Denoising diffusion probabilistic models (DDPMs) have emerged as the powerful framework for a wide range of generative tasks, such as image and audio synthesis. DDPMs have also been used to generate mixed-type tabular data with both continuous and discrete variables. However, current approaches to training DDPMs on mixed-type tabular data tend to inherit the imbalanced distributions of features present in the training dataset, which can result in biased sampling. In this research, we introduce FairDDPM, a diffusion model capable of generating data that is balanced with respect to a specified set of sensitive attributes. We demonstrate that FairDDPM effectively mitigates class imbalance in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that FairDDPM outperforms existing methods for synthesizing tabular data in terms of both machine learning efficiency and fairness.
Mixed-type modeling and classifier-free guidance.
Some figures and tables.