Balanced mixed-type tabular data synthesis with diffusion models

1Rice University
arXiv 2023

Abstract

Denoising diffusion probabilistic models (DDPMs) have emerged as the powerful framework for a wide range of generative tasks, such as image and audio synthesis. DDPMs have also been used to generate mixed-type tabular data with both continuous and discrete variables. However, current approaches to training DDPMs on mixed-type tabular data tend to inherit the imbalanced distributions of features present in the training dataset, which can result in biased sampling. In this research, we introduce FairDDPM, a diffusion model capable of generating data that is balanced with respect to a specified set of sensitive attributes. We demonstrate that FairDDPM effectively mitigates class imbalance in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that FairDDPM outperforms existing methods for synthesizing tabular data in terms of both machine learning efficiency and fairness.

Method

Mixed-type modeling and classifier-free guidance.

Results

Some figures and tables.

Team

Zeyu Yang

Rice University

Peikun Guo

Rice University

Khadija Zanna

Rice University

Akane Sano

Rice University