Document - Boosting synthetic data generation with effective nonlinear causal discovery

2021

Conference article Open Access

Boosting synthetic data generation with effective nonlinear causal discovery

Cinquini M., Giannotti F., Guidotti R.

Data generation Synthetic datasets Pattern mining Explainability Causal discovery

Synthetic data generation has been widely adopted in software testing, data privacy, imbalanced learning, artificial intelligence explanation, etc. In all such contexts, it is important to generate plausible data samples. A common assumption of approaches widely used for data generation is the independence of the features. However, typically, the variables of a dataset de-pend on one another, and these dependencies are not considered in data generation leading to the creation of implausible records. The main problem is that dependencies among variables are typically unknown. In this paper, we design a synthetic dataset generator for tabular data that is able to discover nonlinear causalities among the variables and use them at generation time. State-of-the-art methods for nonlinear causal discovery are typically inefficient. We boost them by restricting the causal discovery among the features appearing in the frequent patterns efficiently retrieved by a pattern mining algorithm. To validate our proposal, we design a framework for generating synthetic datasets with known causalities. Wide experimentation on many synthetic datasets and real datasets with known causalities shows the effectiveness of the proposed method.

Source: CogMI 2021 - Third IEEE International Conference on Cognitive Machine Intelligence, pp. 54–63, Online conference, 13-15/12/2021

Metrics

Back to previous page

Cite as

BibTeX entry

@inproceedings{oai:it.cnr:prodotti:468813,
	title = {Boosting synthetic data generation with effective nonlinear causal discovery},
	author = {Cinquini M. and Giannotti F. and Guidotti R.},
	doi = {10.1109/cogmi52975.2021.00016},
	booktitle = {CogMI 2021 - Third IEEE International Conference on Cognitive Machine Intelligence, pp. 54–63, Online conference, 13-15/12/2021},
	year = {2021}
}

CNR authors and affiliations

CNR authors

Giannotti, Fosca
0000-0003-3099-3835

Laboratories

Knowledge Discovery and Data Mining (2002-ongoing)

Download

CNR ExploRA

Bibliographic record

ISTI Repository

Preprint version

DOI

10.1109/cogmi52975.2021.00016

Also available from

ieeexplore.ieee.org

Projects (via OpenAIRE)