Although AI holds strong promise in numerous high-stakes domains, the lack of high-quality datasets creates a significant hurdle for the development of AI, leading to missed opportunities. Synthetic data has the potential to fuel the development of AI by unleashing the information in datasets that are small, sensitive or biased.
Our Synthcity project is an open-source initiative to create a software platform for innovative use cases of synthetic data in ML fairness, privacy, and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more.
Synthcity provides the practitioners with a single access point to cutting edge research in diverse problems settings. It offers the community a playground for rapid experimentation and prototyping, a one-stop-shop for SOTA benchmarks, and an opportunity for extending research impact.
You can find all information in our white paper: https://arxiv.org/abs/2301.07573
The library can be accessed on GitHub: https://github.com/vanderschaarlab/synthcity
As well as on pip: https://pypi.org/project/synthcity/
If you are interested in learning more about how to use Synthcity and the creative use of synthetic data, sign up to our next Inspiration Exchange session on 1 February at 4 pm GMT here.
To gain an even deeper knowledge of the theory, algorithms, best practices, as well as limitations of synthetic data generation, we are running a Synthetic Data Tutorial at AAAI on 8 February.
You can find all information about the session here.