van der Schaar Lab

Synthcity and using Synthetic Data

Although AI holds strong promise in numerous high-stakes domains, the lack of high-quality datasets creates a significant hurdle for the development of AI, leading to missed opportunities. Synthetic data has the potential to fuel the development of AI by unleashing the information in datasets that are small, sensitive or biased.

Our Synthcity project is an open-source initiative to create a software platform for innovative use cases of synthetic data in ML fairness, privacy, and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more.

Synthcity provides the practitioners with a single access point to cutting edge research in diverse problems settings. It offers the community a playground for rapid experimentation and prototyping, a one-stop-shop for SOTA benchmarks, and an opportunity for extending research impact.

You can find all information in our white paperhttps://arxiv.org/abs/2301.07573

The library can be accessed on GitHubhttps://github.com/vanderschaarlab/synthcity

As well as on pip: https://pypi.org/project/synthcity/

If you are interested in learning more about how to use Synthcity and the creative use of synthetic data, sign up to our next Inspiration Exchange session on 1 February at 4 pm GMT here.

To gain an even deeper knowledge of the theory, algorithms, best practices, as well as limitations of synthetic data generation, we are running a Synthetic Data Tutorial at AAAI on 8 February.

You can find all information about the session here.

Zhaozhi Qian

After obtaining a MSc in Machine Learning at UCL, Zhaozhi Qian started his career as a data scientist in the largest mobile gaming company in Europe. Three years later, he found it might be more fulfilling to apply AI to cure cancer than to make the gamers hit the purchase button 1% more often.

He thus joined the group in 2019 as a PhD student focusing on robust and interpretable learning for longitudinal data. So far, his work has included inferring latent disease interaction networks from Electronic Health Records, uncovering the causal structure between events that unfold over time, and calibrating the predictive uncertainty under domain shift.

Zhaozhi also worked as a contractor in the NHS during the COVID-19 pandemic contributing his analytical skills to the national response to the pandemic.

Bogdan Cebere

Bogdan is one of the lab’s research engineers, having joined the team in 2021. He received his bachelor’s degree in computer science in 2012 and his master’s degree in distributed systems in 2014, both from the University of Bucharest.

Prior to joining the van der Schaar Lab, Bogdan worked for roughly 10 years at a cybersecurity company. During this time, he contributed to a range of research projects related to network security, cryptography, and data privacy, which required high-performance solutions in embedded or cloud environments.

Bogdan has also made substantial contributions to open-source projects, mostly focused on privacy preserving techniques for machine learning. Some of his key contributions in this space have been for the OpenMined community; he and his collaborators published this work in workshops at the prominent NeurIPS and ICLR conferences.

Bogdan is driven to keep learning new things every day, and to keep improving—that’s his main reason for joining the van der Schaar lab.

Mihaela van der Schaar

Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Fellow at The Alan Turing Institute in London.

Mihaela has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), a National Science Foundation CAREER Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award.

In 2019, she was identified by National Endowment for Science, Technology and the Arts as the most-cited female AI researcher in the UK. She was also elected as a 2019 “Star in Computer Networking and Communications” by N²Women. Her research expertise span signal and image processing, communication networks, network science, multimedia, game theory, distributed systems, machine learning and AI.

Mihaela’s research focus is on machine learning, AI and operations research for healthcare and medicine.

Andreas Bedorf