
Resources for Prof. Mihaela van der Schaar’s keynote at MICCAI 2023
Synthetic Data: Powerful creation not second rate copy
Slides
Resources
Here you will find the links to various resources and papers referenced in the keynote. The resources within each category are listed in order of mention.
van der Schaar Lab papers:
- Qian, Z., Cebere, B.-C., & van der Schaar, M. (2023). Synthcity: facilitating innovative use cases of synthetic data in different data modalities. Accepted NeurIPS 2023.
- B. van Breugel, T. Kyono, J. Berrevoets, M. van der Schaar. DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks.
- Liu, Tennison, Zhaozhi Qian, Jeroen Berrevoets, and Mihaela van der Schaar. GOGGLE: Generative modelling for tabular data by learning relational structure. In The Eleventh International Conference on Learning Representations. 2022.
- Yoon, J., Jordon, J., & Schaar, M. (2018, July). RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks. In International Conference on Machine Learning (pp. 5699-5707). PMLR.
- B. van Breugel, N. Seedat, F. Imrie, M. van der Schaar. Can you rely on your model evaluation? Improving model evaluation with synthetic test data? Accepted NeurIPS 2023.
- Chan, A. J., Bica, I., Huyuk, A., Jarrett, D., & van der Schaar, M. (2021). The Medkit-Learn (ing) environment: Medical decision modelling through simulation. NeurIPS 2021.
- J. Berrevoets, D. Jarrett, A. Chan & van der Schaar, M. (2023). AllSim: Systematic Simulation and Benchmarking of Repeated Resource Allocation Policies in Multi-User Systems with Varying Resources. Accepted NeurIPS 2023
- Jordon, J., Yoon, J., & van der Schaar, M. (2018). Measuring the quality of synthetic data for use in competitions. arXiv preprint arXiv:1806.11345.
- A. Alaa, B. van Breugel, E. Saveliev, M. van der Schaar. How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models. ICML 2022.
- Jordon, J., Jarrett, D., Saveliev, E., Yoon, J., Elbers, P., Thoral, P., … & van der Schaar, M. (2021, August). Hide-and-seek privacy challenge: Synthetic data generation vs. patient re-identification. In NeurIPS 2020 Competition and Demonstration Track (pp. 206-215). PMLR.
- van Breugel, B., Qian, Z., & van der Schaar, M. (2023). Synthetic data, real errors: how (not) to publish and use synthetic data. ICML 2023.
Tutorials:
- ICML 2021: Synthetic Healthcare Data Generation and Assessment [YouTube video]
- AAAI 2023: Synthetic Data Tutorial
Other papers referenced:
- Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Fei-Fei, L., 2009, June. ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). IEEE.
- Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F., & Mahmood, F. (2021). Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5(6), 493-497.
Media articles and blog posts:
- van der Schaar lab: “The case for Reality-centric AI”
- The Economist: “From not working to neural networking”
- CNBC: “Hospital execs say they are getting flooded with requests for your health data”
- The HIPAA Journal: “May 2021 Healthcare Data Breach Report”
- Health IT Security: “Patient Data Privacy Lawsuit Against Google, UChicago Dismissed”
- van der Schaar lab: “Synthetic data: breaking the data logjam in machine learning for healthcare”
- MIT Technology Review: “Google’s medical AI was super accurate in a lab. Real life was a different story”
- van der Schaar lab: “Hide-and-seek privacy challenge”
Software:
Other resources: