An actionable checklist with a data-centric lens across the ML pipeline.
Data-centric AI has been raised as an important concept to improve ML systems [1,2,3,4]. However, currently there is no standardised process to communicate the design of data-centric ML pipelines.
Furthermore, there is no guide to the necessary considerations for data-centric AI systems, making the agenda hard to practically engage with.
DC-Check solves this providing an actionable checklist that covers all stages of the ML pipeline.

Who is DC-Check for?
DC-Check is aimed at both practitioners and researchers.
- Both: Each component of DC-Check includes a set of data-centric questions to guide developers in day-to-day development.
- Practioners: We suggest concrete data-centric tools and modeling approaches based on these considerations.
- Researchers: We include research opportunities necessary to advance the research area of data-centric AI.
DC-Check isn’t just a documentation tool
DC-Check goes beyond a documentation tool, supporting practitioners and researchers in achieving greater transparency and accountability with regard to data-centric considerations for ML pipelines. We believe that the transparency and accountability provided by DC-Check can be useful to a range of stakeholders, including developers, researchers, policymakers, regulators, and organization decision-makers, to understand design considerations at each stage of the ML pipeline.
DC-Check covers the end-to-end ML pipeline
DC-Check is an actionable checklist that advocates for a data-centric lens encompassing the following stages of the ML pipeline:
Data: considerations to improve the quality of data used for model training.

Training: considerations based on understanding the data that affect model training.

Testing: considerations around data-centric approaches to test ML models.

Deployment: considerations related to the data post deployment.
