Machine Learning

Cellarity Releases Novel, Open-Source, Single-Cell Dataset

The dataset will be publicly available for a Kaggle competition presented at NeurIPS 2022 hosted by Open Problems in Single-Cell Analysis in collaboration with Chan Zuckerberg Initiative, Chan Zuckerberg Biohub, Yale University, and Helmholtz Munich

Cellarity, a life sciences company founded by Flagship Pioneering to transform the way medicines are created, announced today the release of a unique single-cell dataset to accelerate innovation in mapping multimodal genetic information across cell states and over time. This dataset will be used to power a competition hosted by Open Problems in Single-Cell Analysis.

Cells are among the most complex and dynamic systems and are regulated by the interplay of DNA, RNA, and proteins. Recent technological advances have made it possible to measure these cellular features and such data provide, for the first time, a direct and comprehensive view spanning the layers of gene regulation that drive biological systems and give rise to disease.

“Advancements in single-cell technologies now make it possible to decode genetic regulation, and we are excited to generate another first-of-its-kind dataset to support Open Problems in Single Cell Analysis,” said Fabrice Chouraqui, PharmD, CEO of Cellarity and a CEO-Partner at Flagship Pioneering. “Developing new machine learning algorithms that can predict how a single-cell genome can drive a diversity of cellular states will provide new insights into how cells and tissues move from health to disease and support informed design of new medicines.”

To drive innovation for such data, Cellarity generated a time course profiling in vitro differentiation of blood progenitors, a dataset designed in collaboration with scientists at Yale University, Chan Zuckerberg Biohub, and Helmholtz Munich. This time course will be used to power a competition to develop algorithms that learn the underlying relationships between DNA, RNA, and protein modalities across time. Solving this open problem will help elucidate complex regulatory processes that are the foundation for cell differentiation in health and disease.

“While multimodal single-cell data is increasingly available, methods to analyze these data are still scarce and often treat cells as static snapshots without modeling the underlying dynamics of cell state,” said Daniel Burkhardt, Ph.D., cofounder of Open Problems in Single-Cell Analysis and Machine Learning Scientist at Cellarity. “New machine learning algorithms are needed to learn the rules that govern complex cell regulatory processes so we can predict how cell state changes over time. We hope these new algorithms can augment the value of existing or future single-modality datasets, which can be cost effectively generated at higher quality to streamline and accelerate research.”

In 2021, Cellarity partnered with Open Problems collaborators to develop the first benchmark competition for multimodal single-cell data integration using a first-of-its-kind multi-omics benchmarking dataset (NeurIPS 2021). This dataset was the largest atlas of the human bone marrow measured across DNA, RNA, and proteins and was used to predict one modality from another and learn representations of multiple modalities measured in the same cells. The 2021 competition saw winning submissions from both computational biologists with deep single-cell expertise and machine learning practitioners for whom this competition marked their first foray into biology. This translation of knowledge across disciplines is expected to drive more powerful algorithms to learn fundamental rules of biology.

For 2022, Cellarity and Open Problems are extending the challenge to drive innovation in modeling temporal single-cell data measured in multiple modalities at multiple time points. For this year’s competition, Cellarity generated a 300,000-cell time course dataset of CD34+ hematopoietic stem and progenitor cells (HSPC) from four human donors at five time points. HSPCs are stem cells that give rise to all other cells in the blood throughout adult life, and a 10-day time course captures important biology in CD34+ HSPCs. Being able to solve the prediction problems over time is expected to yield new insights into how gene regulation influences differentiation.

Entries to the competition will be accepted until November 15, 2022. For more information, visit the competition page on Kaggle.

Visit AITechPark for cutting-edge Tech Trends around AI, ML, Cybersecurity, along with AITech News, and timely updates from industry professionals!

Related posts

Synaptic raises $20Mn in Series B Funding Round

Business Wire

OneTrust joins Responsible Artificial Intelligence Institute

PR Newswire

Claranova: inPixio Dawns the Era of AI, ML with Photo Studio® 11.5

Business Wire