PerturbOutcome

Counterfactual Prediction

Task Overview

We define a task for predicting responses in gene expression of single cells to chemical and genetic perturbations, aiming to measure model generalization across cell lines and perturbation types. Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. Furthermore, counterfactual prediction of drug-based perturbations at single-cell resolution enables cell-type specific drugs and treatments, facilitating precision medicine. The predictive, non-generative task is then formalized as a function of a cell, with corresponding attributes such as cell line, disease, and tissue, and a perturbation, such as a drug type or a CRISPR-based perturbation, which outputs a count for gene expression of the cell after the input perturbation.

Impact

Machine learning has significantly advanced the ability to predict how single cells respond to various chemical and genetic perturbations. This capability is crucial for understanding cellular behaviors and developing new therapeutic strategies. Machine learning models have revolutionized the prediction of gene expression responses in single cells to chemical and genetic perturbations by enhancing predictive accuracy, handling dose dependencies, managing complex perturbations, and optimizing experimental designs. These advancements enable more efficient and accurate exploration of cellular responses, facilitating drug discovery and the development of personalized medicine.

Generalization

We measure model generalization across seen and unseen perturbations and across seen and unseen cell lines.

Product

Drug Repurposing, Predicting Adverse Drug Reactions, Biopharmaceuticals

Pipeline Stage

Target discovery, Phenotypic Screening.

Available Datasets

scperturb

Usage Example

You can access these datasets using the PyTDC library:

from tdc_ml.multi_pred import PerturbOutcome

# Load a dataset
data = PerturbOutcome(name='scperturb')

# Access the data
df = data.get_data()
print(df.head())

# Get train/val/test splits
split = data.get_split()
print(split)