Multi-Instance TasksDrugSynOncoPolyPharmacology

OncoPolyPharmacology

DrugSyn

Dataset Description

A large-scale oncology screen produced by Merck & Co., where each sample consists of two compounds and a cell line. The dataset covers 583 distinct combinations, each tested against 39 human cancer cell lines derived from 7 different tissue types. Pairwise combinations were constructed from 38 diverse anticancer drugs (14 experimental and 24 approved). The synergy score is calculated by Loewe Additivity values using the batch processing mode of Combenefit. The genomic features are from ArrayExpress database (accession number: E-MTAB-3610) and was quantile normalized and summarized with Factor Analysis for Robust Microarray Summarization (FARMS). The processed data is provided by DeepSynergy.

Task Description

Regression. Given the gene expression of cell lines and two SMILES strings of the drug combos, predict the drug synergy level.

Dataset Statistics

23,052 drug combo-cell line points, among 39 cancer cell lines and 37 drugs

Available Splits

Random SplitCold Cell Line Split

Usage Example

from tdc_ml.multi_pred import DrugSyn

data = DrugSyn(name='OncoPolyPharmacology')

# Access the data
df = data.get_data()
print(df.head())

# Get train/val/test splits
split = data.get_split()
print(split)

License

This dataset is licensed under CC BY 4.0.