OncoPolyPharmacology
Dataset Description
A large-scale oncology screen produced by Merck & Co., where each sample consists of two compounds and a cell line. The dataset covers 583 distinct combinations, each tested against 39 human cancer cell lines derived from 7 different tissue types. Pairwise combinations were constructed from 38 diverse anticancer drugs (14 experimental and 24 approved). The synergy score is calculated by Loewe Additivity values using the batch processing mode of Combenefit. The genomic features are from ArrayExpress database (accession number: E-MTAB-3610) and was quantile normalized and summarized with Factor Analysis for Robust Microarray Summarization (FARMS). The processed data is provided by DeepSynergy.
Task Description
Regression. Given the gene expression of cell lines and two SMILES strings of the drug combos, predict the drug synergy level.
Dataset Statistics
23,052 drug combo-cell line points, among 39 cancer cell lines and 37 drugs
Available Splits
Usage Example
from tdc_ml.multi_pred import DrugSyn data = DrugSyn(name='OncoPolyPharmacology') # Access the data df = data.get_data() print(df.head()) # Get train/val/test splits split = data.get_split() print(split)
References
- [1] O’Neil, Jennifer, et al. “An unbiased oncology compound screen to identify novel combination strategies.” Molecular cancer therapeutics 15.6 (2016): 1155-1162.
- [2] Preuer, Kristina, et al. “DeepSynergy: predicting anti-cancer drug synergy with Deep Learning.” Bioinformatics 34.9 (2018): 1538-1546.
License
This dataset is licensed under CC BY 4.0.