DrugComb

DrugSyn

Dataset Description

This dataset contains the summarized results of drug combination screening studies for the NCI-60 cancer cell lines (excluding the MDA-N cell line). A total of 129 drugs are tested across 59 cell lines resulting in a total of 297098 unique drug combination-cell line pairs. For each of the combination drugs, we provide its canonical SMILES string queried from PubChem. For each cell line, we include the following features downloaded from NCI’s CellMiner interface: 25,723 gene features capturing transcript expression levels averaged from five microarray platforms, 627 microRNA expression features and 3171 proteomic features that capture the abundance levels of a subset of proteins. There are two kinds of labels included in this dataset. CSS measures the drug combination sensitivity and is derived using relative IC50 values of compounds and the area under their dose-response curves. The other four metrics capture the synergy between the two drugs. Synergy is a dimensionless measure of deviation of an observed drug combination response from the expected effect of non-interaction. Synergy is calculated using four different models: Bliss model, Highest Single Agent (HSA), Loewe additivity model and Zero Interaction Potency (ZIP).

Task Description

Given the chemical structural information for the combining drugs and genomic features for a particular cell line, predict the drug synergy level or the drug combination sensitivity in that cell line.

Dataset Statistics

129 drugs are tested across 59 cell lines resulting in a total of 297098 unique drug combination-cell line pairs.

Available Splits

Random SplitCold Cell Line Split

Usage Example

from tdc_ml.multi_pred import DrugSyn

data = DrugSyn(name='DrugComb')

# Access the data
df = data.get_data()
print(df.head())

# Get train/val/test splits
split = data.get_split()
print(split)

References

License

This dataset is licensed under CC BY 4.0.