CYP3A4 Substrate, Carbon-Mangels et al.
ADME
Dataset Description
CYP3A4 is an important enzyme in the body, mainly found in the liver and in the intestine. It oxidizes small foreign organic molecules (xenobiotics), such as toxins or drugs, so that they can be removed from the body. TDC used a dataset from [1], which merged information on substrates and nonsubstrates from six publications.
Task Description
Binary Classification. Given a drug SMILES string, predict if it is a substrate to the enzyme.
Dataset Statistics
667 drugs.
Available Splits
Random SplitScaffold Split
Usage Example
from tdc_ml.single_pred import ADME data = ADME(name='CYP3A4_Substrate_CarbonMangels') # Access the data df = data.get_data() print(df.head()) # Get train/val/test splits split = data.get_split() print(split)
References
- [1] Carbon‐Mangels, Miriam, and Michael C. Hutter. “Selecting relevant descriptors for classification by bayesian estimates: a comparison with decision trees and support vector machines approaches for disparate data sets.” Molecular informatics 30.10 (2011): 885-895.
- [2] Cheng, Feixiong, et al. “admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties.” (2012): 3099-3105.
License
This dataset is licensed under CC BY 4.0