Datasets Overview
PyTDC provides a comprehensive collection of curated datasets spanning the entire drug discovery pipeline, from target discovery to clinical trial outcomes.
Single-Instance Prediction
Datasets for predicting properties of individual molecules, including ADME, toxicity, quantum mechanics, and more.
Explore Datasets →Multi-Instance Prediction
Datasets for predicting interactions between multiple entities, such as drug-target, drug-drug, and protein-protein interactions.
Explore Datasets →Generation Tasks
Datasets for generative modeling tasks, including molecule generation, retrosynthesis, and structure-based drug design.
Explore Datasets →Key Features
- Curated and standardized datasets from authoritative sources
- Multiple split strategies for robust model evaluation
- Comprehensive metadata and documentation
- Easy-to-use Python API for seamless integration