Datasets Overview

PyTDC provides a comprehensive collection of curated datasets spanning the entire drug discovery pipeline, from target discovery to clinical trial outcomes.

Single-Instance Prediction

Datasets for predicting properties of individual molecules, including ADME, toxicity, quantum mechanics, and more.

Explore Datasets →

Multi-Instance Prediction

Datasets for predicting interactions between multiple entities, such as drug-target, drug-drug, and protein-protein interactions.

Explore Datasets →

Generation Tasks

Datasets for generative modeling tasks, including molecule generation, retrosynthesis, and structure-based drug design.

Explore Datasets →

Key Features

  • Curated and standardized datasets from authoritative sources
  • Multiple split strategies for robust model evaluation
  • Comprehensive metadata and documentation
  • Easy-to-use Python API for seamless integration