IEDB, Jespersen et al.
Epitope
Dataset Description
Epitope prediction is to predict the active region in the antigen. This dataset is from Bepipred, which curates a dataset from IEDB. It collects B-cell epitopes and non-epitope amino acids determined from crystal structures.
Task Description
Token-level classification. Given an amino acid sequence, predict amino acid token that is active in binding, i.e. X is amino acid sequence, Y is a list of indices for the active positions in X.
Dataset Statistics
3,159 antigens.
Available Splits
Random Split
Usage Example
from tdc_ml.single_pred import Epitope data = Epitope(name='IEDB_Jespersen') # Access the data df = data.get_data() print(df.head()) # Get train/val/test splits split = data.get_split() print(split)
References
- [1] Vita, Randi, et al. “The immune epitope database (IEDB): 2018 update.” Nucleic acids research 47.D1 (2019): D339-D343.
- [2] Jespersen, Martin Closter, et al. “BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes.” Nucleic acids research 45.W1 (2017): W24-W29.
License
This dataset is licensed under CC BY 4.0.