MHC Class I, IEDB-IMGT, Nielsen et al.
PeptideMHC
Dataset Description
Binding of peptides to MHC class I molecules (MHC-I) is essential for antigen presentation to cytotoxic T-cells. An organized datasets by NetMHCpan for MHC class I collected from IEDB and IMGT/HLA database.
Task Description
Regression. Given the amino acid sequence of peptide and the pseudo amino acid sequence of MHC, predict the binding affinity.
Dataset Statistics
185,985 pairs, 43,018 peptides and 150 MHC class 1s
Available Splits
Random Split
Usage Example
from tdc_ml.multi_pred import PeptideMHC data = PeptideMHC(name='MHC1_IEDB-IMGT_Nielsen') # Access the data df = data.get_data() print(df.head()) # Get train/val/test splits split = data.get_split() print(split)
References
- [1] Nielsen, Morten, and Massimo Andreatta. “NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets.” Genome medicine 8.1 (2016): 1-9.
- [2] Vita, Randi, et al. “The immune epitope database (IEDB): 2018 update.” Nucleic acids research 47.D1 (2019): D339-D343.
- [3] Zeng, Haoyang, and David K. Gifford. “Quantification of uncertainty in peptide-MHC binding prediction improves high-affinity peptide Selection for therapeutic design.” Cell systems 9.2 (2019): 159-166.
License
This dataset is licensed under CC BY 4.0.