Bioinformatics resources for the International Center for Cancer Vaccine Science
Kierownik projektu: Javier Alfaro
University of Gdańsk
International Centre for Cancer Vaccine Science
The clinical relevance of immune cells in the control of human cancers is now well established. However, the identification of tumour-specific antigens that allow the immune system to differentiate cancer cells from normal cells remains a challenge. To be immunogenic, somatic mutations must give rise to peptides that are processed and bind to any of the major histocompatibility complex (MHC) class I or class II allelic products in the patient. Breakthroughs in genomics and proteomics have made it possible to discover recurring and patient-specific neoantigens arising as a consequence of tumor-specific mutations. However, the fraction of somatic mutations yielding an epitope in any patient is low, as is the fraction of the population expected to present a recurring mutation. Hence, the prioritization of which neoantigens to characterize is essential for the success of cancer vaccine science and relies on the development of bioinformatics pipelines. The International Centre for Cancer Vaccine Science (ICCVS), funded by the Polish Foundation for Science (FNP), is an innovative new partnership between the University of Gdańsk and the University of Edinburgh addressing this major challenge in cancer medicine. The centre in Gdansk is seeking dedicated computational resources for the development and application of bioinformatics tools and methods.
Project aim 1: Cancer neo-epitope discovery platform development.
The first stage for the project involves neo-epitope discovery using matched tumour and normal patient samples and cancer cell lines. The team is expected to develop and apply a computational pipeline to identify mutated genes, mutant mRNA, RNA editing events, intron-translation, and chromosomal fusions from next generation DNA and RNA sequencing data. Having identified these aberrations, the team will characterize immunopeptidomes by mass spectrometry and standard immunoaffinity purification.
Project aim 2: Predicting neo-antigen presentation from genomics and transcriptomics.
The centre will generate a large dataset of immunopeptidomes derived by mass-spectrometry alongside matching genomic and transcriptomic datasets. Further, a large dataset of publicly available immuno-peptidomic data will be collected. The team will use machine learning strategies to develop a predictor of neo-antigen presentation based on these and publicly available data. The goal will be to predict from genomics and transcriptomic datasets, which cancer-specific peptides will later be detected by mass-spectrometry as cell-surface antigens. The resulting model will be used to accelerate discovery and reduce costs for neo-antigen discovery.
This is a major computational project at the ICCVS. The team will regularly analyse cancer samples using genomics, transcriptomics and proteomics workflows. These pipelines will not only be used for the above presented project, but also for the other research collaborations conducted at the ICCVS. We will need to establish and benchmark a series of pipelines for the analysis of these samples. These pipelines will be run in parallel by a team of 7 bioinformaticians and we aim to include the Triton cluster as part of our analysis workflow. To carry out activities at the ICCVS, we will require 64 TB of hard-drive space to house numerous genomics, transcriptomics and proteomics datasets. Of these 64TB, it would be beneficial to have at least 20 TB of raided storage backup for the storage of raw datafiles. Our team is growing and will soon consist of 7 bioinformaticians that will each require the ability to submit 400 processes in parallel as they work on various parts of this project. In addition to cluster compute resources, many proteomics software packages run on Windows platforms only. We will require the ability to launch windows virtual machines with 8 TB virtual HDD, 28 cores and at 64-128GB of RAM to run MaxQuant and peptide shaker among other proteomics softwares. MaxQuant, in particular, can be very memory intensive and processor intensive and requires these resources.