Grant/Projek zakończony
Badania nad anomaliami genetycznymi nabytymi w ciągu życia jako czynnikami ryzyka nowotworów i innych chorób
Identyfikator grantu: PT00762
Kierownik projektu: Dariusz Kedra
Gdański Uniwersytet Medyczny
Międzynarodowa Agenda Badawcza
Gdańsk
Data otwarcia: 2019-08-23
Data zakończenia: 2022-03-30
Streszczenie projektu
We are generating and obtaining from our collaborators DNA/RNA sequencing from hundreds of patients. Since for each patient we sequence at least three tissues, we are dealing already with a large data set of ~15k sequencing files. In order to analyze them, we perform mapping to the human genome with subsequent mutation calling (DNA) or expression analyzes (RNA). Our in house server resources are not suitable to process such data fast enough and in parallel.
The pipelines require following programs: bbmap (java), bwa (C), sambamba (Dlang), samtools (C), picard (Java), GATK (Java), Platypus (Python/Cython/C), GRIDSS (Java), MANTA (C++), STAR mapper (C++), GMAP/GSNAP (C)
Since the most CPU/RAM intensive steps of our pipelines do not require inter-process communication, mapping of the sequences, sorting the results, can be run independently. We will require >=64GB RAM for mapping and mutation calling. The bwa/sambamba and other programs are able to use multiple threads (we tested them with up to 40 threads at the time).
The pipelines require following programs: bbmap (java), bwa (C), sambamba (Dlang), samtools (C), picard (Java), GATK (Java), Platypus (Python/Cython/C), GRIDSS (Java), MANTA (C++), STAR mapper (C++), GMAP/GSNAP (C)
Since the most CPU/RAM intensive steps of our pipelines do not require inter-process communication, mapping of the sequences, sorting the results, can be run independently. We will require >=64GB RAM for mapping and mutation calling. The bwa/sambamba and other programs are able to use multiple threads (we tested them with up to 40 threads at the time).