Badania nad anomaliami genetycznymi nabytymi w ciągu życia jako czynnikami ryzyka nowotworów i innych chorób

Kierownik projektu: Dariusz Kedra

Gdański Uniwersytet Medyczny

Międzynarodowa Agenda Badawcza


Data otwarcia: 2019-08-23

Streszczenie projektu

We are generating and obtaining from our collaborators DNA/RNA sequencing from hundreds of patients. Since for each patient we sequence at least three tissues, we are dealing already with a large data set of ~15k sequencing files. In order to analyze them, we perform mapping to the human genome with subsequent mutation calling (DNA) or expression analyzes (RNA). Our in house server resources are not suitable to process such data fast enough and in parallel.
The pipelines require following programs: bbmap (java), bwa (C), sambamba (Dlang), samtools (C), picard (Java), GATK (Java), Platypus (Python/Cython/C), GRIDSS (Java), MANTA (C++), STAR mapper (C++), GMAP/GSNAP (C)
Since the most CPU/RAM intensive steps of our pipelines do not require inter-process communication, mapping of the sequences, sorting the results, can be run independently. We will require >=64GB RAM for mapping and mutation calling. The bwa/sambamba and other programs are able to use multiple threads (we tested them with up to 40 threads at the time).

