Basic data about the project
Project title: MOST DANYCH – DATA BRIDGE, Multidisciplinary Open Knowledge Transfer System – stage II: Open Research Data
Name of the Operational Program: Digital Poland for 2014-2020
Implementing institution: Centrum Projektów Polska Cyfrowa
Priority axis: 02. E-administration and open government
Action: 02.03. Digital availability and usability of public sector information
Sub-action: 02.03.01. Digitally sharing public sector information from administrative and science sources
Project no: POPC.02.03.01-00-0033 / 17
The need for this project is due to the following facts:
- The need to create a dedicated repository of open research data (ORD). In addition to the available libraries of scientific articles, new repositories of various types of source data are also needed.
- Need to increase the reusability and processing of Big Data (Open Research Data) using a supercomputer.
- Lack of adequate standards for the description of open research data for the Pomeranian and Polish academic community and an appropriate competence center supporting their standardization and use.
- The need to provide a universal tool available to the Polish scientific community to support the publishing process of open scientific journals.
The essence of the MOST DANYCH – DATA BRIDGE project is to design and build a platform for collecting, searching, analyzing and sharing open research data and feeding it with unique data such as human tissue imaging or sea state measurements collected from the three most important universities in Pomerania: Gdańsk University of Technology, Gdańsk Medical University and University of Gdańsk. These data will be made available free of charge to the scientific community, entrepreneurs and the public, with the possibility of processing on the Triton supercomputer.
The main objectives of this project are: increasing the availability, improving the quality and increasing the possibility of reusing the scientific resources of the most important universities in Pomerania (GUT, UG, MUG) by creating a center for collecting and sharing research data on the open access platform and their analysis in order to implement new scenarios by scientists research.
The following specific objectives will be achieved under the project
- Collecting and sharing open research data within the Data Bridge platform and based on an object-oriented data repository located in CI TASK. It is planned to collect 27,000 research results with a total capacity of 142 TB, including 120 TB of human tissue imaging. Moreover, CI TASK will produce some software components enabling the transfer, analysis and viewing of collected data, e.g. through a virtual microscope.
- Ensuring the possibility of re-use and processing of the collected data of considerable size (Big Data analysis) using the Triton supercomputer, located in CI TASK, as well as analysis applications implemented by CI TASK employees.
- Establishing the MOST KOMPETENCJI competence center at the Gdańsk University of Technology, whose task is to raise the awareness of the scientific community in the subject of Open Access and ORD, including the use of metadata description standards, in which CI TASK employees are involved.
- Collecting information on the policies of Polish scientific publishers towards Open Access, along with a service supporting the editing of open scientific journals. It is estimated that 5 magazines will use this service. CI TASK publishes the journal: TASK Quarterly, a quarterly published on the basis of Open Access.
Project implementation stages:
- 31.12.2018 – Project initiation including: analysis, design, team organization and preparation of a public procurement plan.
- 31.12.2019 – Standards for research data descriptions development and a prototype of a platform and data repositories.
- 31.03.2021 – Starting of the MOST DANYCH – DATA BRIDGE platform and supporting tools, including the construction of the main services provided by the platform.
- 30.09.2021 – Implementation of other functionalities of the MOST DANYCH – DATA BRIDGE platform and its improvement based on cooperation with real users and the data provided by them.
Scientific data analysis cycle
Schemat przedstawia typowy cykl analizy danych naukowych w Moście Danych na przykładzie danych medycznych.
- Data acquisition can occur from various sources, e.g. tissue scanners, medical history, medical tests, laboratory test results etc.
- The above data is collected in an object-oriented data store (software: CEPH), along with appropriate metadata and links.
- The ordered analysis is performed on the Tryton supercomputer (over 38,000 computing cores) using Big Data software, e.g. Apache Spark and / or AI, e.g. TensorFlow (machine learning), e.g. high / low-pass filtering, brightness modification (sharpening), image smoothing, contour / border detection, classification and counting of objects (e.g. cells).
- The results of the analysis are stored in an object-oriented warehouse and presented on the internet portal in a graphical manner, using modern information technologies.