Data Science – Data Sciences in Biomedicine

In recent years, data science has gained importance in many areas, particularly in medicine and the life sciences. From predicting disease outbreaks to personalized treatment — data science is changing the way we understand, diagnose, and treat human health.

Back

What is Data Science?

Data science involves collecting, analyzing, and interpreting large amounts of data to uncover patterns leading to novel insights. As an interdisciplinary science, it combines mathematics, statistics, and computer science to approach problems from a more complex perspective. In the case of biomedicine, this might include gene sequences, measurement data, and chemical compounds, while in the clinic it can also involve health data such as blood values. In biomedical research, data science enables the transformation of massive, ever-growing amounts of information generated in labs into concrete findings. These results, in turn, can be used to generate new hypotheses. In the context of healthcare, data science makes it possible to use information from hospitals to improve patient care.

How Does Data Science Help?

Some of the most important areas where data science is used in the biomedical field include:

Disease Prediction and Prevention & Personalized Medicine

By analyzing patient data, data scientists can create models to predict who is at risk for certain diseases. For example, some algorithms can examine patterns in health records to predict the likelihood of heart disease, diabetes, or even cancer. The genome, for instance: by looking at combinations of certain gene variants in an individual's genome, one can assess the risk of developing a particular disease (1). This type of predictive analysis enables early intervention, which can save lives and reduce healthcare costs. Every human body is different, and what works for one patient may not work for another. Data science helps tailor treatments to individuals by analyzing genetic data, lifestyle factors, and medical history. This personalized approach is especially important in areas like oncology, where some cancer treatments are more effective for patients with certain genetic markers. For example, combining genetic data with pharmacological data can help determine which medications may offer the most promising therapy for which patients (2).

Information Brochure on Health Data

Medical Imaging, Diagnostics & Medical Infrastructure

Traditionally, radiologists manually examine medical images to detect abnormalities. Now, AI-powered tools can assist doctors by automatically detecting signs of diseases in X-rays or MRI scans — sometimes with an accuracy that matches or even exceeds that of human experts. These tools speed up diagnosis and reduce the risk of error (see also article Medicine and AI).

An example of how data science can be applied directly in acute clinical practice is the so-called "ICU-Unit-Cockpit" used at University Hospital Zurich (3). Here, patient data in the intensive care unit is processed directly to provide decision support and predictions for healthcare staff on site.

Pharmaceutical Research and Development

The development of a new drug is a lengthy and costly process, often taking more than a decade and costing billions of dollars. Data science accelerates this process by identifying promising active ingredients, predicting their effects, and designing better clinical trials. This allows new therapies to reach the market faster and more efficiently. For example, from a database of billions of chemical compounds, certain algorithms can select the most suitable ones for further laboratory investigation (4). Ultimately, data science is an important tool to handle and analyze the large volumes of data generated in clinical trials.

Epidemiology

During events like the COVID-19 pandemic, data science played a crucial role in tracking the spread of the virus, predicting case numbers, and guiding health policy decisions. By analyzing trends and modeling disease dynamics, data scientists help health authorities respond more effectively to new threats. A very important platform during the COVID-19 pandemic, partly developed by Swiss research institutions, is the database and analysis tool from Nextstrain (5). It collects and documents virus gene sequences worldwide, enabling the tracking of the virus’s evolution and spread (Nextstrain.org).

The Swiss Institute of Bioinformatics maintains several projects where data processing aims to improve health, such as cancer diagnostics or monitoring pathogenic organisms based on genome data:

https://www.sib.swiss/about/flagship-projects#spsp

The Future of Medicine – An Interplay Between Data Sciences and the Health System

The integration of data science into medicine is constantly evolving. As more health systems adopt digital technologies, the scope of available data continues to increase. At the same time, advances in artificial intelligence, computing power, and data storage will allow for even faster and more comprehensive data processing. Ultimately, the goal is a healthcare system where diagnoses are more accurate, treatments more personalized, and new discoveries are made more quickly.

Sources

 

1. Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. May 18, 2020;12(1):44.

2. Sadee W, Wang D, Hartmann K, Toland AE. Pharmacogenomics: Driving Personalized Medicine. Pharmacol Rev. July 2023;75(4):789–814.

3. Boss JM, Narula G, Straessle C, Willms J, Azzati J, Brodbeck D, et al. ICU Cockpit: a platform for collecting multimodal waveform data, AI-based computational disease modeling and real-time decision support in the intensive care unit. J Am Med Inform Assoc JAMIA. June 14, 2022;29(7):1286–91.

4. Edfeldt K, Edwards AM, Engkvist O, Günther J, Hartley M, Hulcoop DG, et al. A data science roadmap for open science organizations engaged in early-stage drug discovery. Nat Commun. July 5, 2024;15(1):5640.

5. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. Nextstrain: real-time tracking of pathogen evolution. Kelso J, editor. Bioinformatics. December 1, 2018;34(23):4121–3.

(Image: Google DeepMind / Unsplash)