Statistics for Big Data

Our research covers a number of topics, such as:

Our research generates methods to learn molecular network from omics data. In particular, identifying (parts of these) networks to be differential between disease stages is a first step towards network medicine.

Statistical omics
Many types of omics data (genomics, proteomics, radiomics) require dedicated statistical models to test for associations with clinical parameters. Much of this data comes with specific structure, which we usually aim to incoporate in our solutions.

Co-data learning
Omics data is 2 x Big data: a) number of features ├índ b) sources of auxiliary data: co-data. We develop methods and machine learners that jointly use co-data and the main data, rendering better predictions and markers for several applications.

Machine learning & Big data: Application, development and interpretation of several machine learners (focus on tree-based learners) for a variety of big data applications. In addition, inference and  meta analysis for variable import metrics like Shapley values.

Causal inference
How to draw conclusions on what causes what in complex problems that use advanced machine learners on tons of data points? We apply and develop methodology to answer this question.

Record linkage
Clinical data contains detailed clinical information on, usually, a limited number of individuals. Such information can be enriched by linking these records with those from public repositories containing environmental information on postal code level. Our job is to create a reliable linkage based on statistical models.