Statistics for Big Data

Our research covers a number of topics, such as:

Our research generates methods to learn molecular network from omics data. In particular, identifying (parts of these) networks to be differential between disease stages is a first step towards network medicine.

Statistical omics
Many types of omics data (genomics, proteomics, radiomics) require dedicated statistical models to test for associations with clinical parameters. Much of this data comes with specific structure, which we usually aim to incoporate in our solutions.

Co-data learning
Omics data is 2 x Big data: a) number of features ├índ b) sources of auxiliary data: co-data. We develop methods and machine learners that jointly use co-data and the main data, rendering better predictions and markers for several applications.

Big longitudinal data
Cohort data often consists of a large number of samples with an even larger number of variables for, usually, a limited number of time points. We model such high-dimensional longitudinal data and develop methods for inference in such settings.

Causal inference
How to draw conclusions on what causes what in complex problems that use advanced machine learners on tons of data points? We apply and develop methodology to answer this question.

Record linkage
Clinical data contains detailed clinical information on, usually, a limited number of individuals. Such information can be enriched by linking these records with those from public repositories containing environmental information on postal code level. Our job is to create a reliable linkage based on statistical models.