Making sense of Big Data

Statistics for Big data

Welcome to the site of the Big Statistics section! Our aim is to link Big Data to clinical response by novel, problem-specific statistical methods. Here, 'Big' means: big in sample size (n) and/or big in number of variables (p), including high-dimensional ("omics") data. 

As part of the Department of Epidemiology and Data Science of the Amsterdam UMC, our section is involved in consultancy, research and teaching. More information about who we are, our work and how to reach us can be found in these pages.

Big Statistics


Networks: Developing methods to learn molecular networks from omics data. 

Statistical omics: Building dedicated statistical models to test associations of omics variables with clinical parameters. 

Co-data learning: Improving prediction and variable selection by accounting for complementary data

Big longitudinal data: Modelling high-dimensional longitudinal data.

Causal inference: Drawing conclusions on what causes what in complex learning problems applied to (big) data.

Record linkage: Linking clinical information with anonimized information on an aggregated level (e.g. postal code).


More on Statistics

Software & Support

Research Support
Big data analysis support is core business for our group. We supply tailored solutions for a variety of big data analysis questions in the VUmc, covering study design, preprocessing and downstream analysis. We collaborate with researchers from a variety of disciplines, such as oncology, cardiology and neurology. 

More on Consultancy

Software is the tool for disseminating our research. We have contributed >20 R packages (>30,000 downloads per year) to well-known public repositories like CRAN, Bioconductor and Github.

More on Software


We love data. The more, the better. Many of us have experience with omics data, which refers to the high-throughput quantification of some pool of molecular molecules. Often, these data are high-dimensional meaning they have more features than observations. Our group provides statistical support for the processing and analysis of a wide variety of omics data, such as genomicmetabolomic, and radiomic data. Our expertise ranges from next-generation sequencing platforms for genomics, to various platforms for proteomics/metabolomics and imaging features (radiomics). 

We also support analysis of truly big data, with a focus on observational cohorts. We are, for example, involved in the exposome project for the analysis of large longitudinal data. 

More on Data

Latest News

New Name


Due to the merger our department has a new name: Epidemiology & Data Science. This website represents the new Big Statistics section, which hosts statisticians from both locations, VUmc and AMC, working on big data. 

Read more

Award for Mirrelijn!


Mirrelijn van Nee won the Award for best Student oral Presentation at the IBC 2020!

Read more

PLRS performs well


In a recent paper, our software for DNA copy number - gene expression integration, PLRS, developed by Gwenael Leday, was compared to several other tools. It was rated as "best performing"!

Read more