Making sense of Big Data

Statistics for Big data

Welcome to the site of the Big Statistics section! Our aim is to link Big Data to clinical response by novel, problem-specific statistical methods. Here, 'Big' means: big in sample size (n) and/or big in number of variables (p), including high-dimensional ("omics") data.

As part of the Department of Epidemiology and Data Science of the Amsterdam UMC, our section is involved in consultancy, research and teaching. More information about who we are, our work and how to reach us can be found in these pages.

Big Statistics

Statistics & Machine Learning

Networks: Developing methods to learn molecular networks from omics data.

Statistical omics: Building dedicated statistical models to test associations of omics variables with clinical parameters.

Co-data learning: Improving prediction and variable selection by accounting for complementary data.

Machine learning & Big data: Application, development and interpretation of several machine learners (focus on tree-based learners) for a variety of big data applications 

Causal inference: Drawing conclusions on what causes what in complex learning problems applied to (big) data.

Record linkage: Linking clinical information with anonimized information on an aggregated level (e.g. postal code).

More on Statistics

Software & Support

Research Support

Big data analysis support is core business for our group. We supply tailored solutions for a variety of big data analysis questions in the AUmc, covering study design, preprocessing and downstream analysis. We collaborate with researchers from a variety of disciplines, such as oncology, cardiology and neurology.

More on Consultancy


Software is the tool for disseminating our research. We have contributed >20 R packages (>30,000 downloads per year) to well-known public repositories like CRAN, Bioconductor and Github.

More on Software


We love data. The more, the better. Many of us have experience with omics data, which refers to the high-throughput quantification of some pool of molecular molecules. Often, these data are high-dimensional meaning they have more features than observations. Our group provides statistical support for the processing and analysis of a wide variety of omics data, such as genomic, metabolomic, and radiomic data. Our expertise ranges from next-generation sequencing platforms for genomics, to various platforms for proteomics/metabolomics and imaging features (radiomics).

We also support analysis of truly big n data, with a focus on observational cohorts. We are, for example, involved in the exposome project for the analysis of large longitudinal data.

More on Data