Making sense of Big Data

News

Publication: flexible co-data learning for high-dimensional prediction

A paper of our team’s PhD student Mirrelijn van Nee on “flexible co-data learning for high-dimensional prediction” got accepted for Statistics in Medicine, well done! In this joint work with Lodewyk Wessels from the NKI and Mark van de Wiel, the paper explains how we can flexibly learn from auxiliary information on the variables, or co-data, to improve the prediction and covariate selection in high-dimensional data. In cancer genomics, for example, such co-data may represent groups of genes corresponding to pathways or p-values from previously published studies. While the simplest format of co-data comprises few, non-overlapping groups, co-data may also come in different formats, such as hierarchical groups, many overlapping groups or continuous co-data. The introduced method, termed “ecpc”, is able to handle these various types of co-data by including shrinkage on the group level. Besides, multiple co-data sources may be combined, unpenalised covariates may be included and a model with a user-defined number of variables may be selected a posteriori. The method is available in the R-package ecpc on CRAN.