My primary interest is in modeling data stemming from complex phenomenon that have been characterized in many dimensions. Preferably, these data have been acquired in studies with intricate experimental designs. Examples of such studies can be found in molecular biology. In that field, often many cellular traits are interrogated simultaneously using the latest omics techniques. When the study’s biological research question is of a system (rather than reductionistic) nature, e.g. unravelling the regulatory network of a pathway instead of the change in expression of an isolated gene, novel methodology not found in the traditional statistical textbooks is required. This novel methodology comprises amongst others: 1) the formulation of multivariate statistical models, e.g. graphical models, describing (say) cellular processes; 2) the learning of model parameters from the (high-dimensional) data; and 3) understanding the models' limitations in their capacity to explain observed data, e.g. from dysregulated cellular processes. The previous three aspects are re-iterated to improve upon employed models. The application of the resulting statistical methodology is not limited to molecular biology. Other scientific disciplines have seen the arrival of high-throughput techniques. For instance, in movement sciences accelerometers, that measure an individual’s 3-dimensional acceleration in ever smaller epochs over long streches of time, are now common place. There too such novel methodology applies, although modifications are usually required, as is the case with accelerometer data.
The interplay of high-throughput data and the novel statistical methodology they require is an exciting, challenging and dynamic scientific field to work in. Moreover, with one foot in the dept. Epidemiology and Data Science of the Amsterdam UMC and the other in the dept. Mathematics of the Vrije Universiteit Amsterdam, I am well-positioned to contribute to this field. The former brings me into close contact with the context researchers, they provide both data and the questions that need to be elucidated with these data. The latter provides access to my collegues with whom I discuss the latest developments in theoretical statistics and mathematics and their relevance for the data-related problems I encounter in the hospital. This bears fruit as can be witnessed from the string of research papers in leading statistical journals on penalized learning of multivariate models, usually of a graphical nature. Virtually all these papers include an application involving high-throughput data stemming from or inspired by my colloborations in the Amsterdam UMC.
In order to encourage practitioners to use my methodology, I (co-)author and/or maintain three R-packages, porridge, rags2ridges and ragt2ridges, that come with detailed manuals. The latter two are one-stop-shops, that facilitate the learning of graphical models, the down-stream analyses of the resulting networks, and various network visualizations. For these two packages extended vignettes have been developed. Both vignettes form the backbone of highly appreciated pre-conference courses.
van Wieringen, W.N., Stam, K.A., Peeters, C.F.W., van de Wiel, M.A. (2020), "Updating of the Gaussian graphical model through targeted penalized estimation", I, 178, article 104621.
Bilgrau, A.E., Peeters, C.F.W., Eriksen, P.S., Bogsted, M., van Wieringen, W.N. (2020), "Targeted fused ridge estimation of inverse covariance matrices from multiple high-dimensional data classes", Journal of Machine Learning Research, 21(26), 1-52..
Van Wieringen, W.N. (2019), "The generalized ridge estimator of the inverse covariance matrix", Journal of Computational and Graphical Statistics, 28(4), 932-942.
Miok, V., Wilting, S.M., van Wieringen, W.N. (2017), "Ridge estimation of the VAR(1) model and its time series chain graph from multivariate time-course omics data", Biometrical Journal, 59(1), 172-191.
Van Wieringen, W.N., Peeters, C.F.W. (2016), "Ridge estimation of inverse covariance matrices from high-dimensional data", Computational Statistics and Data Analysis, 103, 284-303.