PLNmodels: a collection of models for multivariate analysis in microbial ecology

Probabilités et Statistique

Salle séminaire M3-324
Mahendra Mariadassou
Lundi, 29 Avril, 2019 - 14:00 - 15:00

Microbial ecosystems play a major role in fields as diverse as human health, biowaste treatment and food production and understanding them is increasingly relevant to develop sustainable practives. High throughput sequencing allows for precise quantification of the taxa and functions present in the microbiome. Many dedicated tools have been developed in the last few years for the statistical analyses of microbiome data but the field remains very active due the challenging nature of the data. Microbiome data are high-dimensional, sparse, multivariate, highly structured, integer-valued, vary over several orders of magnitude and are subject to differences in sequencing depths.

Many methods, including compositional methods, rely on log-transformation of the counts followed by standard multivariate methods designed for gaussian settings. They deal with sparsity by adding pseudo-counts to the data. In this work, we introduce a generic multivariate framework based on the Poisson log-Normal distribution where the counts are modeled directly. They are Poisson distributed conditional to latent (hidden) Gaussian variables. This probabilistic modeling can accommodate the confounding effect of known covariates, varying sample sizes (through offset terms) and mixed marker-genes (e.g. 16S and ITS). We show how it can be used to perform dimension reduction, classification and network inference.

Based on
Chiquet, Julien, Mahendra Mariadassou, and Stéphane Robin. 2018. “Variational Inference for Probabilistic Poisson Pca.” Ann. Appl. Stat. 12 (4). The Institute of Mathematical Statistics: 2674–98.
Chiquet, Julien, Mahendra Mariadassou, and Stéphane Robin. 2018. “Variational inference for sparse network reconstruction from count data” arxiv:1806.03120