MachOmics: drag-and-drop data to quickly build a machine learning model - with plots for performance metrics and interpretable feature importances.

Vignettes: examples of machine learning workflows (in R) are provided based on existing publications by other research groups. In most cases, analyses are validated and extended.

  • Microbial Load Predictor (16S): Validating the performance of a novel tool that predicts the number of bacterial cells in a metagenomic sample. A regression problem with internal and external validation. Additional models are evaluated.
  • Predicting Crohn’s Disease (16S): Validating a study from 2014 using (low biomass) microbiome samples obtained from biopsies in new-onset Crohn’s disease. Results could not be validated for any discernible reason. Only internal validation was employed. Additional models are evaluated.
  • Predicting IBD (Metagenome): Validating a meta-analysis from 2023 using stool microbiome samples obtained from 3 cohorts in the curatedMetagenomicData. 3 other cohorts were not available to download, so results could not be fully validated. Hybrid up/down sampling is applied. Additional models are evaluated.
  • Predicting IBD (RNAseq): Re-analyzing a large RNAseq dataset obtained from mucosal biopsies of CD, UC, and non-IBD controls. Focusing on rectum biopsies, applying machine learning to predict inflamed vs non-inflamed samples, and then predicting IBD using non-inflamed samples. Downsampling and ensemble learning are applied.
  • Predicting Disease (Metagenome): Validating a meta-analysis of microbiome data linked to several different diseases. Downsampling and multiclass learning are applied, and additional models are evaluated.
  • Predicting IBD (Metabolome): Validating a metabolomic analysis of microbiome data from IBD and non-IBD subjects. Downsampling is applied and additional models are evaluated. Feature importances are extracted and contextualized with the literature.
  • Predicting Metabolites (Metagenome): Validating the performances of machine learning in predicting metabolite abundances using genus-level metagenomic data in IBD microbiomes. A series of regression problems with internal and external validation. Feature importances are extracted and contextualized with the literature. Additional models are evaluated.
  • Predicting Mouse Age (Metagenome): Validating the hypothesis that a model trained on ad libitum-fed mice microbiomes will predict calorie-restricted mice ages to be higher than they actually are. A regression problem without optimization. Conclusions were not affected by the removal of pseudoreplicates. Additional models are evaluated.

Overfit Check: comments on poor machine learning engineering leading to overfitting.

Models: common machine learning algorithms are listed, illustrated, and described.

About: source code for analyses and a list of publications by Peter Dobranowski.