MachOmics:
drag-and-drop data to quickly build a machine learning model - with
plots for performance metrics and interpretable feature importances.
Vignettes: examples of machine learning workflows
(in R) are provided based on existing publications by other research
groups. In most cases, analyses are validated and extended.
- Microbial
Load Predictor (16S): Validating the performance of a novel tool
that predicts the number of bacterial cells in a metagenomic sample. A
regression problem with internal and external validation.
Additional models are evaluated.
- Predicting
Crohn’s Disease (16S): Validating a study from 2014 using (low
biomass) microbiome samples obtained from biopsies in new-onset Crohn’s
disease. Results could not be validated for any discernible reason. Only
internal validation was employed. Additional models are
evaluated.
- Predicting
IBD (Metagenome): Validating a meta-analysis from 2023 using stool
microbiome samples obtained from 3 cohorts in the
curatedMetagenomicData. 3 other cohorts were not available to download,
so results could not be fully validated. Hybrid up/down
sampling is applied. Additional models are evaluated.
- Predicting
IBD (RNAseq): Re-analyzing a large RNAseq dataset obtained from
mucosal biopsies of CD, UC, and non-IBD controls. Focusing on rectum
biopsies, applying machine learning to predict inflamed vs non-inflamed
samples, and then predicting IBD using non-inflamed samples.
Downsampling and ensemble learning are applied.
- Predicting
Disease (Metagenome): Validating a meta-analysis of microbiome data
linked to several different diseases. Downsampling and
multiclass learning are applied, and additional models
are evaluated.
- Predicting
IBD (Metabolome): Validating a metabolomic analysis of microbiome
data from IBD and non-IBD subjects. Downsampling is applied and
additional models are evaluated. Feature importances
are extracted and contextualized with the literature.
- Predicting
Metabolites (Metagenome): Validating the performances of machine
learning in predicting metabolite abundances using genus-level
metagenomic data in IBD microbiomes. A series of regression
problems with internal and external validation. Feature
importances are extracted and contextualized with the literature.
Additional models are evaluated.
- Predicting
Mouse Age (Metagenome): Validating the hypothesis that a model
trained on ad libitum-fed mice microbiomes will predict
calorie-restricted mice ages to be higher than they actually are. A
regression problem without optimization. Conclusions were not
affected by the removal of pseudoreplicates. Additional
models are evaluated.
Overfit
Check: comments on poor machine learning engineering
leading to overfitting.
Models:
common machine learning algorithms are listed, illustrated, and
described.
About:
source code for analyses and a list of publications by Peter
Dobranowski.