MachOmics: Machine Learning for Omics Data

Launch the interactive MachOmics application to build machine learning models with your omics data. Upload your feature data and metadata, configure the model, and visualize performance metrics and feature importances.

Launch MachOmics App

Methods

(Optional) Data processing: Features are filtered to retain those with prevalent in 10% or more of samples. NA values are replaced with 0’s or a pseudocount of half the minimum non-0 value. Data are log2 transformed. Class distributions are visualized using ggplot2 version 3.5.2.

Model building: Random forest (ranger version 0.17.0) is applied to predict the target variable. In parallel, “null” data are generated by shuffling the target variable, and a secondary model is applied on these data. This process is repeated 15 times using 15 different pre-set seeds.

Model evaluation: For classification and regression problems, the out-of-bag predictions are compared to their true values. For classification problems, a confusion matrix (showing the median number of predictions per group), comparison of area under the receiver operating characteristic (ROC) curve (AUC) (using pROC version 1.18.5), and ROC curves are plotted. 95% confidence intervals are estimated from 15 iterations. AUCs are compared using a non-parametric Wilcoxon test. For regression problems, Spearman rho is calculated between the predictions and true values. Spearman rho values are compared using a Wilcoxon test. Scatterplots represent median predicted values versus corresponding true values.

Important features: Feature importances are calculated by permutation, yielding a % decrease in accuracy upon removal of the feature. Feature importances between real and null datasets are evaluated with a wilcoxon test, and pvalues undergo Benjamini-Hochberg adjustment. Significant features are plotted. To determine the direction of association between a feature and the target, multiple linear models are fit using the formula y = value ~ target, and coefficients are extracted.

Citation

If you found this app helpful, please cite or acknowledge Peter Dobranowski https://pdobrano25.github.io/ml_website/index.html