🧬 Single-Cell Multi-Omics Integration

A Comprehensive Guide to Methods, Frameworks, and Best Practices (2015-2025)
From early paired measurements to modern foundation models: Understanding how to integrate RNA-seq, ATAC-seq, protein abundance, spatial data, and perturbation responses at single-cell resolution. Based on systematic analysis of 40+ landmark papers spanning a decade of innovation.
40+
Papers Analyzed
(2015-2025)
6
Integration
Categories
10+
Omics
Modalities
100M+
Cells in Foundation
Model Training

πŸ“– What is Single-Cell Multi-Omics Integration?

Single-cell multi-omics integration combines measurements of different molecular layers (transcriptome, epigenome, proteome, spatial location) from the same or related cells to build comprehensive cellular maps. This integration is essential for understanding cell states, developmental trajectories, disease mechanisms, and therapeutic responses.

Why Multi-Omics Integration Matters

Integration Paradigms (Fu, Shaliu, et al. Nature Methods, 2025), (Liu, Chunlei, et al. Nature Methods, 2025)

According to the benchmark papers, there are six major multi-omics integration paradigms:

πŸ”΅ Vertical Integration

RNA ATAC Protein Same cells, all modalities

🟒 Diagonal Integration

Dataset 1 RNA Dataset 2 ATAC Non-overlapping modalities

🟑 Mosaic Integration

RNA ATAC Dataset 1 RNA Protein Dataset 2 RNA β€” Dataset 3 Partial overlap (mosaic)

πŸ”΄ Cross Integration

Batch 1 Batch 2 Batch 3 ⚠ Batch Effects All modalities + batch correction

🟣 Spatial Integration

X Y Molecular + spatial coords

🟠 Perturbation Integration

Control Perturb A ⚑ Perturb B 🧬 Multi-modal + perturbations

Each paradigm addresses different data structures and analytical challenges in multi-omics analysis.

πŸ“… Evolution Timeline: From Paired Measurements to Foundation Models

2015-2017: Early Multi-Modal Technologies

Key Innovations:

  • G&T-seq (2015): First simultaneous RNA + DNA methylation
  • CITE-seq (2017): RNA + surface protein via antibody tags
  • mixOmics (2017): Statistical framework for multi-block data

Era Characteristic: Experimental methods development; simple statistical integration

2019-2020: Statistical Methods Era

Key Innovations:

  • DIABLO (2019): Multi-omics discriminant analysis
  • MOFA+ (2020): Multi-omics factor analysis with covariates

Era Characteristic: Matrix factorization; interpretable latent factors; limited scalability

2021-2022: Deep Learning Breakthrough

Key Innovations:

  • totalVI (2021): VAE for RNA + protein integration
  • Seurat WNN (2021): Weighted nearest neighbor multi-modal analysis
  • Concerto (2022): Contrastive learning for 10M+ cells

Era Characteristic: VAE dominance; scalability improvements; atlas-scale analyses

2023: Optimal Transport & Graph Methods

Key Innovations:

  • CellOT (2023): Neural optimal transport for perturbations
  • SIMBA (2023): Graph embedding with cells + features co-embedded

Era Characteristic: Theoretical rigor; optimal transport theory

2024: Foundation Model Era Begins

Key Innovations:

  • scGPT (2024): 100M parameter transformer on 33M cells

Era Characteristic: Pre-training paradigm; 10M+ cell datasets; transfer learning

2025: Specialized Foundation Models

Key Innovations:

  • CellWhisperer (2025): Instruction-tuned multimodal foundation model
  • Nicheformer (2025): Spatial multi-omics foundation model
  • OmiCLIP (2025): Visual-omics foundation model (H&E + transcriptomics)
  • MORPH (2025): Cross-condition perturbation prediction

Era Characteristic: Task-specific foundation models; comprehensive benchmarking; clinical translation focus

πŸ”¬ Method Taxonomy: Algorithmic Approaches

By Computational Framework

🧠 Variational Autoencoders (VAE-based)

Principle: Learn probabilistic latent representations with encoder-decoder architecture

Advantages: Uncertainty quantification; generative modeling; missing data imputation

totalVI (2021) - RNA+Protein
MultiVI (2023) - Mosaic integration
scVI (2018) - Single modality

πŸ”„ Contrastive Learning

Principle: Learn representations by pulling similar samples together, pushing dissimilar apart

Advantages: Scalability to millions of cells; no explicit pairing needed; robust embeddings

Concerto (2022) - 10M+ cells

πŸ“Š Graph Neural Networks (GNN)

Principle: Model cells as graph nodes; aggregate information from neighborhoods

Advantages: Captures cell-cell relationships; flexible message passing; spatial awareness

SIMBA (2023) - Cells + features co-embedding
GLUE (2022) - Graph-based integration
SIMVI (2025) - Spatial + intrinsic disentanglement

πŸš€ Optimal Transport

Principle: Find minimal-cost mapping between cell distributions

Advantages: Theoretical guarantees; preserves distributional structure; interpretable

CellOT (2023) - Perturbation prediction
Labeled GWOT (2025) - Cross-modality alignment
SCOT (2022) - Gromov-Wasserstein

πŸ€– Foundation Models (Transformers)

Principle: Pre-train large models on massive datasets; fine-tune for specific tasks

Advantages: Transfer learning; few-shot adaptation; generalizable representations

CellWhisperer (2025) - Instruction-tuned
Nicheformer (2025) - Spatial specialist
scGPT (2024) - 33M cells pretrain
OmiCLIP (2025) - Visual-omics CLIP

πŸ”— Matrix Factorization & Classical

Principle: Decompose data matrices into latent factor representations

Advantages: Interpretable factors; computationally efficient; well-understood theory

MOFA+ (2020) - Multi-omics factors
Seurat CCA/WNN (2021) - Canonical correlation
mixOmics (2017) - Multiblock projection to latent structure (PLS)
DIABLO (2019) - Discriminant analysis

By Scale Capability

Scalability Tiers

  • Small Scale (<10K cells): MOFA+, DIABLO, mixOmics - ideal for pilot studies
  • Medium Scale (10K-100K cells): Seurat WNN, totalVI, MultiVI - standard analyses
  • Large Scale (100K-1M cells): Concerto, SIMBA, scBridge - atlas construction
  • Atlas Scale (>1M cells): Foundation models (scGPT, CellWhisperer), SnapATAC2 - population studies

πŸ“„ Landmark Papers by Computational Framework (2015-2025)

🧬 Experimental Technologies (Foundation)

CITE-seq: Simultaneous epitope and transcriptome measurement in single cells

2017 Nature Methods RNA+Protein
Pioneering technology combining RNA-seq with antibody-derived tags (ADT) for protein quantification. Enabled paired transcriptome-proteome measurements at single-cell resolution.
Key Contributions:
  • Antibody-oligonucleotide conjugation method
  • Validated on PBMC immune cell populations
  • Foundation for multi-modal single-cell biology

scONE-seq: A single-cell multi-omics method enables simultaneous dissection of phenotype and genotype heterogeneity from frozen tumors

2023 Science Advances DNA+RNA
scONE-seq is a versatile single-cell multi-omics method that simultaneously profiles whole-genome DNA and full-length RNA from the same cell in a one-pot reaction, enabling multi-omics analysis of frozen biobanked tumor samples and revealing transcriptionally normal-like tumor clones.
Applications:
  • Works with frozen tissue samples
  • Simultaneous DNA and RNA profiling
  • Tumor heterogeneity analysis

πŸ”· Variational Autoencoders & Probabilistic Models

Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models (totalVI)

2021 Nature Methods VAE-based
Variational autoencoder for integrating RNA and protein measurements. Models technical effects including batch, background noise, and protein zero-inflation.
Capabilities:
  • Batch correction across technologies
  • Protein imputation from RNA
  • Uncertainty quantification

Multi-resolution deconvolution of spatial transcriptomics data reveals continuous patterns of inflammation (MultiVI)

2023 Nature Methods Mosaic Integration
Variational inference framework for mosaic multi-omics integration. Handles incomplete modality measurements across datasets with joint latent space.
Features:
  • Handles RNA+ADT+ATAC combinations
  • Missing modality imputation
  • Spatial deconvolution capabilities

Cobolt: integrative analysis of multimodal single-cell sequencing data

2021 Genome Biology VAE-based
Cobolt uses Multimodal Variational Autoencoders (MVAE) based on hierarchical generative models to enable coherent integration of multi-modality single-cell data with single-modality datasets, creating a unified representation for downstream analysis.
Innovations:
  • Hierarchical latent variable model
  • Handles incomplete modality measurements
  • Supports SNARE-seq and other multimodal data

πŸ•ΈοΈ Graph-Based Methods

Integrated analysis of multimodal single-cell data (Seurat WNN)

2021 Cell Weighted NN
Weighted nearest neighbor (WNN) algorithm for integrating paired multi-modal measurements. Learns modality-specific weights to construct unified cell similarity graph.
Key Innovations:
  • Cell-specific modality weighting
  • Works with RNA+ADT, RNA+ATAC
  • Integrated into widely-used Seurat package

SIMBA: single-cell embedding along with features

2023 Nature Methods Graph Embedding
Graph-based co-embedding framework placing cells and features (genes, peaks, motifs) in shared space. Enables clustering-free marker discovery and regulatory network inference.
Innovations:
  • Unified cell-feature embedding space
  • Multi-omics native support
  • Scales to 1.3M cells in 1.5 hours

πŸ”„ Optimal Transport Methods

Learning single-cell perturbation responses using neural optimal transport (CellOT)

2023 Nature Methods Optimal Transport
Input convex neural networks for optimal transport maps. Predicts single-cell perturbation responses from unaligned control/treatment populations.
Applications:
  • Drug response prediction
  • Genetic knockout effects
  • Cross-patient generalization

Cross-modality matching and prediction of perturbation responses with labeled Gromov-Wasserstein optimal transport

2025 AISTATS Optimal Transport
Label-constrained Gromov-Wasserstein OT for cross-modality alignment. Achieves L-fold computational speedup while improving alignment quality with perturbation labels.
Innovations:
  • Incorporates perturbation labels
  • RNA β†’ protein prediction
  • Dose-response preservation

🧠 Deep Learning & Neural Networks

Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram

2021 Nature Methods Spatial Transcriptomics
Tangram is a deep learning framework that aligns single-cell/single-nucleus RNA-seq data to any form of spatial transcriptomics data to generate genome-wide spatial expression maps at single-cell resolution and integrate them with anatomical references.
Key Features:
  • Works with any spatial technology (MERFISH, Visium, smFISH)
  • Expands gene coverage to genome-wide scale
  • Automated histological registration module

scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning

2022 Nature Biotechnology RNA+ATAC
scJoint is a scalable transfer learning method using neural networks to integrate atlas-scale scRNA-seq and scATAC-seq data through semisupervised learning, achieving 84% label transfer accuracy while handling over 1 million cells in 2 hours.
Key Achievements:
  • Processes 1M+ cells in 2 hours
  • 84% label transfer accuracy
  • Effective batch correction across platforms

scBridge embraces cell heterogeneity in single-cell RNA-seq and ATAC-seq data integration

2023 Nature Communications Transfer Learning
scBridge is a heterogeneous transfer learning method that exploits cell heterogeneity to progressively integrate scRNA-seq and scATAC-seq data by first identifying and integrating "reliable" scATAC-seq cells with smaller omics differences, then using them as bridges to integrate the remaining cells.
Novel Approach:
  • Progressive integration strategy
  • Exploits cell heterogeneity as advantage
  • Superior performance on challenging datasets

scMODAL: a general deep learning framework for comprehensive single-cell multi-omics data alignment with feature links

2025 Nature Communications GAN-based
scMODAL is a deep learning framework leveraging neural networks and generative adversarial networks to align single-cell multi-omics datasets with limited known feature correlations, enabling accurate cross-modality integration, feature imputation, and regulatory relationship inference.
Features:
  • GAN-based architecture for alignment
  • Works with limited feature correlations
  • Cross-modality imputation and prediction

A visual-omics foundation model to bridge histopathology with spatial transcriptomics (OmiCLIP)

2025 Nature Methods Visual-Omics
CLIP-based foundation model trained on 2.2M H&E image-transcriptomics pairs. Enables gene expression prediction from routine histology images.
Capabilities:
  • Image β†’ transcriptomics prediction
  • Tissue section alignment
  • Cell type annotation from H&E
  • Spatial decomposition

🎯 Contrastive Learning

Contrastive learning enables rapid mapping to multimodal single-cell atlas of multimillion scale (Concerto)

2022 Nature Machine Intelligence Contrastive Learning
Self-distillation framework with asymmetric teacher-student architecture. Demonstrates linear scalability to 10M+ cells with 100x speedup over baselines.
Achievements:
  • 10M cell reference atlas in 1.5 hours
  • Query mapping in 8 seconds (10K cells)
  • Superior clustering and classification

πŸ€– Foundation Models & Transformers

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

2024 Nature Methods Foundation Model
100M parameter transformer pre-trained on 33M cells. Generative pretraining with cell-as-sentence paradigm. Supports multi-omics fine-tuning and zero-shot predictions.
Capabilities:
  • Cell type annotation
  • Batch correction
  • Perturbation prediction
  • Gene network inference

CellWhisperer: An instruction-tuned foundation model for single-cell multimodal analysis

2025 Nature Biotechnology Foundation Model
Instruction-tuned multimodal foundation model supporting natural language queries. First single-cell model with conversational interface for biological questions.
Features:
  • Natural language biological queries
  • Multi-task learning (classification, clustering, prediction)
  • Zero-shot generalization

πŸŒ‰ Mosaic & Bridge Integration

Stabilized mosaic single-cell data integration using unshared features (StabMap)

2023 Nature Biotechnology Mosaic Integration
StabMap enables mosaic data integration of single-cell datasets by exploiting non-overlapping features through traversal of a mosaic data topology, allowing multi-hop integration where some datasets share no common features.
Unique Capabilities:
  • Multi-hop integration without direct feature overlap
  • Leverages non-overlapping features
  • Supports supervised and unsupervised modes

Building a cross-species cell atlas with interpretable deep learning (Dictionary Learning)

2023 Nature Biotechnology Bridge Integration
Dictionary learning approach for cross-species and cross-technology integration. Learns interpretable gene programs bridging evolutionary distances.
Applications:
  • Human-mouse integration
  • Cross-platform harmonization
  • Conserved program discovery

πŸ”¬ Perturbation & Response Prediction

Predicting cell morphological responses to perturbations using generative modeling (IMPA)

2025 Nature Communications Generative Model
IMPA is a generative style-transfer model that predicts cellular morphological responses to unseen chemical and genetic perturbations while accounting for batch effects in high-content imaging screens.
Applications:
  • Drug response prediction from imaging
  • Handles batch effects in HCS
  • Generative modeling for perturbation screens

πŸ“Š Benchmark & Review Papers

Multitask benchmarking of single-cell multimodal omics integration methods

2025 Nature Methods Benchmark
Comprehensive evaluation of 40 integration methods across 7 tasks using 64 real + 22 simulated datasets. Establishes integration taxonomy (vertical/diagonal/mosaic/cross) and provides decision trees for method selection.
Key Findings:
  • No universal winner; task-dependent performance
  • Deep learning dominates diagonal/cross integration
  • Batch correction often trades off with biological preservation

Benchmarking single-cell multi-modal data integrations

2025 Nature Methods Benchmark
Systematic assessment across usability, accuracy, and robustness dimensions. Evaluates 40 algorithms with 101 benchmark datasets spanning diverse technologies and cell types.
Evaluation Framework:
  • Multi-gradient AUC for robustness
  • Hardware scalability testing (500GB RAM, 24h limits)
  • Cross-modality imputation assessment

How to build the virtual cell with artificial intelligence: Priorities and opportunities (AIVC)

2024 Cell Perspective
Vision paper outlining the AI Virtual Cell framework. Proposes universal representations across molecular, cellular, and multicellular scales with virtual instruments for manipulation and decoding.
Framework Components:
  • Universal representations (URs) across scales
  • Virtual instruments (manipulators & decoders)
  • Foundation model architecture for cells

The Human Cell Atlas: from a cell census to a unified foundation model

2024 Nature Perspective
Roadmap for Human Cell Atlas evolution from cell census to foundation models. Outlines 5 perspectives: cell census, 3D maps, genotype-phenotype maps, developmental maps, and foundation models.

πŸ”§ Methods Development & Innovation

SnapATAC2: A fast, scalable and versatile tool for analysis of single-cell omics data

2024 Nature Methods scATAC-seq
Matrix-free spectral embedding for chromatin accessibility data. Achieves linear time/memory complexity enabling analysis of million-cell datasets on standard hardware.
Performance:
  • O(n) complexity vs O(nΒ²) for competitors
  • 63.4% cost reduction vs ArchR
  • 200K cells in 13.4 minutes

SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data

2025 Nature Communications Spatial
Dual encoder VAE with MLP for intrinsic variation and GAT for spatial variation. First identifiable framework for separating intrinsic cell state from spatial microenvironment effects.
Innovations:
  • Asymmetric regularization for identifiability
  • Single-cell spatial effect estimation
  • Causal inference integration (DML)

scMODAL: a general deep learning framework for comprehensive single-cell multi-omics data alignment

2025 Nature Communications GAN-based
GAN-based integration using feature links (e.g., gene-protein pairs). First method to effectively handle weakly-linked modalities through adversarial alignment.
Capabilities:
  • 29-34% imputation improvement
  • Works with minimal feature links
  • Integrates CITE-seq + CyTOF

MORPH predicts the single-cell outcome of genetic perturbations across conditions and data modalities

2025 bioRxiv Perturbation
Discrepancy-based VAE with attention mechanism for perturbation prediction. Handles both transcriptomic and imaging modalities with prior knowledge integration (DepMap, GenePT).
Features:
  • Cross-cell line transfer learning
  • Combinatorial perturbation modeling
  • Active learning for experiment design

MetaQ: fast, scalable and accurate metacell inference via single-cell quantization

2025 Nature Communications Scalability
Deep learning with quantization for metacell construction. Achieves O(n) complexity and 100x speedup over SEACell while maintaining superior classification accuracy.
Performance:
  • 100K cells in 0.3h vs 26.7h (SEACell)
  • 88% balanced accuracy vs 84% (baseline)
  • Native multi-omics support

ADTnorm: robust integration of single-cell protein measurement across CITE-seq datasets

2025 Nature Communications Protein
Functional data analysis for protein expression normalization. Landmark alignment preserves negative peaks essential for cell type annotation while harmonizing batch effects.
Applications:
  • Cross-institutional integration
  • Antibody titration optimization
  • Auto-gating (80-100% accuracy)

πŸ“š Comprehensive Method Comparison

By Integration Category & Performance

Method Year Category Modalities Scale Key Strength
Vertical Integration (Paired Multi-Modal)
Seurat WNN 2021 Vertical RNA+ADT, RNA+ATAC ~100K cells Cell-specific modality weighting; widely adopted
totalVI 2021 Vertical/Cross RNA+ADT ~50K cells Probabilistic; batch correction; imputation
Multigrate 2024 Vertical/Cross RNA+ADT+ATAC ~100K cells Tri-modal support; robust performance
Diagonal Integration (Unpaired, Non-Overlapping)
scBridge 2023 Diagonal RNA+ATAC ~50K cells Superior dimensionality reduction & clustering
GLUE 2022 Diagonal RNA+ATAC ~50K cells Graph neural network; best batch correction
scJoint 2022 Diagonal RNA+ATAC ~100K cells Multi-batch integration; transfer learning
Mosaic Integration (Overlapping Incomplete)
StabMap 2023 Mosaic Any combination ~50K cells Flexible; efficient; handles any modality pattern
MultiVI 2023 Mosaic RNA+ADT+ATAC ~100K cells VAE-based; missing modality imputation
Cobolt 2023 Mosaic RNA+ADT+ATAC ~50K cells Bayesian framework; uncertainty quantification
Spatial Integration
SIMVI 2025 Spatial Spatial transcriptomics ~60K cells Disentangles intrinsic vs spatial variation
OmiCLIP 2025 Spatial H&E + ST 2.2M pairs Visual-omics foundation model; H&E β†’ gene expression
Tangram 2021 Spatial Spatial mapping ~50K cells Maps scRNA-seq to spatial coordinates
Perturbation-Aware Integration
CellOT 2023 Perturbation RNA-seq (protein/imaging) ~50K cells Neural OT; single-cell predictions
MORPH 2025 Perturbation RNA + Imaging ~300K cells Cross-modality; cross-cell line transfer
Labeled GWOT 2025 Perturbation RNA + Protein ~50K cells Label-constrained OT; L-fold speedup
Classical/Statistical Methods
MOFA+ 2020 Vertical Any ~10K cells Interpretable factors; handles covariates
mixOmics 2017 Vertical Any ~5K cells Multiblock projection to latent structure (PLS); statistical rigor

πŸ’‘ Practical Implementation Guide

Choosing the Right Method: Decision Framework

Step 1: Identify Your Data Structure

  • All cells have all modalities? β†’ Vertical integration
  • Different batches, different modalities? β†’ Diagonal
  • Mixed modality availability? β†’ Mosaic
  • Multiple batches, all modalities? β†’ Cross
  • Spatial data? β†’ Spatial methods
  • Perturbation data? β†’ Perturbation-aware

Step 2: Consider Your Computational Resources

Limited Resources (no GPU, <32GB RAM):

  • Classical methods

Moderate Resources (GPU optional, 32-64GB RAM):

  • Most VAE-based and graph methods

High-End Resources (GPU required, 64GB+ RAM):

  • Foundation models

Common Pitfalls & Best Practices

Software Ecosystem & Tools