π Overview
Trajectory inference (TI) methods reconstruct the dynamic processes of cellular differentiation, development, and state transitions from single-cell data. These computational approaches allow researchers to understand how cells progress through different states over time, identify key transition points, and discover the genes that drive these changes.
π― Key Considerations
Selecting the right trajectory inference method depends on several factors:
- Dataset size: From thousands to millions of cells
- Data type: Standard scRNA-seq, spatial, multimodal, or temporal
- Experimental design: Single snapshot vs. multiple timepoints
- Biological complexity: Linear, branching, cyclical, or disconnected trajectories
- Computational resources: Available memory and processing power
π¬ Major Method Categories
1. RNA Velocity-Based Methods
These methods infer future cell states by modeling the relationship between unspliced and spliced mRNA:
velocyto
2018
Nature
The pioneering RNA velocity method that introduced the concept of using splicing dynamics to predict future cell states.
- First RNA velocity implementation
- Steady-state model assumption
- Works with standard scRNA-seq protocols
Algorithm: Estimates RNA velocity by modeling splicing kinetics under steady-state assumptions. Calculates velocity as v = u - Ξ³s where u is unspliced mRNA, s is spliced mRNA, and Ξ³ is the degradation rate fitted via linear regression. Projects velocities onto PCA/t-SNE/UMAP embeddings to visualize cell state transitions.
scVelo
2020
Nature Biotechnology
An improved RNA velocity framework with dynamical modeling that accounts for transcriptional induction, repression, and steady-state. Most widely used velocity method.
- Dynamical and stochastic models
- Improved accuracy over velocyto
- Latent time estimation
- Driver gene identification
- Compatible with Scanpy ecosystem
Algorithm: Models RNA velocity using dynamical system of unspliced (u) and spliced (s) mRNA: du/dt = Ξ± - Ξ²u, ds/dt = Ξ²u - Ξ³s, where Ξ± is transcription rate, Ξ² is splicing rate, and Ξ³ is degradation rate. Learns gene-specific kinetic parameters via expectation-maximization, then projects velocity vectors onto low-dimensional embeddings.
veloVI
2024
Nature Methods
Probabilistic RNA velocity inference using variational inference with uncertainty quantification.
- Deep generative modeling framework
- Accounts for technical noise
- Uncertainty estimates for velocity
- Better handling of low counts
Algorithm: Uses variational autoencoders (VAEs) to model splicing dynamics probabilistically. Learns latent variables for transcriptional state and kinetic parameters while accounting for technical noise. Provides uncertainty quantification by sampling from posterior distributions of both velocity direction (intrinsic) and future states (extrinsic).
UniTVelo
2022
Nature Communications
Unified RNA velocity framework using unified latent time modeling across the transcriptome via Radial Basis Functions.
- Unified latent time estimation
- Phase portraits for visualization
- Top-down time modeling approach
- Improved stability in velocity estimates
Algorithm: Uses Radial Basis Functions (RBFs) to learn a unified latent time shared across all genes, moving from gene-specific to transcriptome-wide temporal modeling. Fits splicing dynamics to this shared time coordinate, improving consistency across genes and reducing over-fitting from noisy gene-specific estimates.
2. Optimal Transport-Based Methods
These methods use optimal transport theory to match cells across conditions or timepoints:
Waddington-OT
2019
Cell
Uses optimal transport to infer developmental trajectories and fate probabilities across timepoints.
- Temporal trajectory reconstruction
- Fate probability predictions
- Perturbation analysis
- Requires multiple timepoints
Algorithm: Computes optimal transport maps between consecutive time points by minimizing Wasserstein distance with entropy regularization (Sinkhorn algorithm). Models development as a sequence of transport maps, allowing computation of cell fate probabilities and ancestor/descendant relationships across temporal data.
MOSCOT
2025
Nature (preprint 2023)
Multi-Omics Single-Cell Optimal Transport - the most scalable OT framework, handling over 1.7 million cells with linear time complexity.
- Atlas-scale: handles 500K+ cells efficiently
- Multi-omics integration (RNA, ATAC, protein)
- Spatial and spatiotemporal mapping
- Temporal trajectory inference
- Neural OT solvers for speed
- Handles unbalanced problems (growth/death)
Algorithm: Uses entropic Gromov-Wasserstein optimal transport with low-rank factorizations for linear time/memory complexity. Employs neural network parameterizations for transport maps and integrates multiple modalities through fused optimal transport. Supports unbalanced formulations via Kullback-Leibler divergence for modeling cell proliferation/death.
GENOT
2024
NeurIPS 2024
Gene-regulated neural optimal transport with uncertainty quantification for trajectory inference using flow matching.
- Stochastic OT framework
- Gene regulatory modeling
- Handles unbalanced transport
- Cross-modality translation
- Uncertainty quantification
Algorithm: Learns stochastic transport plans using entropic Wasserstein and Gromov-Wasserstein flow matching. Neural networks parameterize velocity fields that interpolate between distributions. Provides uncertainty estimates through stochastic sampling and supports unbalanced formulations for modeling growth dynamics.
3. Graph-Based & Pseudotime Methods
Traditional approaches that construct trajectories using dimensionality reduction and graph structures:
Monocle 3
2019
Nature
Advanced trajectory inference framework using UMAP and principal graphs, designed to scale to millions of cells and handle discontinuous trajectories. Used to analyze the Mouse Organogenesis Cell Atlas (2 million cells).
- Ultra-scalable: handles millions of cells efficiently
- UMAP-based dimensionality reduction
- Discontinuous trajectories and convergent fates
- Louvain partitioning for cell communities
- Principal graph learning with SimplePPT
- Loop detection for cyclical trajectories
- Moran's I test for spatial autocorrelation
- Pseudotime via geodesic distance
- Integrated with Seurat/Scanpy
Algorithm: (1) Dimensionality reduction via UMAP for fast embedding of large datasets, (2) Louvain clustering for initial partitioning, (3) Enhanced SimplePPT algorithm learns principal graphs allowing disconnected components and convergent paths, (4) Pseudotime computed as geodesic distance along learned graph from root cells, (5) Differential expression via Moran's I statistic for spatial autocorrelation.
Monocle 2
2017
Nature Methods
Earlier version using reversed graph embedding for trajectory reconstruction. Still widely used for smaller datasets.
- DDRTree algorithm
- Branch point identification
- Differential expression testing along pseudotime
- Works well for less than 50K cells
Algorithm: Uses DDRTree (Discriminative Dimensionality Reduction via Tree) algorithm which performs reversed graph embedding. Constructs a principal tree in reduced dimensional space that captures branching differentiation paths. Assumes continuous manifold structure without allowing convergence or disconnected components.
Slingshot
2018
BMC Genomics
Flexible trajectory inference using cluster-based minimum spanning trees and principal curves.
- Works with any dimensionality reduction
- Cluster-based approach
- Multiple lineage support
- Well-integrated with Bioconductor
Algorithm: Two-stage approach: (1) Constructs minimum spanning tree on cluster centroids to identify lineage structure, (2) Fits simultaneous principal curves through low-dimensional space for each lineage, allowing shared early segments. Assigns cells pseudotime along curves and lineage weights based on proximity.
PAGA
2019
Genome Biology
Partition-based graph abstraction that creates coarse-grained trajectory representations. Exceptionally scalable: 1.3M cells in 90 seconds.
- Graph abstraction approach
- Handles complex topologies
- Integrated with Scanpy
- Good for exploratory analysis
- 130Γ faster than UMAP
Algorithm: Creates partition-based graph where nodes represent cell clusters and edges represent connectivity strength. Computes edge weights using statistical tests on inter-cluster vs. intra-cluster distances. Preserves global topology while allowing multi-resolution analysis through hierarchical clustering. Can initialize UMAP for faster embedding.
4. Hybrid & Multi-Method Approaches
Methods that combine multiple signals or integrate different trajectory inference approaches:
CellRank 2
2024
Nature Methods
Unified framework combining RNA velocity, pseudotime, gene expression, and experimental time for robust fate predictions. Scales to millions of cells.
- Multi-view learning approach
- Combines velocity, pseudotime, and real time
- Handles multimodal data
- Terminal state identification
- Driver gene discovery
- Integrated with Scanpy ecosystem
Algorithm: Modular framework with kernels for different data views (velocity, pseudotime, real time, metabolic labeling). Combines kernels via weighted aggregation into cell-cell transition matrix. Computes fate probabilities using Markov chain analysis and identifies terminal states via eigenvector decomposition. 30Γ faster than CellRank 1.
DELVE
2024
Nature Communications
Feature selection for trajectory analysis that identifies genes driving dynamic processes.
- Dynamic feature selection
- Identifies trajectory-driving genes
- Works with velocity or pseudotime
- Removes redundant features
- Improves downstream analysis
Algorithm: Unsupervised bottom-up approach identifying dynamic gene/protein modules. Uses graph-based methods to find features that robustly recapitulate cellular trajectories while removing redundancy. Works across modalities (scRNA-seq, mass cytometry, imaging) by evaluating feature contribution to trajectory preservation.
5. Specialized Methods
Purpose-built tools for specific biological questions or data types:
CASi
2024
Scientific Reports
Discovers novel cell types and subpopulations along developmental trajectories using cross-timepoint analysis.
- Novel cell type discovery
- Temporal single-cell data
- Handles rare populations
- Annotates discovered types
Algorithm: Neural network architecture for cross-timepoint annotation and automatic feature selection. Detects potentially novel cell types that emerge over developmental time by comparing cell type distributions across timepoints. Uses attention mechanisms to identify discriminative features for each discovered population.
sciCSR
2024
Nature Methods (online Nov 2023)
Specialized method for B cell development using class-switch recombination as molecular clock.
- B cell trajectory inference
- Uses CSR as temporal marker
- High temporal resolution
- Links phenotype to maturation
Algorithm: Leverages class-switch recombination (CSR) events as intrinsic molecular timestamps. Constructs Markov state model of B cell differentiation states based on immunoglobulin isotype expression patterns. Achieves ~0.9 cosine similarity in BCR isotype predictions by modeling CSR dynamics.
TIGON
2024
Nature Machine Intelligence
Models growth dynamics explicitly during trajectory inference to account for proliferation using dynamic unbalanced optimal transport.
- Growth rate modeling
- Proliferation-aware trajectories
- Birth/death process integration
- Corrects for cell cycle effects
- Infers gene regulatory networks
Algorithm: Uses dynamic unbalanced optimal transport based on Wasserstein-Fisher-Rao distance to simultaneously reconstruct trajectories AND model population growth/death. Employs neural ODEs implemented in PyTorch. Learns cell-cell communication and gene regulatory networks while accounting for proliferation dynamics.
PRESCIENT
2021
Nature Communications
Learns potential landscapes from temporal single-cell data to predict differentiation trajectories. Developed by Gifford lab at MIT CSAIL.
- Potential landscape modeling
- Trajectory perturbation analysis
- In silico perturbations
- Requires temporal data
- Waddington landscape framework
Algorithm: Models cell differentiation as diffusion over Waddington potential landscapes. Neural networks parameterize potential functions, learning landscape geometry from temporal data. Predicts cell trajectories through gradient descent on learned potentials. Enables in silico perturbation analysis by modifying landscape topography.
FLOW-MAP
2020
Nature Protocols
Trajectory visualization tool optimized for flow and mass cytometry data using force-directed graph layouts.
- Graph-based layout
- Optimized for flow/CyTOF
- Interactive visualization
- Handles large datasets
- Force-directed embedding
Algorithm: Constructs k-nearest neighbor graph with edges constrained to sequential timepoints. Applies ForceAtlas2 force-directed layout algorithm to create 2D visualization preserving temporal ordering. Supports density-dependent downsampling and hierarchical clustering for scalability across variable dataset sizes.
PHLOWER
2025
Nature Methods
Hierarchical lineage tree inference with probabilistic modeling for complex multi-branching developmental systems using Hodge Laplacian decomposition.
- Hierarchical tree structures
- Probabilistic framework
- Handles uncertainty
- Complex lineage relationships (up to 26 branches)
- Multimodal RNA+ATAC support
Algorithm: Uses Hodge Laplacian decomposition on simplicial complexes to infer hierarchical lineage trees. Decompose cell-cell relationships into gradient (hierarchical), curl (cyclical), and harmonic (equilibrium) components. Provides uncertainty quantification through probabilistic modeling. Works with multimodal data (RNA+ATAC).
π Detailed Method Comparison
This table provides a comprehensive comparison of key features across different trajectory inference methods:
| Method |
Year |
Max Scale |
Multiple Timepoints |
Single Snapshot |
Spatial Data |
Multimodal |
Growth/Death |
Key Strength |
| Monocle 3 |
2019 |
2M+ cells |
β |
β |
β |
β |
β |
Ultra-scalable, discontinuous trajectories, convergent fates |
| Monocle 2 |
2017 |
~50K cells |
β |
β |
β |
β |
β |
DDRTree algorithm, well-established |
| MOSCOT |
2025 |
1.7M+ cells |
β |
β |
β |
β |
β |
Most scalable OT, spatial, multimodal |
| scVelo |
2020 |
~100K cells |
β |
β |
β |
β |
β |
Dynamic velocity, most popular |
| GENOT |
2024 |
Variable |
β |
β |
β |
β |
β |
Stochastic OT, uncertainty quantification |
| CellRank 2 |
2024 |
1.3M+ cells |
β |
β |
β |
β |
β |
Multi-view integration |
| veloVI |
2024 |
~50K cells |
β |
β |
β |
β |
β |
Uncertainty quantification |
| UniTVelo |
2022 |
~100K cells |
β |
β |
β |
β |
β |
Batch integration |
| CASi |
2024 |
~100K cells |
β |
β |
β |
β |
β |
Novel cell type discovery |
| sciCSR |
2024 |
~50K cells |
β |
β |
β |
β |
β |
B cell specialization |
| Slingshot |
2018 |
~50K cells |
β |
β |
β |
β |
β |
Flexible, Bioconductor integration |
| PAGA |
2019 |
~200K cells |
β |
β |
β |
β |
β |
Graph abstraction, exploratory |
π‘ Best Practices
General Recommendations
- Start simple: Begin with scVelo for standard scRNA-seq before trying more complex methods
- Validate predictions: Use experimental data or known biology to validate trajectory predictions
- Combine methods: Tools like CellRank 2 allow integration of multiple signals
- Consider scale: For atlas-scale data (>500K cells), use MOSCOT or Monocle 3
- Quality control: Ensure high-quality data preprocessing before trajectory inference
- Assess topology: Use Monocle 3 for complex branching or disconnected structures
When to Use RNA Velocity vs Optimal Transport vs Graph-Based Methods
RNA Velocity (scVelo, veloVI):
- Single timepoint with sufficient sequencing depth
- Interested in immediate transcriptional dynamics
- Need to identify driving genes
- Standard droplet-based scRNA-seq
Optimal Transport (MOSCOT, GENOT):
- Multiple experimental timepoints available
- Very large datasets (>200K cells)
- Spatial or multimodal data
- Need to model growth/death explicitly
- Low sequencing depth (insufficient for velocity)
Graph-Based (Monocle 3, PAGA, Slingshot):
- Complex developmental trajectories with multiple branches
- Discontinuous trajectories or convergent developmental paths (Monocle 3)
- Large-scale datasets (100K-2M+ cells with Monocle 3)
- Exploratory analysis of trajectory topology (PAGA)
- When data spans multiple timepoints with gaps
- Need for pseudotime ordering along complex paths