Trajectory Inference Methods

A comprehensive guide to computational tools for reconstructing cellular developmental trajectories

πŸ“‹ Overview

Trajectory inference (TI) methods reconstruct the dynamic processes of cellular differentiation, development, and state transitions from single-cell data. These computational approaches allow researchers to understand how cells progress through different states over time, identify key transition points, and discover the genes that drive these changes.

🎯 Key Considerations

Selecting the right trajectory inference method depends on several factors:

  • Dataset size: From thousands to millions of cells
  • Data type: Standard scRNA-seq, spatial, multimodal, or temporal
  • Experimental design: Single snapshot vs. multiple timepoints
  • Biological complexity: Linear, branching, cyclical, or disconnected trajectories
  • Computational resources: Available memory and processing power

πŸ”¬ Major Method Categories

1. RNA Velocity-Based Methods

These methods infer future cell states by modeling the relationship between unspliced and spliced mRNA:

velocyto

2018

Nature

The pioneering RNA velocity method that introduced the concept of using splicing dynamics to predict future cell states.

  • First RNA velocity implementation
  • Steady-state model assumption
  • Works with standard scRNA-seq protocols
Algorithm: Estimates RNA velocity by modeling splicing kinetics under steady-state assumptions. Calculates velocity as v = u - Ξ³s where u is unspliced mRNA, s is spliced mRNA, and Ξ³ is the degradation rate fitted via linear regression. Projects velocities onto PCA/t-SNE/UMAP embeddings to visualize cell state transitions.

scVelo

2020

Nature Biotechnology

An improved RNA velocity framework with dynamical modeling that accounts for transcriptional induction, repression, and steady-state. Most widely used velocity method.

  • Dynamical and stochastic models
  • Improved accuracy over velocyto
  • Latent time estimation
  • Driver gene identification
  • Compatible with Scanpy ecosystem
Algorithm: Models RNA velocity using dynamical system of unspliced (u) and spliced (s) mRNA: du/dt = Ξ± - Ξ²u, ds/dt = Ξ²u - Ξ³s, where Ξ± is transcription rate, Ξ² is splicing rate, and Ξ³ is degradation rate. Learns gene-specific kinetic parameters via expectation-maximization, then projects velocity vectors onto low-dimensional embeddings.

veloVI

2024

Nature Methods

Probabilistic RNA velocity inference using variational inference with uncertainty quantification.

  • Deep generative modeling framework
  • Accounts for technical noise
  • Uncertainty estimates for velocity
  • Better handling of low counts
Algorithm: Uses variational autoencoders (VAEs) to model splicing dynamics probabilistically. Learns latent variables for transcriptional state and kinetic parameters while accounting for technical noise. Provides uncertainty quantification by sampling from posterior distributions of both velocity direction (intrinsic) and future states (extrinsic).

UniTVelo

2022

Nature Communications

Unified RNA velocity framework using unified latent time modeling across the transcriptome via Radial Basis Functions.

  • Unified latent time estimation
  • Phase portraits for visualization
  • Top-down time modeling approach
  • Improved stability in velocity estimates
Algorithm: Uses Radial Basis Functions (RBFs) to learn a unified latent time shared across all genes, moving from gene-specific to transcriptome-wide temporal modeling. Fits splicing dynamics to this shared time coordinate, improving consistency across genes and reducing over-fitting from noisy gene-specific estimates.

2. Optimal Transport-Based Methods

These methods use optimal transport theory to match cells across conditions or timepoints:

Waddington-OT

2019

Cell

Uses optimal transport to infer developmental trajectories and fate probabilities across timepoints.

  • Temporal trajectory reconstruction
  • Fate probability predictions
  • Perturbation analysis
  • Requires multiple timepoints
Algorithm: Computes optimal transport maps between consecutive time points by minimizing Wasserstein distance with entropy regularization (Sinkhorn algorithm). Models development as a sequence of transport maps, allowing computation of cell fate probabilities and ancestor/descendant relationships across temporal data.

MOSCOT

2025

Nature (preprint 2023)

Multi-Omics Single-Cell Optimal Transport - the most scalable OT framework, handling over 1.7 million cells with linear time complexity.

  • Atlas-scale: handles 500K+ cells efficiently
  • Multi-omics integration (RNA, ATAC, protein)
  • Spatial and spatiotemporal mapping
  • Temporal trajectory inference
  • Neural OT solvers for speed
  • Handles unbalanced problems (growth/death)
Algorithm: Uses entropic Gromov-Wasserstein optimal transport with low-rank factorizations for linear time/memory complexity. Employs neural network parameterizations for transport maps and integrates multiple modalities through fused optimal transport. Supports unbalanced formulations via Kullback-Leibler divergence for modeling cell proliferation/death.

GENOT

2024

NeurIPS 2024

Gene-regulated neural optimal transport with uncertainty quantification for trajectory inference using flow matching.

  • Stochastic OT framework
  • Gene regulatory modeling
  • Handles unbalanced transport
  • Cross-modality translation
  • Uncertainty quantification
Algorithm: Learns stochastic transport plans using entropic Wasserstein and Gromov-Wasserstein flow matching. Neural networks parameterize velocity fields that interpolate between distributions. Provides uncertainty estimates through stochastic sampling and supports unbalanced formulations for modeling growth dynamics.

3. Graph-Based & Pseudotime Methods

Traditional approaches that construct trajectories using dimensionality reduction and graph structures:

Monocle 3

2019

Nature

Advanced trajectory inference framework using UMAP and principal graphs, designed to scale to millions of cells and handle discontinuous trajectories. Used to analyze the Mouse Organogenesis Cell Atlas (2 million cells).

  • Ultra-scalable: handles millions of cells efficiently
  • UMAP-based dimensionality reduction
  • Discontinuous trajectories and convergent fates
  • Louvain partitioning for cell communities
  • Principal graph learning with SimplePPT
  • Loop detection for cyclical trajectories
  • Moran's I test for spatial autocorrelation
  • Pseudotime via geodesic distance
  • Integrated with Seurat/Scanpy
Algorithm: (1) Dimensionality reduction via UMAP for fast embedding of large datasets, (2) Louvain clustering for initial partitioning, (3) Enhanced SimplePPT algorithm learns principal graphs allowing disconnected components and convergent paths, (4) Pseudotime computed as geodesic distance along learned graph from root cells, (5) Differential expression via Moran's I statistic for spatial autocorrelation.

Monocle 2

2017

Nature Methods

Earlier version using reversed graph embedding for trajectory reconstruction. Still widely used for smaller datasets.

  • DDRTree algorithm
  • Branch point identification
  • Differential expression testing along pseudotime
  • Works well for less than 50K cells
Algorithm: Uses DDRTree (Discriminative Dimensionality Reduction via Tree) algorithm which performs reversed graph embedding. Constructs a principal tree in reduced dimensional space that captures branching differentiation paths. Assumes continuous manifold structure without allowing convergence or disconnected components.

Slingshot

2018

BMC Genomics

Flexible trajectory inference using cluster-based minimum spanning trees and principal curves.

  • Works with any dimensionality reduction
  • Cluster-based approach
  • Multiple lineage support
  • Well-integrated with Bioconductor
Algorithm: Two-stage approach: (1) Constructs minimum spanning tree on cluster centroids to identify lineage structure, (2) Fits simultaneous principal curves through low-dimensional space for each lineage, allowing shared early segments. Assigns cells pseudotime along curves and lineage weights based on proximity.

PAGA

2019

Genome Biology

Partition-based graph abstraction that creates coarse-grained trajectory representations. Exceptionally scalable: 1.3M cells in 90 seconds.

  • Graph abstraction approach
  • Handles complex topologies
  • Integrated with Scanpy
  • Good for exploratory analysis
  • 130Γ— faster than UMAP
Algorithm: Creates partition-based graph where nodes represent cell clusters and edges represent connectivity strength. Computes edge weights using statistical tests on inter-cluster vs. intra-cluster distances. Preserves global topology while allowing multi-resolution analysis through hierarchical clustering. Can initialize UMAP for faster embedding.

4. Hybrid & Multi-Method Approaches

Methods that combine multiple signals or integrate different trajectory inference approaches:

CellRank 2

2024

Nature Methods

Unified framework combining RNA velocity, pseudotime, gene expression, and experimental time for robust fate predictions. Scales to millions of cells.

  • Multi-view learning approach
  • Combines velocity, pseudotime, and real time
  • Handles multimodal data
  • Terminal state identification
  • Driver gene discovery
  • Integrated with Scanpy ecosystem
Algorithm: Modular framework with kernels for different data views (velocity, pseudotime, real time, metabolic labeling). Combines kernels via weighted aggregation into cell-cell transition matrix. Computes fate probabilities using Markov chain analysis and identifies terminal states via eigenvector decomposition. 30Γ— faster than CellRank 1.

DELVE

2024

Nature Communications

Feature selection for trajectory analysis that identifies genes driving dynamic processes.

  • Dynamic feature selection
  • Identifies trajectory-driving genes
  • Works with velocity or pseudotime
  • Removes redundant features
  • Improves downstream analysis
Algorithm: Unsupervised bottom-up approach identifying dynamic gene/protein modules. Uses graph-based methods to find features that robustly recapitulate cellular trajectories while removing redundancy. Works across modalities (scRNA-seq, mass cytometry, imaging) by evaluating feature contribution to trajectory preservation.

5. Specialized Methods

Purpose-built tools for specific biological questions or data types:

CASi

2024

Scientific Reports

Discovers novel cell types and subpopulations along developmental trajectories using cross-timepoint analysis.

  • Novel cell type discovery
  • Temporal single-cell data
  • Handles rare populations
  • Annotates discovered types
Algorithm: Neural network architecture for cross-timepoint annotation and automatic feature selection. Detects potentially novel cell types that emerge over developmental time by comparing cell type distributions across timepoints. Uses attention mechanisms to identify discriminative features for each discovered population.

sciCSR

2024

Nature Methods (online Nov 2023)

Specialized method for B cell development using class-switch recombination as molecular clock.

  • B cell trajectory inference
  • Uses CSR as temporal marker
  • High temporal resolution
  • Links phenotype to maturation
Algorithm: Leverages class-switch recombination (CSR) events as intrinsic molecular timestamps. Constructs Markov state model of B cell differentiation states based on immunoglobulin isotype expression patterns. Achieves ~0.9 cosine similarity in BCR isotype predictions by modeling CSR dynamics.

TIGON

2024

Nature Machine Intelligence

Models growth dynamics explicitly during trajectory inference to account for proliferation using dynamic unbalanced optimal transport.

  • Growth rate modeling
  • Proliferation-aware trajectories
  • Birth/death process integration
  • Corrects for cell cycle effects
  • Infers gene regulatory networks
Algorithm: Uses dynamic unbalanced optimal transport based on Wasserstein-Fisher-Rao distance to simultaneously reconstruct trajectories AND model population growth/death. Employs neural ODEs implemented in PyTorch. Learns cell-cell communication and gene regulatory networks while accounting for proliferation dynamics.

PRESCIENT

2021

Nature Communications

Learns potential landscapes from temporal single-cell data to predict differentiation trajectories. Developed by Gifford lab at MIT CSAIL.

  • Potential landscape modeling
  • Trajectory perturbation analysis
  • In silico perturbations
  • Requires temporal data
  • Waddington landscape framework
Algorithm: Models cell differentiation as diffusion over Waddington potential landscapes. Neural networks parameterize potential functions, learning landscape geometry from temporal data. Predicts cell trajectories through gradient descent on learned potentials. Enables in silico perturbation analysis by modifying landscape topography.

FLOW-MAP

2020

Nature Protocols

Trajectory visualization tool optimized for flow and mass cytometry data using force-directed graph layouts.

  • Graph-based layout
  • Optimized for flow/CyTOF
  • Interactive visualization
  • Handles large datasets
  • Force-directed embedding
Algorithm: Constructs k-nearest neighbor graph with edges constrained to sequential timepoints. Applies ForceAtlas2 force-directed layout algorithm to create 2D visualization preserving temporal ordering. Supports density-dependent downsampling and hierarchical clustering for scalability across variable dataset sizes.

PHLOWER

2025

Nature Methods

Hierarchical lineage tree inference with probabilistic modeling for complex multi-branching developmental systems using Hodge Laplacian decomposition.

  • Hierarchical tree structures
  • Probabilistic framework
  • Handles uncertainty
  • Complex lineage relationships (up to 26 branches)
  • Multimodal RNA+ATAC support
Algorithm: Uses Hodge Laplacian decomposition on simplicial complexes to infer hierarchical lineage trees. Decompose cell-cell relationships into gradient (hierarchical), curl (cyclical), and harmonic (equilibrium) components. Provides uncertainty quantification through probabilistic modeling. Works with multimodal data (RNA+ATAC).

πŸ“Š Detailed Method Comparison

This table provides a comprehensive comparison of key features across different trajectory inference methods:

Method Year Max Scale Multiple Timepoints Single Snapshot Spatial Data Multimodal Growth/Death Key Strength
Monocle 3 2019 2M+ cells βœ“ βœ“ βœ— βœ“ βœ— Ultra-scalable, discontinuous trajectories, convergent fates
Monocle 2 2017 ~50K cells βœ“ βœ“ βœ— βœ— βœ— DDRTree algorithm, well-established
MOSCOT 2025 1.7M+ cells βœ“ βœ— βœ“ βœ“ βœ“ Most scalable OT, spatial, multimodal
scVelo 2020 ~100K cells βœ— βœ“ βœ— βœ— βœ— Dynamic velocity, most popular
GENOT 2024 Variable βœ“ βœ— βœ— βœ“ βœ“ Stochastic OT, uncertainty quantification
CellRank 2 2024 1.3M+ cells βœ“ βœ“ βœ— βœ“ βœ— Multi-view integration
veloVI 2024 ~50K cells βœ— βœ“ βœ— βœ— βœ— Uncertainty quantification
UniTVelo 2022 ~100K cells βœ— βœ“ βœ— βœ— βœ— Batch integration
CASi 2024 ~100K cells βœ“ βœ— βœ— βœ— βœ— Novel cell type discovery
sciCSR 2024 ~50K cells βœ“ βœ— βœ— βœ— βœ— B cell specialization
Slingshot 2018 ~50K cells βœ“ βœ“ βœ— βœ— βœ— Flexible, Bioconductor integration
PAGA 2019 ~200K cells βœ“ βœ“ βœ— βœ— βœ— Graph abstraction, exploratory

πŸ—ΊοΈ Method Selection Guide

🎯 Quick Reference Summary

  • Single snapshot: scVelo (development) | veloVI (uncertainty) | UniTVelo (unified time)
  • <200K cells + temporal: GENOT (stochastic/unbalanced) | PRESCIENT (landscape) | Monocle 2
  • 200K-1M cells: MOSCOT (multimodal/spatial) | Monocle 3 (complex topologies)
  • >1M cells: MOSCOT, Monocle 3, CellRank 2, or PAGA (atlas-scale options)
  • B cells: sciCSR | Novel types: CASi | Drivers: DELVE + scVelo
  • Cross-modality: GENOT | Multi-method: CellRank 2
  • Growth dynamics: TIGON or GENOT (unbalanced) | Visualization: FLOW-MAP
  • Complex branching/discontinuous: Monocle 3 | Convergent fates: Monocle 3

πŸ’‘ Best Practices

General Recommendations

When to Use RNA Velocity vs Optimal Transport vs Graph-Based Methods

RNA Velocity (scVelo, veloVI):

Optimal Transport (MOSCOT, GENOT):

Graph-Based (Monocle 3, PAGA, Slingshot):