Trajectory Inference Methods

📋 Overview

Trajectory inference (TI) methods reconstruct the dynamic processes of cellular differentiation, development, and state transitions from single-cell data. These computational approaches allow researchers to understand how cells progress through different states over time, identify key transition points, and discover the genes that drive these changes.

🎯 Key Considerations

Selecting the right trajectory inference method depends on several factors:

Dataset size: From thousands to millions of cells
Data type: Standard scRNA-seq, spatial, multimodal, or temporal
Experimental design: Single snapshot vs. multiple timepoints
Biological complexity: Linear, branching, cyclical, or disconnected trajectories
Computational resources: Available memory and processing power

🔬 Major Method Categories

1. RNA Velocity-Based Methods

These methods infer future cell states by modeling the relationship between unspliced and spliced mRNA:

velocyto

2018

Nature

The pioneering RNA velocity method that introduced the concept of using splicing dynamics to predict future cell states.

First RNA velocity implementation
Steady-state model assumption
Works with standard scRNA-seq protocols

Algorithm: Estimates RNA velocity by modeling splicing kinetics under steady-state assumptions. Calculates velocity as v = u - γs where u is unspliced mRNA, s is spliced mRNA, and γ is the degradation rate fitted via linear regression. Projects velocities onto PCA/t-SNE/UMAP embeddings to visualize cell state transitions.

Website GitHub Paper

scVelo

2020

Nature Biotechnology

An improved RNA velocity framework with dynamical modeling that accounts for transcriptional induction, repression, and steady-state. Most widely used velocity method.

Dynamical and stochastic models
Improved accuracy over velocyto
Latent time estimation
Driver gene identification
Compatible with Scanpy ecosystem

Algorithm: Models RNA velocity using dynamical system of unspliced (u) and spliced (s) mRNA: du/dt = α - βu, ds/dt = βu - γs, where α is transcription rate, β is splicing rate, and γ is degradation rate. Learns gene-specific kinetic parameters via expectation-maximization, then projects velocity vectors onto low-dimensional embeddings.

Documentation GitHub Paper

veloVI

2024

Nature Methods

Probabilistic RNA velocity inference using variational inference with uncertainty quantification.

Deep generative modeling framework
Accounts for technical noise
Uncertainty estimates for velocity
Better handling of low counts

Algorithm: Uses variational autoencoders (VAEs) to model splicing dynamics probabilistically. Learns latent variables for transcriptional state and kinetic parameters while accounting for technical noise. Provides uncertainty quantification by sampling from posterior distributions of both velocity direction (intrinsic) and future states (extrinsic).

Documentation GitHub Paper

UniTVelo

2022

Nature Communications

Unified RNA velocity framework using unified latent time modeling across the transcriptome via Radial Basis Functions.

Unified latent time estimation
Phase portraits for visualization
Top-down time modeling approach
Improved stability in velocity estimates

Algorithm: Uses Radial Basis Functions (RBFs) to learn a unified latent time shared across all genes, moving from gene-specific to transcriptome-wide temporal modeling. Fits splicing dynamics to this shared time coordinate, improving consistency across genes and reducing over-fitting from noisy gene-specific estimates.

Documentation GitHub Paper

2. Optimal Transport-Based Methods

These methods use optimal transport theory to match cells across conditions or timepoints:

Waddington-OT

2019

Cell

Uses optimal transport to infer developmental trajectories and fate probabilities across timepoints.

Temporal trajectory reconstruction
Fate probability predictions
Perturbation analysis
Requires multiple timepoints

Algorithm: Computes optimal transport maps between consecutive time points by minimizing Wasserstein distance with entropy regularization (Sinkhorn algorithm). Models development as a sequence of transport maps, allowing computation of cell fate probabilities and ancestor/descendant relationships across temporal data.

Documentation GitHub Paper

MOSCOT

2025

Nature (preprint 2023)

Multi-Omics Single-Cell Optimal Transport - the most scalable OT framework, handling over 1.7 million cells with linear time complexity.

Atlas-scale: handles 500K+ cells efficiently
Multi-omics integration (RNA, ATAC, protein)
Spatial and spatiotemporal mapping
Temporal trajectory inference
Neural OT solvers for speed
Handles unbalanced problems (growth/death)

Algorithm: Uses entropic Gromov-Wasserstein optimal transport with low-rank factorizations for linear time/memory complexity. Employs neural network parameterizations for transport maps and integrates multiple modalities through fused optimal transport. Supports unbalanced formulations via Kullback-Leibler divergence for modeling cell proliferation/death.

GitHub Paper

GENOT

2024

NeurIPS 2024

Gene-regulated neural optimal transport with uncertainty quantification for trajectory inference using flow matching.

Stochastic OT framework
Gene regulatory modeling
Handles unbalanced transport
Cross-modality translation
Uncertainty quantification

Algorithm: Learns stochastic transport plans using entropic Wasserstein and Gromov-Wasserstein flow matching. Neural networks parameterize velocity fields that interpolate between distributions. Provides uncertainty estimates through stochastic sampling and supports unbalanced formulations for modeling growth dynamics.

Documentation

3. Graph-Based & Pseudotime Methods

Traditional approaches that construct trajectories using dimensionality reduction and graph structures:

Monocle 3

2019

Nature

Advanced trajectory inference framework using UMAP and principal graphs, designed to scale to millions of cells and handle discontinuous trajectories. Used to analyze the Mouse Organogenesis Cell Atlas (2 million cells).

Ultra-scalable: handles millions of cells efficiently
UMAP-based dimensionality reduction
Discontinuous trajectories and convergent fates
Louvain partitioning for cell communities
Principal graph learning with SimplePPT
Loop detection for cyclical trajectories
Moran's I test for spatial autocorrelation
Pseudotime via geodesic distance
Integrated with Seurat/Scanpy

Algorithm: (1) Dimensionality reduction via UMAP for fast embedding of large datasets, (2) Louvain clustering for initial partitioning, (3) Enhanced SimplePPT algorithm learns principal graphs allowing disconnected components and convergent paths, (4) Pseudotime computed as geodesic distance along learned graph from root cells, (5) Differential expression via Moran's I statistic for spatial autocorrelation.

Documentation GitHub Paper MOCA Atlas

Monocle 2

2017

Nature Methods

Earlier version using reversed graph embedding for trajectory reconstruction. Still widely used for smaller datasets.

DDRTree algorithm
Branch point identification
Differential expression testing along pseudotime
Works well for less than 50K cells

Algorithm: Uses DDRTree (Discriminative Dimensionality Reduction via Tree) algorithm which performs reversed graph embedding. Constructs a principal tree in reduced dimensional space that captures branching differentiation paths. Assumes continuous manifold structure without allowing convergence or disconnected components.

Documentation GitHub Paper

Slingshot

2018

BMC Genomics

Flexible trajectory inference using cluster-based minimum spanning trees and principal curves.

Works with any dimensionality reduction
Cluster-based approach
Multiple lineage support
Well-integrated with Bioconductor

Algorithm: Two-stage approach: (1) Constructs minimum spanning tree on cluster centroids to identify lineage structure, (2) Fits simultaneous principal curves through low-dimensional space for each lineage, allowing shared early segments. Assigns cells pseudotime along curves and lineage weights based on proximity.

Bioconductor GitHub Paper

PAGA

2019

Genome Biology

Partition-based graph abstraction that creates coarse-grained trajectory representations. Exceptionally scalable: 1.3M cells in 90 seconds.

Graph abstraction approach
Handles complex topologies
Integrated with Scanpy
Good for exploratory analysis
130× faster than UMAP

Algorithm: Creates partition-based graph where nodes represent cell clusters and edges represent connectivity strength. Computes edge weights using statistical tests on inter-cluster vs. intra-cluster distances. Preserves global topology while allowing multi-resolution analysis through hierarchical clustering. Can initialize UMAP for faster embedding.

Documentation Paper

4. Hybrid & Multi-Method Approaches

Methods that combine multiple signals or integrate different trajectory inference approaches:

CellRank 2

2024

Nature Methods

Unified framework combining RNA velocity, pseudotime, gene expression, and experimental time for robust fate predictions. Scales to millions of cells.

Multi-view learning approach
Combines velocity, pseudotime, and real time
Handles multimodal data
Terminal state identification
Driver gene discovery
Integrated with Scanpy ecosystem

Algorithm: Modular framework with kernels for different data views (velocity, pseudotime, real time, metabolic labeling). Combines kernels via weighted aggregation into cell-cell transition matrix. Computes fate probabilities using Markov chain analysis and identifies terminal states via eigenvector decomposition. 30× faster than CellRank 1.

Documentation GitHub Paper

DELVE

2024

Nature Communications

Feature selection for trajectory analysis that identifies genes driving dynamic processes.

Dynamic feature selection
Identifies trajectory-driving genes
Works with velocity or pseudotime
Removes redundant features
Improves downstream analysis

Algorithm: Unsupervised bottom-up approach identifying dynamic gene/protein modules. Uses graph-based methods to find features that robustly recapitulate cellular trajectories while removing redundancy. Works across modalities (scRNA-seq, mass cytometry, imaging) by evaluating feature contribution to trajectory preservation.

GitHub Paper

5. Specialized Methods

Purpose-built tools for specific biological questions or data types:

CASi

2024

Scientific Reports

Discovers novel cell types and subpopulations along developmental trajectories using cross-timepoint analysis.

Novel cell type discovery
Temporal single-cell data
Handles rare populations
Annotates discovered types

Algorithm: Neural network architecture for cross-timepoint annotation and automatic feature selection. Detects potentially novel cell types that emerge over developmental time by comparing cell type distributions across timepoints. Uses attention mechanisms to identify discriminative features for each discovered population.

Paper

sciCSR

2024

Nature Methods (online Nov 2023)

Specialized method for B cell development using class-switch recombination as molecular clock.

B cell trajectory inference
Uses CSR as temporal marker
High temporal resolution
Links phenotype to maturation

Algorithm: Leverages class-switch recombination (CSR) events as intrinsic molecular timestamps. Constructs Markov state model of B cell differentiation states based on immunoglobulin isotype expression patterns. Achieves ~0.9 cosine similarity in BCR isotype predictions by modeling CSR dynamics.

Paper

TIGON

2024

Nature Machine Intelligence

Models growth dynamics explicitly during trajectory inference to account for proliferation using dynamic unbalanced optimal transport.

Growth rate modeling
Proliferation-aware trajectories
Birth/death process integration
Corrects for cell cycle effects
Infers gene regulatory networks

Algorithm: Uses dynamic unbalanced optimal transport based on Wasserstein-Fisher-Rao distance to simultaneously reconstruct trajectories AND model population growth/death. Employs neural ODEs implemented in PyTorch. Learns cell-cell communication and gene regulatory networks while accounting for proliferation dynamics.

GitHub Paper

PRESCIENT

2021

Nature Communications

Learns potential landscapes from temporal single-cell data to predict differentiation trajectories. Developed by Gifford lab at MIT CSAIL.

Potential landscape modeling
Trajectory perturbation analysis
In silico perturbations
Requires temporal data
Waddington landscape framework

Algorithm: Models cell differentiation as diffusion over Waddington potential landscapes. Neural networks parameterize potential functions, learning landscape geometry from temporal data. Predicts cell trajectories through gradient descent on learned potentials. Enables in silico perturbation analysis by modifying landscape topography.

GitHub Paper

FLOW-MAP

2020

Nature Protocols

Trajectory visualization tool optimized for flow and mass cytometry data using force-directed graph layouts.

Graph-based layout
Optimized for flow/CyTOF
Interactive visualization
Handles large datasets
Force-directed embedding

Algorithm: Constructs k-nearest neighbor graph with edges constrained to sequential timepoints. Applies ForceAtlas2 force-directed layout algorithm to create 2D visualization preserving temporal ordering. Supports density-dependent downsampling and hierarchical clustering for scalability across variable dataset sizes.

GitHub Paper

PHLOWER

2025

Nature Methods

Hierarchical lineage tree inference with probabilistic modeling for complex multi-branching developmental systems using Hodge Laplacian decomposition.

Hierarchical tree structures
Probabilistic framework
Handles uncertainty
Complex lineage relationships (up to 26 branches)
Multimodal RNA+ATAC support

Algorithm: Uses Hodge Laplacian decomposition on simplicial complexes to infer hierarchical lineage trees. Decompose cell-cell relationships into gradient (hierarchical), curl (cyclical), and harmonic (equilibrium) components. Provides uncertainty quantification through probabilistic modeling. Works with multimodal data (RNA+ATAC).

GitHub Paper

📊 Detailed Method Comparison

This table provides a comprehensive comparison of key features across different trajectory inference methods:

Method	Year	Max Scale	Multiple Timepoints	Single Snapshot	Spatial Data	Multimodal	Growth/Death	Key Strength
Monocle 3	2019	2M+ cells	✓	✓	✗	✓	✗	Ultra-scalable, discontinuous trajectories, convergent fates
Monocle 2	2017	~50K cells	✓	✓	✗	✗	✗	DDRTree algorithm, well-established
MOSCOT	2025	1.7M+ cells	✓	✗	✓	✓	✓	Most scalable OT, spatial, multimodal
scVelo	2020	~100K cells	✗	✓	✗	✗	✗	Dynamic velocity, most popular
GENOT	2024	Variable	✓	✗	✗	✓	✓	Stochastic OT, uncertainty quantification
CellRank 2	2024	1.3M+ cells	✓	✓	✗	✓	✗	Multi-view integration
veloVI	2024	~50K cells	✗	✓	✗	✗	✗	Uncertainty quantification
UniTVelo	2022	~100K cells	✗	✓	✗	✗	✗	Batch integration
CASi	2024	~100K cells	✓	✗	✗	✗	✗	Novel cell type discovery
sciCSR	2024	~50K cells	✓	✗	✗	✗	✗	B cell specialization
Slingshot	2018	~50K cells	✓	✓	✗	✗	✗	Flexible, Bioconductor integration
PAGA	2019	~200K cells	✓	✓	✗	✗	✗	Graph abstraction, exploratory

🗺️ Method Selection Guide

                🎯 Quick Reference Summary
                Single snapshot: scVelo (development) | veloVI (uncertainty) | UniTVelo (unified time)
<200K cells + temporal: GENOT (stochastic/unbalanced) | PRESCIENT (landscape) | Monocle 2
200K-1M cells: MOSCOT (multimodal/spatial) | Monocle 3 (complex topologies)
>1M cells: MOSCOT, Monocle 3, CellRank 2, or PAGA (atlas-scale options)
B cells: sciCSR | Novel types: CASi | Drivers: DELVE + scVelo
Cross-modality: GENOT | Multi-method: CellRank 2
Growth dynamics: TIGON or GENOT (unbalanced) | Visualization: FLOW-MAP
Complex branching/discontinuous: Monocle 3 | Convergent fates: Monocle 3

            

💡 Best Practices

General Recommendations

Start simple: Begin with scVelo for standard scRNA-seq before trying more complex methods
Validate predictions: Use experimental data or known biology to validate trajectory predictions
Combine methods: Tools like CellRank 2 allow integration of multiple signals
Consider scale: For atlas-scale data (>500K cells), use MOSCOT or Monocle 3
Quality control: Ensure high-quality data preprocessing before trajectory inference
Assess topology: Use Monocle 3 for complex branching or disconnected structures

When to Use RNA Velocity vs Optimal Transport vs Graph-Based Methods

RNA Velocity (scVelo, veloVI):

Single timepoint with sufficient sequencing depth
Interested in immediate transcriptional dynamics
Need to identify driving genes
Standard droplet-based scRNA-seq

Optimal Transport (MOSCOT, GENOT):

Multiple experimental timepoints available
Very large datasets (>200K cells)
Spatial or multimodal data
Need to model growth/death explicitly
Low sequencing depth (insufficient for velocity)

Graph-Based (Monocle 3, PAGA, Slingshot):

Complex developmental trajectories with multiple branches
Discontinuous trajectories or convergent developmental paths (Monocle 3)
Large-scale datasets (100K-2M+ cells with Monocle 3)
Exploratory analysis of trajectory topology (PAGA)
When data spans multiple timepoints with gaps
Need for pseudotime ordering along complex paths