TranscriptFormer Dual Decoder Heads

🎯 Gene Decoder Head (Categorical)

Purpose: Predicts which gene to select next in the cell sentence

ω_j = softmax(MLP_ω(z_j^(L)))

Key Properties:

Output: Probability distribution over entire gene vocabulary (~25K-247K genes)
Architecture: Two-layer MLP + softmax normalization
Loss: Standard categorical cross-entropy
Context-aware: Depends on all previously selected genes

Example Output:

GAPDH: 0.23 ACTB: 0.18 TP53: 0.12 Others: 0.47

📊 Count Decoder Head (Zero-Truncated Poisson)

Purpose: Predicts expression level (count) for the selected gene

P(c_j | g_j, context) = ZTP(λ_j)

Key Properties:

Output: Expected count value (always > 0)
Architecture: MLP with normalization to total count
Loss: Zero-truncated Poisson negative log-likelihood
Constraint: Counts sum to observed total transcripts in cell

Example Output:

GAPDH: 156 ACTB: 89 TP53: 23

🔗 Joint Training & Coupling

Why Both Heads Are Essential:

L = L_gene + L_count

Sequential Dependency: Count decoder uses gene identity from gene decoder
Biological Realism: Models the fact that different genes have different typical expression levels
Generative Capability: Can sample both gene identity and count jointly
Context Sensitivity: Both decoders condition on cell context

Innovation: Unlike discriminative models that only classify, this architecture can generate realistic cell profiles by sampling from both distributions sequentially.

🔬 Why Zero-Truncated Poisson?

Biological Motivation:

Gene expression counts are discrete, non-negative integers
Poisson distribution naturally models count data
Zero-truncation ensures only expressed genes appear in sequences
Captures overdispersion common in single-cell data

Mathematical Form:

P(c = k | λ) = (λ^k e^-λ) / (k!(1 - e^-λ)), k ≥ 1

⚡ Computational Advantages

Efficiency Benefits:

Shared Encoder: Single transformer processes both tasks
Parallel Computation: Both losses computed simultaneously
Parameter Sharing: Contextualized representations used by both heads
End-to-End Training: Joint optimization improves both tasks

vs. Separate Models: 2x more efficient than training gene and count predictors separately

🧬 TranscriptFormer Dual Decoder Heads

Transformer
Encoder

Gene
Decoder

Count
Decoder

🎯 Gene Decoder Head (Categorical)

📊 Count Decoder Head (Zero-Truncated Poisson)

🔗 Joint Training & Coupling

🎮 Interactive Generation Demo

🔬 Why Zero-Truncated Poisson?

⚡ Computational Advantages

🧬 TranscriptFormer Dual Decoder Heads

TransformerEncoder

GeneDecoder

CountDecoder

🎯 Gene Decoder Head (Categorical)

📊 Count Decoder Head (Zero-Truncated Poisson)

🔗 Joint Training & Coupling

🎮 Interactive Generation Demo

🔬 Why Zero-Truncated Poisson?

⚡ Computational Advantages

Transformer
Encoder

Gene
Decoder

Count
Decoder