IGBundle: Fiber Bundle Adapters for Language Models

A geometric inductive bias for transformer reasoning

Author: Jesus Vilela Jato
Date: December 2025 – present (active research)
Affiliation: Independent Research
Team: Jesus Vilela (orchestrator) + Claude (Anthropic) + Gemini (Google)

1. Mathematical Foundations

1.1 Fiber Bundle Structure

The central object is a fiber bundle \(\pi: E \to M\) where the base manifold \(M\) is a Poincare ball \(\mathbb{B}^{64}\) with constant curvature \(\kappa = -1\), and each fiber \(F_x\) is a categorical distribution over \(K=16\) sections with \(P=8\) mixture components.

Hidden states from transformer layer 12 are projected into this bundle. The projection is a residual perturbation clamped to ≤10% of the base hidden state norm, preserving the pretrained distribution.

1.2 Riemannian Metric and Curvature

The adapter learns a metric tensor \(g_{ij}\) on \(M\), approximated via the Fisher information matrix of the fiber distributions. Curvature is computed via a log-determinant Laplacian estimator:

\[ K \approx -\frac{1}{2} \nabla^2 \log \det g \]

A curvature loss regularizes toward \(\kappa = -1\). The learned curvature at convergence is \(K = -5.63\) — strongly hyperbolic, overshooting the target but geometrically meaningful.

1.3 Symplectic Dynamics

Fiber state evolves via a Hamiltonian system with symplectic integration and a Lorentz-factor speed limiter (\(c = 5.0\)):

\[ \dot{q} = \frac{\partial H}{\partial p}, \quad \dot{p} = -\frac{\partial H}{\partial q} \]

The speed limiter prevents gradient explosion: \(v \to v / \sqrt{1 + \|v\|^2 / c^2}\). Combined with the SymplecticSPIDER optimizer, this replaces global gradient clipping (which was found to kill fiber gradients due to the 7M-parameter curvature norm dominating).

1.4 Information Geometry

Optimization follows the natural gradient direction on the statistical manifold:

\[ \theta_{t+1} = \theta_t - \eta \, F^{-1} \nabla_\theta \mathcal{L} \]

This respects the Fisher-Rao metric and converges ~30% faster than Euclidean gradient descent.

1.5 Sheaf Consistency

Adjacent tokens must agree on their fiber distributions. The Jensen-Shannon divergence between neighboring sections is penalized:

\[ \mathcal{L}_{\text{sheaf}} = \frac{1}{N} \sum_{i} \text{JSD}(\sigma_i \| \sigma_{i+1}) \]

This enforces local-to-global semantic coherence — the "gluing condition" of the sheaf.

2. Key Components

2.1 Geometric Steering Probe (GSP)

At inference time, the GSP reads curvature \(K\) and entropy \(S\) from the adapter and modulates generation parameters (temperature, top_p) via a GENERIC-compliant homeostatic controller. No retraining required.

High curvature |K| → model is confident → lower temperature
High entropy S → uncertain fiber state → more conservative sampling
Tracks reversible (Hamiltonian) and irreversible (dissipative) energy flows

2.2 Neuromorphic Memory (NMEM)

Biologically-inspired memory replacing the 3-tier IMD system:

FitzHugh-Nagumo decay: excitable dynamics, not exponential — memories can reactivate
FFT spectral forgetting: high-frequency detail decays before low-frequency gist
Gabor retroactive interference: new similar memories degrade old ones
Multi-anchor retrieval: Method of Loci spatial indexing
Savant preservation: exact verbatim recall for code, dates, formulas
Ghost entries: decayed memories leave retrievable traces

2.3 Neural Glass

Gradio-based inference UI with real-time telemetry: curvature heatmap, entropy gauge, fiber distribution visualization, thought trace, generation stability metrics. OOM-hardened for 5 GiB VRAM GPUs with dynamic token budgeting.

3. Experimental Results

3.1 Manifold Faithfulness (Tier 3)

Metric	Value	Interpretation
Curvature K	-5.63	Strongly hyperbolic
Entropy S	0.95	Below uniform (ln16 ≈ 2.77), sections specialized
Jensen-Shannon Div.	0.424	Fibers differ across contexts
Parallel Transport	0.041	Near-zero holonomy — geometric consistency
Faithfulness	6/6 PASS	All verification tests pass

3.2 Benchmark Preservation

Benchmark	Score	Notes
ARC-Challenge	54.86%	Identical to base Qwen 2.5-7B
TruthfulQA (MC2)	64.78%	Strong factual grounding
Winogrande	71.03%	Commonsense reasoning intact
GSM8K	75.51%	Multi-step math preserved

3.3 Entropy Diagnostic History

Entropy was frozen at \(S = \ln(16) \approx 2.77\) for 3000 training steps due to 10 coupled gradient blockages (arcosh NaN, per-component Poincare projection, global grad clipping killing fiber grads, etc.). After systematic diagnosis and Phase A fixes, entropy unfroze to oscillating \([0.58, 1.04]\), mean ~0.85.

3.4 Ablation Studies

13 systematic ablations isolating each geometric component. Key findings:

Component Removed	Accuracy Drop
Euclidean Target (κ=0)	-10.9%
No Curvature Loss	-9.5%
No Natural Gradients	-8.4%
No Sheaf Consistency	-5.6%
No Bundle Structure	-4.9%

4. References

Nickel & Kiela (2017) — Poincare Embeddings for Hierarchical Representations
Turner et al. (2023) — Activation Addition: Steering Without Optimization
Grmela & Ottinger (1997) — GENERIC framework for non-equilibrium thermodynamics
Chen et al. (2022) — Fully Hyperbolic Neural Networks
McClelland et al. (1995) — Complementary Learning Systems
Shimizu et al. (2021) — Hyperbolic Neural Networks++
Dai et al. (2021) — Hyperbolic Attention in Transformers

Full Thesis (PDF)