IGBundle: Fiber Bundle Adapters for Language Models

A geometric inductive bias for transformer reasoning

Author: Jesus Vilela Jato
Date: December 2025 – present (active research)
Affiliation: Independent Research
Team: Jesus Vilela (orchestrator) + Claude (Anthropic) + Gemini (Google)

1. Mathematical Foundations

1.1 Fiber Bundle Structure

The central object is a fiber bundle \(\pi: E \to M\) where the base manifold \(M\) is a Poincare ball \(\mathbb{B}^{64}\) with constant curvature \(\kappa = -1\), and each fiber \(F_x\) is a categorical distribution over \(K=16\) sections with \(P=8\) mixture components.

Hidden states from transformer layer 12 are projected into this bundle. The projection is a residual perturbation clamped to ≤10% of the base hidden state norm, preserving the pretrained distribution.

1.2 Riemannian Metric and Curvature

The adapter learns a metric tensor \(g_{ij}\) on \(M\), approximated via the Fisher information matrix of the fiber distributions. Curvature is computed via a log-determinant Laplacian estimator:

\[ K \approx -\frac{1}{2} \nabla^2 \log \det g \]

A curvature loss regularizes toward \(\kappa = -1\). The learned curvature at convergence is \(K = -5.63\) — strongly hyperbolic, overshooting the target but geometrically meaningful.

1.3 Symplectic Dynamics

Fiber state evolves via a Hamiltonian system with symplectic integration and a Lorentz-factor speed limiter (\(c = 5.0\)):

\[ \dot{q} = \frac{\partial H}{\partial p}, \quad \dot{p} = -\frac{\partial H}{\partial q} \]

The speed limiter prevents gradient explosion: \(v \to v / \sqrt{1 + \|v\|^2 / c^2}\). Combined with the SymplecticSPIDER optimizer, this replaces global gradient clipping (which was found to kill fiber gradients due to the 7M-parameter curvature norm dominating).

1.4 Information Geometry

Optimization follows the natural gradient direction on the statistical manifold:

\[ \theta_{t+1} = \theta_t - \eta \, F^{-1} \nabla_\theta \mathcal{L} \]

This respects the Fisher-Rao metric and converges ~30% faster than Euclidean gradient descent.

1.5 Sheaf Consistency

Adjacent tokens must agree on their fiber distributions. The Jensen-Shannon divergence between neighboring sections is penalized:

\[ \mathcal{L}_{\text{sheaf}} = \frac{1}{N} \sum_{i} \text{JSD}(\sigma_i \| \sigma_{i+1}) \]

This enforces local-to-global semantic coherence — the "gluing condition" of the sheaf.

2. Key Components

2.1 Geometric Steering Probe (GSP)

At inference time, the GSP reads curvature \(K\) and entropy \(S\) from the adapter and modulates generation parameters (temperature, top_p) via a GENERIC-compliant homeostatic controller. No retraining required.

2.2 Neuromorphic Memory (NMEM)

Biologically-inspired memory replacing the 3-tier IMD system:

2.3 Neural Glass

Gradio-based inference UI with real-time telemetry: curvature heatmap, entropy gauge, fiber distribution visualization, thought trace, generation stability metrics. OOM-hardened for 5 GiB VRAM GPUs with dynamic token budgeting.

3. Experimental Results

3.1 Manifold Faithfulness (Tier 3)

MetricValueInterpretation
Curvature K-5.63Strongly hyperbolic
Entropy S0.95Below uniform (ln16 ≈ 2.77), sections specialized
Jensen-Shannon Div.0.424Fibers differ across contexts
Parallel Transport0.041Near-zero holonomy — geometric consistency
Faithfulness6/6 PASSAll verification tests pass

3.2 Benchmark Preservation

BenchmarkScoreNotes
ARC-Challenge54.86%Identical to base Qwen 2.5-7B
TruthfulQA (MC2)64.78%Strong factual grounding
Winogrande71.03%Commonsense reasoning intact
GSM8K75.51%Multi-step math preserved

3.3 Entropy Diagnostic History

Entropy was frozen at \(S = \ln(16) \approx 2.77\) for 3000 training steps due to 10 coupled gradient blockages (arcosh NaN, per-component Poincare projection, global grad clipping killing fiber grads, etc.). After systematic diagnosis and Phase A fixes, entropy unfroze to oscillating \([0.58, 1.04]\), mean ~0.85.

3.4 Ablation Studies

13 systematic ablations isolating each geometric component. Key findings:

Component RemovedAccuracy Drop
Euclidean Target (κ=0)-10.9%
No Curvature Loss-9.5%
No Natural Gradients-8.4%
No Sheaf Consistency-5.6%
No Bundle Structure-4.9%

4. References

Full Thesis (PDF)