π LLM Benchmarks & Model Selection Guide β
ArchSpine uses a High-Context Synthesis approach to maintain high semantic precision. This document provides observed token usage (benchmarks) and guidance on selecting the right LLM provider/model for your project.
1. Observed Token Benchmarks β
The following data points are reconstructed from real synchronization logs of the ArchSpine repository itself (approx. 100 source files).
π Standard Mode (High-Precision) β
This mode sends the full source header, structural skeletons of all dependencies, architectural rules, and git intent.
| File Complexity | Example File | Observed Input Tokens | Note |
|---|---|---|---|
| Small | README.md | ~4,000 - 9,000 | Basic content only. |
| Medium | src/core/sync.ts | ~25,000 - 35,000 | 10+ internal dependencies. |
| High | src/ast/extractor.ts | ~55,351 | Complex AST logic + multiple parsers. |
| Large Spec | archspine-protocol-v0.3.md | ~52,021 | Extensive prose + formatting. |
β‘ Constrained Runtime Fallback β
On low-TPM providers, ArchSpine may use a lighter runtime path internally to keep generation within budget.
| File Complexity | Example File | Target Input Tokens | Note |
|---|---|---|---|
| All Files | Any | < 8,000 | Internal low-budget fallback for constrained runtimes. |
2. Model Selection Matrix β
Choosing the right model depends on two primary metrics: Context Window and TPM (Tokens Per Minute).
| Model Tier | Providers | Best For | Context | Min. TPM |
|---|---|---|---|---|
| Performance | Claude 3.5 Sonnet, GPT-4o | Large codebases, complex rules. | 128k - 200k | > 300,000 |
| Economy | DeepSeek-V3 / R1 | Best ROI for ArchSpine. | 128k | > 500,000 |
| Local / Offline | Ollama, LM Studio | Privacy-first, air-gapped environments. | 128k (VRAM-limited) | N/A β start with mode=standard; constrained runtimes may fall back to a lighter internal path |
| Limited/Free | Groq (Free), OpenRouter (Free) | Small projects, MVP testing. | 128k | start with mode=standard; constrained runtimes may fall back to a lighter internal path |
3. Critical Concepts β
TPM vs. Context Window β
- Context Window (128k+): Most modern models support 128k tokens. This is sufficient for ArchSpine's deep synthesis.
- TPM (Tokens Per Minute): This is the primary bottleneck for free APIs.
- Free Tiers (e.g., Groq) often limit you to 12k TPM.
- Observation: Since a single core file can require 55k tokens, a 12k TPM limit will cause immediate failure unless the runtime reduces prompt budget internally.
4. Optimization Recommendations β
- Use DeepSeek: For most users, DeepSeek provides the best balance of context window, high TPM limits, and low cost.
- Start with
mode=standardon constrained providers: Free tiers and low-TPM environments should usually start withspine llm set mode standard. - Use runtime mode as the public control surface: If
mode=standardis still too heavy on constrained providers, treat lighter generation behavior as an internal runtime fallback rather than a primary user-facing knob. - Local LLMs (Ollama / LM Studio): See the dedicated guide β
docs/guides/LOCAL-LLM.md.
5. Benchmark scope β
These benchmarks exist to evaluate internal strategy choices that may later be absorbed into the default behavior of mode=standard|heavy. The strategy work serves the mode defaults; it does not replace the mode-first product surface with low-level runtime knobs.
For normal usage, prefer the higher-level runtime modes:
spine llm set mode standard
spine llm set mode heavy