ArchSpine Semantic Protocol Specification (v1.0.0)
This specification describes the current v1.0.x implementation of ArchSpine: the .spine/ directory model, the index contract, and the runtime behaviors that are already shipped today.
1. Machine-first design
.spine/ is first a structured semantic object with derived human-readable views for indexed repository inputs. The goal is to make repository semantics consumable by IDEs, agents, CI systems, and governance tooling.
In the current open-source line, the default semantic mirror is intentionally centered on code, schemas, repository automation, and the .spine control plane itself. Human-facing repository docs such as docs/**, README*, CONTRIBUTING.md, and SECURITY.md usually remain authoritative in their original location and are excluded from the default .spine mirror unless a repository opts in differently.
2. Physical layout
.spine/
├── manifest.json # Human-readable summary view
├── cache.db # Core SQLite state store
├── secrets.json # Project-local fallback secret store when OS credential backend is unavailable
├── index/ # Machine-readable semantic index
│ └── src/
│ └── auth.ts.json
├── atlas/ # Derived Markdown documentation layer
│ └── en-US/
│ └── src/
│ └── auth.ts.md
├── view/ # Experimental derived JSON reading layer (opt-in)
│ ├── public-surface.json
│ └── risk-hotspots.json
└── rules/ # Architecture rules in YAMLProtected outputs in the current line:
.spine/index/**.spine/atlas/**.spine/view/**.spine/cache.db*.spine/.lock
ArchSpine CLI/runtime is the authoritative writer for these official .spine outputs. MCP and ordinary local agents are not formal .spine writers.
Runtime-local but non-distributable .spine files also include:
.spine/protected-output-baseline.json.spine/secrets.jsonwhen project credential fallback is active
These files are operational state, not semantic snapshot outputs.
3. Core runtime mechanisms
3.1 Sync engine
ScanPolicydefines file sources and ignore-chain behavior- repo paths are normalized to repo-relative POSIX-style paths
- state updates are committed atomically through SQLite-backed flows
.gitignore,.spineignore, and.spineignore.localare composed through an ordered ignore chain- the default product boundary keeps code, schemas, and repository automation such as
.github/workflows/**indexable while usually excluding human-facing repository docs from the semantic mirror - incremental sync uses Git and hash-aware change detection
- deleted source files are cleaned up as orphaned index entries
3.2 Skeleton-first extraction
- AST extraction captures deterministic import/export structure
- semantic generation builds on top of skeleton facts instead of replacing them
- recent Git intent can be injected to explain why a file changed
3.3 Layered aggregation
- file level: role, responsibilities, invariants, graph edges
- folder level:
folder.jsonplus derivedfolder.md - project level:
project.jsonplus derivedproject.md
3.4 Experimental view derivation
When artifacts.experimentalViewLayer=true or SPINE_EXPERIMENTAL_VIEW_LAYER=true, the post-aggregation writer path may also derive:
public-surface.jsonrisk-hotspots.json
This layer is:
- derived from indexed and aggregated signals
- non-authoritative
- intended for fast comprehension
- intentionally outside the stable public artifact contract for the first open-source
v1.0release
3.5 Writer path inventory and boundary contract
Current trusted writer paths:
spine sync --full/spine sync: refresh the local runtime mirror and the local protected-output baseline- when enabled, the same trusted
syncwriter path also writes.spine/view/** spine sync --hook: refresh the hook-oriented runtime subset and the same baseline without Atlas regenerationspine check/spine fix: update local runtime state in.spine/cache.db*and coordinate with.spine/.lockspine publish: maintainer publish workflow that first runs publish preflight, requires an existing runtime baseline (.spine/manifest.jsonplus.spine/protected-output-baseline.json), fails closed if.spine/.lockis active, stale, not owner-verifiable, or stored in a corrupt/unsupported legacy format, and then refreshes the distributable.spine/index/**and.spine/atlas/**snapshot through a full sync
Current host deployment convention:
- ordinary agents run in a writable repository for normal source work
- protected
.spineoutputs remain read-only by default .spine/view/**follows the same protected-output posture when enabled- trusted
spinewrite paths temporarily unlock and then re-lock those outputs - the baseline file plus mutation warnings are soft-gate layers for exposing out-of-band edits, not a replacement for strong host isolation
- this model reduces accidental same-user writes in normal workflows; it does not claim to stop malicious same-permission processes
4. Index contract
Each indexed file is stored as a SpineUnit in .spine/index/<path>.json.
Key slices:
- identity: file path, hash, language, file kind, scope
- semantic: role, responsibilities, out-of-scope statements, invariants, public surface, and localized content (translations).
- skeleton: deterministic AST facts
- graph: dependency edges and related structure
- provenance: generation metadata
Additional runtime signals:
ruleViolationsdriftDetecteddriftReason_thinking: (Validation mode only) Chain-of-Thought scratchpad.
5. v0.4 changes
Compared with v0.3, the v0.4 line introduced:
- Headless Generation: Shifting from prose generation to data-centric JSON extraction.
- Node-side Atlas Rendering: Local deterministic Markdown generation from SpineUnits.
- Multilingual Indexing: Support for the
localizedfield inSpineSemanticto hold multiple language summaries. - Intelligence Primitives: Integration of Few-Shot examples and Chain-of-Thought (CoT) reasoning for higher precision.
- SQLite-backed state via
cache.db - centralized tracking of violations and usage logs
manifest.jsonas a summary view rather than the only source of truth- protocol and implementation version alignment
- semantic drift detection and persisted drift history
- stronger transactional write behavior and lock handling
6. SQLite tables
Current core tables:
filesviolationsusage_logsdrift_eventssymbols
7. Runtime characteristics in the current line
The shipped v1.0.x line already includes:
- task-based execution flows with explicit stage input/output contracts
TaskContext.statereserved for telemetry, with transient stage artifacts held inruntimeCache- serial task orchestration with timing and logging
- file-lock coordination on top of SQLite transactions
- CLI surfaces that expose runtime progress and health clearly
8. Manifest summary view
manifest.json is a human-readable summary surface, not the primary source of truth. In the current line it includes a sync summary with:
- sync mode and duration
- aggregate counters and token usage snapshots
- latest resolved LLM provider/model metadata
The sync LLM summary records:
providerproviderSourcemodelmodelSource
This metadata is intended for runtime traceability so users can see which resolved model produced the current .spine state.