Architecture
System overview — pillars, data model, snapshot system, and the embeddings-not-LLMs design choice
System Overview
Kural is organized around four pillars — Embed, Score, Audit, Place — that cross-validate each other. This page covers the mechanics underneath: how snapshots are built and cached, and what the data model looks like.
Today, kural ships as a single tier:
- CLI — scan, score, local snapshots on disk under
.kural-db/.
A server tier and web dashboard are planned — see Roadmap.
How snapshots are built and cached
A snapshot is the serialized output of the Embed and Score pillars, written once per kural snapshot generate run. Every other command (audit, score, place, brief, advise) reads from a snapshot rather than rebuilding one.
Walk and parse — TypeScript-only today. The AST pass extracts functions, types, and descriptions; anything outside that surface is invisible to the rest of the system.
Embed with cache — each unit is keyed by facet_hash. Unchanged units are served from the previous active.db; only new or modified units hit the embedding provider. This is why first runs are slow and subsequent runs complete in seconds.
Score and store — fit, uniqueness, and subtree metrics are computed and written to a fresh active.db. The previous active.db is rotated into history/<snapshot-id>.db.
Snapshot layout, rotation rules, and cache-invalidation keys live in Database.
Why embeddings, not LLMs
A deliberate architectural choice runs through every pillar: kural uses embedding models, not LLMs, for the work it does itself. Placement, similarity, drift detection, and clustering are vector problems — they don't need reasoning, they need a well-distributed space and statistical fences. Embeddings give exactly that, at two-to-three orders of magnitude lower cost than frontier LLMs and with the option to run fully local (BGE, Nomic, Qwen3 Embedding via Ollama).
The consequence: a full embedding pass over a 50k-LOC codebase costs cents at hosted rates and zero locally. An agent can run kural brief before every implementation, and kural audit after every meaningful change, without thinking about cost. Reasoning is reserved for the user's coding agent, where it actually buys something. Kural is the structural layer underneath — fast, local-capable, and cheap enough to call repeatedly.
Tech Stack
| Layer | Tool | Role |
|---|---|---|
| CLI framework | Gunshi | Command routing, plugin system |
| AI | Multiple providers (Vercel, OpenAI, OpenRouter, Ollama) | Embeddings, advise |
| Local persistence | TanStack DB + node-sqlite-persistence | Snapshot storage |
Data Model
KuralUnit (base for all)
Every parsed unit carries:
name,path,descriptionidentityEmbedding— name + description vectorleafEmbedding— name + description + structural signature vectorfacetHash— SHA256 for cache invalidation
Unit Types
- KuralFile — source files with functions, types, imports
- KuralType — interfaces, classes, type aliases with fields and references
- KuralFunction — functions with params, return types, purity, call graph
- KuralDirectory — directory hierarchy with children, descriptions from KURAL.md
Scores
Every node (types, functions, files, directories) gets a ScoreCard:
fit— how well the node's content matches its parent's identityuniqueness— mean distance to siblings (2.0 = N/A)score— harmonic mean of fit and uniquenesschildrenFit/childrenUniqueness/childrenScore— direct children quality (containers only)subtreeFit/subtreeUniqueness/subtreeScore— aggregate descendant health (containers only)overallScore— leaf: score. Container: harmonic mean of score and subtreeScoreworstPair— the most similar child pairbestUncle— the uncle node where this unit would fit better (name + score)
Snapshot System
Snapshot = Isolated SQLite Database
Each snapshot is a self-contained SQLite database file. This enables clone-and-mutate workflows — copy a snapshot, run analysis/simulation on the clone, discard when done. The original stays untouched.
Snapshot ID
Format: <timestamp>-<short-commit-hash>
1711700400-a3f8b2c- Sortable by time for history ordering
- Tied to code state via commit hash
- Deduplicable — same commit produces same hash suffix
Local Storage Layout
.kural-db/
<branch>/
active.db # current snapshot, clone-able
advise.db # ephemeral clone for AI simulation
history/ # rotated snapshots (max 10)
<snapshot-id>.dbSnapshot Lifecycle
- Active snapshot is rotated to
history/<snapshot-id>.db; a freshactive.dbis created - Oldest unpinned history snapshots are evicted when count exceeds 10
See How snapshots are built and cached for the per-run flow.
Schema Design
Database Tables
| Table | Purpose |
|---|---|
files | Source files with embeddings, imports, descriptions |
types | Type/interface/class declarations with fields, refs |
functions | Functions with params, return types, purity, call graph |
directories | Directory hierarchy with children, descriptions |
scores | Structural health metrics per file/directory |
metadata | Key-value store (created_at, model_id, schema_version, axes) |
Schema Versioning
schema_versionis stored in every snapshot's metadata from day one- Additive-only evolution is the default — new nullable columns only, no removals or renames
- Old snapshots have NULLs for new columns; readers handle them gracefully
- Breaking changes (rare) require a one-time migration over existing history snapshots
Embedding Model Versioning
model_idis stored in snapshot metadata for context- Scores are precomputed numbers — no cross-snapshot vector comparison needed
- If the model changes and scores shift,
model_idin metadata explains why
What's next
Planned work — server tier, cloud sync, and the reactive dashboard — lives on the Roadmap page.