Architecture

System overview — pillars, data model, snapshot system, and the embeddings-not-LLMs design choice

System Overview

Kural is organized around four pillars — Embed, Score, Audit, Place — that cross-validate each other. This page covers the mechanics underneath: how snapshots are built and cached, and what the data model looks like.

Today, kural ships as a single tier:

CLI — scan, score, local snapshots on disk under .kural-db/.

A server tier and web dashboard are planned — see Roadmap.

How snapshots are built and cached

A snapshot is the serialized output of the Embed and Score pillars, written once per kural snapshot generate run. Every other command (audit, score, place, brief, advise) reads from a snapshot rather than rebuilding one.

Walk and parse — TypeScript-only today. The AST pass extracts functions, types, and descriptions; anything outside that surface is invisible to the rest of the system.

Embed with cache — each unit is keyed by facet_hash. Unchanged units are served from the previous active.db; only new or modified units hit the embedding provider. This is why first runs are slow and subsequent runs complete in seconds.

Score and store — fit, uniqueness, and subtree metrics are computed and written to a fresh active.db. The previous active.db is rotated into history/<snapshot-id>.db.

Snapshot layout, rotation rules, and cache-invalidation keys live in Database.

Why embeddings, not LLMs

A deliberate architectural choice runs through every pillar: kural uses embedding models, not LLMs, for the work it does itself. Placement, similarity, drift detection, and clustering are vector problems — they don't need reasoning, they need a well-distributed space and statistical fences. Embeddings give exactly that, at two-to-three orders of magnitude lower cost than frontier LLMs and with the option to run fully local (BGE, Nomic, Qwen3 Embedding via Ollama).

The consequence: a full embedding pass over a 50k-LOC codebase costs cents at hosted rates and zero locally. An agent can run kural brief before every implementation, and kural audit after every meaningful change, without thinking about cost. Reasoning is reserved for the user's coding agent, where it actually buys something. Kural is the structural layer underneath — fast, local-capable, and cheap enough to call repeatedly.

Tech Stack

Layer	Tool	Role
CLI framework	Gunshi	Command routing, plugin system
AI	Multiple providers (Vercel, OpenAI, OpenRouter, Ollama)	Embeddings, advise
Local persistence	TanStack DB + node-sqlite-persistence	Snapshot storage

Data Model

KuralUnit (base for all)

Every parsed unit carries:

name, path, description
identityEmbedding — name + description vector
leafEmbedding — name + description + structural signature vector
facetHash — SHA256 for cache invalidation

Unit Types

KuralFile — source files with functions, types, imports
KuralType — interfaces, classes, type aliases with fields and references
KuralFunction — functions with params, return types, purity, call graph
KuralDirectory — directory hierarchy with children, descriptions from KURAL.md

Scores

Every node (types, functions, files, directories) gets a ScoreCard:

fit — how well the node's content matches its parent's identity
uniqueness — mean distance to siblings (2.0 = N/A)
score — harmonic mean of fit and uniqueness
childrenFit / childrenUniqueness / childrenScore — direct children quality (containers only)
subtreeFit / subtreeUniqueness / subtreeScore — aggregate descendant health (containers only)
overallScore — leaf: score. Container: harmonic mean of score and subtreeScore
worstPair — the most similar child pair
bestUncle — the uncle node where this unit would fit better (name + score)

1711700400-a3f8b2c

Sortable by time for history ordering
Tied to code state via commit hash
Deduplicable — same commit produces same hash suffix

Local Storage Layout

.kural-db/
  <branch>/
    active.db       # current snapshot, clone-able
    advise.db       # ephemeral clone for AI simulation
    history/        # rotated snapshots (max 10)
      <snapshot-id>.db

Snapshot Lifecycle

Active snapshot is rotated to history/<snapshot-id>.db; a fresh active.db is created
Oldest unpinned history snapshots are evicted when count exceeds 10

See How snapshots are built and cached for the per-run flow.

Schema Design

Database Tables

Table	Purpose
`files`	Source files with embeddings, imports, descriptions
`types`	Type/interface/class declarations with fields, refs
`functions`	Functions with params, return types, purity, call graph
`directories`	Directory hierarchy with children, descriptions
`scores`	Structural health metrics per file/directory
`metadata`	Key-value store (created_at, model_id, schema_version, axes)

Schema Versioning

schema_version is stored in every snapshot's metadata from day one
Additive-only evolution is the default — new nullable columns only, no removals or renames
Old snapshots have NULLs for new columns; readers handle them gracefully
Breaking changes (rare) require a one-time migration over existing history snapshots

Embedding Model Versioning

model_id is stored in snapshot metadata for context
Scores are precomputed numbers — no cross-snapshot vector comparison needed
If the model changes and scores shift, model_id in metadata explains why

What's next

Planned work — server tier, cloud sync, and the reactive dashboard — lives on the Roadmap page.