KURAL

Architecture

System overview — pillars, data model, snapshot system, and the embeddings-not-LLMs design choice

System Overview

Kural is organized around four pillarsEmbed, Score, Audit, Place — that cross-validate each other. This page covers the mechanics underneath: how snapshots are built and cached, and what the data model looks like.

Today, kural ships as a single tier:

  • CLI — scan, score, local snapshots on disk under .kural-db/.

A server tier and web dashboard are planned — see Roadmap.

How snapshots are built and cached

A snapshot is the serialized output of the Embed and Score pillars, written once per kural snapshot generate run. Every other command (audit, score, place, brief, advise) reads from a snapshot rather than rebuilding one.

Walk and parse — TypeScript-only today. The AST pass extracts functions, types, and descriptions; anything outside that surface is invisible to the rest of the system.

Embed with cache — each unit is keyed by facet_hash. Unchanged units are served from the previous active.db; only new or modified units hit the embedding provider. This is why first runs are slow and subsequent runs complete in seconds.

Score and store — fit, uniqueness, and subtree metrics are computed and written to a fresh active.db. The previous active.db is rotated into history/<snapshot-id>.db.

Snapshot layout, rotation rules, and cache-invalidation keys live in Database.

Why embeddings, not LLMs

A deliberate architectural choice runs through every pillar: kural uses embedding models, not LLMs, for the work it does itself. Placement, similarity, drift detection, and clustering are vector problems — they don't need reasoning, they need a well-distributed space and statistical fences. Embeddings give exactly that, at two-to-three orders of magnitude lower cost than frontier LLMs and with the option to run fully local (BGE, Nomic, Qwen3 Embedding via Ollama).

The consequence: a full embedding pass over a 50k-LOC codebase costs cents at hosted rates and zero locally. An agent can run kural brief before every implementation, and kural audit after every meaningful change, without thinking about cost. Reasoning is reserved for the user's coding agent, where it actually buys something. Kural is the structural layer underneath — fast, local-capable, and cheap enough to call repeatedly.

Tech Stack

LayerToolRole
CLI frameworkGunshiCommand routing, plugin system
AIMultiple providers (Vercel, OpenAI, OpenRouter, Ollama)Embeddings, advise
Local persistenceTanStack DB + node-sqlite-persistenceSnapshot storage

Data Model

KuralUnit (base for all)

Every parsed unit carries:

  • name, path, description
  • identityEmbedding — name + description vector
  • leafEmbedding — name + description + structural signature vector
  • facetHash — SHA256 for cache invalidation

Unit Types

  • KuralFile — source files with functions, types, imports
  • KuralType — interfaces, classes, type aliases with fields and references
  • KuralFunction — functions with params, return types, purity, call graph
  • KuralDirectory — directory hierarchy with children, descriptions from KURAL.md

Scores

Every node (types, functions, files, directories) gets a ScoreCard:

  • fit — how well the node's content matches its parent's identity
  • uniqueness — mean distance to siblings (2.0 = N/A)
  • score — harmonic mean of fit and uniqueness
  • childrenFit / childrenUniqueness / childrenScore — direct children quality (containers only)
  • subtreeFit / subtreeUniqueness / subtreeScore — aggregate descendant health (containers only)
  • overallScore — leaf: score. Container: harmonic mean of score and subtreeScore
  • worstPair — the most similar child pair
  • bestUncle — the uncle node where this unit would fit better (name + score)

Snapshot System

Snapshot = Isolated SQLite Database

Each snapshot is a self-contained SQLite database file. This enables clone-and-mutate workflows — copy a snapshot, run analysis/simulation on the clone, discard when done. The original stays untouched.

Snapshot ID

Format: <timestamp>-<short-commit-hash>

1711700400-a3f8b2c
  • Sortable by time for history ordering
  • Tied to code state via commit hash
  • Deduplicable — same commit produces same hash suffix

Local Storage Layout

.kural-db/
  <branch>/
    active.db       # current snapshot, clone-able
    advise.db       # ephemeral clone for AI simulation
    history/        # rotated snapshots (max 10)
      <snapshot-id>.db

Snapshot Lifecycle

  1. Active snapshot is rotated to history/<snapshot-id>.db; a fresh active.db is created
  2. Oldest unpinned history snapshots are evicted when count exceeds 10

See How snapshots are built and cached for the per-run flow.

Schema Design

Database Tables

TablePurpose
filesSource files with embeddings, imports, descriptions
typesType/interface/class declarations with fields, refs
functionsFunctions with params, return types, purity, call graph
directoriesDirectory hierarchy with children, descriptions
scoresStructural health metrics per file/directory
metadataKey-value store (created_at, model_id, schema_version, axes)

Schema Versioning

  • schema_version is stored in every snapshot's metadata from day one
  • Additive-only evolution is the default — new nullable columns only, no removals or renames
  • Old snapshots have NULLs for new columns; readers handle them gracefully
  • Breaking changes (rare) require a one-time migration over existing history snapshots

Embedding Model Versioning

  • model_id is stored in snapshot metadata for context
  • Scores are precomputed numbers — no cross-snapshot vector comparison needed
  • If the model changes and scores shift, model_id in metadata explains why

What's next

Planned work — server tier, cloud sync, and the reactive dashboard — lives on the Roadmap page.

On this page