FAQ
Answers to the questions a skeptic would ask
Isn't this just another linter?
TypeScript isn't a linter either. Both are discipline layers — TypeScript checks type discipline, kural checks structural discipline. Linting checks style; type checking checks contracts; kural checks whether code lives in the right directory, whether siblings overlap in purpose, whether a module's description matches what it actually contains.
You can have zero lint errors, full type safety, and still have a function that belongs in analysis/ sitting in utils/, silently pulling every module that imports it toward the wrong neighborhood in vector space. That's the class of problem kural catches — and the class no linter or type checker ever will, because it's structural, not syntactic.
Doesn't the annotation burden defeat the purpose?
The AI annotates, not you. After you install skills with kural skill, the agent gets full context on description principles and kural params. It reads audit findings and resolves them through the four-phase pipeline (fix docs, then structure, then suppress). The annotations are the agent's own memory of architectural intent, persisting across sessions.
You can bootstrap with just file-level JSDoc and @kuralPure/@kuralCauses on functions. The incomplete-docs audit tells you exactly what's missing. Add @kuralUtil, @kuralPatterns, and @kuralBound as audit findings guide you — see the full params spec for triggers and scope.
Aren't the params just patching a flawed embedding model?
Params express structural realities that no embedding can infer from code — behavioral intent, architectural roles, structural grouping:
@kuralPure— two functions with identical signatures where one hits the database and the other is pure computation. No embedding distinguishes these from code alone.@kuralPatterns— three near-identical siblings generated from one concept. Without the annotation, the system sees duplicates; with it, they collapse into one representative, which is the correct structural reading.@kuralBound inward— a barrel file whose identity derives from what it wires, not what it contains. The embedding from its name+description would be misleading without declaring this.
Params are a type system for the structural space. TypeScript types don't patch a flawed runtime — they express intent that values structurally cannot convey. Params express structural intent that code text structurally cannot convey.
How does an AI agent actually use Kural?
The same way it uses a type checker — to verify its own work before and after writing code. Brief is the pre-flight check; audit is the post-flight check.
- Before writing code (pre-flight): run
kural brief"description of what I'm about to build"— get siblings, utilities, symbols, and patterns to reuse or imitate. The agent sees the structural neighborhood before generating anything. - Write code with descriptions following the skill's principles (exclusivity language, no sibling vocabulary, role-based descriptions).
- After writing (post-flight): run
kural audit— surface findings the waytscsurfaces type errors. - Resolve findings through the four-phase pipeline: fix incomplete docs first, then documentation quality, then structural issues, then suppress with
@kuralResidualonly as a last resort. - Regenerate snapshot:
kural snapshot generate src— scores recalibrate, embeddings update. - Compare:
kural score -c <old-snapshot-id>— verify the fix improved things.
The discipline runs in a loop: brief → write → audit → fix → regenerate. Each cycle sharpens the structural map. The agent both reads and writes the map.
What about snapshot staleness during active sessions?
Snapshots cache embeddings by facet_hash. Only units whose content has changed get re-embedded. On a typical incremental run, most units hit the cache and the generate step completes in seconds.
For longer sessions with many changes, regenerate between meaningful milestones — after a feature branch is complete, after a refactor, or before running kural brief for a new implementation.
What happens when the embedding model changes?
model_id is stored in snapshot metadata. Scores are precomputed numbers — no cross-snapshot vector comparison is needed. If the model changes and scores shift, the dashboard shows the shift and model_id explains why.
Switching models mid-project doesn't break anything, but be aware that audit findings and placement decisions are calibrated to the embedding space of the model that generated the snapshot. Compare scores across snapshots generated by the same model.
Doesn't ask-user mean the system doesn't know?
It's an honest answer, not a failure. The placement engine's four-tier cascade escalates when the statistical signal falls below what known-good placements achieved during calibration. Forcing a placement at low confidence would be worse — it would give a confident answer to a question where no confident answer exists.
For agents, ask-user is a useful signal: it means the query didn't match any existing neighborhood decisively, and the agent should either use more expensive inference for the final placement decision, or create a genuinely new module. The parent neighborhood is still surfaced — it tells the agent where the new concept belongs, even if it can't name the exact directory.
What about the cold start problem?
On a new or undocumented codebase, Kural starts with whatever descriptions exist. The incomplete-docs audit produces the first actionable list — every function missing a description, every directory missing a KURAL.md. Fix those, regenerate, and the structural map becomes meaningful.
You don't need all annotations on day one. Start with descriptions and @kuralPure/@kuralCauses. The audits themselves guide further annotation: vocabulary-bleed tells you when descriptions converge, outliers tells you when something doesn't fit, and @kuralResidual lets you acknowledge intentional architecture without fixing it.
How is this different from RAG-based code search?
RAG retrieves text by similarity — "find me code that looks like this." Kural places units in structural space, scores organizational health, and audits for specific drift patterns that RAG cannot detect:
- RAG can find
computeFitif you search for "fitness metric." It cannot tell you thatcomputeFitandcomputeChildrenFitare structural repetitions that should be grouped (@kuralPatterns). - RAG can find similar functions. It cannot tell you that one directory's description has drifted into another module's vocabulary (
vocabulary-bleed). - RAG can retrieve code.
kural briefretrieves with structural context — siblings, utilities, symbols, patterns — so the agent knows not just what exists, but how it relates and whether the new code belongs nearby.
Brief does retrieve, but within a scored, audited space. The retrieval is a byproduct of placement, not the other way around.
There's also a cost dimension. Many RAG pipelines pay LLM-rerank or LLM-summarization costs on every query. Kural's entire pipeline runs on embedding-tier compute — orders of magnitude cheaper, locally runnable, and fast enough that an agent can call kural brief before every implementation without thinking about cost.
Can Kural handle large codebases?
The embedding pipeline caches by facet_hash. On subsequent runs, only new or modified units are re-embedded. Merge-candidates and duplicates are computed within sibling groups, not across the entire codebase. The 14 audits use statistical fencing calibrated from each codebase's own distributions, so they scale with the number of sibling groups, not the total unit count.
For very large codebases (10k+ files), the initial embedding pass is the most expensive step. Incremental runs after that are fast.
Does the placement engine make locally greedy decisions?
The four-tier cascade addresses this directly:
- Chain search — softmaxes children at each directory level, with an adaptive temperature (
T = 1/numChildren) that makes routing more decisive when there are more choices. - Alien detection — catches queries that are semantically foreign to the entire codebase, before any directory-level choice is made.
- Bridge classification — when confidence is low but the query isn't alien, classifies the query against seven architectural archetypes (orchestrator, processor, presenter, resolver, gateway, adapter, entry-point) and routes by layer.
- Safety gate — when confidence is between
bridgeThresholdandsafetyGate, escalates rather than forcing a bad placement.
The chain search is locally greedy, but three escape hatches catch cases where local information is insufficient.
What if my codebase isn't TypeScript?
Currently TypeScript-only. The parser extracts types, functions, and JSDoc from the TypeScript AST. The structural scoring concepts generalize — fit, uniqueness, audits, and placement are language-agnostic — but the ingestion pipeline would need a new parser for each language.
Can I suppress audits for an entire directory?
Currently, @kuralResidual suppresses per-finding per-unit, and audits.disable in config disables an audit entirely. There is no path-scoped suppression — suppressing outliers for src/legacy/ only is not yet supported. This is on the project backlog.
For directories with entrenched but intentional architecture, annotate with @kuralBound, @kuralUtil, or @kuralBorrows — these params suppress specific audits implicitly and apply to the entire annotated unit.
What do the audits cover, and why this set?
The 14 audits aren't an arbitrary number — each one measures a distinct structural property that the others can't catch. They cover six categories using two statistical strategies:
| Category | Audits | Strategy |
|---|---|---|
| Bloating & Size | bloated-directories, bloated-files | Dendrogram gap |
| Outliers & Cohesion | outliers, merge-candidates | Z-score, MAD-based |
| Containment | containments, misplaced | Z-score |
| Cross-module duplicates | duplicates, util-duplicates | Z-score |
| Vocabulary & Coherence | focal-drift, vocabulary-bleed, incoherent, incoherent-utils, weak-identity | Z-score, MAD-based, deterministic |
| Documentation | incomplete-docs | Rule-based |
Each audit measures a different structural property, but all fences flow through the same sensitivity parameter. One knob governs them all — see Tuning for how every threshold resolves to sensitivity, data, or design intent.
Doesn't bad description quality make the whole system unreliable?
Descriptions carry 50% weight in the identity vector, so yes — garbage in, garbage out. But the system has self-correcting feedback loops:
incomplete-docsflags missing descriptions,@param,@returns, and@kuralPure/@kuralCauses.vocabulary-bleedflags when descriptions converge on another module's domain vocabulary.incoherentflags when a directory's description says one thing but its content says another.weak-identityflags when most children fit an uncle better than their own parent.
An agent that writes lazy descriptions will see these audits fire. Fixing the descriptions recalibrates the embeddings. The skill teaches the agent how to write them well. The loop is: bad descriptions → bad scores → audit findings → agent fixes descriptions → scores improve.
What's the single most important thing to get right?
Descriptions. Every other input — paths, signatures, call graphs, @kuralPure/@kuralCauses — is extracted from code. Descriptions are the primary human (or agent) input, and they carry 50% of the identity vector. Write them with exclusivity language, anchor with metaphors, and never borrow vocabulary from sibling modules. If descriptions are good, the system works. If they're vague, nothing else compensates. See Getting Started for the description principles in practice.