Audits

Audits surface specific, actionable issues in the codebase's structural organization. Where scoring produces aggregate health numbers, audits produce named findings — "this function is an outlier", "these two types are near-duplicates", "this directory's vocabulary bleeds into its sibling's domain."

Score and Audit are two lenses on the same vector space, intentionally different to cross-validate each other. Fixing audit findings should improve scores. If a fix drops the score, it introduced a new issue — the chain isn't complete.

1. How Audits Work

Every audit applies statistical fencing to detect values that deviate from the expected distribution. Two fencing strategies handle different data characteristics.

Standard fencing (Z-score)

upperFence = mean + sensitivity × σ
lowerFence = mean - sensitivity × σ

Used when the sample size is large enough for parametric assumptions. Detects merge candidates, containments, misplaced nodes, and incoherent containers.

Robust fencing (MAD-based)

spread            = max(MAD, IQR/2)
robustUpperFence  = median + sensitivity × 1.4826 × spread
robustLowerFence  = median - sensitivity × 1.4826 × spread

MAD (Median Absolute Deviation) resists outliers better than standard deviation. The constant 1.4826 makes MAD consistent with σ for normal data. The spread estimate is max(MAD, IQR/2) — under normality the two are equal (IQR ≈ 2·MAD), so the max returns MAD on well-behaved data. When the center concentrates while tails persist (typical of tightened embedding distributions), MAD shrinks faster than IQR/2 and the IQR-derived value floors the spread, preventing fence collapse. When both MAD and IQR are zero (all values identical), there is no variation and the fence returns its sentinel — no outliers exist. Used for outlier detection and vocabulary bleed.

Dendrogram gap

hasSignificantGap = maxGap > medianGap × (1 + sensitivity)

Hierarchical clustering detects natural split points in a container's children. Used by bloated-directories and bloated-files to identify where a container should be divided.

The sensitivity parameter (default 2.0) controls how many deviations from center constitute an anomaly. Higher values produce stricter thresholds and fewer findings.

2. The 14 Audits

Bloating & Size

bloated-directories — Detects directories whose children cluster into distinct groups via dendrogram gap analysis. Requires at least 3 children to attempt clustering, and each cluster must have at least 3 members to count as substantial. Suggests a natural split point for reorganization. In the Kural codebase, src/ carries @kuralResidual bloated-directories [070d22b3] because its breadth — spanning ingestion, scoring, audits, storage, commands, and UI — is the intended architecture.

bloated-files — Same dendrogram logic applied to functions and types within files. Distinguishes type-vs-function splits (natural) from semantic clusters (actionable).

Outliers & Cohesion

outliers — Detects children whose mean similarity to siblings falls below a robust lower fence. Uses cosine similarity between each child and all its siblings, then flags children whose mean is statistically low. The per-parent spread is floored by a codebase-derived baseline — Q1 of pooled |value − group_median| across every sibling group — so a tight inlier band cannot collapse the fence onto itself when one sibling stands apart. Excludes util, helper, and @kuralBound inward nodes. Populates a shared outlierKeys set that downstream audits use for adjusted thresholds.

merge-candidates — Detects sibling pairs whose similarity exceeds the upper fence — near-duplicates within the same parent. Separate fences for file-level pairs (computed from file sibling similarities) and leaf-level pairs (computed from leaf sibling similarities). Filters out caller-callee pairs and @kuralBound inward nodes.

Containment & Hierarchy

containments — Detects parents where one child's dominance gap is an upper outlier. The parent is essentially a wrapper around a single child. In src/ui/hero.ts, renderHero dominates its parent — but this is declared @kuralBound outward, so the containment finding on the parent is suppressed.

misplaced — Detects nodes that fit better under a different parent. Uses the bestUncle metric: if a node's similarity to an uncle exceeds its similarity to its own parent by a statistically significant delta, the node may belong elsewhere. Applies a more lenient fence for nodes already identified as outliers. In the Kural codebase, src/commands/audit/ carries @kuralResidual misplaced [ab944cdf] — audit command logic must live near commands, even though it's semantically close to src/audits/.

Cross-Module Duplicates

duplicates — Detects semantically identical units separated by module boundaries. Covers cross-file leaves (functions/types in different files), cross-directory files, and cross-population pairs (util vs domain). Filters out pairs sharing the same @kuralPatterns group, caller-callee pairs, and companion groups.

util-duplicates — Detects util-scoped units in separate files whose embeddings exceed the merge fence. Covers the util-to-util gap that the main duplicates audit doesn't reach.

Vocabulary & Identity

focal-drift — Applies only to @kuralBound outward units. Detects when the declared focal node is no longer the most similar child to its parent — the file's purpose has shifted. Either the tag is stale or a new function has grown to overtake the declared focal. Reports the overtaking child's name and both similarity values.

vocabulary-bleed — Detects directories whose identity embedding drifts closer to a non-sibling module than to their weakest sibling. In embedding space, this means the directory's description borrows vocabulary from outside its domain. Reports the top 3 closest non-siblings pulling the identity away. Requires at least 2 other candidates for the statistical test.

Documentation & Coherence

incoherent — Detects non-util containers whose identity-to-content similarity is a lower outlier. The name and description say one thing; the actual content says another. Capped at the median label-fit to avoid flagging near-perfect alignment as an issue. Pattern nodes (synthetic centroids materialised from @kuralPatterns-tagged code) are skipped — their names are intentionally abstract category labels, so identity-content similarity is meaningless by construction.

incoherent-utils — Same logic for util containers, scored within their own population.

weak-identity — Detects containers where more than half the children fit a sibling module better than their own parent. Uses the drift ratio — the fraction of children whose uncle fit exceeds parent fit — and flags containers where this ratio is a Z-score upper outlier. Reports the strongest competing uncle and the parent-vs-uncle fit comparison. Requires at least 2 eligible children for the statistical test.

incomplete-docs — Detects units missing required documentation. Functions need description, @param, @returns, and @kuralPure or @kuralCauses. Types need description. Files need file-level JSDoc. Directories need KURAL.md with a description.

3. The Feedback Loop

Audits and scores cross-validate:

Score reveals regions — a low childrenUniqueness tells you siblings are clumped

Audit reveals specifics — merge-candidates names the exact pair

Fix the finding — move, rename, merge, or split as the audit suggests

Re-run — if the score improves, the fix helped

Chain fixes — if the score drops, check what new audit finding appeared. The chain continues until no new findings are introduced and scores improve

A single fix that drops the score isn't a failure — it's a signal that the fix introduced another issue. A proper chained fix sequence resolves cleanly.

4. Suppression

@kuralResidual

When an audit finding is architecturally intentional, @kuralResidual suppresses it:

@kuralResidual <audit-name> [<hash>]

The hash ties the suppression to the current code structure. If the code changes, the hash becomes stale and the suppression breaks — forcing re-evaluation.

Examples from the Kural codebase:

Location	Suppressed audit	Reason
`src/KURAL.md`	`bloated-directories [070d22b3]`	Root intentionally spans the full lifecycle
`src/config/KURAL.md`	`containments [576a50e8]`	Config intentionally contains all settings
`src/ingestion/parse/jsdoc.ts`	`containments [a1e18864]`	JSDoc parser intentionally dominates its file
`src/commands/audit/KURAL.md`	`misplaced [ab944cdf]`	Audit command must live near commands, not audits

Implicit suppression

@kuralBound and @kuralHelper suppress specific audits automatically — see Kural Params for the full suppression matrix.

5. Audit Execution Order

Audits run in a defined sequence. The outliers audit runs early and populates a shared outlierKeys set in the audit context. Downstream audits use this set — misplaced applies a more lenient fence for nodes already identified as outliers, since an outlier that also appears misplaced deserves a lower threshold for the suggestion.

6. Configuration

{
  sensitivity: 2.0,       // Standard deviations from center (higher = stricter)
  disable: []             // Audit names to skip entirely
}

Sensitivity is the only knob. At 2.0, findings are moderate — most real issues surface without excessive noise. At 3.0, only strong outliers are flagged. Below 1.5, expect many findings. See Tuning for how every threshold in the system resolves to sensitivity, data, or design intent.

On this page