KURAL
Codebase realities

Pattern Nodes

One concept, one vote — deduplication via invisible folders

Units tagged with the same @kuralPatterns ID are structural repetitions of a single concept. The scoring, auditing, embedding, and placement systems all need to treat them as one entity — but today each system reimplements this reduction independently via deduplicateByGroup. Pattern nodes unify this by materializing the group as a real node in the in-memory tree.


1. The Problem

A file with 3 fitMetric-tagged functions and 2 ungrouped functions has 5 children. Three of those children are near-identical by design. This causes:

SystemDistortion
EmbeddingFile's leaf vector is 60% "fit computation" — skewed by repetition count
ScoringRequires deduplicateByGroup centroid collapse before every uniqueness computation
AuditingEvery audit must filter pattern pairs before comparing siblings
PlacementNear-identical siblings cannibalize each other's softmax probability

All four systems solve the same problem in isolation. The tree should solve it once.


2. Pattern = Invisible Folder

A pattern group is an unnamed container nested inside a file. Its members are structural instances of one concept, just as files inside a directory are structural members of one module.

metrics.ts                       <- visible file (5 children today)
  [fitMetric]/                   <- pattern node (invisible folder)
    computeFit
    computeChildrenFit
  [uniquenessMetric]/            <- pattern node (invisible folder)
    computeUniqueness
    computeChildrenUniqueness
  findBestUncle                  <- ungrouped, direct child of file

After materialization, the file has 3 children: two pattern nodes and one function. Each child contributes one concept.


3. Materialization

Pattern nodes are synthesized in buildTree, after all DB-sourced nodes are wired. They exist only in the in-memory NodeMap — no database record is created.

Algorithm

materializePatterns(nodes):
  for each file node:
    group leaf children by patterns field
    for each group with 2+ members:
      create PatternNode:
        key:       pattern:{filePath}:{patternId}
        kind:      "pattern"
        name:      patternId
        identity:  centroid(members.map(m => m.identity))
        leaf:      centroid(members.map(m => m.leaf))
        childKeys: [member keys]
        parentKey: file key
      reparent members: set member.parentKey = pattern key
      replace members in file.childKeys with pattern key

Node Type

type PatternNode = BaseNode & {
  kind: "pattern";
  childKeys: string[];
};

Added to the CodeNode union. isLeaf returns false for pattern nodes.


4. Impact by System

Embedding

Before: blendFiles() computes mean(allChildLeaves) — 3 fitMetric leaves inflate the file's leaf toward "fit computation."

After: blendFiles() computes mean(patternLeaves + ungroupedLeaves) — the fitMetric pattern node contributes one leaf vector. Three concepts, equal weight, 33% each.

The embedding pipeline does not need the tree. At the blendFiles step, children are grouped by their patterns field. Per-group centroids replace individual members in the mean computation. The formula becomes:

file.leaf = 0.5 x file.identity + 0.5 x mean(groupCentroids + ungroupedLeaves)

Directory leaf computation is unaffected — directories aggregate file/subdirectory leaves, which are already corrected.

Scoring

deduplicateByGroup in metrics.ts is no longer needed. A file's getEligibleChildren returns pattern nodes and ungrouped leaves. Pattern nodes participate in uniqueness computation like any container — their identity (the centroid) is the representative vector. getEligibleChildren also implements the util sandbox: util parents include all children, domain parents exclude util children.

Fit for pattern members measures cosineSimilarity(patternNode.identity, member.leaf) — "does this member match the pattern's centroid?" A mis-tagged member scores low, which is correct. Util containers now get fit and childrenFit when their parent is also util — within the util tree, these measurements are meaningful.

Auditing

Pattern members are no longer siblings of each other — they live under different parents (the pattern node vs other pattern nodes / the file). Sibling pair collection skips pattern nodes as parents since intra-pattern similarity is by design. Duplicate detection skips pairs sharing the same pattern ID, and also skips cross-pattern pairs within the same file (different pattern nodes sharing a file parent). Outliers and containments still audit pattern internals — a pattern group could have a mis-tagged member or be dominated by one child.

deduplicateByGroup in audits/groups.ts is no longer used by scoring — the tree handles it. Bloated, outliers, and containments audits iterate getEligibleChildren, which returns pattern nodes directly.

Placement

When routing a query down the tree, each child gets a softmax probability. Pattern nodes get one probability instead of N near-identical members competing. Routing accuracy scales with conceptual distinctness.

Subtree Aggregation

Pattern nodes are containers, so collectSubtree traverses them. However, a pattern node's childrenFit is ~1.0 by construction (members are close to their own centroid). This would dilute subtree fit averages with uninformative perfect scores.

Fix: collectSubtree skips kind === "pattern" nodes from contributing their own scores. They still collect and propagate their children's scores upward.


5. What Pattern Nodes Do NOT Have

PropertyFile/DirectoryPattern Node
DescriptionHuman-writtenNone
Database recordPersistedIn-memory only
Identity sourceEmbedded from facetsCentroid of member identities
Leaf sourceAggregated from childrenCentroid of member leaves

The patterns field on function/type rows is the source of truth. buildTree materializes the hierarchy every run.


6. Nesting (Deferred)

A unit with two @kuralPatterns tags would represent nested grouping — a sub-pattern within a pattern. No unit in the codebase carries multiple tags today. The patterns field remains string | null.

When needed, the materialization algorithm extends naturally: group by first tag, then sub-group within each group by second tag, centroid upward level by level. The patterns field would change to string[] | null.


7. Relationship to Real Folders

If someone creates a real fit/ directory and moves pattern members into it, the directory absorbs the pattern. Leftover members tagged fitMetric but still in the original file are misplaced — the misplaced audit flags them because they fit the new fit/ directory better than their current parent.

The pattern tag on the leftover is a breadcrumb naming exactly where it should go.

StateMeaning
Pattern, no folderGrouping exists in code, not yet in filesystem
Folder, no patternGrouping exists in filesystem, tag can be removed
Partial overlapMisplacement — audit detects and suggests the move

On this page