CodeDNA Annotation Standard · v0.8 · MIT

The code that
explains itself.

AI tools editing your code make mistakes because they have no context. CodeDNA embeds it directly in the files — like DNA in every cell. Zero RAG. Minimal drift.

+13pp
F1 improvement
Gemini 2.5 Flash, p=0.040
3
LLMs tested
positive Δ on all 3 models
−52%
Retry risk reduction
high-risk runs: 52% → 25%
0
Infrastructure required
no RAG, no vector DB
The Problem

AI makes mistakes when it lacks context.

Imagine asking a contractor to renovate your apartment without showing them the floor plan. Same problem with AI editing your code. CodeDNA is the floor plan.

Sliding Window
AI reads code in windows of 50–100 lines. If a critical rule is defined 200 lines above, they don't see it — and they break it.
❌ Scenario S4: violates MAX_DISCOUNT_RATE
Cascade Effect
You change a function. Three other parts of the system depend on it. Without a map, the AI updates only the file it sees and leaves the rest broken.
❌ Scenario S5: KeyError at runtime in main.py
The Solution: CodeDNA
Context lives in the file itself — not in an external document. Each snippet the AI reads carries architectural constraints for that module.
+13pp F1 on SWE-bench (Gemini 2.5 Flash, p=0.040)
Memory Stack

Where CodeDNA sits in the AI memory stack.

Every AI coding agent relies on multiple memory layers. Most of them are external to the code. CodeDNA is the only layer that lives inside the source files themselves — it travels with every clone, fork, and CI pipeline.

Layer Examples Where it lives Shared across tools?
LLM / Agent Claude, GPT-4, Cursor, Copilot Cloud
External memory Chat history, Memory API Cloud / external DB ✗ tool-specific
Markdown / Config CLAUDE.md, .cursorrules, AGENTS.md Repo (outside source files) partial
CodeDNA exports, rules, agent, message, .codedna Inside every source file + repo root always
Setup

Works with your favorite AI tool.

One command installs CodeDNA rules for any AI coding assistant. Or pick the file for your tool and paste it into your project.

Step 1 — Install for your AI tool (instructions + enforcement hooks)
bash <(curl -fsSL https://raw.githubusercontent.com/Larens94/codedna/main/integrations/install.sh) claude-hooks
# or: cursor-hooks  copilot-hooks  cline-hooks  opencode  windsurf
Step 2 — Annotate existing files (CLI)
pip install git+https://github.com/Larens94/codedna.git
# set ANTHROPIC_API_KEY first
codedna init ./          # first-time: annotates every .py file
codedna update ./        # incremental: only unannotated files
codedna check ./         # coverage report, no changes
Claude Code
Active enforcement
4 hooks: SessionStart, PreToolUse, PostToolUse, Stop. Validates every .py write automatically.
Install guide →
Cursor
Active enforcement
2 hooks in .cursor/hooks/: validates on every file edit, reminds at session end. Requires v1.7+.
Install guide →
GitHub Copilot
Active enforcement
3 hooks in .github/hooks/: session start context, post-write validation, session end reminder.
Install guide →
Cline
Active enforcement
2 hooks in .clinerules/hooks/: TaskStart context injection, PostToolUse validation. Requires v3.36+.
Install guide →
OpenCode
Active enforcement
AGENTS.md + JS plugin in .opencode/plugins/. Validates 11 languages on every write.
Install guide →
Windsurf
Instructions only
Copy .windsurfrules to your project root. Cascade reads it automatically.
Install guide →
Antigravity / Agents
Instructions only
Copy .agents/workflows/codedna.md to your project. Compatible with Antigravity and custom agent frameworks.
View file →

Full Install Guide

How It Works

Three levels. Every snippet is self-contained.

Like biological DNA: cutting it in half produces two fragments that still carry the complete information. With CodeDNA, 10 random lines from anywhere in a file are enough for the AI to act correctly.

Level 1
Module Header (Python-native)
A concise module docstring at the top of every file. The AI reads it before the code and already knows purpose, public API, who depends on this file, and hard constraints. Python-native format — maximises LLM comprehension trained on Python corpora.
pricing.py
"""pricing.py — Pricing engine with tier discounts.

exports: apply_discount(cents, tier) -> int
used_by: checkout.py → build_cart
rules:   NEVER exceed MAX_DISCOUNT_RATE from config.py;
         apply_discount() must cap before returning.
         DB: discount_tiers(tier, multiplier).
"""
Level 2
Sliding-Window Annotations
Rules: docstrings on critical functions, written organically by agents as they discover constraints. Each agent that fixes a bug leaves knowledge for the next. Even in a 10-line extract the AI receives all the context it needs.
pricing.py · lines 210–226
def apply_discount(cents: int, tier: str) -> int:
  """Apply tier discount to price in cents.

  Rules:   MUST cap discount before returning — exceeding
           MAX_DISCOUNT_RATE is a financial compliance bug.
           After fix #42: also check tier != 'internal'.
  """
  discount = get_multiplier(tier)
  discount = min(discount, MAX_DISCOUNT_RATE)
  return int(cents * (1.0 - discount))
Level 3
Semantic Naming
Variable names encode type, shape and origin. A 10-line snippet carries significantly more context — reducing backward tracing for the most common cases.
Comparison
# ❌ Ambiguous — euros? cents?
price = request.json.get("price")
data  = get_users()

# ✅ CodeDNA — type, domain, origin are clear
int_cents_price_from_request = request.json.get("price")
list_dict_users_from_db      = get_users()
Bonus
Manifest-Only Planner
To plan changes across 10+ files, the AI reads .codedna first, then only the module docstring of each file (first 8–12 lines) — building the full map in very few tokens.
Planner — Standard
# 1. Read .codedna — project structure
# 2. Read module docstring (8–12 lines each)
# 3. Filter: used_by mentions target? Include
#    rules mentions task domain? Include
# 4. Build exports → used_by graph
# 5. Open in full ONLY the relevant files
# Cost: ~50 tok × N files = complete map
Illustrative Scenarios

Problems CodeDNA is designed to solve.

5 scenarios that illustrate the categories of errors AI agents make without architectural context. For measured results, see the SWE-bench benchmark below.

Scenario S4
Sliding Window — Hidden Constraint
The AI reads only lines 200–250. The max discount limit is in the manifest (line 7). Without CodeDNA it ignores it and applies illegal discounts of 50%+.
Scenario S5
Cascade Change — Domino Effect
utils.py is modified. Without used_by:, the AI only updates utils.py and leaves main.py with a runtime KeyError.
Scenario S6
Ambiguous Type — Euros or Cents?
price = 1999 — euros or cents? Without semantic naming the AI gets the unit wrong. With CodeDNA: int_cents_price_from_request — zero ambiguity.
Scenario S7
💥 Broken Dependency — Silent Rename
format_revenue()format_currency(). The rules: field records the rename. The Control calls the old name: crash.
Scenario S8
🗺️ Planning — 8 Files, Manifest Only
The AI must find the 2 right files to change by reading only the module docstrings (8–12 lines each). Using the exports:used_by: graph it identifies exactly the 2 files.
Multi-Model SWE-bench

Tested across multiple LLMs.

5 real Django issues from SWE-bench, tested across multiple LLMs. Same prompt, same tools, same tasks. Gemini 2.5 Flash: +13pp F1, p=0.040 · DeepSeek Chat: +9pp F1, p=0.11 · Gemini 2.5 Pro: +9pp F1, p=0.11. All 3 models improve.

File Localization F1 — Control vs CodeDNA by Model

Navigation Demo — django__django-11808 · DeepSeek Chat · 5 runs

CodeDNA navigation demo: without CodeDNA the agent wanders, with CodeDNA it follows the used_by chain

Without CodeDNA: agent opens random files, stops early — 2/10 critical files found.  |  With CodeDNA: follows used_by: chain — 6/10 critical files found.

▶ VS Code Navigation Demo 3 Visual Metaphors Agent Graph Visualizer CodeDNA World
Best Model
🏆 Gemini 2.5 Flash — +13pp F1
From 60% to 72%. Wins 4 out of 5 tasks. Δ up to +21pp on delegation chains (Task 13495). p=0.040, Wilcoxon W+=14, N=5 tasks × ≥5 runs at T=0.1.
Control60%
CodeDNA72%
DeepSeek Chat
DeepSeek Chat — +9pp F1 (p=0.11)
From 50% to 60%. Wins 4/5 tasks. Notable: +35pp on cross-cutting task 11808 — opposite direction from Gemini Flash (−1pp). Task 13495 anomaly (−9pp) under investigation. Not statistically significant.
Control50%
CodeDNA60%
Key Finding
🧠 Model-Agnostic Benefits
4 out of 5 models improve with CodeDNA. The benefit is strongest on tasks requiring cross-module navigation — exactly where AI agents struggle most.
Chain tasks+9% to +21%

🔬 Methodology: 5 SWE-bench Django tasks × 3 models (Gemini 2.5 Flash ✓, DeepSeek Chat ✓, Gemini 2.5 Pro ✓). 5 runs/task at T=0.1. Identical system prompt, same 3 tools (read_file, list_files, grep), max 30 turns. Metric: File Localization F1 (ground-truth files from patch). Statistical test: Wilcoxon signed-rank (one-tailed). Script: benchmark_agent/swebench/run_agent_multi.py.

Comparison

CodeDNA vs. existing approaches

ApproachToken overheadContext driftRetrieval latencySliding-windowInfrastructure
CLAUDE.md / CursorRulesLowMediumZeroNoExternal file
RAG / Vector DBLowMediumHighNoDB + embedding
MemGPTMediumLowMediumNoComplex system
CodeDNA ✦Low (inline)LowZeroYes ✓None
Status

Roadmap

Done
Level 1 — Manifest Header (v0.1–v0.4)
FILE, PURPOSE, DEPENDS_ON, EXPORTS, AGENT_RULES, REQUIRED_BY, DB_TABLES, LAST_MODIFIED.
Done
Level 2 — Sliding-Window Annotations (v0.2–v0.3)
@REQUIRES-READ, @SEE, @MODIFIES-ALSO, @BREAKS-IF-RENAMED — solves the sliding-window problem.
Done
Level 3 — Semantic Naming + CONTEXT_BUDGET (v0.3)
Naming <type>_<shape>_<domain>_<origin>. Manifest-Only Planner Read.
Done
LLM-Optimised Format (v0.5)
Python-native module docstring (L1) + function-level Rules: docstrings (L2). Maximises LLM comprehension trained on Python corpora.
Done
Enterprise Benchmark — 105-file, 3 bugs, 48 distractors
−29% tool calls, 0 incorrect root-cause identifications (vs 1 Control). Replicable on disk.
Done
Multi-Model SWE-bench Benchmark — 5 tasks, 5 runs/task
Gemini 2.5 Flash: ctrl=60%, DNA=72%, Δ=+13pp, p=0.040 · DeepSeek Chat: ctrl=50%, DNA=60%, Δ=+9pp, p=0.11 · Gemini 2.5 Pro: ctrl=60%, DNA=69%, Δ=+9pp, p=0.11
Done
White Paper / arXiv preprint
Formal study with reproducible methodology, DNA analogy, and comparison against SWE-bench, LoCoBench-Agent, ETH Zurich (2026).
Done
Redundancy Audit (v0.8) ✦ Current
Header reduced to 3 fields: exports, used_by, rules. rules: promoted to required — the inter-agent communication channel. Python-only focus.
Done
M1 — CLI & Auto-Annotation
AST skeleton extraction · codedna init, codedna update, codedna check · pip installable · Claude Code Challenge: 7/7 patch files in ~8 min vs 6/7 in ~10–11 min (control). Results →
Done
M3 — Multi-Tool Enforcement Hooks
Active hooks for Claude Code (4), Cursor (2), GitHub Copilot (3), Cline (2), OpenCode (plugin). Validates on every write — no manual reminder needed. Pre-commit hook for all tools.
Done
M4 — Language Extension
CLI supports 10 languages: Python (AST), TypeScript/JS, Go, PHP, Rust, Java, Kotlin, Ruby, C#, Swift. validate_manifests.py supports template engines (Blade, Jinja2, ERB, Vue, Svelte…).
Next
M2 — Benchmark Expansion
20+ SWE-bench tasks across multiple projects · 5+ LLMs · confidence intervals · Zenodo dataset · public dashboard.
Next
M5 — VS Code Extension & GitHub Action
VS Code extension (used_by graph, stale annotation highlight) · GitHub Action for CI/CD validation.
Next
M6 — Research Paper & Dissemination
Finalize paper · submit to ICSE NIER / LLM4Code workshop · contribute annotations to Flask, FastAPI and one non-Python project.

M1–M5 are part of a funding application to NLnet NGI0 Commons Fund. If you find CodeDNA useful, ⭐ the repo and share it.