Agent Deep-Dive Analysis

Design Decisions

1. Skill-Per-Directory Architecture

What it is: Each skill is a self-contained directory with a SKILL.md frontmatter file, optional references/ subdirectory for methodology docs, and optional templates/ for output generation.

Where it appears: .claude/skills/debate/, .claude/skills/jit-learning/, .claude/skills/visualize-codebase/, etc.

What problem it solves: Skills can be installed, updated, or removed independently. A user can copy just debate/ without pulling in visualization templates. Claude Code auto-discovers skills from the directory structure.

What would break without it: All skill logic would live in a single monolithic instruction file. Updates to one skill would risk regressions in others. Users couldn't selectively install skills.

Alternatives: A single SKILLS.md file with all skills concatenated was the original approach. It was abandoned because (a) context window bloat on every invocation and (b) no way to selectively distribute individual skills via make copy-claude-skill SKILL=X.

2. Template-Driven Generation with Structural Fidelity

What it is: All generated HTML files must match a canonical template skeleton exactly. Only project-specific content (data arrays, text) may differ. CSS, DOM structure, class names, and navigation placeholders must be verbatim copies.

Where it appears: visualize-codebase/templates/index.html, interactive.html, deep-dive.html. Phase 4 validation explicitly diffs structural skeletons.

What problem it solves: Prevents CSS/DOM drift across regenerations. When the visualize-index skill injects shared navigation via , it can rely on the placeholder being in the exact right position. Ensures visual consistency across all projects in a gallery.

What would break without it: Each regeneration could introduce subtle layout differences. The gallery navigation injection would fail if the comment placeholder was missing or moved. Cross-project styling would diverge.

Alternatives: Using a templating engine (Jinja2, Handlebars) was considered but rejected because it would add a dependency. The current approach keeps templates as plain HTML files that Claude copies and fills in.

3. Data-Driven Python SVG Generator

What it is: A single Python script (generate_visuals.py, ~800 lines) with two sections: (1) shared helper functions and renderers that never change, and (2) a DATA dict that gets replaced with project-specific content.

Where it appears: visualize-codebase/templates/generate_visuals.py

What problem it solves: Zero pip dependencies. Deterministic SVG output. The LLM only needs to fill in a data structure, not write rendering code. All XML escaping is handled by the shared helpers.

What would break without it: Each visualization would require hand-crafted SVG strings with ad-hoc escaping, leading to XML parsing failures from unescaped & characters. Output would be inconsistent across projects.

Alternatives: SVG libraries (svgwrite, drawSvg) add pip dependencies. D3.js-based generation requires Node.js. Browser-rendered SVG (via headless Chrome) adds infrastructure complexity. The stdlib-only approach is the simplest that works.

4. Three-Pass Adversarial Review (Advocate/Critic/Judge)

What it is: The debate review skill enforces three distinct analytical passes with different objectives. The Advocate steelmans first (Rapoport Rules), the Critic challenges assumptions (ACH methodology), and the Judge synthesizes a verdict.

Where it appears: .claude/skills/debate/SKILL.md and references/review.md

What problem it solves: Prevents the common failure mode of jumping straight to criticism without understanding. Forces the reviewer to demonstrate comprehension before challenging. The Judge pass prevents both false positives (Critic overreach) and false negatives (Advocate bias).

What would break without it: Single-pass reviews tend to be either too generous (missing real issues) or too harsh (nitpicking without context). The three-pass structure produces balanced, actionable findings.

Alternatives: Single-pass review (standard code review), two-pass (pro/con only), or unstructured adversarial analysis. The three-pass approach was chosen because it maps to established intelligence analysis tradecraft (ACH) and philosophical methodology (Hegelian dialectic).

5. Permission Model: Deny-by-Default with Explicit Allow List

What it is: settings.json explicitly lists every Bash command pattern that Claude is allowed to run, plus a deny list for destructive operations.

Where it appears: .claude/settings.json

What problem it solves: Prevents accidental destructive operations like rm -rf, git push, or helm install. The allow list ensures only read-only and generation commands run without user confirmation.

What would break without it: Claude could accidentally push code, delete files, or modify cluster state. The deny list for git push is especially important because skills like /commit stage and commit but should never push without explicit user approval.

Alternatives: Relying on Claude's built-in safety checks alone, or using a broader allow list. The explicit approach was chosen because the cost of a false positive (blocking a safe command) is low, while the cost of a false negative (running a destructive command) is high.

Invariants

Skill Structure Constraints

Every skill must have a SKILL.md: Claude Code discovers skills by the presence of this file. Without it, the skill directory is invisible.
SKILL.md frontmatter must include name and description: The description field contains trigger patterns that Claude Code uses for intent matching.
Allowed tools must be explicitly listed: If a skill needs Bash access, the SKILL.md must declare it. This prevents skills from silently escalating their tool usage.

Template Fidelity Constraints

 must appear immediately after <body>: The visualize-index skill replaces this comment with a <script> tag. If it's missing or moved, navigation injection fails silently.
CSS must be single-line compact format: The template validation in Phase 4 compares CSS blocks. Multi-line or reformatted CSS would fail structural conformance checks.
Generated HTML must be standalone: No build step, no npm install, no server. Open the file in a browser and it works.

SVG Generation Constraints

All text must be XML-escaped: Every string passed to SVG renderers goes through xml.sax.saxutils.escape(). Unescaped &, <, or > characters produce invalid XML that xmllint rejects.
Output files must validate with xmllint: Phase 4 runs xmllint --noout on every generated SVG. A broken SVG is a bug, not a warning.
Dark theme colors are fixed: Background #0f172a, primary text #e2e8f0, muted text #94a3b8. Changing these would break visual consistency across the gallery.

Methodology Constraints

Debate review must complete all three passes: Skipping the Advocate pass means the Critic lacks context. Skipping the Judge pass means no verdict synthesis.
Red team must apply the 10th Man Rule: Even if all 5 attack vectors find no issues, the methodology requires constructing at least one plausible failure scenario.
JIT Learning must produce exactly 5 pages: One page per Bloom's Taxonomy step. This constraint ensures comprehensive coverage and prevents shallow analysis.

Idempotency

SVG generation is idempotent: Running the Python script twice with the same DATA dict produces identical output.
Gallery index rebuild is idempotent: Running /visualize-index twice produces the same gallery (assuming no new projects were added).
Navigation injection is idempotent: Replacing  with a script tag, then running again, does not duplicate the script tag (the comment is consumed).

Failure Modes

Codebase Visualization Flow

Step	What Can Fail	Current Handling	Correctness
Phase 0: Explore	Subagents return incomplete findings (miss routes, types, etc.)	Multiple subagents with overlapping scope; manual review of findings before generation	Partial — depends on subagent thoroughness; no automated completeness check
Phase 1: SVG generation	Unescaped XML characters in DATA dict strings	`xml.sax.saxutils.escape()` applied to all text; xmllint validation catches survivors	Correct — two layers of defense
Phase 1: SVG generation	Python script crashes due to missing DATA keys	KeyError traceback; slide not generated	Partial — fails loud but no graceful degradation
Phase 2: Interactive HTML	GSAP CDN unavailable (network down, CDN outage)	No fallback; animations silently fail; static layout still renders	Partial — page is usable but animations don't work
Phase 4: Validation	xmllint not installed on system	Command fails; user sees error	Correct — validation is explicit, not silent
Gallery index build	Project folder missing slide_01_eli5.svg	Folder is excluded from discovery (requires both slide_01 and index.html)	Correct — explicit inclusion criteria

Debate Review Flow

Step	What Can Fail	Current Handling	Correctness
Mode parsing	Unrecognized mode (not review/red-team/steelman)	SKILL.md defaults to review mode	Correct — safe default
Pass 1: Advocate	LLM doesn't steelman effectively (confirms bias instead)	Rapoport Rules enforce structure: restate, agree, learn, then critique	Partial — methodology helps but can't guarantee LLM follows it perfectly
Pass 2: Critic	False positive findings (Critic finds issues that don't exist)	Pass 3 Judge rejects Critic overreach as a dedicated category	Correct — built-in correction mechanism
Evidence matrix	Evidence supports both sides equally (non-diagnostic)	ACH methodology rates diagnostic value; low-diagnostic evidence is flagged	Correct — methodology handles this case explicitly

Cascading Failures

If settings.json is corrupted: Claude Code falls back to default permissions. All Bash commands require manual approval. Skills still work but every command prompts.
If a template file is deleted: The corresponding skill fails at generation time. Other skills are unaffected (skill isolation). The error is visible in Claude's output.
If CLAUDE.md is removed: Claude Code loses operational guidelines but skills still function. Commit format, task management, and verification practices degrade to defaults.

Architecture Strengths

Zero external dependencies: No pip install, no npm install, no Docker required. Python uses only stdlib. HTML uses a single CDN script (GSAP). This eliminates version conflicts, supply chain risks, and installation friction.
Skill isolation: Each skill is a self-contained directory. Adding, removing, or updating one skill cannot break another. This enables selective distribution via make copy-claude-skill SKILL=X.
Template-driven idempotent generation: Output is deterministic given the same input data. Templates enforce visual consistency. Structural validation catches drift.
Methodology grounding: Skills aren't ad-hoc prompts. They're grounded in established frameworks: ACH (intelligence analysis), Rapoport Rules (philosophy of argument), Bloom's Taxonomy (cognitive science), Hegelian dialectic (synthesis).
Progressive disclosure: The 11-slide altitude descent (50,000 ft to ground) means any audience can engage at their level. Executives see slide 1-3; engineers dive to slide 8-11.

Architecture Risks

LLM fidelity to methodology: Skills describe detailed methodologies, but the LLM may not follow them precisely on every invocation. There's no runtime enforcement — only the quality of the prompt matters.
Template rigidity vs. customization: The strict template conformance rule means any visual customization requires modifying the canonical template, which affects all future projects. There's no per-project override mechanism.
Single-CDN dependency for interactivity: If the GSAP CDN is unavailable, interactive walkthroughs lose all animation. A local fallback copy would eliminate this risk but adds a file to maintain.
No automated testing: The project has no test suite. Validation relies on xmllint for SVGs and manual template conformance checks. A regression in the Python generator or HTML templates would only be caught at generation time.