Agent Deep-Dive Analysis
Design Decisions
1. Skill-Per-Directory Architecture
What it is: Each skill is a self-contained directory with a SKILL.md frontmatter file, optional
references/ subdirectory for methodology docs, and optional templates/ for output generation.
Where it appears: .claude/skills/debate/, .claude/skills/jit-learning/,
.claude/skills/visualize-codebase/, etc.
What problem it solves: Skills can be installed, updated, or removed independently. A user can copy just
debate/ without pulling in visualization templates. Claude Code auto-discovers skills from the directory structure.
What would break without it: All skill logic would live in a single monolithic instruction file. Updates to one skill would risk regressions in others. Users couldn't selectively install skills.
Alternatives: A single SKILLS.md file with all skills concatenated was the original approach. It was
abandoned because (a) context window bloat on every invocation and (b) no way to selectively distribute individual skills via
make copy-claude-skill SKILL=X.
2. Template-Driven Generation with Structural Fidelity
What it is: All generated HTML files must match a canonical template skeleton exactly. Only project-specific content (data arrays, text) may differ. CSS, DOM structure, class names, and navigation placeholders must be verbatim copies.
Where it appears: visualize-codebase/templates/index.html, interactive.html,
deep-dive.html. Phase 4 validation explicitly diffs structural skeletons.
What problem it solves: Prevents CSS/DOM drift across regenerations. When the visualize-index skill
injects shared navigation via <!-- nav:inject -->, it can rely on the placeholder being in the exact right
position. Ensures visual consistency across all projects in a gallery.
What would break without it: Each regeneration could introduce subtle layout differences. The gallery navigation injection would fail if the comment placeholder was missing or moved. Cross-project styling would diverge.
Alternatives: Using a templating engine (Jinja2, Handlebars) was considered but rejected because it would add a dependency. The current approach keeps templates as plain HTML files that Claude copies and fills in.
3. Data-Driven Python SVG Generator
What it is: A single Python script (generate_visuals.py, ~800 lines) with two sections: (1) shared
helper functions and renderers that never change, and (2) a DATA dict that gets replaced with project-specific content.
Where it appears: visualize-codebase/templates/generate_visuals.py
What problem it solves: Zero pip dependencies. Deterministic SVG output. The LLM only needs to fill in a data structure, not write rendering code. All XML escaping is handled by the shared helpers.
What would break without it: Each visualization would require hand-crafted SVG strings with ad-hoc escaping,
leading to XML parsing failures from unescaped & characters. Output would be inconsistent across projects.
Alternatives: SVG libraries (svgwrite, drawSvg) add pip dependencies. D3.js-based generation requires Node.js. Browser-rendered SVG (via headless Chrome) adds infrastructure complexity. The stdlib-only approach is the simplest that works.
4. Three-Pass Adversarial Review (Advocate/Critic/Judge)
What it is: The debate review skill enforces three distinct analytical passes with different objectives. The Advocate steelmans first (Rapoport Rules), the Critic challenges assumptions (ACH methodology), and the Judge synthesizes a verdict.
Where it appears: .claude/skills/debate/SKILL.md and references/review.md
What problem it solves: Prevents the common failure mode of jumping straight to criticism without understanding. Forces the reviewer to demonstrate comprehension before challenging. The Judge pass prevents both false positives (Critic overreach) and false negatives (Advocate bias).
What would break without it: Single-pass reviews tend to be either too generous (missing real issues) or too harsh (nitpicking without context). The three-pass structure produces balanced, actionable findings.
Alternatives: Single-pass review (standard code review), two-pass (pro/con only), or unstructured adversarial analysis. The three-pass approach was chosen because it maps to established intelligence analysis tradecraft (ACH) and philosophical methodology (Hegelian dialectic).
5. Permission Model: Deny-by-Default with Explicit Allow List
What it is: settings.json explicitly lists every Bash command pattern that Claude is allowed to run,
plus a deny list for destructive operations.
Where it appears: .claude/settings.json
What problem it solves: Prevents accidental destructive operations like rm -rf, git push,
or helm install. The allow list ensures only read-only and generation commands run without user confirmation.
What would break without it: Claude could accidentally push code, delete files, or modify cluster state. The deny
list for git push is especially important because skills like /commit stage and commit but should never
push without explicit user approval.
Alternatives: Relying on Claude's built-in safety checks alone, or using a broader allow list. The explicit approach was chosen because the cost of a false positive (blocking a safe command) is low, while the cost of a false negative (running a destructive command) is high.
Invariants
Skill Structure Constraints
- Every skill must have a SKILL.md: Claude Code discovers skills by the presence of this file. Without it, the skill directory is invisible.
-
SKILL.md frontmatter must include
nameanddescription: The description field contains trigger patterns that Claude Code uses for intent matching. - Allowed tools must be explicitly listed: If a skill needs Bash access, the SKILL.md must declare it. This prevents skills from silently escalating their tool usage.
Template Fidelity Constraints
-
<!-- nav:inject -->must appear immediately after<body>: Thevisualize-indexskill replaces this comment with a<script>tag. If it's missing or moved, navigation injection fails silently. - CSS must be single-line compact format: The template validation in Phase 4 compares CSS blocks. Multi-line or reformatted CSS would fail structural conformance checks.
- Generated HTML must be standalone: No build step, no npm install, no server. Open the file in a browser and it works.
SVG Generation Constraints
-
All text must be XML-escaped: Every string passed to SVG renderers goes through
xml.sax.saxutils.escape(). Unescaped&,<, or>characters produce invalid XML that xmllint rejects. -
Output files must validate with xmllint: Phase 4 runs
xmllint --noouton every generated SVG. A broken SVG is a bug, not a warning. -
Dark theme colors are fixed: Background
#0f172a, primary text#e2e8f0, muted text#94a3b8. Changing these would break visual consistency across the gallery.
Methodology Constraints
- Debate review must complete all three passes: Skipping the Advocate pass means the Critic lacks context. Skipping the Judge pass means no verdict synthesis.
- Red team must apply the 10th Man Rule: Even if all 5 attack vectors find no issues, the methodology requires constructing at least one plausible failure scenario.
- JIT Learning must produce exactly 5 pages: One page per Bloom's Taxonomy step. This constraint ensures comprehensive coverage and prevents shallow analysis.
Idempotency
- SVG generation is idempotent: Running the Python script twice with the same DATA dict produces identical output.
-
Gallery index rebuild is idempotent: Running
/visualize-indextwice produces the same gallery (assuming no new projects were added). -
Navigation injection is idempotent: Replacing
<!-- nav:inject -->with a script tag, then running again, does not duplicate the script tag (the comment is consumed).
Failure Modes
Codebase Visualization Flow
| Step | What Can Fail | Current Handling | Correctness |
|---|---|---|---|
| Phase 0: Explore | Subagents return incomplete findings (miss routes, types, etc.) | Multiple subagents with overlapping scope; manual review of findings before generation | Partial — depends on subagent thoroughness; no automated completeness check |
| Phase 1: SVG generation | Unescaped XML characters in DATA dict strings | xml.sax.saxutils.escape() applied to all text; xmllint validation catches survivors |
Correct — two layers of defense |
| Phase 1: SVG generation | Python script crashes due to missing DATA keys | KeyError traceback; slide not generated | Partial — fails loud but no graceful degradation |
| Phase 2: Interactive HTML | GSAP CDN unavailable (network down, CDN outage) | No fallback; animations silently fail; static layout still renders | Partial — page is usable but animations don't work |
| Phase 4: Validation | xmllint not installed on system | Command fails; user sees error | Correct — validation is explicit, not silent |
| Gallery index build | Project folder missing slide_01_eli5.svg | Folder is excluded from discovery (requires both slide_01 and index.html) | Correct — explicit inclusion criteria |
Debate Review Flow
| Step | What Can Fail | Current Handling | Correctness |
|---|---|---|---|
| Mode parsing | Unrecognized mode (not review/red-team/steelman) | SKILL.md defaults to review mode | Correct — safe default |
| Pass 1: Advocate | LLM doesn't steelman effectively (confirms bias instead) | Rapoport Rules enforce structure: restate, agree, learn, then critique | Partial — methodology helps but can't guarantee LLM follows it perfectly |
| Pass 2: Critic | False positive findings (Critic finds issues that don't exist) | Pass 3 Judge rejects Critic overreach as a dedicated category | Correct — built-in correction mechanism |
| Evidence matrix | Evidence supports both sides equally (non-diagnostic) | ACH methodology rates diagnostic value; low-diagnostic evidence is flagged | Correct — methodology handles this case explicitly |
Cascading Failures
- If settings.json is corrupted: Claude Code falls back to default permissions. All Bash commands require manual approval. Skills still work but every command prompts.
- If a template file is deleted: The corresponding skill fails at generation time. Other skills are unaffected (skill isolation). The error is visible in Claude's output.
- If CLAUDE.md is removed: Claude Code loses operational guidelines but skills still function. Commit format, task management, and verification practices degrade to defaults.
Architecture Strengths
- Zero external dependencies: No pip install, no npm install, no Docker required. Python uses only stdlib. HTML uses a single CDN script (GSAP). This eliminates version conflicts, supply chain risks, and installation friction.
-
Skill isolation: Each skill is a self-contained directory. Adding, removing, or updating one skill cannot break
another. This enables selective distribution via
make copy-claude-skill SKILL=X. - Template-driven idempotent generation: Output is deterministic given the same input data. Templates enforce visual consistency. Structural validation catches drift.
- Methodology grounding: Skills aren't ad-hoc prompts. They're grounded in established frameworks: ACH (intelligence analysis), Rapoport Rules (philosophy of argument), Bloom's Taxonomy (cognitive science), Hegelian dialectic (synthesis).
- Progressive disclosure: The 11-slide altitude descent (50,000 ft to ground) means any audience can engage at their level. Executives see slide 1-3; engineers dive to slide 8-11.
Architecture Risks
- LLM fidelity to methodology: Skills describe detailed methodologies, but the LLM may not follow them precisely on every invocation. There's no runtime enforcement — only the quality of the prompt matters.
- Template rigidity vs. customization: The strict template conformance rule means any visual customization requires modifying the canonical template, which affects all future projects. There's no per-project override mechanism.
- Single-CDN dependency for interactivity: If the GSAP CDN is unavailable, interactive walkthroughs lose all animation. A local fallback copy would eliminate this risk but adds a file to maintain.
- No automated testing: The project has no test suite. Validation relies on xmllint for SVGs and manual template conformance checks. A regression in the Python generator or HTML templates would only be caught at generation time.