Agent Deep-Dive Analysis

Design Decisions

1. Skill-Per-Directory Architecture

What it is: Each skill is a self-contained directory with a SKILL.md frontmatter file, optional references/ subdirectory for methodology docs, and optional templates/ for output generation.

Where it appears: .claude/skills/debate/, .claude/skills/jit-learning/, .claude/skills/visualize-codebase/, etc.

What problem it solves: Skills can be installed, updated, or removed independently. A user can copy just debate/ without pulling in visualization templates. Claude Code auto-discovers skills from the directory structure.

What would break without it: All skill logic would live in a single monolithic instruction file. Updates to one skill would risk regressions in others. Users couldn't selectively install skills.

Alternatives: A single SKILLS.md file with all skills concatenated was the original approach. It was abandoned because (a) context window bloat on every invocation and (b) no way to selectively distribute individual skills via make copy-claude-skill SKILL=X.

2. Template-Driven Generation with Structural Fidelity

What it is: All generated HTML files must match a canonical template skeleton exactly. Only project-specific content (data arrays, text) may differ. CSS, DOM structure, class names, and navigation placeholders must be verbatim copies.

Where it appears: visualize-codebase/templates/index.html, interactive.html, deep-dive.html. Phase 4 validation explicitly diffs structural skeletons.

What problem it solves: Prevents CSS/DOM drift across regenerations. When the visualize-index skill injects shared navigation via <!-- nav:inject -->, it can rely on the placeholder being in the exact right position. Ensures visual consistency across all projects in a gallery.

What would break without it: Each regeneration could introduce subtle layout differences. The gallery navigation injection would fail if the comment placeholder was missing or moved. Cross-project styling would diverge.

Alternatives: Using a templating engine (Jinja2, Handlebars) was considered but rejected because it would add a dependency. The current approach keeps templates as plain HTML files that Claude copies and fills in.

3. Data-Driven Python SVG Generator

What it is: A single Python script (generate_visuals.py, ~800 lines) with two sections: (1) shared helper functions and renderers that never change, and (2) a DATA dict that gets replaced with project-specific content.

Where it appears: visualize-codebase/templates/generate_visuals.py

What problem it solves: Zero pip dependencies. Deterministic SVG output. The LLM only needs to fill in a data structure, not write rendering code. All XML escaping is handled by the shared helpers.

What would break without it: Each visualization would require hand-crafted SVG strings with ad-hoc escaping, leading to XML parsing failures from unescaped & characters. Output would be inconsistent across projects.

Alternatives: SVG libraries (svgwrite, drawSvg) add pip dependencies. D3.js-based generation requires Node.js. Browser-rendered SVG (via headless Chrome) adds infrastructure complexity. The stdlib-only approach is the simplest that works.

4. Three-Pass Adversarial Review (Advocate/Critic/Judge)

What it is: The debate review skill enforces three distinct analytical passes with different objectives. The Advocate steelmans first (Rapoport Rules), the Critic challenges assumptions (ACH methodology), and the Judge synthesizes a verdict.

Where it appears: .claude/skills/debate/SKILL.md and references/review.md

What problem it solves: Prevents the common failure mode of jumping straight to criticism without understanding. Forces the reviewer to demonstrate comprehension before challenging. The Judge pass prevents both false positives (Critic overreach) and false negatives (Advocate bias).

What would break without it: Single-pass reviews tend to be either too generous (missing real issues) or too harsh (nitpicking without context). The three-pass structure produces balanced, actionable findings.

Alternatives: Single-pass review (standard code review), two-pass (pro/con only), or unstructured adversarial analysis. The three-pass approach was chosen because it maps to established intelligence analysis tradecraft (ACH) and philosophical methodology (Hegelian dialectic).

5. Permission Model: Deny-by-Default with Explicit Allow List

What it is: settings.json explicitly lists every Bash command pattern that Claude is allowed to run, plus a deny list for destructive operations.

Where it appears: .claude/settings.json

What problem it solves: Prevents accidental destructive operations like rm -rf, git push, or helm install. The allow list ensures only read-only and generation commands run without user confirmation.

What would break without it: Claude could accidentally push code, delete files, or modify cluster state. The deny list for git push is especially important because skills like /commit stage and commit but should never push without explicit user approval.

Alternatives: Relying on Claude's built-in safety checks alone, or using a broader allow list. The explicit approach was chosen because the cost of a false positive (blocking a safe command) is low, while the cost of a false negative (running a destructive command) is high.


Invariants

Skill Structure Constraints

Template Fidelity Constraints

SVG Generation Constraints

Methodology Constraints

Idempotency


Failure Modes

Codebase Visualization Flow

Step What Can Fail Current Handling Correctness
Phase 0: Explore Subagents return incomplete findings (miss routes, types, etc.) Multiple subagents with overlapping scope; manual review of findings before generation Partial — depends on subagent thoroughness; no automated completeness check
Phase 1: SVG generation Unescaped XML characters in DATA dict strings xml.sax.saxutils.escape() applied to all text; xmllint validation catches survivors Correct — two layers of defense
Phase 1: SVG generation Python script crashes due to missing DATA keys KeyError traceback; slide not generated Partial — fails loud but no graceful degradation
Phase 2: Interactive HTML GSAP CDN unavailable (network down, CDN outage) No fallback; animations silently fail; static layout still renders Partial — page is usable but animations don't work
Phase 4: Validation xmllint not installed on system Command fails; user sees error Correct — validation is explicit, not silent
Gallery index build Project folder missing slide_01_eli5.svg Folder is excluded from discovery (requires both slide_01 and index.html) Correct — explicit inclusion criteria

Debate Review Flow

Step What Can Fail Current Handling Correctness
Mode parsing Unrecognized mode (not review/red-team/steelman) SKILL.md defaults to review mode Correct — safe default
Pass 1: Advocate LLM doesn't steelman effectively (confirms bias instead) Rapoport Rules enforce structure: restate, agree, learn, then critique Partial — methodology helps but can't guarantee LLM follows it perfectly
Pass 2: Critic False positive findings (Critic finds issues that don't exist) Pass 3 Judge rejects Critic overreach as a dedicated category Correct — built-in correction mechanism
Evidence matrix Evidence supports both sides equally (non-diagnostic) ACH methodology rates diagnostic value; low-diagnostic evidence is flagged Correct — methodology handles this case explicitly

Cascading Failures


Architecture Strengths

  1. Zero external dependencies: No pip install, no npm install, no Docker required. Python uses only stdlib. HTML uses a single CDN script (GSAP). This eliminates version conflicts, supply chain risks, and installation friction.
  2. Skill isolation: Each skill is a self-contained directory. Adding, removing, or updating one skill cannot break another. This enables selective distribution via make copy-claude-skill SKILL=X.
  3. Template-driven idempotent generation: Output is deterministic given the same input data. Templates enforce visual consistency. Structural validation catches drift.
  4. Methodology grounding: Skills aren't ad-hoc prompts. They're grounded in established frameworks: ACH (intelligence analysis), Rapoport Rules (philosophy of argument), Bloom's Taxonomy (cognitive science), Hegelian dialectic (synthesis).
  5. Progressive disclosure: The 11-slide altitude descent (50,000 ft to ground) means any audience can engage at their level. Executives see slide 1-3; engineers dive to slide 8-11.

Architecture Risks

  1. LLM fidelity to methodology: Skills describe detailed methodologies, but the LLM may not follow them precisely on every invocation. There's no runtime enforcement — only the quality of the prompt matters.
  2. Template rigidity vs. customization: The strict template conformance rule means any visual customization requires modifying the canonical template, which affects all future projects. There's no per-project override mechanism.
  3. Single-CDN dependency for interactivity: If the GSAP CDN is unavailable, interactive walkthroughs lose all animation. A local fallback copy would eliminate this risk but adds a file to maintain.
  4. No automated testing: The project has no test suite. Validation relies on xmllint for SVGs and manual template conformance checks. A regression in the Python generator or HTML templates would only be caught at generation time.