Loading...
Evaluate design across visual hierarchy, UX quality, and AI anti-pattern detection.
Additional context needed: what the interface is trying to accomplish.
Before gathering assessments, do two small bookkeeping steps. They cost almost nothing and they're what makes critique iterative across runs.
Resolve the primary artifact. The user's phrasing ("the homepage", "the pricing flow") is not stable enough to track across runs. Resolve it to a concrete file path or URL: the same one you'd already need to scan code or open in a browser. Examples:
site/pages/index.astro (or http://localhost:3000/ if you're inspecting live)src/components/Settings.tsx)bun dev vs bun preview), file paths don't.Compute the slug. Run:
node .agents/skills/impeccable/scripts/critique-storage.mjs slug "<resolved-path-or-url>"
Keep the printed slug. It identifies this target's stream across runs. If the command exits non-zero ("no stable slug for input"), skip persistence for this run and tell the user; the trend won't update but the critique still goes ahead.
Read the ignore list at .impeccable/critique/ignore.md if it exists. Plain markdown; each non-empty, non-comment line is something the user has marked as "do not re-raise" (deferred tradeoffs, designer-intended deviations, detector false-positives the user accepts). When a finding's text matches a line here (case-insensitive substring against rule name or snippet), drop it silently. Do not mention it in the report. This is the ONLY input critique consumes from prior runs; anchoring on prior findings would defeat the point of independent assessment.
Launch two independent assessments. Neither may see the other's output. This isolation is what makes the combined score honest. Running both in one head silently anchors them to each other; do not shortcut it for cost, speed, or context-size reasons.
Delegate each assessment to a separate sub-agent (Claude Code's Agent tool, Codex's subagent spawning, etc.). Each returns structured findings as text. Do NOT output findings to the user yet.
Fall back to sequential in-head work only if the environment genuinely cannot spawn sub-agents.
Tab isolation: When browser automation is available, each assessment MUST create its own new tab. Never reuse an existing tab, even if one is already open at the correct URL. This prevents the two assessments from interfering with each other's page state.
Read the relevant source files (HTML, CSS, JS/TS) and, if browser automation is available, visually inspect the live page. Create a new tab for this; do not reuse existing tabs. After navigation, label the tab by setting the document title:
document.title = '[LLM] ' + document.title;
Think like a design director. Evaluate:
AI Slop Detection (CRITICAL): Does this look like every other AI-generated interface? Review against ALL DON'T guidelines from the parent impeccable skill (already loaded in this context). Check for AI color palette, gradient text, dark glows, glassmorphism, hero metric layouts, identical card grids, generic fonts, and all other tells. The test: If someone said "AI made this," would you believe them immediately?
Holistic Design Review: visual hierarchy (eye flow, primary action clarity), information architecture (structure, grouping, cognitive load), emotional resonance (does it match brand and audience?), discoverability (are interactive elements obvious?), composition (balance, whitespace, rhythm), typography (hierarchy, readability, font choices), color (purposeful use, cohesion, accessibility), states & edge cases (empty, loading, error, success), microcopy (clarity, tone, helpfulness).
Cognitive Load (consult cognitive-load):
Emotional Journey:
Nielsen's Heuristics (consult heuristics-scoring): Score each of the 10 heuristics 0-4. This scoring will be presented in the report.
Return structured findings covering: AI slop verdict, heuristic scores, cognitive load assessment, what's working (2-3 items), priority issues (3-5 with what/why/fix), minor observations, and provocative questions.
Run the bundled deterministic detector, which flags 27 specific patterns (AI slop tells + general design quality).
CLI scan:
npx impeccable detect --json [--fast] [target]
[target] (anything with markup). Do not pass CSS-only files.--fast (regex-only, skips jsdom)Browser visualization: required when browser automation tools are available AND the target is a viewable page. The [Human] overlay tab is the user-facing deliverable; the critique is incomplete without it. Skip only if the target is not a viewable page (CSS-only file, non-browser target).
The overlay is a visual aid for the user. It highlights issues directly in their browser. Do NOT scroll through the page to screenshot overlays. Instead, read the console output to get the results programmatically.
npx impeccable live &
Note the port printed to stdout (auto-assigned). Use --port=PORT to fix it.javascript_tool so the user can distinguish it:
document.title = '[Human] ' + document.title;
javascript_tool (replace PORT with the port from step 1):
const s = document.createElement('script'); s.src = 'http://localhost:PORT/detect.js'; document.head.appendChild(s);
read_console_messages with pattern impeccable. The detector logs all findings with the [impeccable] prefix. Do NOT scroll through the page to take screenshots of the overlays.npx impeccable live stop
For multi-view targets, inject on 3-5 representative pages. If injection fails, continue with CLI results only.
Return: CLI findings (JSON), browser console findings (if applicable), and any false positives noted.
Synthesize both assessments into a single report. Do NOT simply concatenate. Weave the findings together, noting where the LLM review and detector agree, where the detector caught issues the LLM missed, and where detector findings are false positives.
Structure your feedback as a design director would:
Consult heuristics-scoring
Present the Nielsen's 10 heuristics scores as a table:
| # | Heuristic | Score | Key Issue | |---|-----------|-------|-----------| | 1 | Visibility of System Status | ? | [specific finding or "n/a" if solid] | | 2 | Match System / Real World | ? | | | 3 | User Control and Freedom | ? | | | 4 | Consistency and Standards | ? | | | 5 | Error Prevention | ? | | | 6 | Recognition Rather Than Recall | ? | | | 7 | Flexibility and Efficiency | ? | | | 8 | Aesthetic and Minimalist Design | ? | | | 9 | Error Recovery | ? | | | 10 | Help and Documentation | ? | | | Total | | ??/40 | [Rating band] |
Be honest with scores. A 4 means genuinely excellent. Most real interfaces score 20-32.
Start here. Does this look AI-generated?
LLM assessment: Your own evaluation of AI slop tells. Cover overall aesthetic feel, layout sameness, generic composition, missed opportunities for personality.
Deterministic scan: Summarize what the automated detector found, with counts and file locations. Note any additional issues the detector caught that you missed, and flag any false positives.
Visual overlays (if browser was used): Tell the user that overlays are now visible in the [Human] tab in their browser, highlighting the detected issues. Summarize what the console output reported.
A brief gut reaction: what works, what doesn't, and the single biggest opportunity.
Highlight 2-3 things done well. Be specific about why they work.
The 3-5 most impactful design problems, ordered by importance.
For each issue, tag with P0-P3 severity (consult heuristics-scoring for severity definitions):
Consult personas
Auto-select 2-3 personas most relevant to this interface type (use the selection table in the reference). If AGENTS.md contains a ## Design Context section from impeccable teach, also generate 1-2 project-specific personas from the audience/brand info.
For each selected persona, walk through the primary user action and list specific red flags found:
Alex (Power User): No keyboard shortcuts detected. Form requires 8 clicks for primary action. Forced modal onboarding. High abandonment risk.
Jordan (First-Timer): Icon-only nav in sidebar. Technical jargon in error messages ("404 Not Found"). No visible help. Will abandon at step 2.
Be specific. Name the exact elements and interactions that fail each persona. Don't write generic persona descriptions; write what broke for them.
Quick notes on smaller issues worth addressing.
Provocative questions that might unlock better solutions:
Remember:
Once the report above is finalized, write it to .impeccable/critique/ so the user can refer back, and so $impeccable polish can pick up the priority issues without a copy-paste.
Skip this step if the Setup slug was null (vague or root-level target).
Write the body to a temp file so you can pipe it to the helper. Use the full report (heuristic table, anti-patterns verdict, priority issues, persona red flags) but stop before the "Ask the User" / "Recommended Actions" sections that come later.
Pass the structured metadata through IMPECCABLE_CRITIQUE_META (JSON), then run the write command:
IMPECCABLE_CRITIQUE_META='{"target":"<user phrasing>","total_score":<n>,"p0_count":<n>,"p1_count":<n>}' \
node .agents/skills/impeccable/scripts/critique-storage.mjs write <slug> <body-file>
The helper prints the absolute path it wrote.
Read the trend for context:
node .agents/skills/impeccable/scripts/critique-storage.mjs trend <slug> 5
This returns a JSON array of the last 5 frontmatter entries (including the one you just wrote).
Append a single line to the user-visible output, after the report and before the questions:
Trend for
<slug>(last 5 runs): 24 → 28 → 32 → 29 → 32 Wrote.impeccable/critique/<filename>.
If this is the first run for the slug, the trend is just one score; say so: "First run for this target, no trend yet."
This is fire-and-forget. Do not show the user the helper's JSON output; only the human-readable trend line and the written path. Failures here should not block the rest of the flow; print the error and move on.
After presenting findings, use targeted questions based on what was actually found. STOP and use Codex's structured user-input/question tool when available; if unavailable, ask directly in chat to clarify what you cannot infer. These answers will shape the action plan.
Ask questions along these lines (adapt to the specific findings; do NOT ask generic questions):
Priority direction: Based on the issues found, ask which category matters most to the user right now. For example: "I found problems with visual hierarchy, color usage, and information overload. Which area should we tackle first?" Offer the top 2-3 issue categories as options.
Design intent: If the critique found a tonal mismatch, ask whether it was intentional. For example: "The interface feels clinical and corporate. Is that the intended tone, or should it feel warmer/bolder/more playful?" Offer 2-3 tonal directions as options based on what would fix the issues found.
Scope: Ask how much the user wants to take on. For example: "I found N issues. Want to address everything, or focus on the top 3?" Offer scope options like "Top 3 only", "All issues", "Critical issues only".
Constraints (optional; only ask if relevant): If the findings touch many areas, ask if anything is off-limits. For example: "Should any sections stay as-is?" This prevents the plan from touching things the user considers done.
Rules for questions:
After receiving the user's answers, present a prioritized action summary reflecting the user's priorities and scope from Ask the User.
List recommended commands in priority order, based on the user's answers:
$command-name: Brief description of what to fix (specific context from critique findings)$command-name: Brief description (specific context)
...Rules for recommendations:
$impeccable polish as the final step if any fixes were recommendedAfter presenting the summary, tell the user:
You can ask me to run these one at a time, all at once, or in any order you prefer.
Re-run
$impeccable critiqueafter fixes to see your score improve.
Write conversion-focused marketing copy for homepages, landing pages, and pricing pages.
SEO and AEO optimization for Google, ChatGPT, and Perplexity with structured data and EEAT.
Comprehensive SEO audit covering crawlability, on-page optimization, and content quality.
Changelog coming soon.