Stop Putting Best Practices in Skills

Vercel demonstrated that AGENTS.md outperforms skills in their agent evals. AGENTS.md hit 100% pass rate on general framework knowledge. Skills with explicit instructions reached 79%. In 56% of cases, the agent had access to a skill but never invoked it. Their conclusion: skills work for vertical, action-specific workflows, not for general best practices.

Their evals were single-shot, though. One prompt, one response, done. Skills depend on context to be called. The model sees a name and a one-line description and has to decide in a single cold shot whether to invoke. In a real session, you go back and forth, context accumulates, the model picks up patterns. Single-shot penalizes skills by testing them in conditions nobody actually uses them in.

Then there’s Superpowers. People install it, and Claude Code starts following TDD, writing plans before coding, and debugging systematically. It bundles best practices as skills and people swear by it. If skills are supposed to lose to AGENTS.md, why does Superpowers work so well?

I ran 51 multi-turn evals across 4 configurations, replicated Vercel’s experiment in realistic multi-turn sessions, and read Claude Code’s source to understand the mechanics. Skills and CLAUDE.md are both just prompts. Same markdown, same model. The only difference is whether the prompt reaches the model. CLAUDE.md reaches it every time. Skills depend on a chain of decisions that fails 34-94% of the time. And Superpowers works not because of skills, but because its hook bypasses the skill system entirely, approximating what CLAUDE.md does natively.

TL;DR: Skills and CLAUDE.md are both just prompt. When skills get invoked, they work just as well. The problem is they only get invoked 6-66% of the time. CLAUDE.md is always in context. Put guidelines in CLAUDE.md, use skills for on-demand recipes.

Contents:

How skills actually work in Claude Code
The activation gap
The multi-turn eval
Results
Why this happens
Skills are recipes, CLAUDE.md is the health code
Full turn-by-turn results
Methodology
What to do with this

How skills actually work in Claude Code

Before the data, how skills work under the hood. I first figured this out by reading OpenCode’s source, then confirmed it against Claude Code’s leaked source. I reference file paths from that codebase throughout this section. Search GitHub, you’ll find them.

Discovery: name and description

When Claude Code starts a session, it scans for skills across three levels (src/skills/loadSkillsDir.ts):

Managed – /etc/claude-code/.claude/skills/ (org-wide)
User – ~/.claude/skills/ (your personal skills)
Project – .claude/skills/ (checked into the repo)

For each skill directory, it reads the SKILL.md frontmatter – name, description, and optional fields like context, allowed-tools, arguments. The full markdown body stays on disk.

At session init, only name and description reach the model. formatCommandDescription() in src/tools/SkillTool/prompt.ts produces one line per skill: - {name}: {description}. The listing is built by getSkillListingAttachments() in src/utils/attachments.ts, which formats these lines within a token budget (1% of the context window, max 250 chars per description) and sends them as a <system-reminder> user message:

The following skills are available for use with the Skill tool:

- test-driven-development: Use when implementing any feature or bugfix
- systematic-debugging: Use when encountering any bug or test failure

Two short strings. The model knows skills exist, but it hasn't read them. This is the bottleneck. The model sees what is available, not how to apply it. It has to decide, from a name and a sentence, whether to spend a tool call loading the full content. In my evals, it almost never does.

Invocation: full content on demand

When the model decides a skill is relevant, it calls the Skill tool (src/tools/SkillTool/SkillTool.ts):

{ "tool": "Skill", "input": { "skill": "test-driven-development" } }

The SkillTool loads the full SKILL.md content via getPromptForCommand(), substitutes variables like $ARGUMENTS and ${CLAUDE_SKILL_DIR}, and injects it into the conversation. Two execution modes:

Inline (default) – content goes directly into the current conversation as a user message. The model reads the instructions and follows them in the same context.
Fork (context: fork in frontmatter) – spawns a sub-agent via executeForkedSkill() with isolated context and its own token budget. The result comes back without bloating the parent conversation.

Same mechanism that lets the model read files or run bash commands. It asks, the runtime reads a markdown file, the content comes back.

CLAUDE.md: always in context

CLAUDE.md takes a different path. At session start, Claude Code walks from your working directory up to root, collecting every CLAUDE.md, .claude/CLAUDE.md, and .claude/rules/*.md it finds (src/utils/claudemd.ts). It supports @include directives – one file pulling in others, up to 5 levels deep.

All of this feeds into getUserContext() in src/context.ts, which gets prepended to the conversation as a <system-reminder> user message before the model sees anything else (src/utils/api.ts).

Content	When it loads	How it loads
`CLAUDE.md`	Every session, automatically	Prepended as first user message
Skill listings	Every session, automatically	Name + description only
Skill content	On demand, when model calls Skill tool	Full markdown injected into conversation

CLAUDE.md is always there. Skill content waits for the model to decide it’s relevant and call the tool.

What Superpowers actually does

Superpowers registers a SessionStart hook. When a session begins, the hook runs a shell script that reads the using-superpowers skill from disk and outputs it as additionalContext in the hook response. Claude Code injects that into the conversation as a <system-reminder> message.

The content is aggressive. The skill wraps instructions in <EXTREMELY_IMPORTANT> tags:

IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.

This is not negotiable. This is not optional. You cannot rationalize your way out of this.

It even includes a "Red Flags" table listing thoughts the model might have for skipping skills ("This is just a simple question," "I need more context first," "The skill is overkill") and labels each one as rationalization.

So Superpowers doesn't wait for the model to discover skills. It front-loads instructions into every session via a hook, telling the model to invoke skills before doing anything else. This is basically the same idea as having a CLAUDE.md with a hint ("invoke the relevant skill before coding"), just louder. Better than plain skills, but still not the same as having the actual guidelines in CLAUDE.md from the start. The model still has to invoke the skill, read the content, and follow it. Three steps that can fail. CLAUDE.md skips all three.

The activation gap

I call this the activation gap. The distance between "skill is installed" and "model actually uses the skill."

I ran single-shot evals first to confirm Vercel's numbers. 31 tasks across react and next.js (react-best-practices-eval, nextjs-agents-md-eval) and a 10-task Superpowers benchmark (superpowers-eval). Similar results. Vanilla skills: 0% invocation. The model never opens the drawer on its own. AGENTS.md: 76-90% pass rate.

But as I mentioned, single-shot isn't how people work. So I built a multi-turn eval suite.

The multi-turn eval

5 scenarios, 3-4 turns each. Plain Node.js/TypeScript so framework knowledge isn't a confounding variable. The prompts are the kind of thing you'd actually type.

Scenario 1: TDD – Email Validator (4 turns)

Turn	Prompt	Expected workflow
1	"Build a function that validates email addresses. It should handle basic formats like user@domain.com and reject obviously invalid ones like missing @ or empty strings."	TDD: write tests first
2	"Now add support for international emails – addresses with unicode characters in the local part and IDN domains like user@münchen.de."	TDD: extend tests first
3	"I found a bug – plus aliases like user+tag@gmail.com are being rejected. Fix it."	Debug: reproduce with failing test
4	"Refactor to separate the parsing logic from the validation logic."	Refactor: ensure tests pass after

Scenario 2: Debugging – Broken LRU Cache (3 turns)

Starts with a buggy LRU cache implementation (eviction check uses >= instead of >, causing items to "disappear").

Turn	Prompt	Expected workflow
1	"This LRU cache is broken – items seem to disappear even when the cache isn't full. Can you figure out what's wrong and fix it?"	Debug: reproduce, find root cause
2	"It works now but it's really slow when the cache size is large – like 10000 entries. Can you improve the performance?"	Debug: reason about complexity
3	"Add tests to make sure these bugs don't come back."	TDD: write regression tests

Scenario 3: Planning – Rate Limiter (3 turns)

Turn	Prompt	Expected workflow
1	"I need a rate limiter for an API. Limit each client to 100 requests per minute. Give me a plan before coding."	Plan: present approach first
2	"Actually, fixed window won't work for my use case – requests cluster at window boundaries and burst through. I need sliding window instead."	Plan: revise, explain trade-offs
3	"Implement it and add tests."	TDD: write tests, then implement

Scenario 4: Refactoring – Express Middleware (4 turns)

Starts with a 160-line monolithic middleware handling auth, logging, rate limiting, and error handling.

Turn	Prompt	Expected workflow
1	"This middleware file is 300 lines and handles auth, logging, rate limiting, and error handling all in one. Help me understand what it does."	Analysis: read and explain
2	"Split it into separate, focused middleware files."	Refactor: restructure safely
3	"The auth middleware broke after the split – requests that should require auth are passing through without a token."	Debug: reproduce, identify regression
4	"Add tests for each middleware so we catch this kind of thing."	TDD: write isolated tests

Scenario 5: Mixed – HTTP Client Retry (3 turns)

Starts with a basic HTTP client without retry logic.

Turn	Prompt	Expected workflow
1	"Add retry with exponential backoff to this HTTP client. It should retry on 5xx errors and network failures, up to 3 retries."	TDD or plan first
2	"It's retrying on 400 Bad Request errors too. That's wrong – 4xx should fail immediately without retrying."	Debug: identify status code bug
3	"Add tests covering the retry logic – success on first try, retry on 5xx, no retry on 4xx, max retries exceeded."	TDD: comprehensive test suite

Each scenario crosses workflow boundaries. TDD leads to debugging, debugging ends with tests, planning leads to implementation. This is where skills should shine, since they have dedicated workflows for each phase.

Four configurations:

Config	What the model gets
Superpowers	SessionStart hook + skills (the real plugin experience)
Plain skills	Same skills installed, no hook, no hint
CLAUDE.md	Equivalent guidelines written as static rules, always in context
CLAUDE.md + hint	One-liner in CLAUDE.md saying "invoke the relevant skill before coding" + skills installed

Same model (claude-opus-4-6), same tasks, same workspace setup. All runs executed in a clean environment with ~/.claude/plugins, ~/.claude/skills, ~/.claude/settings.json, and ~/.claude/CLAUDE.md temporarily disabled. Only the Superpowers config had plugins restored (it needs them for the hook). Each turn was capped at 15 agentic turns.

Results

Skill invocations

Config	Invocations	Rate
Superpowers (hook)	10/15	66%
CLAUDE.md + hint	5/15	33%
Plain skills	1/15	6%
CLAUDE.md (guidelines)	n/a	n/a

8 of 68 total turns hit the 15 max-turns limit. That just means the model ran out of agentic steps before finishing, not that it wasn't doing useful work. In most MT turns, the model was actively writing tests and implementation, it just needed more steps to complete. Skill invocations on those turns are valid (they happened before the cutoff).

Multi-turn helps Superpowers a lot. From 10% in single-shot to 66% here. The hook fires at session start, and across turns the model builds momentum. Once it invokes TDD on turn 1, it knows the skill exists and reaches for debugging when the task shifts on turn 3.

The CLAUDE.md hint works, but only in a clean environment. This was the Vercel-style config. In my earlier run with global plugins contaminating things, it scored 6% (1/16, wrong skill). Clean run: 33% (5/15, correct skills). The hint is sensitive to noise. Competing global skills and plugins dilute its effect.

Plain skills got one spontaneous invocation out of 15 turns. The model invoked systematic-debugging unprompted on scenario 04, turn 3, after two turns of conversation context. So multi-turn can trigger invocation without a hook, but it's rare.

Clean environment matters more than I expected. Every config did better in the clean run. The earlier local runs (with global plugins present but renamed) showed Superpowers at 41%, CLAUDE.md+hint at 6%, plain skills at 0%. Clean run: 66%, 33%, 6%. Global plugins and skills create noise that suppresses skill invocation.

When Superpowers invokes skills

The pattern is consistent across all runs (local, Docker, and clean):

Scenario	Turn 1	Turn 2	Turn 3	Turn 4
01 email	TDD	–	debugging	–
02 LRU cache	debugging	–	TDD
03 rate limiter	brainstorming	–	TDD
04 middleware	–	–	debugging	TDD
05 HTTP retry	brainstorming	–	verification

Skills fire at transitions, when the workflow changes (coding to debugging, debugging to testing). On continuation turns the model doesn't re-invoke. It keeps the momentum from the previous invocation. Which makes sense. You don't re-read the TDD manual every time you write a new test.

TDD compliance

Skill invocations are one metric. Did the agent actually follow the workflow? I checked whether test files were written before implementation files on the key TDD turns.

Scenario	Superpowers	Plain Skills	CLAUDE.md	CLAUDE.md + hint
01 email t1	test first	impl first	test first	test first
02 LRU t1	test first	test first	test first	test first
03 rate limiter t3	test first	impl first	test first `MT`	impl first
05 HTTP retry	test first (t2)	test only (t3)	test first (t1)	test first (t1)

Here's the thing: Superpowers and CLAUDE.md are basically tied. Both wrote tests first on 4 out of 4 measured scenarios. CLAUDE.md + hint got 3/4. Plain skills got 1/4.

Having guidelines in CLAUDE.md wasn't necessarily better at making the model follow TDD. When Superpowers fires, the workflow quality is just as good. They're all prompt. Same markdown, same instructions, same model. The only difference is whether the prompt reaches the model. CLAUDE.md reaches it every time. Superpowers reaches it 66% of the time.

The interesting case is scenario 04 (refactor middleware, turn 2). No config wrote tests before refactoring. They all jumped straight to splitting the middleware into files. The "write tests before restructuring" guideline needs to be stronger, regardless of delivery mechanism.

Why this happens

Both CLAUDE.md and skill listings arrive through the same channel: <system-reminder> wrapped user messages. No architectural trust difference. The difference is just presence.

CLAUDE.md content is always in the context window. Every turn, every decision, the guidelines are right there. Skill content requires the model to read the name+description listing, decide the skill is relevant, call the Skill tool, wait for the content, then follow it. Each step can fail.

So the activation gap isn't a quality problem. It's a reliability problem. When skills get invoked, they work. They just don't always get invoked. Superpowers gets to 66% in clean multi-turn. The CLAUDE.md hint gets 33%. Neither reaches 100%.

CLAUDE.md gets 100% presence. No invocation needed.

Skills are recipes, CLAUDE.md is the health code

Think of it like a kitchen.

CLAUDE.md is the health code. Wash your hands, sanitize surfaces, check temperatures. Every cook follows these rules on every shift. They're non-negotiable and always visible, posted on the wall. You don't wait for someone to ask "should I wash my hands before touching food?" It's the baseline.

Skills are recipes. You pull the recipe for bouillabaisse when someone orders bouillabaisse. You don't tape every recipe to the wall next to the health code. That's noise. Recipes have their moment. The health code is constant.

Superpowers tries to turn recipes into the health code by having a hook shout "READ THE RECIPES" at the start of every shift. It works most of the time. But you could just put the important rules on the wall.

CLAUDE.md is for guidelines. Conventions, coding standards, workflow rules, TDD processes, debugging protocols. Anything the agent must follow every session. "Write tests before implementation" is a health code rule. It goes in CLAUDE.md.

Skills are for recipes. Specific, on-demand procedures you invoke when the moment calls for it. "Generate a database migration," "scaffold a component," "run the release checklist." These don't need to be in context all the time. They need to be there when you ask for them. Use context: fork for heavy recipes that would bloat the main context.

Hooks are for automation, not instruction delivery. Pre-commit validation, linting, notifications. If you're using a hook to inject guidelines (like Superpowers does), it works at 66% in clean multi-turn, but CLAUDE.md would do the same job at 100% with zero activation gap.

Mechanism	Presence	Invocation needed	Clean multi-turn rate
`CLAUDE.md` (health code)	100%	No	n/a, always there
Superpowers (hook + recipes)	Hook: 100%, Content: 66%	Yes	66%
CLAUDE.md + hint + skills	100% (hint), 33% (content)	Yes	33%
Plain skills (recipes on shelf)	Listing only	Yes	6%

General guidelines don't belong in skills. Skills are not how you say "always do X." They're how you say "when you need to do Y, here's how."

Full turn-by-turn results

Every turn, every config. MT marks turns that hit the 15 max-turns limit.

Superpowers (hook + skills) – 10/15 invocations (66%)

Scenario	Turn	Skill invoked	First file written
01 email	t1 (tdd)	`test-driven-development`	validateEmail.test.ts (test first)
01 email	t2 (tdd)	–	– (used Edit)
01 email	t3 (debug)	`systematic-debugging`	–
01 email	t4 (refactor)	–	validateEmail.ts
02 LRU	t1 (debug)	`systematic-debugging`	lru-cache.test.ts (test first)
02 LRU	t2 (debug)	–	lru-cache.ts
02 LRU	t3 (tdd)	`test-driven-development`	–
03 rate limiter	t1 (plan)	`brainstorming`	– (planning, no code)
03 rate limiter	t2 (plan)	–	–
03 rate limiter	t3 (tdd)	`test-driven-development`	rate-limiter.test.ts (test first) `MT`
04 middleware	t1 (analysis)	–	– (reading code)
04 middleware	t2 (refactor)	–	logging.ts (impl first)
04 middleware	t3 (debug)	`systematic-debugging`	middleware.test.ts
04 middleware	t4 (tdd)	`test-driven-development`	logging.test.ts (test first) `MT`
05 HTTP retry	t1 (tdd)	`brainstorming`	–
05 HTTP retry	t2 (debug)	–	http-client.test.ts (test first)
05 HTTP retry	t3 (tdd)	`verification-before-completion`	–

Plain skills (no hook, no hint) – 1/15 invocations (6%)

Scenario	Turn	Skill invoked	First file written
01 email	t1 (tdd)	–	validateEmail.ts (impl first)
01 email	t2 (tdd)	–	–
01 email	t3 (debug)	–	–
01 email	t4 (refactor)	–	validateEmail.ts
02 LRU	t1 (debug)	–	lru-cache.test.ts (test first)
02 LRU	t2 (debug)	–	lru-cache.ts
02 LRU	t3 (tdd)	–	–
03 rate limiter	t1 (plan)	–	scalable-bubbling-lagoon.md
03 rate limiter	t2 (plan)	–	types.ts (impl first) `MT`
03 rate limiter	t3 (tdd)	–	–
04 middleware	t1 (analysis)	–	–
04 middleware	t2 (refactor)	–	logging.ts (impl first)
04 middleware	t3 (debug)	`systematic-debugging`	middleware.test.ts
04 middleware	t4 (tdd)	–	logging.test.ts
05 HTTP retry	t1 (tdd)	–	–
05 HTTP retry	t2 (debug)	–	–
05 HTTP retry	t3 (tdd)	–	http-client.test.ts

CLAUDE.md (guidelines, no skills) – 0/14 invocations (n/a)

Scenario	Turn	First file written
01 email	t1 (tdd)	validateEmail.test.ts (test first)
01 email	t2 (tdd)	–
01 email	t3 (debug)	–
01 email	t4 (refactor)	–
02 LRU	t1 (debug)	lru-cache.test.ts (test first)
02 LRU	t2 (debug)	bench.ts
02 LRU	t3 (tdd)	–
03 rate limiter	t1 (plan)	–
03 rate limiter	t2 (plan)	–
03 rate limiter	t3 (tdd)	rate-limiter.test.ts (test first) `MT`
04 middleware	t1 (analysis)	–
04 middleware	t2 (refactor)	tests first (4 test files before impl) `MT`
04 middleware	t3 (debug)	–
04 middleware	t4 (tdd)	–
05 HTTP retry	t1 (tdd)	http-client.test.ts (test first)
05 HTTP retry	t2 (debug)	–
05 HTTP retry	t3 (tdd)	–

CLAUDE.md + hint (skills installed) – 5/15 invocations (33%)

Scenario	Turn	Skill invoked	First file written
01 email	t1 (tdd)	`test-driven-development`	validateEmail.test.ts (test first)
01 email	t2 (tdd)	–	–
01 email	t3 (debug)	–	–
01 email	t4 (refactor)	–	validateEmail.ts
02 LRU	t1 (debug)	`systematic-debugging`	lru-cache.test.ts (test first)
02 LRU	t2 (debug)	–	lru-cache.ts
02 LRU	t3 (tdd)	–	–
03 rate limiter	t1 (plan)	–	snazzy-juggling-glacier.md
03 rate limiter	t2 (plan)	–	rate-limiter.ts (impl first)
03 rate limiter	t3 (tdd)	–	types.ts (impl first) `MT`
04 middleware	t1 (analysis)	–	–
04 middleware	t2 (refactor)	–	–
04 middleware	t3 (debug)	`systematic-debugging`	–
04 middleware	t4 (tdd)	`test-driven-development`	logging.test.ts (test first) `MT`
05 HTTP retry	t1 (tdd)	`test-driven-development`	http-client.test.ts (test first) `MT`
05 HTTP retry	t2 (debug)	–	–
05 HTTP retry	t3 (tdd)	–	–

Methodology

Execution

Each scenario runs as a multi-turn claude -p session:

# Turn 1: fresh session
claude -p --model claude-opus-4-6 --output-format stream-json \
  --verbose --dangerously-skip-permissions --max-turns 15 \
  "$PROMPT" > turn-1.jsonl

# Extract session ID
sid=$(grep -o '"session_id":"[^"]*"' turn-1.jsonl | head -1 | cut -d'"' -f4)

# Turn 2+: resume same session
claude -p --model claude-opus-4-6 --output-format stream-json \
  --verbose --dangerously-skip-permissions --max-turns 15 \
  --resume "$sid" "$PROMPT" > turn-2.jsonl

Environment isolation

All runs disabled user-level configuration to prevent contamination:

# Disabled at start, restored on exit (trap)
~/.claude/plugins      -> ~/.claude/plugins.eval-disabled
~/.claude/skills       -> ~/.claude/skills.eval-disabled
~/.claude/settings.json -> ~/.claude/settings.json.eval-disabled
~/.claude/CLAUDE.md    -> ~/.claude/CLAUDE.md.eval-disabled

Only the Superpowers config re-enabled ~/.claude/plugins (the hook + skills come from the plugin). OAuth auth stays in the macOS keychain, unaffected by the rename.

Config setup per workspace

Each scenario gets a fresh /tmp workspace with package.json, tsconfig.json, seed files (if any), and npm install. Then:

Superpowers: plugin provides the SessionStart hook + skills via .claude/settings.json
Plain skills: skills copied into workspace .claude/skills/, no hook
CLAUDE.md: CLAUDE.md with equivalent TDD/debugging/planning guidelines
CLAUDE.md + hint: CLAUDE.md with "Before writing code, first explore the project structure, then invoke the relevant skill for the task at hand." + skills copied into workspace .claude/skills/

Max turns

Each turn was capped at 15 agentic steps (--max-turns 15). 8 of 68 turns hit this limit. The affected turns are marked with MT in the results tables. In most MT turns the model was actively writing tests and code, it just needed more steps to finish. The || true flag prevents truncation from killing the runner script.

Measurement

Skill invocations are extracted from stream-json transcripts by searching for "name":"Skill" in assistant messages. TDD compliance is measured by the order of Write tool calls – whether test files (.test.ts, .spec.ts) appear before implementation files.

Reproducibility

A Dockerfile is included for fully isolated runs (requires ANTHROPIC_API_KEY):

docker build -t multiturn-eval .
docker run -d -e ANTHROPIC_API_KEY=$KEY \
  -v ./results:/home/evaluser/eval/results \
  multiturn-eval

I validated the eval across three environments: local with plugin rename, Docker with zero user config, and local with full config disabled. Superpowers invocation patterns were identical across all three.

The repos are open if you want to reproduce or poke at the data:

react-best-practices-eval – 10 single-shot React tasks
nextjs-agents-md-eval – 21 single-shot Next.js tasks
superpowers-eval – Superpowers invocation benchmark
multiturn-eval – 20 multi-turn scenarios across 4 configs (this post)

What to do with this

If you're setting up Claude Code for a project:

TDD, debugging protocols, code style, naming conventions go in CLAUDE.md. These are rules you want followed every session. No invocation, no activation gap, 100% presence.
"Scaffold a service," "generate a migration," "run the release checklist" go in skills. These are procedures you call when you need them. Use context: fork if they're heavy.
If you need CLAUDE.md to reference extra documentation, make it an index. Point to files. Same pattern Claude Code uses for its own memory: a root file that links to specifics.
If you're using Superpowers and it works for you, keep using it. Now you know why it works (the hook) and where it drops off (34% of turns in multi-turn, more in single-shot). Moving your guidelines to CLAUDE.md would close that gap.

Skills are not broken. They're just not for guidelines.

We want to work with you. Check out our Services page!

Stop Putting Best Practices in Skills

How skills actually work in Claude Code

Discovery: name and description

Invocation: full content on demand

CLAUDE.md: always in context

What Superpowers actually does

The activation gap

The multi-turn eval

Results

Skill invocations

When Superpowers invokes skills

TDD compliance

Why this happens

Skills are recipes, CLAUDE.md is the health code

Full turn-by-turn results

Methodology

Execution

Environment isolation

Config setup per workspace

Max turns

Measurement

Reproducibility

What to do with this

Related

Edy Silva

How skills actually work in Claude Code

Discovery: name and description

Invocation: full content on demand

CLAUDE.md: always in context

What Superpowers actually does

The activation gap

The multi-turn eval

Results

Skill invocations

When Superpowers invokes skills

TDD compliance

Why this happens

Skills are recipes, CLAUDE.md is the health code

Full turn-by-turn results

Methodology

Execution

Environment isolation

Config setup per workspace

Max turns

Measurement

Reproducibility

What to do with this

Related

Edy Silva

You might also like

Design Systems: The True Source of Frontend PeaceHow Design Systems can save you

How Design Systems can save you

A stylish guide to styled components #1

CodeTips#3: How Specificity works in CSS

Design Systems: The True Source of Frontend Peace
How Design Systems can save you