Back to Thoughts & Insights
June 6, 2026About 14 min read#security#ai

Turning Your AI Into an Adversarial Security Agent: The SKILLS.md Framework

A continuation of: Breaking to Build: How CTF and Bug Bounty Hunting Rewires System Design

In my previous article, I explored how offensive security permanently changes the way engineers think about systems. Once you've spent enough time exploiting race conditions, bypassing authorization boundaries, abusing SSRF chains, and breaking assumptions hidden deep inside application logic, you stop viewing software as a collection of features.

You start viewing it as an attack surface.

That shift fundamentally changes how you design production systems. The problem is that modern software development is no longer purely human-driven. Today, a massive percentage of engineering work happens alongside AI coding assistants. Tools now generate thousands of lines of code faster than most engineers can review them.

And that introduces a brand new problem.

  • AI systems are optimized for one thing: Generate code that works.

  • Attackers are optimized for something completely different: Find code that breaks.

That difference matters. A generated API endpoint might pass every functional test while still exposing a devastating BOLA (Broken Object Level Authorization) vulnerability. A generated webhook handler might function perfectly while allowing SSRF into your internal infrastructure. A generated payment workflow might appear correct while collapsing into a double-spend condition under concurrent execution.

The code works. The architecture fails. And that is exactly where real-world vulnerabilities are born.

The Missing Layer in AI-Assisted Development

Most teams currently treat AI coding agents like extremely fast junior engineers. They give them instructions like:

  • "Build this feature"

  • "Refactor this service"

  • "Create this migration"

The model responds by optimizing for correctness, readability, and implementation speed. Security is rarely treated as a first-class objective.

Most AI systems are never explicitly taught to think like attackers. They are taught how software should behave; they are not taught how software is abused.

That distinction becomes increasingly dangerous as organizations move toward autonomous code generation, AI-assisted architecture, and agentic development workflows.

The solution turns out to be surprisingly simple: instead of prompting for features alone, we inject a persistent security reasoning framework directly into the agent's operating context.

That framework is SKILLS.md.

What Is SKILLS.md?

SKILLS.md is a structured operational framework that teaches an AI agent how to evaluate software through an adversarial lens. It is not a prompt, a simple checklist, or another copy-paste of the OWASP Top 10. It is a behavioral framework that continuously pushes the model to ask "How would an attacker abuse this?" before it asks "How do I implement this?"

The goal is to transplant the mindset developed through years of CTF competitions, bug bounty hunting, and incident response directly into the AI’s reasoning process.

Why Traditional Security Checklists Fail

Most security documentation focuses on known vulnerability categories (XSS, SQLi, CSRF, SSRF, IDOR). These are important, but attackers rarely think in categories. They think in assumptions.

Every vulnerability exists because somebody assumed something was true:

  • The frontend won't send invalid values.

  • Only authenticated users can reach this endpoint.

  • This request executes once at a time.

  • Nobody can access that internal network.

Bug bounty hunting teaches you something uncomfortable: assumptions are where systems fail. Security is often less about blocking payloads and more about eliminating dangerous assumptions. SKILLS.md is built entirely around that philosophy.

The Evolution From Builder To Breaker

Plaintext

Traditional Engineering:
Requirement ──> Implementation ──> Testing ──> Deployment

Security-Oriented Engineering:
Requirement ──> Implementation ──> Abuse Analysis ──> Boundary Verification ──> Concurrency Analysis ──> Deployment

The first workflow asks: Does this feature work?

The second asks: What happens when somebody intentionally tries to break it?

SKILLS.md forces AI agents into the second mode.

The Specifications: SKILLS.md

Modern AI tools and tools like Claude Code have evolved past static, single-file home directory configurations. They utilize the Agent Skills Standard, which relies on a nested folder footprint (skills/<skill-name>/SKILL.md) and mandatory YAML frontmatter.

The frontmatter contains semantic metadata. When you start an AI session, the engine scans the description block to automatically determine when to pull this skill into context.

Here is the production-ready implementation file.

Markdown

---
name: secure
description: Evaluates software architecture and code through an adversarial lens. Automatically invokes when generating APIs, designing features, reviewing code, or managing authentication, state, and data boundaries.
user-invocable: true
---

# Security-First Architecture Skill

**Axiom:** Inputs malicious. Clients untrusted. Networks hostile. Dependencies may be compromised. Never trust; always verify at execution point.

---

## Domain Controls

| # | Domain | Attacks | Key Controls | Core Question |
|---|--------|---------|-------------|---------------|
| 1 | **State / Race** | TOCTOU, double-spend, optimistic-lock loss | `SELECT FOR UPDATE`; distributed lock (Redis `SET NX PX`); validate ETag every write | Can same op succeed twice in parallel, or state change between check and act? |
| 2 | **AuthZ / BOLA** | IDOR, BOLA, mass assignment, GraphQL introspection | Ownership check at data layer, not route; allowlist binding (strong_params/Pydantic DTO); disable introspection in prod; tenant-scope every query | What changes if resource ID or any request field changes? Who verified ownership? |
| 3 | **SSRF** | Metadata endpoint, DNS rebind, redirect chain, `gopher://` | Resolve DNS → re-validate IP vs RFC-1918+169.254+fc00 denylist; allowlist domains; HTTPS-only via egress proxy; proxy follows redirects, not app code | Who controls final destination after DNS resolution and redirects? |
| 4 | **Path Traversal** | `../../`, URL-encode, null byte, Zip Slip, symlink | `realpath()`/`Path.resolve()` → verify under root; never concat user input to paths; use UUIDs as storage keys; validate archive entries before extract | Can user-controlled string, after normalization, escape storage boundary? |
| 5 | **Blast Radius** | Lateral movement, over-permissive IAM, credential reuse | Non-root containers, read-only rootfs, `--cap-drop ALL`; per-service least-privilege IAM; separate creds per env/service; mTLS between services | If this service is fully compromised, what else is reachable without new creds? |
| 6 | **Fail-Closed** | Broad catch-continue, feature-flag default-on, null bypass | Default DENY in every auth check, exception block, conditional; feature flags off for security features; `default: deny` in all security-relevant switches | What does system permit on unexpected error, null, or undefined in auth path? |
| 7 | **Secrets / Crypto** | Log leak, `alg:none` JWT, weak HMAC, IV reuse, timing oracle, weak PRNG | Secrets Manager + rotation; CSPRNG only; pin JWT alg server-side; RS256/ES256 cross-service; `timingSafeEqual`; AES-256-GCM unique nonces; ban MD5/SHA-1/DES/RC4 | Can credential be recovered, forged, or brute-forced from token, log, or build output? |
| 8 | **Supply Chain** | Typosquat, dep-confusion, `postinstall` RCE, unpinned deps | Exact-version pin + lockfile; verify signatures/hashes; audit `postinstall` scripts; private registry namespace; `npm audit`/`pip-audit`/`cargo audit` in CI | What third-party code executes in build/runtime that we don't own and audit? |
| 9 | **Event Integrity** | Replay, out-of-order, schema-invalid crash, webhook spoof | Idempotency keys (Redis TTL); strict schema → DLQ on malformed; HMAC-SHA256 webhook sig + 5-min timestamp window; sequence numbers for ordering | Can replaying/reordering an event corrupt state? Is every inbound event cryptographically authenticated? |
| 10 | **LLM / AI** | Prompt injection (direct/indirect), agentic tool abuse, secondary injection | Treat model output as untrusted input; explicit tool-auth gateway per user permission; strict output schema (JSON Schema/Pydantic); human-in-loop for irreversible actions; no ambient creds in agent env | What auth boundaries exist between model output and execution? Can injected content in retrieved data override system instructions? |
| 11 | **Injection** | SQLi, XSS, CMDi, SSTI, XXE, NoSQLi, ReDoS | Parameterized queries; `execFile([...])` not `exec(string)`; context-aware output encoding; disable XML external entities; reject `$`-prefix JSON keys; audit regexes for backtracking; never eval user input in templates | Does any user-controlled string reach an interpreter without structural separation? |
| 12 | **AuthN / Session / OAuth** | Credential stuffing, session fixation, missing server-side invalidation, redirect-URI wildcard, missing PKCE, token-in-URL | CSPRNG session tokens; rotate on login/priv-escalation; server-side invalidation on logout; `Secure;HttpOnly;SameSite=Strict`; Argon2id (64MB/3-iter) or bcrypt≥12; exact-match `redirect_uri`; PKCE for public clients; bind `state` to session; identical error messages + timing | Can attacker reuse, predict, fix, or intercept session/token/auth-code without knowing original secret? |
| 13 | **Log Exposure** | Credential leak, PII capture, log forgery via `\n\r`, stack-trace disclosure | Scrub tokens/JWTs/keys/PAN/SSN/email before every log write; allowlist loggable fields; sanitize `\n\r\033` in user input; disable verbose stack traces in prod responses | If every log line leaked, what sensitive data would be visible? |
| 14 | **Deserialization** | pickle/ObjectInputStream/`yaml.load`/Marshal RCE, gadget chains, DoS via nested structures | Never deserialize untrusted data with native formats; use JSON/Protobuf + schema validation; if unavoidable: allowlist filter + HMAC-sign payload; always `yaml.safe_load` | Does any code path deserialize attacker-influenced data with a class-instantiating deserializer? |
| 15 | **Rate Limits / DoS** | Credential stuffing, ReDoS, zip bomb, billion-laughs, large upload, unbounded pagination | Rate limits per-IP + per-user at gateway and app; tightest on auth/reset/OTP; cap body/upload size at ingress; GraphQL depth+complexity limits; timeouts on all external calls + queries; queue expensive ops; `limit ≤ 100` on pagination; return 429 + `Retry-After` | Can a single actor trigger resource consumption that degrades availability for others? |
| 16 | **HTTP Controls** | CORS credential theft, XSS via missing CSP, clickjacking, MIME-sniff bypass, SSL-strip | Exact-origin CORS allowlist (never reflect `Origin`, never `*` + credentials); HSTS `max-age=63072000;includeSubDomains;preload`; CSP allowlist `script-src`, no `unsafe-inline/eval`; `X-Content-Type-Options:nosniff`; `X-Frame-Options:DENY`; `Referrer-Policy:strict-origin-when-cross-origin`; `__Host-` cookie prefix | Can cross-origin page, framed page, or MIME-sniffed resource exploit browser trust in this origin? |
| 17 | **File Upload** | Zip Slip, polyglot exec, oversized upload, SVG XSS, archive bomb | UUID storage keys (never user filename); separate origin (S3 bucket/CDN subdomain) + `Content-Disposition:attachment`; validate by magic bytes not MIME; enforce size+count at ingress; reject SVG/HTML or sanitize with DOMPurify; cap archive extraction size + entry count | Can an uploaded file execute code, escape storage, or gain application-origin trust? |
| 18 | **Info Disclosure** | Stack traces, version headers, debug endpoints, username enum, timing leak | Opaque error IDs to clients, full detail server-side only; remove `Server`/`X-Powered-By`/`X-AspNet-Version`; disable debug/admin/introspection in prod (fail startup if debug=prod); identical error messages + response times; scan for exposed debug routes | Does any response, header, error, or timing difference reveal internal structure to unauthorized caller? |
| 19 | **Build Pipeline** | Compromised CI, over-permissive build IAM, unsigned images, secrets in logs, fork PR secret leak | Pin CI actions to commit SHAs; secrets only on protected branches; read-only source + write-only artifact IAM for build; sign images (cosign/Sigstore) + verify at deploy; OIDC ephemeral creds (no long-lived keys); branch protection: review + CI + signed commits required | Can compromised dependency update, CI job, or PR introduce malicious code reaching production without human review? |

---

## Verification Protocol (run sequentially on every review/design)

1. **Trust Boundaries**   map all input sources (HTTP, queues, webhooks, files, env); strip all safety assumptions
2. **Concurrency**   find TOCTOU windows; verify DB-level locking on every state mutation
3. **AuthZ**   confirm ownership check at data-access layer, not route layer, on every request
4. **Privileges**   least-privilege IAM, service accounts, container caps, DB permissions
5. **Blast Radius**   full compromise simulation: map reachable services/creds/data without new credentials
6. **Interpreter Paths**   trace user input to SQL/shell/template/XML/YAML/pickle; confirm structural separation
7. **Auth Surface**   session lifecycle (issue→rotate→invalidate), MFA, OAuth parameter binding, credential storage
8. **Log Audit**   confirm no secrets/PII/payment/raw bodies in any log path including error handlers and APM agents
9. **Rate Controls**   sensitive endpoints have per-user + per-IP limits; expensive ops queued with concurrency caps
10. **HTTP Controls**   CORS allowlist, security headers, CSP, no verbose error leakage on all response paths
11. **Upload & Deserialization**   files stored outside app origin with server-generated keys; no native deserializer on untrusted data
12. **Build Pipeline**   CI secrets scoped, actions SHA-pinned, images signed and verified at deploy

> **Target:** Code that behaves predictably when an adversary is actively attempting to shatter it.

Installation Guide

To ensure your AI assistant picks up this framework without breaking file path scopes, use the explicit terminal setups below depending on your favorite environment.

1. Claude Code

Claude Code evaluates configurations from your global home configuration space (~/.claude) or local workspaces (.claude).

  • Global Installation (Applies across all code repositories on your machine without altering git states):

    Bash

    mkdir -p ~/.claude/skills/security-review
    # Save the Markdown block above into this file:
    nano ~/.claude/skills/security-review/SKILL.md
    
* **Project-Specific Installation** *(Committed directly into git to enforce security rules across the whole engineering team)*:
  ```bash
  mkdir -p .claude/skills/security-review
  nano .claude/skills/security-review/SKILL.md

2. Cursor (and custom IDEs)

Cursor indexes markdown definitions gracefully via workspace indexing or dedicated custom instructions.

Bash

mkdir -p .cursor/skills/security-review
nano .cursor/skills/security-review/SKILL.md

(Alternatively, you can save it as a top-level SKILLS.md file in your root workspace).

3. Orchestrated Agent Frameworks (CrewAI / LangGraph)

For autonomous multi-agent pipelines, pass the file directly as system background data inside your orchestration configuration:

YAML

agent:
  role: Adversarial Security Auditor
  backstory: You analyze architectural code changes strictly through the lens of SKILLS.md rules.
  instructions:
    - Ingest the custom SKILLS.md baseline constraints.
    - Check every generated code route against Concurrency and Trust Boundaries.

How to Use the Framework

Once installed, you don’t need to repeatedly copy-paste security prompts. The framework leverages both passive and active execution behaviors.

Method A: Automated Semantic Triggering (Passive Mode)

Because the custom frontmatter contains a deep description string, the AI continuously evaluates your inputs. If you type a standard prompt that crosses defensive boundaries, the engine auto-activates the skill behind the scenes.

  • Your Prompt: "Write an endpoint that takes a user's uploaded image URL, downloads it, and processes metadata."

  • The AI's Internal Action: The engine intercepts words like URL and downloads. It auto-loads security-review from disk, catches the SSRF / Deterministic Routing rule, and adds domain validation code before outputting the feature.

Method B: Manual Slash Invocation (Active Mode)

If you want to explicitly mandate an application review, call the skill directly via standard interface paths.

  • In Claude Code: Use the custom command shortcut directly inside your terminal session:

    Bash

    /security-review Review our new database migration file for potential data isolation vulnerabilities.
    
* **In Cursor Composer:** Force index mapping by targeting the file handle directly inside the chat bar:
  ```text
  Please build out our stripe payment callback router following the criteria defined in @SKILL.md

Real-World Transformations: Before and After

When SKILLS.md is active, the agent stops acting like a passive code generator and starts acting like an unyielding architecture reviewer.

Example: Payment Balance Deduction

  • Without SKILLS.md: The user asks for a simple point redemption function. The AI generates a standard SELECT balance followed by an UPDATE balance sequence. It looks clean, passes unit tests, but immediately falls to a race condition exploit when a user executes parallel curl requests.

  • With SKILLS.md: The agent's internal reasoning detects a state change trigger. It forces the SQL generation to include row-level isolation via SELECT ... FOR UPDATE or requires a strict Idempotency-Key header transaction check.

Example: User-Configured Webhooks

  • Without SKILLS.md: The user prompts the AI to build an outbound webhook engine so users can get alerts. The AI uses a simple Axios/Fetch call passing the target parameter. An attacker signs up, sets their webhook to [http://169.254.169.254/latest/meta-data/](http://169.254.169.254/latest/meta-data/), and extracts cloud infrastructure IAM keys.

  • With SKILLS.md: The agent flags the user-controlled URL routing pattern. It refuses to output the code until it builds an accompanying domain allowlist check, wraps the execution in an isolated egress proxy, or isolates the protocol rules.

The Bigger Shift

Today, engineers review AI-generated code. Tomorrow, AI systems will review AI-generated code. Eventually, entire engineering workflows will become completely autonomous.

When that happens, security can no longer exist as an afterthought or a final manual compliance checklist performed at the tail end of a sprint. It has to become a core property of the AI's internal reasoning loop.

AI does not automatically inherit security instincts. It inherits whatever mental models we explicitly give it. If you train an AI to think only like an engineer, it will build systems. If you train it to think like an attacker, it will help you build resilient systems.

The future belongs to the teams that can do both. Secure software is not created by accident; it is forged when someone spends enough time thinking about how it breaks first.

Related Thoughts

Share Your Thoughts

Be the first to comment

Share your thoughts on this post

Join the conversation

Stay in the loop

New posts and projects land in your inbox. No spam, no filler - just the good stuff.