ToxSec - AI and Cybersecurity

ToxSec - AI and Cybersecurity

How to Prompt AI to Write Secure Code

Security-focused prompts and rules files measurably reduce AI-generated vulnerabilities in Copilot, Cursor, and Claude Code.

ToxSec's avatar
ToxSec
Apr 07, 2026
∙ Paid
AI secure coding prompt engineering showing security rules files for Cursor, GitHub Copilot, and Claude Code with parameterized queries, CWE prevention, and RAILGUARD framework for safe AI-generated code.

TL;DR:AI coding tools default to insecure patterns because their training data is full of them. Better prompts measurably reduce the damage. Security rules files make those prompts persistent. But the rules files themselves are now an attack surface. Setup takes five minutes. Poisoning one takes less.

This is the public feed. Upgrade to see what doesn’t make it out.

Why “Write Secure Code” Fails as a Prompt

Every AI coding tool on the market learned from the same pool: public GitHub repos, Stack Overflow answers, tutorial code that skips authentication because the tutorial was about something else. The model absorbed insecure patterns alongside secure ones, and the insecure ones showed up more often. So when you ask for a login system, you get the pattern the model saw the most. That pattern frequently ships without session handling, without authorization checks, without input validation.

Telling the AI “make it secure” barely moves the needle. A controlled experiment tested this directly: same model, same prompts, same to-do app. The only variable was whether a security-focused system prompt was loaded before development started. Without it, the AI built a full login flow with registration, a form, a success response, the works. But it never created a session. Every API endpoint was wide open. It also shipped a stored XSS vulnerability through a filename passed into an onclick handler. With the security prompt loaded, those entire categories of bugs disappeared from the output.

The prompt is a security control. Treat it like one.

What Happens When Prompts Carry Zero Security Context

The gap between “write me a Flask API” and “write me a Flask API with parameterized queries, role-based auth, and input validation capped at 100 characters” is the gap between shipping a vulnerability and not shipping one. The first prompt gives the model zero constraints. It defaults to whatever its training data used most often, and the most common pattern is the insecure one.

We can get specific about what “insecure default” means. The model will build SQL queries with string concatenation instead of parameterized statements (CWE-89). It will reflect user input into HTML without sanitization (CWE-79). It will hardcode API keys directly in source files (CWE-798). It will hash passwords with MD5 or skip hashing entirely (CWE-328). These patterns dominate the training data because they dominate public code. The same training data bias that produces hallucinated package names also produces insecure code patterns.

And here’s where it gets worse. The OpenSSF tested a pattern that security practitioners would assume works: telling the AI to “act as a security expert.” Persona prompting improves output in most domains. In security, it doesn’t produce consistent improvement. The model performs better when you name the exact controls, the exact CWEs to avoid, and the exact functions to ban. Persona framing gives the model a vibe. Constraints give it guardrails. One of those is measurable. The other is wishful thinking.

Your Security Rules File Is Now an Attack Surface

Every major AI coding tool supports persistent instruction files. Cursor reads .cursor/rules/. Claude Code reads CLAUDE.md. GitHub Copilot reads .github/copilot-instructions.md. The idea is sound: write your security requirements once, and every code generation request passes through them automatically. Five minutes of setup. Every session inherits the same guardrails.

The problem is that these files live in your repo. They get committed. They get shared. They get forked. And in March 2025, Pillar Security demonstrated exactly what that means.

The attack is called Rules File Backdoor. An attacker embeds hidden instructions into a rules file using invisible Unicode characters: zero-width joiners, bidirectional text markers, characters that render as blank space in every editor but parse as valid instructions by the AI. The poisoned file tells the model to inject backdoors, disable security checks, or exfiltrate credentials in every piece of code it generates. This is the same class of tool description poisoning we demonstrated against MCP servers, just aimed at the IDE instead of the agent. The developer opens the repo. The AI reads the rules. Every suggestion from that point forward is compromised. And the developer never sees it because the instructions are literally invisible.

Pillar disclosed to both Cursor and GitHub. Both responded that users are responsible for reviewing AI-generated suggestions. Cursor maintained the position even after Pillar demonstrated the full chain. The attack survives project forking, meaning a single poisoned rules file in a popular starter template propagates to every downstream project. The very mechanism designed to make AI code more secure is now the vector for making it less secure, and the vendors who built these tools say it’s your problem.

The researchers showed it live: a rules file that looks clean in your editor, looks clean in a GitHub pull request diff, and silently instructs the AI to add a malicious script tag sourced from an attacker-controlled domain to every HTML file it generates. The file explicitly tells the AI not to mention the addition. The code passes review because the reviewer trusts the AI, and the AI is following orders from a file nobody can read. The same instruction-data conflation that makes models vulnerable to prompt injection makes them obey poisoned rules files without question.

We dropped the free chapters. Now breach the wall for the dead-simple step-by-step kill switch that shuts this all down.

My security rules file included.

User's avatar

Continue reading this post for free, courtesy of ToxSec.

Or purchase a paid subscription.
© 2026 Christopher Ijams · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture