Why AI Security Programmes Fail Before They Start | Anwer Gertani

The most common AI security failure mode: applying AI to processes that were never defined, documented, or cleaned up. You do not automate chaos. You accelerate it.

This is Post 2 of 7 in the series “Building Security Operations That AI Can Run.”

The typical AI security programme failure follows a recognisable pattern. The organisation purchases an AI-augmented SOAR platform or an AI-driven detection product. The vendor deploys it in a week and points it at the existing alert stream. Initial demos look impressive. Within ninety days, the team is handling more alerts than before, the false-positive rate is higher than expected, analyst satisfaction has dropped, and the programme sponsor is being asked to justify the investment. The conclusion drawn is usually that the AI was not good enough. The actual problem is that the AI worked exactly as designed — it amplified the underlying process, which was undefined and

Large language model deployments fail differently from classical ML — and in ways that are less visible and therefore more dangerous. A classical ML triage model that fails produces wrong labels with incorrect confidence scores: detectable through the comparison record described in Post 4. An LLM that fails produces plausible-sounding but incorrect reasoning — a confident, coherent explanation for why an alert should be dismissed that happens to be wrong. It can also be manipulated by adversary-controlled content it is asked to process: a phishing email body, a malware string, a log entry containing instructions designed to alter the LLM’s analysis. This is prompt injection, and it is not a configuration error — it is an intrinsic property of language models processing untrusted input. Any security programme deploying LLMs to analyse adversary-generated content needs a threat model for its AI inputs before deployment. Most do not have one.

The one thing organisations consistently skip is the step that makes everything else possible: getting explicit about what they are trying to decide. A security operation running on undocumented tribal knowledge can function — experienced analysts carry the playbook in their heads. But that operation cannot be augmented by AI, because AI has no playbook to augment. And it cannot be safely augmented by an LLM agent, because an agent operating on undocumented intent will fill the gaps in its instructions with inferences that may or may not match what the security team actually intends. The absence of explicit process definition is a risk multiplier for every class of AI, and a critical vulnerability for agentic AI specifically.

The fix is targeted documentation focused on the ten high-volume, high-impact decisions that drive the majority of analyst time. Document how those decisions are currently made, what data they rely on, and what the correct behaviour is when the input is ambiguous or adversary-controlled. That last element is the foundation of a prompt injection defence. It does not require deep AI expertise. It requires the same honest process documentation that the rest of the programme depends on.