AI as Second Opinion: Building the Trust Record | Anwer Gertani

Do not start with AI making decisions. Start with AI making recommendations alongside the decisions your analysts are already making. Trust is earned by evidence, not declared by procurement.

This is Post 4 of 7 in the series “Building Security Operations That AI Can Run.”

The most common mistake in deploying AI for security operations is giving it authority before it has earned trust.

The right deployment sequence inverts the authority relationship. In the first phase, the AI model operates in shadow mode — it produces recommendations in parallel with analyst decisions, but analysts make decisions independently without seeing the AI output. After a defined period, the comparison record is reviewed: how often did the AI recommendation match the analyst decision? The divergence cases are the most valuable. Some reveal analyst error. Others reveal model error. A third category reveals genuine ambiguity. Each category requires a different response.

In the second phase, the AI recommendation is surfaced to analysts before they make their decision, but without binding authority. Analysts can see what the model recommends and choose to follow it or override it, with a lightweight logging mechanism capturing their choice and reasoning.

The comparison record methodology above applies directly to classical ML models. Large language models require important additions. Because LLMs reason probabilistically, a high agreement rate on standard historical cases can significantly understate the risk: an LLM can achieve 95 percent agreement on routine triage decisions while being highly vulnerable to adversary-crafted inputs that manipulate its reasoning on the cases that matter most. The trust record for an LLM deployment must therefore include adversarial testing — deliberate attempts to manipulate the model’s recommendations using the kinds of content it will encounter in production.

The threshold for moving a decision class to AI-primary authority should be set by the full trust record — agreement rate plus adversarial robustness — not by either alone. A reasonable threshold is sustained agreement of 95 percent or more on a specific, well-defined decision class over a sufficient volume of cases, combined with demonstrated resistance to prompt injection attempts representative of the adversary techniques active in your sector.