Explainable AI in Security:

Security Academy Defense Whitepaper Partnership Program Security Center

Mansfeld-Südharz, Germany - October 15, 2025

A field manual for making neural nets confess their motives before they are allowed to open a valve

The hydrogen valve on line 3 does not care that your neural network achieved 99.3 % accuracy on the test set. It cares whether the 0.7 % mistake will open it at 3 a.m. while the maintenance crew is sipping coffee in the canteen. That 0.7 % translates into a pressure delta of 200 bar, a temperature spike of 80 °C, and a headline that no county councillor wants to read. The CypSec Academy therefore teaches explainable AI the way chemical plants teach safety: if the operator cannot interrogate the decision in less time than it takes to finish a cigarette, the decision is treated as wrong by default. The curriculum is now 18 months old, has produced 37 graduates, and has already prevented one production halt by forcing an algorithm to admit it had confused a firmware update with a brute-force attack. The algorithm sulked, the valve stayed shut, and the county kept its tax base. That is the only accuracy metric we track.

The first lesson is vocabulary, not mathematics. Apprentices learn that an “activation map” is simply a heat-map that shows which pixels the model stared at before it shouted “malware.” They are taught to overlay the map on the original packet capture and to ask the same question they would ask a junior operator: “What did you see that I didn’t?” If the heat-map highlights the TCP checksum field, the apprentice learns to suspect a false positive, because checksums are random by design. If the heat-map highlights a sequence of NOP sleds, the apprentice learns to escalate, because NOP sleds look like runway lights to anyone who has ever debugged buffer overflow. The interrogation takes 45 seconds, the same time it takes to finish a cigarette, which means the analyst can perform it while the shift foreman is still walking over. The cigarette rule is not pedagogical flair; it is a safety habit borrowed from the chemical plant next door, where every abnormal pressure reading must be explained before the next batch is released.

The second lesson is counterfactual testing, but without the word “counterfactual.” Apprentices are taught to ask “what-if” questions using a slider tool that masks parts of the input and measures the drop in confidence. The tool is a simplified version of SHAP, but the interface looks like a car radio: slide the bass to zero and listen if the song still sounds malicious. If masking the destination port drops confidence from 97 % to 12 %, the apprentice learns that the model was mostly listening to the port, which is a legitimate signal. If masking the payload bytes drops confidence only to 94 %, the apprentice learns that the model was not really looking at the exploit, which is a red flag for a false positive. The slider produces a percentage, not a p-value, because percentages are what chemical plants use when they dilute concentrations. By week eight, apprentices can run the slider while talking to the shift foreman, turning explainability into a conversation rather than a lecture.

The third lesson is local explanation, not global explanation. Global explanations—“the model is 94 % accurate”—are useless to a SOC analyst who needs to know why this specific frame triggered an alert. Apprentices are taught to generate local explanations using LIME, but the output is visualised as a traffic-light strip that sits above the packet payload. Green means “the model looked here and found nothing suspicious,” amber means “look again,” red means “escalate now.” The strip is generated in real time and disappears after 30 minutes, long enough for the analyst to make a decision but short enough to avoid storage bloat. The 30-minute rule is borrowed from shift logs in chemical plants, where entries older than one shift are considered historical fiction. The strip therefore functions like a hand-written note taped to a gauge: temporary, specific, and actionable.

"If the AI cannot explain itself before the cigarette burns out, the valve stays shut."

The fourth lesson is human-in-the-loop, but without the loop becoming a noose. Apprentices are taught that every AI alert must be confirmed or rejected by a human within 15 minutes, but the confirmation is itself logged and fed back into the model. If an analyst marks an alert as false positive, the model weights that feature downwards for the next 24 hours, creating a local feedback circuit that learns from the county’s own traffic pattern rather than from generic benchmarks. The circuit is therefore a closed loop inside the county boundary, ensuring that the model adapts to local normals without drifting into global bias. The loop is monitored by the same apprentice who triggered it, which means that mistakes are corrected by the same eyes that made them, a pedagogical closure that turns every false positive into a free lesson.

The fifth lesson is liability, not accuracy. Apprentices are taught that the final decision is always human, which means the final liability is also human. The academy issues a certificate that states: “The holder is trained to overrule any AI recommendation when safety is in doubt.” The sentence is not boilerplate; it is printed in bold and must be signed by both the apprentice and the academy director. The signature functions like a safety valve: it gives the apprentice permission to disagree with the algorithm without needing to understand the algorithm. In the chemical plant next door, operators are similarly trained to overrule automated valves when pressure gauges contradict the digital readout. The same logic now applies to packets: if the human does not understand the AI’s reasoning, the AI loses the argument by default. The rule is written into the county’s civil-protection manual, which means that overruling the AI is not insubordination; it is due diligence.

The sixth lesson is continuous interrogation, not continuous improvement. Graduates must spend one week per year back in the academy, teaching the next cohort how to break the previous year’s model. The exercise forces alumni to keep up with adversarial research, because the only way to teach breaking is to stay ahead of fixing. The loop ensures that explainability does not ossify into ritual; it evolves as fast as the threats evolve. Last year, a graduate discovered that the model could be fooled by padding malicious payloads with NOP sleds that carried valid checksums, a technique that looked like firmware to the model but like an exploit to the human. The discovery was fed back into the training set, and the model’s false-positive rate dropped by 18 % within a month. The graduate received no bonus, but she did receive a red sticker that says “I taught the machine to confess,” which she stuck on her locker door like a badge of honour.

The result is a workforce that treats explainable AI the way chemical operators treat pressure gauges: as tools that must be read, questioned and occasionally ignored. The academy does not graduate data scientists; it graduates safety engineers who happen to speak Python. Hydrogen valves therefore stay closed until the analyst can explain why they should open, and the explanation must fit inside a cigarette break. The cigarette rule is written on the wall of the SOC in letters large enough to read at 3 a.m., which is the only time of day when explanations really matter.

The Cyber Resilience Alliance is a public-private partnership established 2025, led by CypSec, Validato and the County of Mansfeld-Südharz. The Alliance operates a sovereign private-cloud security stack, a shared SOC and an cyber academy, aiming to make Mansfeld-Südharz the reference site for rural cyber resilience by 2030.

Media Contact: Daria Fediay, Chief Executive Officer at CypSec - daria.fediay@cypsec.de.

Explainable AI Cyber Academy Mansfeld-Südharz

Explainable AI in Security

A field manual for making neural nets confess their motives before they are allowed to open a valve

Welcome to CypSec Group