Sandboxing & Fuzzing: Zero-Day Hunting Inside CRA

Security Academy Defense Whitepaper Partnership Program Security Center

Mansfeld-Südharz, Germany - October 24, 2025

Inside the containerised engine that turns malformed packets into patches—hours before they reach production

The first thing you notice in Hall 14 is the low hum of negative air pressure. Racks are arranged like refrigerator trucks: each drawer is a sealed sled, fans pulling air inward so nothing leaks out. Inside those drawers we run every untrusted artifact the Alliance collects—phishing attachments, firmware blobs, software updates from county suppliers—against a pair of tools that never sleep: a fuzzing orchestrator that mutates inputs until something breaks, and a sandbox that executes the breakage again and again while logging every register flip. Nothing here is theoretical; the rig is already chewing through two million mutations per hour on a tenant fleet that spans chemical plants, clinics and the county’s own SAP payroll. The goal is not to prove we can break code—anyone can—but to compress the delta between first crash and deployable patch to the length of a single shift, a tempo rural industry can actually afford.

The architecture starts with a plain Kubernetes cluster running on bare-metal nodes equipped with AMD Epyc chips that support nested virtualization. Each fuzzing job is wrapped into a Firecracker micro-VM: 128 MB of RAM, one vCPU, a 100 ms boot time. That frugality matters because fuzzing scales linearly with cores; the smaller the unit, the more instances we can spawn. A scheduler written in Rust keeps 30 000 of these micro-VMs alive across the rack, migrating them away from any node that shows correctable memory errors—an early-warning system for Rowhammer-style faults that might otherwise poison results. When a mutated input triggers a crash, the scheduler does not simply note the exit code; it snapshots the entire VM memory, deduplicates the page cache against a reference image, and stores a delta that rarely exceeds 40 kB. Those deltas are time-stamped and piped into a Merkle tree so that any later dispute—vendor claiming “not reproducible”—can be settled by replaying the exact bitstream on an independent node.

Fuzzing itself is handled by a custom engine we call CRA-Fuzz, forked from the open-source honggfuzz but stripped of its LLVM dependence and rewired to feed our own coverage format. The mutation strategy is boring on purpose: bit flips, boundary integers, dictionary tokens scraped from German-language RFCs, plus a genetic algorithm that breeds inputs which touch new basic blocks. Boring is fast; fast is parallel. A single PLC firmware image—say, a 2 MB binary that controls caustic soda flow—will be split into 4 000 equal shards, each shard assigned to 1 000 micro-VMs that mutate for fifteen minutes, then swap results via a shared radix tree. Coverage is measured with Intel PT traces, not instrumentation, so we do not need source code or debug symbols. That matters more than it sounds: 80 % of the industrial software we receive is compiled without symbols and covered by NDAs that forbid reverse engineering. By operating purely on the instruction stream we stay on the right side of those contracts while still delivering stack-level diagnostics to vendors who agree to remediate.

Once a crash is confirmed, the artifact is promoted to the sandbox tier—physically the same racks but scheduled under stricter isolation. Here the micro-VM is replaced with a full Qemu instance that provides emulated CAN-bus, Modbus and OPC-UA interfaces identical to those found in the county’s chemical plants. The exploit candidate is replayed again, this time while a side-channel logger records timing variations, RF emissions and power-draw fluctuations. Any deviation from a clean baseline is encoded into a compact side-channel signature that can later be scanned for in production networks without ever executing the malicious input again. In May we caught a malformed S7 packet that brute-forced a PLC password in 38 milliseconds; the side-channel signature derived from that single run now sits in every participating plant’s IDS, blocking the technique even though the vendor patch is still two weeks away. That gap—signature today, patch tomorrow—is where rural operational technology lives, and it is the gap the Alliance was built to close.

"Zero-days are not mystical; they are edge-cases we simply have not met in a sandbox yet."

Automation does not end with detection. When confidence exceeds 95 %, the sandbox opens a merge-request against a private Git repository that mirrors the vendor’s source tree (shared under NDA). Our wrapper adds a one-line null-check or bounds-test, plus a regression test that replays the crashing input inside the same micro-VM image. The vendor receives a pull-request, not a vague advisory. Acceptance is voluntary, but the MoU we sign beforehand makes disclosure deadlines binding: critical bugs must be closed within 90 days, others within 180. So far 43 % of pull-requests are merged without modification, 31 % receive minor style edits, and only 5 % are rejected outright—usually because the component is already end-of-life. Those metrics are published quarterly, creating a public scoreboard that nudges sluggish suppliers without naming or shaming.

All of this is orchestrated by a single pipeline file—less than 300 lines of YAML—that any new region can transplant into its own cluster. The file references container images we host on a European registry, so no build environment is needed; the only local input is a list of IP ranges and firmware hashes that define the attack surface to be tested. When South-Tyrol stood up its node last month, the complete migration took four hours, most of which was spent waiting for 10 Gbit of firmware images to download. That portability is deliberate: we want the engine to feel like a utility, not a product pitch. If a county can run water pipes, it can run fuzz pipes; the competence required is the same—schedule maintenance, read logs, escalate anomalies.

The long-term bet is that continuous mutation will become as routine as virus scanning once was. By 2027 every PLC, medical device and county laptop inside the Alliance will submit nightly firmware hashes to the fuzzing queue; by 2028 the goal is to reach a mean-time-to-patch of 72 hours for critical vulnerabilities, a speed that even multinational vendors struggle to match inside their own SDLC. Whether we hit that number or not, the underlying message is already stable: zero-days are not mystical; they are just edge-cases we have not met yet, and meeting them at county scale is cheaper—and faster—than waiting for the world to update.

The Cyber Resilience Alliance is a public-private partnership established 2025, led by CypSec, Validato and the County of Mansfeld-Südharz. The Alliance operates a sovereign private-cloud security stack, a shared SOC and an cyber academy, aiming to make Mansfeld-Südharz the reference site for rural cyber resilience by 2030.

Media Contact: Daria Fediay, Chief Executive Officer at CypSec - daria.fediay@cypsec.de.

Sandboxing Fuzzing Zero Day

Sandboxing & Fuzzing

Inside the containerised engine that turns malformed packets into patches—hours before they reach production

Welcome to CypSec Group