Twenty years in Disaster Recovery, and why I haven't shipped Global Deduplication yet - PodHeitor

Today was one of those rare days where you work straight through and the feeling at the end isn’t tiredness — it’s purpose. I have been inside the Disaster Recovery space for over twenty years. Backup, restore, replication, integrity, soak testing, chunk chains, hash tables, RocksDB, ESM, manifests. It’s a space with the cruel luxury of only mattering when something goes wrong — and when something goes wrong, it matters more than any marketing feature ever shipped.

Today I spent the whole day deep in the PodHeitor Global Deduplication plugin for Bacula, written from the ground up in Rust. And I want to explain why it hasn’t gone GA yet — and why that, in my read, is the exact opposite of weakness.

What’s working, in real numbers

The plugin runs in two modes. Storage mode: Bacula’s Storage Daemon intercepts file-data records, content-defined-chunks them with FastCDC, and stores unique chunks in a local content-addressable store. In the lab, on /usr/bin (408.8 MiB of client data), the on-disk volume came out at 4.45 MiB — 1.09% of client bytes. 91× shrink.

Bothsides mode (client-side dedup, with hash exchange over an authenticated, encrypted TCP channel): on a warm cache, against 420 MB of real data, the wire carried 1.55 MB. Measured wire savings: 99.63%. In multi-day soak (R5/R6, daily backup→restore→md5 round-trip cycles), dedup_ratio hits 1.0000 consistently on stable corpora — every chunk presented to the daemon is already known. Perfect steady state.

Chunk-store microbenchmarks, on a modest VM (8 vCPU QEMU, 3 GB RAM, virtio rotational disk), show ingest at ~1.18 GB/s (new chunks) and ~1.22 GB/s (already-deduped), with recall latency around 265 µs for a 4 MB chunk — effective ~15 GB/s when the read-ahead cache helps. These lab numbers already beat reference implementations written in C. Rust isn’t magic — it’s the result of zero runtime overhead, predictable memory layout, and a compiler that prevents five classes of bug from happening simultaneously.

The techniques behind the speed

Deduplication performance doesn’t come only from the language — it comes from what we don’t have to do. Some of the architecture decisions running today:

Multi-layer Bloom filter (hot + cold). Before any RocksDB lookup, the chunk hits a memory-mapped scalable Bloom filter. Hot layer: recent chunks in RAM. Cold layer: full history, mmap’d from disk. Configurable false-positive rate (default 0.1%) — false-negatives impossible by construction. Net effect: the vast majority of “definitely new” chunks never touch SSD. On 99%-dedup workloads this eliminates roughly 99% of the index I/O a naive scheme would issue.
Segment Locality Tracking. Chunks from the same backup stream (same job, same file) are grouped into contiguous segments inside the physical container. On restore, the first recall warms the cache; subsequent recalls return instantly. Same “dedup container locality” pattern that established commercial systems use — implemented natively, no patches. Restore speeds up by an order of magnitude on real-world corpora.
Adaptive Chunk Tuner. The tuner continuously samples dedup_ratio over a sliding window and adjusts chunk size: high dedup (>80%) → smaller chunks (finer granularity); low dedup (<20%) → larger chunks (less index overhead). The system tunes itself to the customer’s data profile — no manual knob-turning.
Custom AEAD framing, no TLS. For Mode B we evaluated and rejected TLS 1.3 mTLS and even TLS 1.3 external-PSK. We kept the primitives (HKDF-SHA256 + ChaCha20-Poly1305), dropped the TLS state machine. Result: handshake in 0.1–0.3 ms versus 3–5 ms for cert-based TLS. Same confidentiality and integrity. 45 KB extra binary per host instead of 200 KB. One 32-byte file per site, no CA, no annual renewal.
RocksDB with durable WAL + append-only containers with per-chunk CRC-32. Every chunk is CRC-checked on read. If a disk silently corrupts a byte, the system catches it immediately — not “eight months later when the restore fails.”
Vacuum / GC with integrity scan. Periodic operation that walks the entire index, reads and re-verifies CRC on every chunk. Detects bit-rot, marks ChunkStatus::Corrupted, and the operator is told before the day they need the data.
FastCDC variable-length chunking. Boundaries determined by content, not by fixed offset. Insert one byte in the middle of a file? Only one chunk changes — not the entire chain. That’s what makes incremental VM dedup land at 99% ratio instead of 30%.

Why I haven’t shipped

Here’s where the post gets uncomfortable to write — because the honest answer is the opposite of what sells.

The component is exceptionally delicate. Global deduplication touches three things that tolerate zero bugs:

Reentrancy under stress. The dedup daemon runs in parallel with Bacula’s File Daemon and Storage Daemon. We found and fixed cases where a stale daemon session, instead of failing closed, would let 37,000 files through as 0-byte records — with final job status Backup OK. That’s the worst class of backup bug: silent data loss wearing a success badge. Fixed (commit 21bdfa5, April 30), validated with md5 round-trip across 25,071/25,071 files, hardened with fail-closed behavior on three fallback paths.
Crash recovery. FD killed mid-backup. SD killed mid-backup. Dedup daemon restarted mid-ingest. Every one of these scenarios was exercised in the lab, with subsequent restore, with byte-exact verification. It took us months to close B3a/B3b/B3c.
Seven-day soak. Not a one-hour benchmark. It’s running a daily backup-restore-md5 cycle for a week, in synthetic production, and proving that RSS doesn’t leak, /gdd doesn’t grow without reason, wire savings don’t regress, md5 doesn’t drift. We’re on R6 — 7-day soak in flight. Day 1 closed. Six to go.

Shipping before that would be commercial cowardice dressed up as agility.

The difference nobody puts on a slide

I have spent my career working alongside, and sometimes against, European backup vendors that treat General Availability as a marketing event. Companies that lean on paid reviews to prop up public perception, that ship rough usability under the excuse of “that’s how it’s always been,” that commercially threaten partners who dare to push back on a technical decision, and that throw code into GA knowing it has a known regression — because the release calendar matters more than the customer who’ll have to restore tomorrow.

That is not the PodHeitor game. I don’t have foreign shareholders demanding a quarter. I have an end customer. A customer who, on a bad day, depends on yesterday’s backup actually restoring. I’m not going to look at that customer a year from now knowing I signed off on a premature GA because some OKR spreadsheet asked me to.

What’s next

R6 finishes May 7. With clean round-trip across all seven days, stable RSS, and bounded /gdd growth, GDD goes GA. Not before. A few more days on a 20-year project cost me nothing — a silent failure would cost a lot to the people who trust us.

When it ships, it ships right: byte-exact roundtrip validated, 7-day soak documented, fail-closed on every fallback path, chunk store with its own fsck, DR runbook for the dedup store itself (yes, we built a disaster-recovery plan for the disaster-recovery system — that’s the level of paranoia we’re talking about).

That’s the work. It’s slow. It’s detail-obsessed. It’s boring for anyone who wants tomorrow’s release. And, twenty years in, it’s still the only kind of work that gives me purpose.

— Heitor Faria

podheitor.com | [email protected]

Disponível em: Português (Portuguese (Brazil))EnglishEspañol (Spanish)