Technical whitepaper — PodHeitor Proxmox for Bacula

Internal architecture, operation modes, Veeam-style DR replication, NBD-based instant recovery, cross-hypervisor conversion (vSphere/Hyper-V → PVE), and the security model with TLS fingerprint pinning.

Technical companion to the PodHeitor Proxmox plugin page.

1. The problem: stock Bacula is hypervisor-blind

Bacula Community in its stock form has no hypervisor awareness. Backing up Proxmox VMs without a plugin typically falls into one of three options, all bad:

  • Host filesystem backup — captures .qcow2/.raw files in inconsistent state, no quiesce, no CBT. Restored VMs frequently boot into corruption.
  • vzdump + directory dump, then Bacula — doubles the storage footprint (1× PVE dataset + 1× vzdump output), and every incremental retransmits the entire dump because vzdump does not emit native deltas.
  • Bacula Enterprise PVE plugin — exists, but at enterprise pricing, with no cross-site DR replication and no cross-hypervisor conversion.

The PodHeitor Proxmox Plugin closes all three gaps in a single binary: VM-aware backup with CBT, Veeam-style cross-node DR replication, and cross-hypervisor restore (vSphere/Hyper-V → PVE).

2. Architectural model

The plugin follows the PodHeitor pattern of cdylib + standalone backend, communicating over PTCOMM (length-tagged framing on stdin/stdout). The motivation is threefold:

  1. Crash isolation. A panic in the NBD or QMP engine kills the backend, not bacula-fd. The cdylib observes EOF on the pipe, reports the job as failed, and the FD keeps serving other jobs.
  2. Parallelism freedom. The backend can open PVE REST + NBD + QMP connections in parallel without violating Bacula’s “one thread per bpContext” contract.
  3. License firewall. Since v2.0.0, no Bacula AGPLv3 source is statically linked.

2.1 Process topology

bacula-fd  →  podheitor-proxmox-fd.so (cdylib ~600 LoC)  →  podheitor-proxmox-backend (Rust ~4500 LoC)
                                                                ├─ PVE REST API (HTTPS/TLS pinned)
                                                                ├─ NBD Client (disk I/O)
                                                                └─ QMP Client (snapshot + dirty bitmap)

The backend hosts five engines: BackupEngine, RestoreEngine, ReplicationSender, ReplicationReceiver and InstantRecoveryEngine. Selected via the plugin string’s mode= parameter.

3. Operation modes

Mode Function Engine
backup VM-aware backup (Full/Inc/Diff) with CBT via QEMU dirty bitmaps BackupEngine
seed Initial full sync, auto-provisions replica VM on DR target ReplicationSender
incremental (DR) CBT-only incremental replication — dirty-bitmap deltas ReplicationSender
receiver DR-target receiver daemon — listens on TCP dr_port (9190) ReplicationReceiver
failover-exec Boot replica on DR (planned failover) ReplicationSender
failback-pre Return replica to standby ReplicationSender

4. CBT via QEMU dirty bitmaps

Instead of re-sending 100 GB every night, the plugin installs a persistent dirty bitmap in PVE’s QEMU through QMP block-dirty-bitmap-add. Every incremental:

  1. Takes a consistent snapshot (with quiesce=yes via QEMU Guest Agent when available).
  2. Reads only blocks flagged dirty since the last backup, via NBD BLOCK_STATUS.
  3. Streams those blocks over PTCOMM with offsets preserved.
  4. Resets the bitmap once the backup terminates OK.

A 100 GB VM with 2 GB modified → only 2 GB transferred. Without CBT (stock Bacula + filesystem), it’d be 100 GB every night.

5. Veeam-style DR replication (v1.1.0)

The plugin implements a cross-node PVE-1 → PVE-2 DR pipeline with semantics close to Veeam Replication, but driven entirely from the Bacula Director (FileSet/Job).

5.1 Phases

  • Seed (mode=seed): initial full sync. Auto-provisions the replica VM on the DR target (cores, RAM, NICs, SCSI controllers, storage spec).
  • Continuous incremental (mode=incremental): dirty-bitmap deltas only.
  • Restore points: snapshots auto-rotated on the replica (default 7 points).
  • Verify (verify_sample_blocks=N): FNV-1a-64 hash of N sample blocks compared source ↔ DR. Mismatch = job fail (not silently OK).
  • Planned failover (mode=failover-exec): one run boots the replica on DR.
  • Planned failback (mode=failback-pre): returns replica to standby.

5.2 DR channel authentication

Method Parameter When to use
PSK token (HMAC) dr_auth_token Default; quick setup between two controlled sites
TLS Mutual Auth dr_auth_cert + dr_auth_key Compliance / multi-tenant; rustls + PEM
Both all 3 Defense-in-depth

5.3 Standalone receiver daemon

The DR target does not need to run bacula-fd. The package installs the systemd template [email protected]; the receiver listens on dr_port (default 9190) and accepts authenticated streams, writing disks over NBD to the local PVE’s dr_storage. This reduces attack surface on DR and simplifies partial air-gap.

6. Instant Recovery via NBD overlay

Traditional recovery of a 500 GB VM can take hours of disk-write time before the service is back. The InstantRecoveryEngine bypasses this:

  1. Boots the VM in PVE pointed at a virtual disk served by NBD from the backend, reading directly from the Bacula restore stream.
  2. Overlay writes (ir_overlay_storage) capture guest changes on fast local storage.
  3. In the background, with ir_auto_migrate=yes, the disk is migrated to final storage (ir_target_storage) without downtime.

Typical RTO drops from hours to minutes. Key parameters: ir_nbd_bind, ir_nbd_port, ir_overlay_storage, ir_target_storage, ir_timeout (default 3600s).

7. Cross-hypervisor conversion

The plugin reads backups produced by sister plugins PodHeitor vSphere and PodHeitor Hyper-V and restores them directly into PVE — no manual reconversion:

Source Source disk format Conversion
VMware vSphere VMDK VMDK → qcow2/raw via in-process library (not a shell-out)
Hyper-V VHDX VHDX → qcow2/raw via in-process library

Lab-validated: Job 805 (Hyper-V → PVE) and Job 865 (VMware → PVE) restored VMs with successful boot on the destination PVE.

8. Security model

8.1 TLS Fingerprint Pinning (PVE API)

Stock Bacula typically trusts the system trust store; PVE certificates are self-signed by default. The plugin enforces explicit SHA-256 fingerprint pinning via pve_fingerprint=AA:BB:CC:..., with pve_insecure=no as the default. To obtain the fingerprint:

openssl s_client -connect pve-host:8006 </dev/null 2>/dev/null 
  | openssl x509 -noout -fingerprint -sha256 
  | sed 's/SHA256 Fingerprint=//'

Mismatch (rotated cert, MITM, host swapped) → job aborts with ERROR: TLS fingerprint mismatch. Expected AA:BB:... got CC:DD:....

8.2 Credentials

  • Passwords and tokens passed via FileSet plugin string (not stored on disk by the plugin).
  • bacula-dir.conf must have perms 600, owner bacula.
  • For production: integrate an external vault (HashiCorp Vault, AWS Secrets Manager) and template the plugin string from the Director.

9. Lab validation

Metric Result
Sequential Bacula Jobs executed 1,290+ JobIDs
Backup/Restore Full/Inc/Diff (same-host + cross-host) OK
Cross-hypervisor Hyper-V → PVE (Job 805) OK
Cross-hypervisor VMware → PVE (Job 865) OK
Replication seed 100 GB 107 GB in 93.7 min, 19 MB/s sustained
Replication incrementals 15 / 15 back-to-back, 14.5 s avg
Integrity verify 150 sample blocks, 0 mismatches
mTLS DR channel v3 cert with IP SAN — handshake + integrity OK
Planned failover + failback One-command, exit 0
Bacula-driven JobId 3448 Termination=OK

Environment: Director Oracle Linux 9.6 + Bacula 15.0.3; PVE Site A Debian 12 + PVE 8.4.18; PVE Site B Debian 13 + PVE 9.x.

10. Documented anti-patterns

  • Do not disable pve_fingerprint in production. Setting pve_insecure=yes accepts any cert the PVE host presents — including MITM certs. Lab use only.
  • Do not run quiesce=yes without QEMU Guest Agent installed and active in the VM. Without the agent, the plugin auto-degrades to crash-consistent — but the operator must know this happened (check job log).
  • Do not run receiver and sender on the same PVE host. They open the same dr_port and the second one fails to bind.
  • Do not confuse backup_type=incremental (Bacula level) with mode=incremental (DR mode). The first is the FileSet’s backup level; the second is the replication engine mode. Orthogonal.

11. License posture

Since v2.0.0, the plugin ships under LicenseRef-PodHeitor-Proprietary. No Bacula AGPLv3 source is statically linked into the .so. The cdylib is built from the pure-PodHeitor plugin-proxmox crate in the PodHeitor Rust cdylib workspace, with independent extern "C" bindings via the bacula-fd-abi crate.

Ready to evaluate?

30-day free trial for production Proxmox VE fleets. Guaranteed at minimum 50% discount vs Bacula Enterprise, Veeam or Commvault, with more capabilities included (DR replication, instant recovery, cross-hypervisor conversion).

Heitor Faria — Founder, PodHeitor International
[email protected]
☎ +1 (789) 726-1749 · +55 (61) 98268-4220 (WhatsApp)
🔗 PodHeitor Proxmox plugin page

Disponível em: pt-brPortuguês (Portuguese (Brazil))enEnglishesEspañol (Spanish)

Leave a Reply