The first Bacula plugin built natively for HPC. Parallel filesystems (Lustre, GPFS / IBM Spectrum Scale, BeeGFS, CephFS, WekaFS), billion-file namespaces, Slurm/PBS/LSF-aware scheduling, AI/ML checkpoint-aware deduplication, and restripe-on-restore. Pure-Rust, with process isolation via PTCOMM — no Bacula AGPL statically linked.
Why a dedicated HPC plugin?
Bacula’s stock File Daemon walks files single-threaded (findlib/find_one.c → readdir + sequential save_file()). On a Lustre filesystem with 1 billion files this is a non-starter — the bottleneck is readdir+lstat, not bandwidth.
Bacula Enterprise 18.2 ships dedicated plugins for HDFS, Quobyte, NDMP, NetApp, and Nutanix — but zero native support for the parallel filesystems that actually run HPC. This plugin closes that gap.
Innovative features (the differentiators)
- Parallel namespace walker — rayon work-stealing, one worker per Lustre MDT / GPFS NSD / BeeGFS metadata target. Replaces the FD’s single-threaded
find_one_file. 10–100× metadata throughput. - Namespace sharding (
Shard=N/M) — splits the namespace into N shards by hash-of-inode or subtree pinning, so N concurrent Bacula jobs run against N SD streams. Bacula has no built-in within-job stream multiplexing — sharding is the only way to saturate HPC fabric. - Filesystem-native incrementals — Lustre ChangeLogs, GPFS
mmapplypolicy, CephFSrstats+rctime, BeeGFS metadata-shard scan. True “changed since” without a billionstat()calls. - Stripe-aware parallel reader — reads Lustre OSTs / GPFS NSDs in parallel via
llapi_layout_get_by_path; reassembles in-order through PTCOMM. Naive sequential reads leave ≥80% of HPC bandwidth on the floor. - Slurm/PBS/LSF orchestration — submits the scan as a Slurm job-step on a compute node; quiesces competing jobs;
JobComphook for AI/ML checkpoint capture. Backup runs on the fast fabric, not the login node. - AI/ML checkpoint-aware dedup — pluggable into PodHeitor Global Deduplication with content-defined chunking tuned to tensor-stride boundaries. Training checkpoints differ by ~5% per epoch — 95%+ dedup ratio is realistic.
- Restripe-on-restore — persists original Lustre layout as a
RestoreObject; restore recreates striping before writing. Preserves performance characteristics, not just bytes. - Namespace-only “metadata snapshot” mode — fast nightly inode + ACL + xattr capture; bulk data weekly. Catastrophic recovery needs the namespace fast; bulk can stream from tape.
- HSM-aware — Lustre HSM integration (archive/release/restore as a tier). Backup becomes a first-class HSM action, not a hostile scan.
- Bandwidth shaping by Slurm load — reads live cluster utilization; bursts during idle, throttles during high-priority jobs. Static QoS doesn’t cut it on shared HPC.
Commercial differentiators
| Feature | Bacula Community | Bacula Enterprise / Veeam | PodHeitor HPC |
|---|---|---|---|
| Native Lustre / GPFS / BeeGFS / CephFS / WekaFS | No | No | Yes |
| Parallel walker (10–100× stock FD) | No | No | Yes |
| Namespace sharding + N SD streams | No | No | Yes |
| Filesystem-changelog incrementals | No | Partial | Yes |
| Slurm/PBS/LSF orchestration | No | No | Yes |
| Restripe-on-restore | No | No | Yes |
| HSM-as-tier | No | No | Yes |
| Cost | Free (no support) | $$$$ | ≥50% cheaper than Enterprise/Veeam |
Compatibility
- Bacula Community 15.0.3+
- Filesystems: Lustre 2.14+, IBM Spectrum Scale (GPFS) 5.x, BeeGFS 7.x, CephFS, WekaFS
- Schedulers: Slurm 22.05+, PBS Pro, LSF, OpenPBS
- Distros: RHEL/Oracle/Rocky/Alma 9.x, Debian 12+, Ubuntu 22.04+
- Architecture: x86_64 (musl static-pie binary)
- Rust toolchain: 1.95+ (build), no runtime dependency
Installation
Install via official .deb or .rpm package — no production builds required:
# RHEL / Oracle Linux / Rocky / Alma 9.x
sudo dnf install podheitor-hpc-plugin-0.1.0-1.el9.x86_64.rpm
# Optional sub-package on hosts with Lustre client:
sudo dnf install podheitor-hpc-plugin-lustre-0.1.0-1.el9.x86_64.rpm
# Debian / Ubuntu
sudo dpkg -i podheitor-hpc-plugin_0.1.0-1_amd64.deb
sudo dpkg -i podheitor-hpc-plugin-lustre_0.1.0-1_amd64.deb
Packages install libpodheitor_hpc_fd.so into /opt/bacula/plugins/, the podheitor-hpc-backend binary, the podheitor-hpc CLI, the systemd unit, and configuration examples. bacula-fd restarts automatically via post-install.
Technical whitepaper
📘 Read the full technical whitepaper — internal architecture, parallelism model, sharding, changelog drivers, restripe-on-restore, Phase 10 benchmarks and deployment topologies.
📄 Executive version (PDF): PodHeitor HPC Whitepaper PDF
Ready to switch?
Bring us your renewal or new-contract proposal from Bacula Enterprise, Veeam, Commvault or NetBackup. We commit to a minimum 50% discount, with more capabilities included.
Heitor Faria — Founder, PodHeitor International
✉ heitor@opentechs.lat
☎ +1 (786) 726-1749 · +55 (61) 98268-4220 (WhatsApp)
Free 30-day commercial trial for qualifying workloads.
Disponível em:
Português (Portuguese (Brazil))
English
Español (Spanish)