Technical whitepaper — PodHeitor Nutanix AHV for Bacula - PodHeitor

VM-level backup of Nutanix AHV via Prism v3+v4 with native Changed Regions Tracking, vendor-neutral PHCBT01 replication, inbound cross-hypervisor restore from Proxmox/vSphere/Hyper-V, and disk-only / alternate-cluster restore — all absent from the Bacula Enterprise 18.2.3 AHV plugin.

Companion technical document to the PodHeitor Nutanix plugin page.

1. Gaps in the Bacula Enterprise Nutanix plugin

Bacula Enterprise 18.2.3 ships a Nutanix AHV plugin — JVM-based, Prism v2/v3, no v4 CRT, no cross-restore, no vendor-neutral replication, no disk-only restore, no alternate-cluster restore. For customers running pc.2024.3+ who want multi-vendor DR, that leaves four large operational gaps:

No v4 CRT. The compute-changed-regions API in pc.2024.3+ is faster and more granular than the legacy v3 changed_regions. BEE doesn’t consume it.
No cross-restore. AHV backups restore only on AHV. Leaving Nutanix requires manual V2V.
Replication coupled to Nutanix Protection Policies. Doesn’t work cross-vendor.
JVM latency. GC pauses during streaming of large disks are measurable.

The PodHeitor Nutanix Plugin is a Rust sibling of podheitor-proxmox, podheitor-vsphere and podheitor-hyperv — reusing their on-wire formats byte-for-byte so restores are fully cross-compatible.

2. Architecture — two-process Rust

The plugin follows the PodHeitor pattern: cdylib (in this case a C++ shim of ~120 LOC, constants-only, linked against the metaplugin framework) + standalone Rust backend, communicating via PTCOMM length-tagged framing on stdin/stdout.

┌──────────────────────────────────────────────────────────────────────────┐
│                      Bacula File Daemon (bacula-fd)                       │
│  ┌──────────────────────────────────────────────────────────────────┐    │
│  │  podheitor-nutanix-fd.so  (metaplugin C++ shim, ~120 LOC)        │    │
│  │  - PLUGINNAMESPACE="@nutanix"                                    │    │
│  └──────────────────────────────────────────────────────────────────┘    │
│                              │ PTCOMM over pipe (stdin/stdout)            │
└──────────────────────────────┼────────────────────────────────────────────┘
                               ▼
┌──────────────────────────────────────────────────────────────────────────┐
│              podheitor-nutanix-backend  (Rust binary)                     │
│  ┌──────────────┬──────────────┬──────────────┬──────────────────────┐   │
│  │ prism v3/v4  │ snapshot     │ iscsi        │ disk_reader          │   │
│  │ REST client  │ RAII guard   │ attach/detach│ (O_DIRECT /dev/sdX)  │   │
│  ├──────────────┼──────────────┼──────────────┼──────────────────────┤   │
│  │ crt (CBT)    │ backup.rs    │ restore.rs   │ replication.rs       │   │
│  └──────────────┴──────────────┴──────────────┴──────────────────────┘   │
└────┬─────────────────────────┬────────────────────────┬───────────────────┘
     │ HTTPS (9440)            │ iSCSI (DSIP:3260)      │ TLS (9848)
     ▼                         ▼                        ▼
┌──────────────┐       ┌──────────────────┐     ┌──────────────────┐
│ Prism Central│       │ Nutanix CVMs     │     │  DR Receiver     │
└──────────────┘       └──────────────────┘     └──────────────────┘

2.1 Two deployment modes, one binary

Mode	Where backend runs	When to choose
`proxy_mode=external`	FD host outside the AHV cluster	Default — requires network route to Prism:9440 + DSIP:3260 and `open-iscsi` on FD host
`proxy_mode=in_cluster`	Linux VM inside the AHV cluster	Maximum throughput: data plane is local virtual-NIC to DSIP

3. Full backup — step by step

PTCOMM handshake; receive JobInfo + Plugin params.
Cluster discovery: PC v4 with v3 fallback, returns PE IP + 15-min JWT (cookie NTNX_IGW_SESSION).
POST /api/nutanix/v3/vms/{uuid}/snapshot (or v4 equivalent). RAII SnapshotGuard guarantees delete on drop.
Clone snapshot disks to a temporary Volume Group (Prism API).
Attach VG to proxy/FD via iSCSI: iscsiadm -m discovery + iscsiadm -m node --login.
Enumerate /dev/disk/by-path/... via sysfs scan; map disk ↔ block device by LUN.
Emit FNAME packets: @nutanix/<cluster>/<vm-uuid>/vm-metadata.json, @nutanix/.../disks/disk-<idx>-<id>.raw.
Stream bytes via O_DIRECT 1 MiB / 4 KiB-aligned reads → D-packets.
Logout iSCSI, delete VG, delete snapshot (unless it is the CBT reference).
Persist CBT state: reference_recovery_point_ext_id (v4) or snapshot_uuid (v3) at /var/lib/podheitor-nutanix/bitmap/<cluster>/<vm-uuid>.json.
PTCOMM F (end-of-data), wait FD ack, T (terminate).

4. Incremental — v4 CRT with v3 fallback

Step 5 of the full flow becomes:

Create new snapshot (current). Keep previous reference snapshot.
Per disk: call compute-changed-regions (v4) or /data/changed_regions (v3) with (reference, current).
Paginate: up to 10,000 regions per response, follow nextOffset until exhausted.
Emit changed extents in PHCBT01 format (magic + original_size + region_count + {offset, length, data} × N) over D-packets.
Hybrid path: dense extents → iSCSI attach + read; sparse extents → REST range-GET. Density heuristic.
On success: rotate. Delete old reference, promote current to new reference.

5. RAII guards — deterministic cleanup

Orphan Nutanix snapshot = capacity leak in the cluster. Orphan VG = zombie LUN target. Open iSCSI session = zombie device file on the FD. The backend solves it via reverse Rust Drop chain:

Guard	Resource	Drop action
`SnapshotGuard`	Prism recovery point	Delete via API
`VolumeGroupGuard`	Temporary VG	Delete via API
`IscsiSessionGuard`	iSCSI session	`iscsiadm -m node --logout`
`CleanupGuard`	Catch-all	Catches Drop panics; logs errors but never propagates

Drop ordering is load-bearing: iSCSI logout → VG delete → snapshot delete. Rust’s reverse-declaration drop order gives that for free, but inserting a guard between existing ones silently swaps the order — convention is encoded as a comment in back_up_vm.

6. Inbound cross-restore — Proxmox/vSphere/Hyper-V → AHV

Detection at FNAME scan (first file of the job):

FNAME prefix	Pipeline
`@proxmox/<vmid>/disks/*.raw`	raw → qcow2 (`qemu-img convert`) → Image Service upload
`@vsphere/<vm>/disks/*.vmdk`	vmdk → qcow2 → Image Service upload
`@hyperv/<vm>/disks/*.vhdx`	vhdx → qcow2 → Image Service upload

VM config (.conf / .vmx / .vmcx XML) is translated to AHV vm_spec_v3:

Source field	AHV target
Memory (MB)	`resources.memory_size_mib`
vCPUs	`num_sockets` × `num_vcpus_per_socket`
Firmware BIOS/UEFI	`resources.boot_config.boot_type`
SCSI/IDE disks	`disk_list[].device_properties.device_type=DISK`, `adapter_type=SCSI`
NIC MAC+VLAN	`nic_list[].mac_address` + `subnet_reference` via `network_map`

7. Vendor-neutral replication (PHCBT01 over TLS)

Unlike Nutanix Protection Policies, which require Nutanix at both ends, PodHeitor replication operates over PHCBT01-over-TLS on port 9848 — same format as the Proxmox/vSphere/Hyper-V plugins. The receiver can be Nutanix, Proxmox, or any host with a PodHeitor peer FD.

Seed: initial full via the same backup path, marked as reference.
Bitmap-push: periodic cycles read CRT delta, send via TLS to receiver.
Failover modes: planned, unplanned, test, undo, permanent.

8. Disk-only and alternate-cluster restore

Two modes absent from BEE AHV v18:

Disk-only restore: restores only disk-N.raw to an arbitrary device path on the FD host (no Image Service upload, no VM creation). Use case: forensics, single-file recovery via manual mount.
Alternate-cluster restore: target_cluster param redirects restore to a different PE than the source. Combine with restore_vm_name= and network_map= to avoid IP collision.

9. Documented anti-patterns

Don’t use prism_insecure=true in production. It exists for Nutanix CE (default self-signed cert). Import the PC CA into the FD trust store instead of bypassing.
Don’t invert guard Drop order. Snapshot delete before iSCSI logout leaves zombie device files on the FD.
Don’t run proxy_mode=external without explicit dsip=. The fallback to cluster_name works in lab but is fragile in production (DNS, multi-DSIP).
Don’t run replication against a cluster with Protection Policies active on the same VM. Snapshot conflict; current flow doesn’t auto-detect.

10. License posture

The plugin ships under LicenseRef-PodHeitor-Proprietary. The backend is a standalone Rust binary — no Bacula AGPLv3 source is statically linked. The C++ shim is minimal (~120 LOC, constants-only) and dynamically links against the Bacula metaplugin framework.

Ready to evaluate?

Free 30-day trial for Nutanix AHV clusters (Prism Central pc.2024.3+ recommended, pc.2023.x supported via v3 fallback). We guarantee at least 50% discount vs Bacula Enterprise, Veeam or Commvault, with cross-restore and vendor-neutral replication included.

Heitor Faria — Founder, PodHeitor International
✉ [email protected]
☎ +1 (789) 726-1749 · +55 (61) 98268-4220 (WhatsApp)
🔗 PodHeitor Nutanix plugin page

Disponível em: Português (Portuguese (Brazil))EnglishEspañol (Spanish)