Technical whitepaper — PodHeitor Nutanix AHV for Bacula

VM-level backup of Nutanix AHV via Prism v3+v4 with native Changed Regions Tracking, vendor-neutral PHCBT01 replication, inbound cross-hypervisor restore from Proxmox/vSphere/Hyper-V, and disk-only / alternate-cluster restore — all absent from the Bacula Enterprise 18.2.3 AHV plugin.

Companion technical document to the PodHeitor Nutanix plugin page.

1. Gaps in the Bacula Enterprise Nutanix plugin

Bacula Enterprise 18.2.3 ships a Nutanix AHV plugin — JVM-based, Prism v2/v3, no v4 CRT, no cross-restore, no vendor-neutral replication, no disk-only restore, no alternate-cluster restore. For customers running pc.2024.3+ who want multi-vendor DR, that leaves four large operational gaps:

  1. No v4 CRT. The compute-changed-regions API in pc.2024.3+ is faster and more granular than the legacy v3 changed_regions. BEE doesn’t consume it.
  2. No cross-restore. AHV backups restore only on AHV. Leaving Nutanix requires manual V2V.
  3. Replication coupled to Nutanix Protection Policies. Doesn’t work cross-vendor.
  4. JVM latency. GC pauses during streaming of large disks are measurable.

The PodHeitor Nutanix Plugin is a Rust sibling of podheitor-proxmox, podheitor-vsphere and podheitor-hyperv — reusing their on-wire formats byte-for-byte so restores are fully cross-compatible.

2. Architecture — two-process Rust

The plugin follows the PodHeitor pattern: cdylib (in this case a C++ shim of ~120 LOC, constants-only, linked against the metaplugin framework) + standalone Rust backend, communicating via PTCOMM length-tagged framing on stdin/stdout.

┌──────────────────────────────────────────────────────────────────────────┐
│                      Bacula File Daemon (bacula-fd)                       │
│  ┌──────────────────────────────────────────────────────────────────┐    │
│  │  podheitor-nutanix-fd.so  (metaplugin C++ shim, ~120 LOC)        │    │
│  │  - PLUGINNAMESPACE="@nutanix"                                    │    │
│  └──────────────────────────────────────────────────────────────────┘    │
│                              │ PTCOMM over pipe (stdin/stdout)            │
└──────────────────────────────┼────────────────────────────────────────────┘
                               ▼
┌──────────────────────────────────────────────────────────────────────────┐
│              podheitor-nutanix-backend  (Rust binary)                     │
│  ┌──────────────┬──────────────┬──────────────┬──────────────────────┐   │
│  │ prism v3/v4  │ snapshot     │ iscsi        │ disk_reader          │   │
│  │ REST client  │ RAII guard   │ attach/detach│ (O_DIRECT /dev/sdX)  │   │
│  ├──────────────┼──────────────┼──────────────┼──────────────────────┤   │
│  │ crt (CBT)    │ backup.rs    │ restore.rs   │ replication.rs       │   │
│  └──────────────┴──────────────┴──────────────┴──────────────────────┘   │
└────┬─────────────────────────┬────────────────────────┬───────────────────┘
     │ HTTPS (9440)            │ iSCSI (DSIP:3260)      │ TLS (9848)
     ▼                         ▼                        ▼
┌──────────────┐       ┌──────────────────┐     ┌──────────────────┐
│ Prism Central│       │ Nutanix CVMs     │     │  DR Receiver     │
└──────────────┘       └──────────────────┘     └──────────────────┘

2.1 Two deployment modes, one binary

Mode Where backend runs When to choose
proxy_mode=external FD host outside the AHV cluster Default — requires network route to Prism:9440 + DSIP:3260 and open-iscsi on FD host
proxy_mode=in_cluster Linux VM inside the AHV cluster Maximum throughput: data plane is local virtual-NIC to DSIP

3. Full backup — step by step

  1. PTCOMM handshake; receive JobInfo + Plugin params.
  2. Cluster discovery: PC v4 with v3 fallback, returns PE IP + 15-min JWT (cookie NTNX_IGW_SESSION).
  3. POST /api/nutanix/v3/vms/{uuid}/snapshot (or v4 equivalent). RAII SnapshotGuard guarantees delete on drop.
  4. Clone snapshot disks to a temporary Volume Group (Prism API).
  5. Attach VG to proxy/FD via iSCSI: iscsiadm -m discovery + iscsiadm -m node --login.
  6. Enumerate /dev/disk/by-path/... via sysfs scan; map disk ↔ block device by LUN.
  7. Emit FNAME packets: @nutanix/<cluster>/<vm-uuid>/vm-metadata.json, @nutanix/.../disks/disk-<idx>-<id>.raw.
  8. Stream bytes via O_DIRECT 1 MiB / 4 KiB-aligned reads → D-packets.
  9. Logout iSCSI, delete VG, delete snapshot (unless it is the CBT reference).
  10. Persist CBT state: reference_recovery_point_ext_id (v4) or snapshot_uuid (v3) at /var/lib/podheitor-nutanix/bitmap/<cluster>/<vm-uuid>.json.
  11. PTCOMM F (end-of-data), wait FD ack, T (terminate).

4. Incremental — v4 CRT with v3 fallback

Step 5 of the full flow becomes:

  1. Create new snapshot (current). Keep previous reference snapshot.
  2. Per disk: call compute-changed-regions (v4) or /data/changed_regions (v3) with (reference, current).
  3. Paginate: up to 10,000 regions per response, follow nextOffset until exhausted.
  4. Emit changed extents in PHCBT01 format (magic + original_size + region_count + {offset, length, data} × N) over D-packets.
  5. Hybrid path: dense extents → iSCSI attach + read; sparse extents → REST range-GET. Density heuristic.
  6. On success: rotate. Delete old reference, promote current to new reference.

5. RAII guards — deterministic cleanup

Orphan Nutanix snapshot = capacity leak in the cluster. Orphan VG = zombie LUN target. Open iSCSI session = zombie device file on the FD. The backend solves it via reverse Rust Drop chain:

Guard Resource Drop action
SnapshotGuard Prism recovery point Delete via API
VolumeGroupGuard Temporary VG Delete via API
IscsiSessionGuard iSCSI session iscsiadm -m node --logout
CleanupGuard Catch-all Catches Drop panics; logs errors but never propagates

Drop ordering is load-bearing: iSCSI logout → VG delete → snapshot delete. Rust’s reverse-declaration drop order gives that for free, but inserting a guard between existing ones silently swaps the order — convention is encoded as a comment in back_up_vm.

6. Inbound cross-restore — Proxmox/vSphere/Hyper-V → AHV

Detection at FNAME scan (first file of the job):

FNAME prefix Pipeline
@proxmox/<vmid>/disks/*.raw raw → qcow2 (qemu-img convert) → Image Service upload
@vsphere/<vm>/disks/*.vmdk vmdk → qcow2 → Image Service upload
@hyperv/<vm>/disks/*.vhdx vhdx → qcow2 → Image Service upload

VM config (.conf / .vmx / .vmcx XML) is translated to AHV vm_spec_v3:

Source field AHV target
Memory (MB) resources.memory_size_mib
vCPUs num_sockets × num_vcpus_per_socket
Firmware BIOS/UEFI resources.boot_config.boot_type
SCSI/IDE disks disk_list[].device_properties.device_type=DISK, adapter_type=SCSI
NIC MAC+VLAN nic_list[].mac_address + subnet_reference via network_map

7. Vendor-neutral replication (PHCBT01 over TLS)

Unlike Nutanix Protection Policies, which require Nutanix at both ends, PodHeitor replication operates over PHCBT01-over-TLS on port 9848 — same format as the Proxmox/vSphere/Hyper-V plugins. The receiver can be Nutanix, Proxmox, or any host with a PodHeitor peer FD.

  • Seed: initial full via the same backup path, marked as reference.
  • Bitmap-push: periodic cycles read CRT delta, send via TLS to receiver.
  • Failover modes: planned, unplanned, test, undo, permanent.

8. Disk-only and alternate-cluster restore

Two modes absent from BEE AHV v18:

  • Disk-only restore: restores only disk-N.raw to an arbitrary device path on the FD host (no Image Service upload, no VM creation). Use case: forensics, single-file recovery via manual mount.
  • Alternate-cluster restore: target_cluster param redirects restore to a different PE than the source. Combine with restore_vm_name= and network_map= to avoid IP collision.

9. Documented anti-patterns

  • Don’t use prism_insecure=true in production. It exists for Nutanix CE (default self-signed cert). Import the PC CA into the FD trust store instead of bypassing.
  • Don’t invert guard Drop order. Snapshot delete before iSCSI logout leaves zombie device files on the FD.
  • Don’t run proxy_mode=external without explicit dsip=. The fallback to cluster_name works in lab but is fragile in production (DNS, multi-DSIP).
  • Don’t run replication against a cluster with Protection Policies active on the same VM. Snapshot conflict; current flow doesn’t auto-detect.

10. License posture

The plugin ships under LicenseRef-PodHeitor-Proprietary. The backend is a standalone Rust binary — no Bacula AGPLv3 source is statically linked. The C++ shim is minimal (~120 LOC, constants-only) and dynamically links against the Bacula metaplugin framework.

Ready to evaluate?

Free 30-day trial for Nutanix AHV clusters (Prism Central pc.2024.3+ recommended, pc.2023.x supported via v3 fallback). We guarantee at least 50% discount vs Bacula Enterprise, Veeam or Commvault, with cross-restore and vendor-neutral replication included.

Heitor Faria — Founder, PodHeitor International
[email protected]
☎ +1 (789) 726-1749 · +55 (61) 98268-4220 (WhatsApp)
🔗 PodHeitor Nutanix plugin page

Disponível em: pt-brPortuguês (Portuguese (Brazil))enEnglishesEspañol (Spanish)

Leave a Reply