DevOps · K8s · Volleyball · Travel  •  DevOps · K8s · Volleyball · Travel  •  DevOps · K8s · Volleyball · Travel
Explore NY Stream

— LiveStream

Fix for Mksysb Restore on DVD-RAM using SAN-BOOT method on p595 in AIX

A mksysb restore on a SAN-boot LPAR fails when the bootable DVD-RAM cannot configure the SAN root disk, because the SDD PCM kernel extensions that cfgscsidisk depends on are missing from the install image's Phase 3 environment. The fix is to declare those SDD PCM files in /usr/lpp/bosinst/cdfs.optional.list so they are copied into the SPOT on the DVD, then rebuild the mksysb media. This guide walks through the root cause, the exact list entries, the rebuild and reinstall steps, and how to verify the SAN disk configures during a fresh restore.

This scenario is from the classic IBM Power 595 (p595) era running AIX 5.2/5.3 with the Subsystem Device Driver Path Control Module (SDD PCM) over Fibre Channel. The hardware and SDD PCM are end-of-life today, but the underlying lesson — making your multipath driver available to the BOS installer when restoring a SAN-boot LPAR — still matters on modern systems, so the modern equivalent is noted at the end.

The problem: mksysb restore fails on a SAN-boot LPAR

On a p595 you create a bootable mksysb on DVD-RAM to capture a master OS image, intending to clone or rebuild LPARs that boot from a SAN LUN over Fibre Channel. The restore boots fine from the optical media, the BOS menus appear, and the install begins — but the target SAN root disk (for example hdisk0) never shows up as an installable destination, or the install aborts because no rootvg disk can be configured.

Symptoms you will see in a debug boot log include a clean Phase 1, then a failure when cfgmgr tries to configure the disk in Phase 3:

  • Method: /etc/methods/cfgscsidisk not in boot image, configure in phase ...
  • Method error (/etc/methods/cfgscsidisk -2 -l hdisk0):
  • 0514-069 Failure loading the platform specific libcfg module.

That last message, 0514-069, is the giveaway: the disk-configuration method exists, but the platform-specific (SDD PCM) library it must load is not present in the running install environment.

Why the SAN disk fails to configure during install

An AIX optical install does not configure everything in one pass. cfgmgr runs in distinct phases against a stripped-down boot environment, and the install media carries two separate pieces of content:

  • The boot image (BOS boot) — a small RAM filesystem used in the very first phase. It contains a limited device-configuration set.
  • The SPOT (Shared Product Object Tree) — a fuller /usr tree that becomes available later in the install for additional device configuration.

During boot, /sbin/rc.boot populates the runtime directories by symlinking SPOT content into place. Drivers, methods and microcode are real directories rather than single links, so they are filled explicitly, conceptually like:

  1. ln -s /SPOT/usr/lib/drivers/* /usr/lib/drivers
  2. ln -s /SPOT/usr/lib/methods/* /usr/lib/methods

Here is the trap that breaks a SAN-boot restore:

  • In Phase 1, the SDD PCM kernel extensions may be reachable (for example because a custom proto entry pulled them into the boot image), so things look healthy early.
  • By Phase 3, cfgscsidisk becomes available and runs against the SAN hdisk — but the SDD PCM run-time loadable module (RTL) and kernel extensions are not present in the SPOT, so the method cannot load its platform-specific libcfg module and dies with 0514-069.

In other words, the multipath driver that AIX needs to talk to a Fibre Channel SAN disk has to be present in the same phase where the disk is actually configured. The clean answer is to make the SDD PCM files part of the SPOT on the DVD so they are available in Phase 3 — not to hack them only into the early boot image.

The fix: declare SDD PCM files in cdfs.optional.list

The contents copied into the SPOT directory on a bootable AIX CD/DVD are governed by the BOS install list files:

FileRole
/usr/lpp/bosinst/cdfs.required.listMandatory content always placed on the media.
/usr/lpp/bosinst/cdfs.optional.listOptional content to include — where you add SDD PCM.
/usr/lpp/bosinst/cdfs.optional.B.listSecondary optional list (variant/boot content).

Each entry typically uses three whitespace-separated fields: the source path, the destination path (usually identical), and the owning fileset so the installer knows where the file comes from. To put the SDD PCM kernel extensions and method into the SPOT, add these lines to /usr/lpp/bosinst/cdfs.optional.list (the fileset name reflects your installed SDD PCM level — here, the AIX 5.2 build):

  1. /usr/lib/drivers/sddpcmke /usr/lib/drivers/sddpcmke devices.sddpcm.52.rte
  2. /usr/lib/drivers/sdduserke /usr/lib/drivers/sdduserke devices.sddpcm.52.rte
  3. /usr/lib/methods/sddpcmrtl /usr/lib/methods/sddpcmrtl devices.sddpcm.52.rte

What each file does:

  • sddpcmke — the SDD PCM kernel extension (the multipath driver in the kernel).
  • sdduserke — the user-mode kernel extension component used by SDD PCM.
  • sddpcmrtl — the run-time loadable method (the platform-specific libcfg piece cfgscsidisk loads). This is the exact module whose absence triggers 0514-069.

Confirm the fileset name and level on your source system before editing, so the list entry matches reality:

  1. lslpp -l 'devices.sddpcm.*' — list the installed SDD PCM fileset and its version.
  2. ls -l /usr/lib/drivers/sddpcmke /usr/lib/drivers/sdduserke /usr/lib/methods/sddpcmrtl — verify the three files exist at those paths.

If your SDD PCM is a different release (for example a 5.3 build), substitute the matching fileset name (such as devices.sddpcm.53.rte) in the third field.

Step-by-step: rebuild the mksysb DVD and reinstall

Work on the source LPAR whose image you are capturing. Back up the list files before editing so you can roll back cleanly.

  1. Save a copy of the install list: cp /usr/lpp/bosinst/cdfs.optional.list /usr/lpp/bosinst/cdfs.optional.list.bak
  2. If you previously tried the early-boot workaround, remove the temporary proto file you created (for example cd.proto.ext.fcp.disk.sddpcm.rte) so the only change in effect is the clean SPOT-based fix.
  3. Add the three SDD PCM lines above to /usr/lpp/bosinst/cdfs.optional.list.
  4. Verify the edit: grep sddpcm /usr/lpp/bosinst/cdfs.optional.list should return all three entries.
  5. Create the new bootable mksysb DVD. Using SMIT: smitty mkcd (choose mksysb on the CD/DVD device), or run mkdvd with your device, for example mkdvd -d /dev/cd0 to write a bootable system backup.
  6. Set the LPAR bootlist to the optical drive, or use SMS to boot from the DVD: bootlist -m normal cd0 (or pick the optical device in the SMS multiboot menu).
  7. Boot the target LPAR from the new DVD and run a normal BOS install / restore. The SAN hdisk should now be offered as a valid rootvg destination, and cfgscsidisk should complete without 0514-069.

How to capture a debug boot log (HMC vterm)

If the restore still fails, capture a debug boot so you can confirm exactly which phase and method break. From an AIX client managing an HMC-attached system:

  1. Start logging and connect to the HMC: script then ssh -l hscroot <HMC>
  2. Find the managed system name: lssyscfg -r sys -F name
  3. Open a virtual terminal to the LPAR: mkvterm -m <managed_system> -p <partition_name> (everything on screen is now logged to the script session).
  4. Ensure the bootlist points at the device you want to debug (disk, optical, or tape) via bootlist or SMS.
  5. Power on and stop at the Open Firmware prompt by pressing 8 during the window of opportunity (after the beep, while the keyword banner shows), or via SMS multiboot → OK prompt.
  6. At the 0> prompt, start a debug boot. For disk or tape: boot -s trap. For optical: boot cdrom:\ppc\chrp\bootfile.exe -s trap
  7. The system stops at the kdb> debugger. Enter, each followed by Enter: mw enter_dbg, then 42, then . (a single period), then g to continue. Debug output now streams to the screen and into your log.
  8. When you have captured the failure, end the vterm with ~. and stop logging with exit. Review the captured typescript file for the failing phase and method.

Common pitfalls when fixing the SAN-boot restore

  • Patching only the boot image, not the SPOT. Forcing SDD PCM into Phase 1 via a custom proto file makes the early boot look fine but does nothing for Phase 3, so the disk still fails. The SPOT (the cdfs.optional.list route) is the correct place.
  • Wrong fileset name. If the third field does not match your actually-installed fileset (for example using .52.rte when you run a 5.3 build), the files may not be staged. Confirm with lslpp -l 'devices.sddpcm.*' first.
  • Editing the wrong list file. Putting the entries in cdfs.required.list or the wrong variant list can cause inconsistent media. Use cdfs.optional.list as described.
  • Stale media. You must rebuild the mksysb DVD after editing the list — an existing disc still has the old SPOT and will keep failing.
  • Fibre Channel / zoning not ready. Even a perfect image cannot install if the LUN is not zoned and mapped to the target LPAR's WWPNs. Verify SAN zoning and LUN masking before blaming the image.
  • Single path only. A SAN-boot LPAR should see the boot LUN on at least two paths for resilience; confirm both fabrics present the LUN once the OS is up.

Verification: confirm the SAN root disk configures

After a successful restore from the rebuilt DVD, boot the LPAR from its SAN disk and verify the multipath stack is healthy:

  1. bootinfo -b — shows the device the system booted from; it should be the SAN hdisk.
  2. lsdev -Cc disk — the boot disk should appear and be Available.
  3. lspv — confirm rootvg is on the expected SAN disk.
  4. pcmpath query device — SDD PCM should report the device with multiple paths in OPEN/NORMAL state.
  5. pcmpath query adapter — verify each Fibre Channel adapter shows active paths.
  6. lslpp -l 'devices.sddpcm.*' — confirm the SDD PCM fileset restored into the new system.

If cfgscsidisk ran cleanly and pcmpath query device shows multiple normal paths, the fix worked: the SDD PCM run-time module was present in Phase 3 and the SAN root disk configured successfully.

Modern equivalent (since p595, SDD PCM and AIX 5.x are EOL)

The IBM Power 595, SDD/SDD PCM, and AIX 5.2/5.3 are all long out of support. On current AIX (7.2/7.3) and modern Power servers, the same class of problem is handled differently:

  • Native MPIO replaces SDD PCM for most storage. AIX ships with the default PCM, and vendors supply their own PCM where needed (for example AIXPCM-style modules), so you rarely stage a third-party driver into the installer by hand.
  • NIM (Network Installation Management) is the standard way to clone and restore SAN-boot LPARs at scale. A NIM master holds the SPOT and mksysb resources, and the device support needed to configure the SAN disk is built into the SPOT — the network-based equivalent of getting the right files into the optical SPOT.
  • Live Partition Mobility, alt_disk_mksysb, and cloud/PowerVC provisioning have largely replaced DVD-RAM master images for duplicating LPARs.

The durable principle is unchanged: when you restore a system that boots from SAN, the multipath device support must be present in the exact install phase that configures the root disk. Whether that is a SPOT on a DVD, a NIM SPOT, or a vendor PCM bundled in the image, get the driver into the right place and the restore succeeds.

Key Takeaways

  • A SAN-boot mksysb restore fails with 0514-069 because cfgscsidisk can't load the SDD PCM module in Phase 3.
  • The correct fix puts the SDD PCM files in the install SPOT via /usr/lpp/bosinst/cdfs.optional.list, not just the early boot image.
  • Add sddpcmke, sdduserke, and sddpcmrtl with the matching devices.sddpcm fileset, then rebuild the DVD with mkdvd/smitty mkcd.
  • Use an HMC vterm debug boot (boot -s trap) to pinpoint the failing phase and method when the cause is unclear.
  • Verify with bootinfo -b, lspv, and pcmpath query device; on modern AIX, NIM + native MPIO handle the same need.

Frequently Asked Questions

What does error 0514-069 mean during an AIX install?

0514-069 ("Failure loading the platform specific libcfg module") means a device configuration method ran but could not load the platform-specific library it depends on. During a SAN-boot mksysb restore it almost always means the multipath driver (SDD PCM run-time module) is missing from the install phase that configures the disk.

Why does the SAN disk appear in Phase 1 but fail in Phase 3?

AIX configures devices across phases against different environments. The SDD PCM extensions may be reachable in the early boot image but absent from the SPOT used later, so cfgscsidisk succeeds early and then fails in Phase 3 when the SPOT-based library is required.

What is the SPOT and why does editing cdfs.optional.list fix it?

The SPOT (Shared Product Object Tree) is the /usr tree placed on the install media and linked into the running installer. cdfs.optional.list controls what optional content is copied into that SPOT, so adding the SDD PCM files there guarantees they are available when the SAN disk is configured.

Is this still relevant on modern AIX and Power servers?

The exact files and the p595 are EOL, but the concept applies on AIX 7.x with native MPIO and NIM: any time you restore a SAN-boot system, the multipath device support must be present in the SPOT (or NIM SPOT) so the root disk can be configured.

For more AIX, Power, and storage troubleshooting walkthroughs, subscribe on YouTube @explorenystream.