LVM Partition Redhat cluster|How to Configure Linux Cluster with 2 Nodes on RedHat and CentOS
— ny_wk
Adding an LVM partition on a RedHat or CentOS two-node cluster means safely presenting a new SAN LUN to both nodes, building the LVM stack on top of it, and laying down a cluster-aware GFS2 file system so either node can mount it. This guide walks the full workflow end to end — storage scan, backups, physical/volume/logical volume creation, the clustered file system, and a complete RedHat cluster build with fencing and failover — then closes with corrected sysadmin interview answers.
The commands below reflect the classic RHEL 6 stack (cman, ccs, ricci, luci, EMC PowerPath). That stack is end-of-life. If you are building new, use the modern equivalent described at the end (Pacemaker, Corosync and the pcs tool on RHEL 7/8/9). The legacy steps remain valuable for maintaining existing systems and for interviews that still test them.
The problem: growing storage on a clustered LVM partition
On a shared-storage cluster, you cannot just fdisk a disk and reboot. The same physical LUN is visible to two nodes at once, multipathing sits between the kernel and the array, and the file system must coordinate locks across both nodes. Get the ordering wrong and you risk corruption or a node fence. The safe sequence is always the same: back up state, scan the new LUN, build the LVM stack once, format with a cluster file system, then mount on both nodes.
Step 1: Take a pre-change backup of disk state
Before touching anything, capture the current view of disks, multipath, and LVM so you can diff before and after. Run these on the active node and store the output in a working directory.
- Record EMC PowerPath devices:
powermt display dev=all > powermt.pre - Record raw disks (excluding device-mapper):
fdisk -l 2>/dev/null | egrep "^Disk" | egrep -v "dm-" > fdisk.pre - Capture mounts and LVM metadata:
df -Ph > df.pre,pvs > pvs.pre,vgs > vgs.pre,lvs > lvs.pre - Save the mount table:
cat /etc/fstab > fstab.pre
Why this matters: after the scan you compare fdisk.pre to fdisk.post to identify exactly which device is new, instead of guessing. Never extend a volume group onto the wrong device.
Step 2: Scan the server for the newly allocated LUN
Once the storage team has zoned and masked the new LUN to your host’s HBAs, the operating system still needs to rescan. Do this without rebooting.
- List the SCSI host adapters:
ls /sys/class/scsi_host/ - Inspect the Fibre Channel HBAs and confirm the link is up:
systool -c fc_host -v(provided by thesysfsutilspackage) - In a second window, watch the kernel log for newly detected disks:
tail -f /var/log/messages - Trigger a rescan on each SCSI host (replace
host0with each real host number, run separately):echo "- - -" > /sys/class/scsi_host/host0/scan
If the LUN still does not appear, force a loop initialization on the FC host to make it re-login to the fabric:
echo "1" > /sys/class/fc_host/host0/issue_lip
Then take the post-change snapshots and reconcile with PowerPath:
fdisk -l 2>/dev/null | egrep "^Disk" | egrep -v "dm-" > fdisk.postpowermt display dev=all > powermt.post- Build the new pseudo-device and persist it:
powermt configthenpowermt save - Confirm the delta:
diff fdisk.pre fdisk.postanddiff powermt.pre powermt.post
The new device typically shows up as a PowerPath pseudo-name such as /dev/emcpowere. Always build LVM on the multipath/pseudo device, never on a single path like /dev/sde — using a single path defeats failover and risks I/O errors when a path drops.
Step 3: Build the LVM stack on the new device
With the multipathed device identified, create the LVM layers once, from a single node. LVM metadata lives on the disk itself, so the other node will see it after a rescan.
- Initialize the physical volume:
pvcreate /dev/emcpowere - Create the volume group:
vgcreate ny01vg /dev/emcpowere
(To grow an existing VG instead, usevgextend ny01vg /dev/emcpowerf.) - Create the logical volume:
lvcreate -L 980G -n ny01vol ny01vg
For a clustered volume group on the legacy stack, the volume group must carry the cluster flag and clvmd must be running on both nodes. Mark it clustered with vgchange -cy ny01vg and confirm with vgs -o vg_name,vg_attr (the attribute string shows a c for clustered).
Step 4: Create the GFS2 cluster file system
A standard ext4 or xfs file system cannot be mounted read-write on two nodes at once — doing so destroys the data. For shared concurrent access you need a cluster file system. GFS2 with the distributed lock manager is the RedHat answer.
mkfs.gfs2 -p lock_dlm -t nylinux-clus1:nyata -j 8 /dev/ny01vg/ny01vol
Decode the flags so you do not get them wrong:
- -p lock_dlm — use the distributed lock manager for a real cluster. (
lock_nolockis only for a single, non-clustered host.) - -t nylinux-clus1:nyata — the lock table name in the form
ClusterName:FSName. The cluster name must match/etc/cluster/cluster.confexactly, or the mount fails. - -j 8 — the number of journals. You need at least one journal per node that will mount the file system; provisioning extra journals up front saves a later
gfs2_jadd.
Step 5: Mount the file system on both nodes
Create the mount point and mount the device on the first node.
mkdir -p /oracle/ny01mount /dev/ny01vg/ny01vol /oracle/ny01- Confirm the active mount and copy its line for persistence:
cat /proc/mounts | grep ny01
Add a matching entry to /etc/fstab on both nodes so it survives reboot. For GFS2 the file system must be brought up by the cluster’s gfs2 service, so do not rely on the default boot order:
/dev/mapper/ny01vg-ny01vol /oracle/ny01 gfs2 defaults,noatime 0 0
On the second node (ny02), make LVM and the file system visible without rebuilding anything:
- Rescan LVM metadata:
vgscanthenlvscan - Activate if needed:
vgchange -ay ny01vg - Mount the same device and verify:
mount /dev/ny01vg/ny01vol /oracle/ny01thendf -h
Configuring a RedHat 2-node cluster (active-standby)
The LVM and GFS2 work above assumes a working cluster underneath. In an active-standby RedHat cluster, critical resources — a floating IP, a file system, and a service such as Apache — fail over from one node to the other automatically. Below is the legacy RHEL 6 build using ccs. The high-level flow is: install packages, start ricci, create the cluster, add nodes, add fencing, define a failover domain, add resources, sync, and start.
1. Install and verify cluster packages
On both nodes, confirm the cluster stack is present:
rpm -qa | egrep -i "ricci|luci|cluster|ccs|cman"
If anything is missing, install it with yum install ccs ricci cman (and luci on the management node only). cman is the cluster manager, ccs is the command-line config tool, ricci is the per-node agent, and luci is the optional web UI.
2. Start ricci and set its password
On both nodes, start the agent and give the ricci user a password — ccs authenticates to each node as this user.
service ricci start(andchkconfig ricci on)passwd ricci
If iptables is active, open the cluster ports on both nodes (ricci on TCP 11111, plus the cman/dlm ports) so the nodes can talk.
3. Create the cluster and add nodes
Run these only from the active node. ccs writes /etc/cluster/cluster.conf.
- Create the cluster:
ccs -h rh1.mydomain.net --createcluster mycluster - Add the first node:
ccs -h rh1.mydomain.net --addnode rh1.mydomain.net - Add the second node:
ccs -h rh1.mydomain.net --addnode rh2.mydomain.net - List nodes and IDs:
ccs -h rh1 --lsnodes
4. Add fencing — do not skip this
Fencing isolates a misbehaving node from shared storage so it cannot corrupt data. It can power the node off via a remote switch, disable a Fibre Channel switch port, or revoke the node’s SCSI-3 reservations. A fence agent is the software that drives a fence device. A cluster without working fencing is not supported and will eventually corrupt your GFS2 data.
| Action | Command |
| Set fence daemon delays | ccs -h rh1 --setfencedaemon post_fail_delay=0 post_join_delay=25 |
| Add a fence device (VM lab) | ccs -h rh1 --addfencedev myfence agent=fence_virt |
| Create a fence method per node | ccs -h rh1 --addmethod mthd1 rh1.mydomain.net |
| Bind device to method | ccs -h rh1 --addfenceinst myfence rh1.mydomain.net mthd1 |
For real hardware use the correct agent — fence_ipmilan for IPMI/iDRAC/iLO, fence_apc for an APC PDU, or a SAN-based agent. fence_virt is only for KVM lab clusters.
5. Configure the failover domain
A failover domain is the ordered subset of nodes a service may run on. The four behaviors you must know:
- Restricted — the service may run only on members of the domain; if none are available it stays stopped.
- Unrestricted — the service may run anywhere but prefers domain members when one is online.
- Ordered — nodes get a priority (1 = highest); the service runs on the highest-priority online node and migrates back when it returns.
- Unordered — no preference among domain members.
Create an ordered domain and add both nodes with priorities:
ccs -h rh1 --addfailoverdomain webserverdomain orderedccs -h rh1 --addfailoverdomainnode webserverdomain rh1.mydomain.net priority=1ccs -h rh1 --addfailoverdomainnode webserverdomain rh2.mydomain.net priority=2
6. Add resources, sync, and start
Add a shared file system as a global resource, then bundle it with an IP and a service into a service group:
- Global file system resource:
ccs -h rh1 --addresource fs name=web_fs device=/dev/cluster_vg/vol01 mountpoint=/var/www fstype=ext4 - Create the service:
ccs -h rh1 --addservice webservice1 domain=webserverdomain recovery=relocate autostart=1 - Push the config to every node:
ccs -h rh1 --sync --activate - Start the cluster everywhere:
ccs -h rh1 --startall
Common pitfalls
- Building LVM on a single path instead of the PowerPath/multipath device — failover breaks the moment a path drops.
- Formatting shared storage with ext4/xfs and mounting on two nodes — guaranteed corruption. Use GFS2 with
lock_dlm. - Wrong cluster name in the GFS2 lock table (
-t Cluster:FS) — the mount silently fails. - Too few GFS2 journals — a node cannot mount if there is no journal for it; size
-jfor current and future nodes. - No fencing or untested fencing — an unfenced split brain corrupts data; always test it.
- Shrinking a file system in the wrong order — for ext,
resize2fsfirst, thenlvreduce; reversing them destroys data.
Verification
- Confirm the LVM stack:
pvs,vgs,lvsshow the new PV/VG/LV. - Confirm the mount on both nodes:
df -hT /oracle/ny01reports typegfs2. - Confirm cluster health:
clustatshows both nodesOnlineand the servicestarted. - Test failover: shut the active node down and confirm the floating IP, file system, and service relocate — this is the only proof the cluster actually works.
- Compare pre/post snapshots:
df -Ph > df.postand diff against the pre files to document the change.
LVM interview questions and answers
These are the LVM and storage questions that come up most in Linux system admin interviews, with corrected, accurate answers.
What is the difference between LVM1 and LVM2, and the maximum LV size?
LVM1 shipped with 2.4-series kernels; LVM2 uses the kernel device-mapper driver in 2.6 and later and is what every modern distribution uses. Maximum logical volume size depends on kernel and architecture: roughly 2 TB on old 2.4 kernels, up to 16 TB for 32-bit on 2.6, and up to 8 EB for 64-bit on 2.6+. On current 64-bit systems the practical ceiling is enormous, so the file system and array limits matter more than LVM itself.
What are the steps to create an LVM volume?
On a local disk: partition with fdisk /dev/sdb, create a partition, set its type to 8e (Linux LVM) with the t command, and write with w. Then run, in order: pvcreate /dev/sdb1, vgcreate vg0 /dev/sdb1, lvcreate -L 1G -n lv1 vg0, create a file system with mkfs.ext4 /dev/vg0/lv1, and mount it with mkdir /test && mount /dev/vg0/lv1 /test. (On modern systems prefer mkfs.ext4 or mkfs.xfs over the obsolete mke2fs -j.)
How do you extend and reduce a file system?
To extend (online, safe): check free space with vgdisplay -v vg1, grow the LV with lvextend -L +1G /dev/vg1/lvol1, then grow the file system — resize2fs /dev/vg1/lvol1 for ext or xfs_growfs /mountpoint for XFS. To reduce (offline, risky — XFS cannot shrink at all): for ext, shrink the file system first with resize2fs /dev/vg1/lvol1 3G, then shrink the LV with lvreduce -L 3G /dev/vg1/lvol1. Doing it in the wrong order truncates live data.
How do you add a new LUN from storage to a Linux server?
List the HBAs and current disks with ls /sys/class/fc_host and fdisk -l 2>/dev/null | egrep "^Disk" | egrep -v "dm-" | wc -l. Rescan each HBA: echo "1" > /sys/class/fc_host/host0/issue_lip and echo "- - -" > /sys/class/scsi_host/host0/scan. Confirm the count grew, then add the device to LVM with pvcreate /dev/<path> and vgextend vg1 /dev/<path>, verifying with vgs.
How do you resize the root file system on RHEL 6?
Shrinking / must be done offline. Boot into rescue mode and choose to skip mounting, then: activate LVM with lvm vgchange -ay, check the file system with e2fsck -f /dev/vg00/lv_root, shrink it with resize2fs -f /dev/vg00/lv_root 20G, shrink the LV with lvreduce -L 20G /dev/vg00/lv_root, re-run e2fsck -f, then exit and reboot. Order is everything: file system before LV.
How do you tell whether a server uses PowerPath, multipath, or LVM RAID?
EMC PowerPath: rpm -qa | grep -i emc, status via powermt display dev=all; VG disks look like /dev/emcpowera. Native multipath: ls -l /dev/mapper, multipath -ll, daemon check ps -ef | grep multipathd; VG disks look like /dev/mapper/mpath0. LVM on software RAID: cat /proc/mdstat or mdadm --detail /dev/md0, and vgdisplay -v vg01 shows members like /dev/md1.
The modern equivalent (RHEL 7, 8 and 9)
The cman/ccs/ricci/luci stack above is end-of-life — RHEL 6 reached end of maintenance in 2020. On RHEL 7 and later the cluster is built on Pacemaker (resource manager) and Corosync (messaging), driven by the pcs command and the pcsd daemon, with the web UI on port 2224. The conceptual model is identical — nodes, fencing (now called STONITH), constraints replacing failover domains, and resource agents — but the commands change to pcs cluster setup, pcs stonith create, pcs resource create, and pcs status. GFS2 still exists for shared storage and now pairs with LVM’s lvmlockd rather than clvmd. EMC PowerPath has largely given way to the kernel’s native device-mapper-multipath on current arrays. Learn the legacy stack for existing systems and interviews, but build anything new on Pacemaker.
Key Takeaways
- Always bracket a storage change with pre/post backups (
fdisk,powermt,pvs/vgs/lvs) and confirm the new device withdiff. - Build the LVM stack on the multipath/PowerPath pseudo-device, never on a single path.
- Shared concurrent storage needs a cluster file system — use GFS2 with
lock_dlm, the correctCluster:FSlock table, and one journal per node. - A RedHat cluster without working, tested fencing (STONITH) is unsupported and will corrupt data.
- For new builds use Pacemaker/Corosync with
pcson RHEL 7+; the legacycman/ccsstack is end-of-life.
Frequently Asked Questions
Can I mount the same LVM volume on two cluster nodes at once?
Only with a cluster-aware file system. A normal ext4 or XFS volume mounted read-write on two nodes will corrupt instantly. Use GFS2 with the distributed lock manager (-p lock_dlm) and a clustered volume group so both nodes coordinate locks.
Why does my GFS2 mount fail with a lock table error?
The lock table name in mkfs.gfs2 -t Cluster:FS must match your actual cluster name in /etc/cluster/cluster.conf (or the Pacemaker cluster name) exactly, and the cluster and DLM must be running before you mount. A mismatch or a stopped cluster is the usual cause.
How do I extend an LVM file system without downtime?
Extending is online and safe. Grow the logical volume with lvextend, then grow the file system in place — resize2fs for ext4 or xfs_growfs for XFS. You can combine both with lvextend -r, which resizes the file system automatically.
Is EMC PowerPath still required for multipathing?
No. PowerPath is an EMC/Dell product, but the Linux kernel’s native device-mapper-multipath handles multipathing on virtually all modern arrays. New deployments typically standardize on native multipath unless the array vendor specifically requires PowerPath.
For more Linux administration and certification walkthroughs, subscribe to the YouTube channel @explorenystream.