Increasing the size of an LVM Physical Volume (PV) while running multipathd — without rebooting

If you’re using the Linux Logical Volume Manager (LVM) to manage your disk space it’s easy to enlarge a logical volume while a server is up and running. It’s also easy to add new drives to an existing volume group.

But if you’re using a SAN the underlying physical drives can have different performance characteristics because they’re assigned to different QOS bands on the SAN. If you want to keep performance optimized it’s important to know what physical volume a logical volume is assigned to — otherwise you can split a single logical volume across multiple physical volumes and end up degrading system performance. If you run out of space on a physical volume and then enlarge a logical volume you will split the LV across two or more PVs. To prevent this from happening you need to enlarge the LUN, tell multipathd about the change, then enlarge the PV, then enlarge the LV, and finally enlarge the file system.

I have three SANs at the company where I work (two Pillar Axioms and a Xyratex) which are attached two two fibrechannel switches and several racks of blade servers. Each blade is running an Oracle database with multiple physical volumes (PVs) grouped together into a single LVM. The PVs are tagged and as logical volumes (LVs) are added they’re assigned to the base physical volume with the same tag name as the logical volume. That way we can assign the PV to a higher or lower performance band on the SAN and optimize the database’s performance. Oracle tablespaces that contain frequently-accessed data get assigned to a PV with a higher QOS band on the SAN. Archival data gets put on a PV with a lower QOS band.

We run OpenSUSE 11.x using multipathd to deal with the multiple fiber paths available between each blade and a SAN. Since each blade has 2 fiber ports for redundancy, which are attached to two fiber switches, each of which is cross-connected to 2 ports on 2 different controllers on the SAN, so there are 4 different fiber paths that data can take between the blade and the SAN. If any path fails, or one port on a fiber card fails, or one fiber switch fails, multipathd re-routes the data using the remaining data paths and everything keeps working. If a blade fails we switch to another blade.

If we run out of space on a PV I can log into the SAN’s administrative interface and enlarge the size of the underlying LUN, but getting the operating system on the blade to recognize the fact that more physical disk space was available is tricky. LVM’s pvresize command would claim that it was enlarging the PV, but nothing would happen unless the server was rebooted and then pvresize was run again. I wanted to be able to enlarge physical volumes without taking a database off-line and rebooting its server. Here’s how I did it:

  • First log into the SAN’s administrative interface and enlarge the LUN in question.
  • Open two xterm windows on the host as root
  • Gather information – you will need the physical device name, the multipath block device names, and the multipath map name. (Since our setup gives us 4 data paths for each LUN there are 4 multipath block device names.)
  • List the physical volumes and their associated tags with pvs -o +tags:
    # pvs -o +tags
      PV         VG     Fmt  Attr PSize   PFree   PV Tags                
      /dev/dm-1  switch lvm2 a-   500.38G 280.38G db024-lindx,lindx      
      /dev/dm-10 switch lvm2 a-     1.95T 801.00G db024-ldata,ldata      
      /dev/dm-11 switch lvm2 a-    81.50G      0  db024-mindx,mindx      
      /dev/dm-12 switch lvm2 a-   650.00G 100.00G db024-reports,reports  
      /dev/dm-13 switch lvm2 a-    51.25G  31.25G db024-log,log          
      /dev/dm-14 switch lvm2 a-   450.12G  50.12G db024-home,home        
      /dev/dm-15 switch lvm2 a-     1.76T 342.00G db024-q_backup,q_backup
      /dev/dm-16 switch lvm2 a-     1.00G 640.00M db024-control,control  
      /dev/dm-2  switch lvm2 a-   301.38G 120.38G db024-dbs,dbs          
      /dev/dm-3  switch lvm2 a-   401.88G 101.88G db024-cdr_data,cdr_data
      /dev/dm-5  switch lvm2 a-   450.62G 290.62G db024-archlogs,archlogs
      /dev/dm-6  switch lvm2 a-    40.88G  22.50G db024-boot,boot        
      /dev/dm-7  switch lvm2 a-    51.25G   1.25G db024-rbs,rbs          
      /dev/dm-8  switch lvm2 a-    51.25G  27.25G db024-temp,temp        
      /dev/dm-9  switch lvm2 a-   201.38G 161.38G db024-summary,summary
  • Find the device that corresponds to the LUN you just enlarged, e.g. /dev/dm-11
  • Run multipath -ll, find the device name in the listing. The large hex number at the start of the line is the multipath map name and the sdX block devices after the device name are the multipath block devices. So in this example the map name is 2000b080112002142 and the block devices are sdy, sdan, sdj, and sdbc:
    2000b080112002142 dm-11 Pillar,Axiom 500                 
    [size=82G][features=1 queue_if_no_path][hwhandler=0][rw] 
    \_ round-robin 0 [prio=100][active]                      
     \_ 0:0:5:9  sdy        65:128 [active][ready]           
     \_ 1:0:4:9  sdan       66:112 [active][ready]           
    \_ round-robin 0 [prio=20][enabled]                      
     \_ 0:0:4:9  sdj        8:144  [active][ready]           
     \_ 1:0:5:9  sdbc       67:96  [active][ready]
  • Next get multipath to recognize that the device is larger:
    • For each block device do echo 1 > /sys/block/sdX/device/rescan:
      # echo 1 > /sys/block/sdy/device/rescan
      # echo 1 > /sys/block/sdan/device/rescan
      # echo 1 > /sys/block/sdj/device/rescan
      # echo 1 > /sys/block/sdbc/device/rescan
    • In the second root window, pull up a multipath command line with multipathd -k
    • Delete and re-add the first block device from each group. Since multipathd provides multiple paths to the underlying SAN, the device will remain up and on-line during this process. Make sure that you get an ‘ok’ after each command. If you see ‘fail’ or anything else besides ‘ok’, STOP WHAT YOU’RE DOING and go to the next step.
      multipathd> del path sdy                                             
      ok                                                                   
      multipathd> add path sdy                                             
      ok                                                                   
      multipathd> del path sdj                                             
      ok                                                                   
      multipathd> add path sdj                                             
      ok
    • If you got a ‘fail’ response:
      • Type exit to get back to a command line.
      • Type multipath -r on the command line. This should recover/rebuild all block device paths.
      • Type multipath -ll | less again and verify that the block devices were re-added.
      • At this point multipath may actually recognize the new device size (you can see the size in the multipath -ll output). If everything looks good, skip ahead to the pvresize step.
    • In the first root window run multipath -ll again and verify that the block devices were re-added:
      2000b080112002142 dm-11 Pillar,Axiom 500                 
      [size=82G][features=1 queue_if_no_path][hwhandler=0][rw] 
      \_ round-robin 0 [prio=100][active]                      
       \_ 1:0:4:9  sdan       66:112 [active][ready]           
       \_ 0:0:5:9  sdy        65:128 [active][ready]           
      \_ round-robin 0 [prio=20][enabled]                      
       \_ 1:0:5:9  sdbc       67:96  [active][ready]           
       \_ 0:0:4:9  sdj        8:144  [active][ready]
    • Delete and re-add the remaining two block devices in the second root window:
      multipathd> del path sdan
      ok                       
      multipathd> add path sdan
      ok                       
      multipathd> del path sdbc
      ok                       
      multipathd> add path sdbc
      ok
    • In the first root window run multipath -ll again and verify that the block devices were re-added.
    • Tell multipathd to resize the block device map using the map name:
      multipathd> resize map 2000b080112002142
      ok
    • Press Ctrl-D to exit multipathd command line.
  • In the first root window run multipath -llagain to verify that multipath sees the new physical device size. The device below went from 82G to 142G:
    2000b080112002142 dm-11 Pillar,Axiom 500
    [size=142G][features=1 queue_if_no_path][hwhandler=0][rw]
    \_ round-robin 0 [prio=100][active]
     \_ 0:0:5:9  sdy        65:128 [active][ready]
     \_ 1:0:4:9  sdan       66:112 [active][ready]
    \_ round-robin 0 [prio=20][enabled]
     \_ 0:0:4:9  sdj        8:144  [active][ready]
     \_ 1:0:5:9  sdbc       67:96  [active][ready]
  • Finally, get the LVM volume group to recognize that the physical volume is larger using pvresize:
    # pvresize /dev/dm-11
      Physical volume "/dev/dm-11" changed
      1 physical volume(s) resized / 0 physical volume(s) not resized
    # pvs -o +tags
      PV         VG     Fmt  Attr PSize   PFree   PV Tags
      /dev/dm-1  switch lvm2 a-   500.38G 280.38G db024-lindx,lindx
      /dev/dm-10 switch lvm2 a-     1.95T 801.00G db024-ldata,ldata
      /dev/dm-11 switch lvm2 a-   141.50G  60.00G db024-mindx,mindx
      /dev/dm-12 switch lvm2 a-   650.00G 100.00G db024-reports,reports
      /dev/dm-13 switch lvm2 a-    51.25G  31.25G db024-log,log
      /dev/dm-14 switch lvm2 a-   450.12G  50.12G db024-home,home
      /dev/dm-15 switch lvm2 a-     1.76T 342.00G db024-q_backup,q_backup
      /dev/dm-16 switch lvm2 a-     1.00G 640.00M db024-control,control
      /dev/dm-2  switch lvm2 a-   301.38G 120.38G db024-dbs,dbs
      /dev/dm-3  switch lvm2 a-   401.88G 101.88G db024-cdr_data,cdr_data
      /dev/dm-5  switch lvm2 a-   450.62G 290.62G db024-archlogs,archlogs
      /dev/dm-6  switch lvm2 a-    40.88G  22.50G db024-boot,boot
      /dev/dm-7  switch lvm2 a-    51.25G   1.25G db024-rbs,rbs
      /dev/dm-8  switch lvm2 a-    51.25G  27.25G db024-temp,temp
      /dev/dm-9  switch lvm2 a-   201.38G 161.38G db024-summary,summary

    pvs shows that /dev/dm-11 is now 141.5G.

At this point you can enlarge any logical volumes residing on the underlying physical volume without splitting the logical volume across multiple (non-contiguous) physical volumes using lvresize and enlarge the file system using the file system tools, e.g. resize2fs.

If you ran out of space, your LVs were split across multiple PVs, and you need to coalesce a PV onto a single LV use pvmove to move the physical volume to a single device.

Hope you find this useful.

Share Button

Adding an external encrypted drive with LVM to Ubuntu Linux

I recently added an external eSATA drive to my home computer so I could back up critical data from my home network to one drive. I bought a Western Digital 1TB “green” drive and a Thermaltake external hard drive enclosure with eSATA and USB connectors.

Since my internal hard drives are encrypted it didn’t make sense to back up all of that data to an unencrypted external drive. I’d read Uwe Hermann’s excellent how-to article on disk encryption, but he didn’t cover setting up an LVM partition, which I always use so I can change drive volume sizes on the fly.

This is what I did to set up an external encrypted drive with LVM on an Ubuntu system:

  1. Open a terminal
  2. Get a root prompt:
    sudo /bin/bash
  3. Watch the system log:
    tail -f /var/log/messages
  4. Attach the external drive. The system log tells me that it was detected as /dev/sdc.
  5. Check the drive for bad blocks (takes a couple of hours):
    badblocks -c 10240 -s -w -t random -v /dev/sdc
  6. Write random data to the entire drive. This step takes all night, but it ensures that never-written drive space can’t be differentiated from encrypted data if someone ever tries to crack the drive. (If you’re going to do this, you might as well do it right.)
    shred -v -n 1 /dev/sdc
  7. Create one big LVM partition on the drive using fdisk. Set up one big primary partition /dev/sdc1, set the tag to system id “8e” LVM, and write the changes to disk:
    > fdisk /dev/sdc                                                                                                                                              
    Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel                                                                                                  
    Building a new DOS disklabel with disk identifier 0xa6846916.                                                                                                                       
    Changes will remain in memory only, until you decide to write them.                                                                                                                 
    After that, of course, the previous content won't be recoverable.                                                                                                                   
    
    
    The number of cylinders for this disk is set to 121575.
    There is nothing wrong with that, but this is larger than 1024,
    and could in certain setups cause problems with:               
    1) software that runs at boot time (e.g., old versions of LILO)
    2) booting and partitioning software from other OSs                                                                                                                                 
       (e.g., DOS FDISK, OS/2 FDISK)                                                                                                                                                    
    Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)                                                                                                      
                                                                                                                                                                                        
    Command (m for help): p                                                                                                                                            
                                                                                                                                                                                        
    Disk /dev/sdc: 999.9 GB, 999989182464 bytes                                                                                                                                         
    255 heads, 63 sectors/track, 121575 cylinders                                                                                                                                       
    Units = cylinders of 16065 * 512 = 8225280 bytes                                                                                                                                    
    Disk identifier: 0xa6846916                                                                                                                                                         
                                                                                                                                                                                        
       Device Boot      Start         End      Blocks   Id  System                                                                                                                      
                                                                                                                                                                                        
    Command (m for help): n                                                                                                                                            
    Command action                                                                                                                                                                      
       e   extended                                                                                                                                                                     
       p   primary partition (1-4)                                                                                                                                                      
    p                                                                                                                                                                  
    Partition number (1-4): 1                                                                                                                                          
    First cylinder (1-121575, default 1): [ENTER]                                                                                                                      
    Using default value 1
    Last cylinder, +cylinders or +size{K,M,G} (1-121575, default 121575): [ENTER]
    Using default value 121575
    
    Command (m for help): t
    Selected partition 1
    Hex code (type L to list codes): 8e
    Changed system type of partition 1 to 8e (Linux LVM)
    
    Command (m for help): p
    
    Disk /dev/sdc: 999.9 GB, 999989182464 bytes
    255 heads, 63 sectors/track, 121575 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Disk identifier: 0xa6846916
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdc1               1      121575   976551156   8e  Linux LVM
    
    Command (m for help): w
    The partition table has been altered!
    
    Calling ioctl() to re-read partition table.
    Syncing disks.
  8. Use cryptsetup to encrypt the drive:
    cryptsetup --verbose --verify-passphrase luksFormat /dev/sdc1
  9. Unlock the drive:
    cryptsetup luksOpen /dev/sdc1 backupexternal
  10. Create the LVM physical volume:
    pvcreate /dev/mapper/backupexternal
  11. Create the LVM volume group:
    vgcreate xbackup /dev/mapper/backupexternal
  12. Create a logical volume within the volume group:
    lvcreate -L 500G -n backupvol /dev/xbackup
  13. At this point you have a device named /dev/xbackup/backupvol, so create a filesystem on the logical volume:
    mkfs.ext4 /dev/xbackup/backupvol
  14. Mount the volume:
    mount /dev/xbackup/backupvol /mnt/backup
  15. To get the volume to mount automatically at boot time add this line to your /etc/fstab file:
    /dev/xbackup/backupvol      /mnt/backup     ext4    defaults        0 5
  16. To be prompted for the decryption key / passphrase at boot time first get the drive’s UUID:
    ls -l /dev/disk/by-uuid

    (In my example I use the UUID for /dev/sdc1)

  17. Then add this line to the /etc/crypttab file:
    backupexternal UUID=[the UUID of the drive] none luks

That’s it. You now have an external, encrypted hard drive with LVM installed. You’ve created one 500GB volume that uses half the disk, leaving 500GB free for other volumes, or for expanding the first volume.

Hope you find this useful.

Share Button

Samsung SSD drives achieve 2GB/s write speeds

I spend a fair amount of my time administering large databases and trying to squeeze the maximum speed out of computer hardware. The types of databases I work on are typically in the 1-2TB range and the disks that they run on have to support a mix of bulk writes, bulk deletes, random writes and random reads. Some of the systems I work on can burst at up to 500MB/s and sustain random write speeds of 250MB/s. Your typical SATA2 drive for home use supports a maximum sequential write speed of around 60MB/s, and they run slower still when doing lots of random IO, so getting 250MB/s sustained random write speeds requires some special hardware.

This is why I was so impressed with a demo I saw of the new Samsung SSD (solid state drive). Since these drives are solid state — no moving parts — they have 220MB/s sequential read and 200MB/s sequential write speeds. If you tie a bunch of them together in a RAID configuration you can speed writes up even more, which is what the guy in the video below did. The end result was a desktop server that could read and write sequential data at 2GB/s.

Usually when I see demos like this the caveat is that the total amount of space available is 80MB or so. Not the case here. These new drives are 256GB each, and the RAID array built below has a total capacity of 6TB.

Check this out. If you’re into high-speed hardware using off-the-shelf parts, this is just incredible.

One thing to note, the demo does not show random read/write speeds. All of the guy’s demonstrations are of sequential IO speed. This is significant, because in the past SSD drives slow down significantly if they’re asked to do random IO, and you want lots of speed for random IO if you want to use these drives for database applications.

Share Button