Adding a LUKS-encrypted iSCSI volume to Synology DS414 NAS and Ubuntu 15.04

I have an Ubuntu 15.04 “Vivid” workstation already set up with LUKS full disk encryption, and I have a Synology DS414 NAS with 12TB raw storage on my home network. I wanted to add a disk volume on the Synology DS414 that I could mount on the Ubuntu server, but NFS doesn’t support “at rest” encrypted file systems, and using EncFS over NFS seemed like the wrong way to go about it, so I decided to try setting up an iSCSI volume and encrypting it with LUKS. Using this type of setup, all data is encrypted both “on the wire” and “at rest”.

Log into the Synology Admin Panel and select Main Menu > Storage Manager:

  • Add an iSCSI LUN
    • Set Thin Provisioning = No
    • Advanced LUN Features = No
    • Make the volume as big as you need
  • Add an iSCSI Target
    • Use CHAP authentication
    • Write down the login name and password you choose

On your Ubuntu box switch over to a root prompt:

sudo /bin/bash

Install the open-iscsi drivers. (Since I’m already running LUKS on my Ubuntu box I don’t need to install LUKS.)

apt-get install open-iscsi

Edit the conf file

vi /etc/iscsi/iscsid.conf

Edit these lines:

node.startup = automatic
node.session.auth.username = [CHAP user name on Synology box]
node.session.auth.password = [CHAP password on Synology box]

Restart the open-iscsi service:

service open-iscsi restart
service open-iscsi status

Start open-iscsi at boot time:

systemctl enable open-iscsi

Now find the name of the iSCSI target on the Synology box:

iscsiadm -m discovery -t st -p $SYNOLOGY_IP
iscsiadm -m node

The target name should look something like “iqn.2000-01.com.synology:boxname.target-1.62332311”

Still on the Ubuntu workstation, log into the iSCSI target:

iscsiadm -m node --targetname "$TARGET_NAME" --portal "$SYNOLOGY_IP:3260" --login

Look for new devices:

fdisk -l

At this point fdisk should show you a new block device which is the iSCSI disk volume on the Synology box. In my case it was /dev/sdd.

Partition the device. I made one big /dev/sdd1 partition, type 8e (Linux LVM):

fdisk /dev/sdd

Set up the device as a LUKS-encrypted device:

cryptsetup --verbose --verify-passphrase luksFormat /dev/sdd1

Open the LUKS volume:

cryptsetup luksOpen /dev/sdd1 backupiscsi

Create a physical volume from the LUKS volume:

pvcreate /dev/mapper/backupiscsi

Add that to a new volume group:

vgcreate ibackup /dev/mapper/backupiscsi

Create a logical volume within the volume group:

lvcreate -L 1800GB -n backupvol /dev/ibackup

Put a file system on the logical volume:

mkfs.ext4 /dev/ibackup/backupvol

Add the logical volume to /etc/fstab to mount it on startup:

# Synology iSCSI target LUN-1
/dev/ibackup/backupvol /mnt/backup ext4 defaults,nofail,nobootwait 0 6

Get the UUID of the iSCSI drive:

ls -l /dev/disk/by-uuid | grep sdd1

Add the UUID to /etc/crypttab to be automatically prompted for the decrypt passphrase when you boot up Ubuntu:

backupiscsi UUID=693568ca-9334-4c19-8b01-881f2247ae0d none luks

If you found this interesting, you might want to check out my article Adding an external encrypted drive with LVM to Ubuntu Linux.

Hope you found this useful.

2014 HPCwire Awards

The StratoStor project I’ve been working on for the past 10 months just got a “Top 5 New Products or Technologies to Watch” award from HPCwire announced at this week’s SuperComputing 2014 (SC14) conference in New Orleans.

HPC = High Performance Computing, HPCwire is a news bureau for all things regarding High Performance Computing, and SC14 is where every major vendor of HPC equipment and products shows off their wares, so getting this bit of recognition from the readers of HPCwire is really nice.

So THANK YOU HPCwire readers, for this award.

https://www.hpcwire.com/2014-hpcwire-readers-choice-awards/23/

2014 HPCwire Awards

Validating Distributed Application Workloads

This is the talk I gave at RICON this year on Validating Distributed Application Workloads. It’s about how we set up test environments at Seagate for validating storage system performance at the petabyte scale. This talk centers around the testing done to validate performance of a 2PB rack running Riak CS.

Creating differential backups with hard links and rsync

You can use a hard link in Linux to create two file names that both point to the same physical location on a hard disk. For instance, if I type:

> echo xxxx > a
> cp -l a b
> cat a
xxxx
> cat b
xxxx

I create a file named “a” that contains the string “xxxx”. Then I create a hard link “b” that also points to the same spot on the disk. Now if I write to the file “a” whatever I write also appears in file “b” and vice versa:

> echo yyyy > b
> cat b
yyyy
> cat a
yyyy
> echo zzzz > a
> cat a
zzzz
> cat b
zzzz

Copying to a hard link updates the data on the disk that each hard link points to:

> rm a b c
> echo xxxx > a
> echo yyyy > c
> cp -l a b
> cat a b c
xxxx
xxxx
yyyy

“a” and “b” point to the same file on disk, “c” is a separate file. If I copy a file “c” to “b” that also updates “a”:

> cp c b 
> cat a b c
yyyy
yyyy
yyyy
> echo zzzz > c
> cat a b c
yyyy
yyyy
zzzz 

What most people don’t know is that rsync is an exception to this rule. If you use rsync to sync two files, and it sees that the target file is a hard link, it will create a new target file but only if the contents of the two files are not the same:

> rm a
> rm b
> echo xxxx > a
> cp -l a b
> cat a
xxxx
> cat b
xxxx
> echo yyyy > c
> cat c
yyyy
> rsync -av c b
sending incremental file list
c
sent 87 bytes  received 31 bytes  236.00 bytes/sec
total size is 5  speedup is 0.04
> cat b
yyyy
> cat c
yyyy
> cat a
xxxx

File “b” is no longer a hard link of “a”, it’s a new file. If I update “a” it no longer updates “b”:

> echo zzzz > a
> cat a b c
zzzz
yyyy
yyyy

However, if the file that I’m rsync-ing is the same as “b”, then rsync does NOT break the hard link, it leaves the file alone:

> rm a
> rm b
> rm c
> echo xxxx > a
> cp -al a b
> cp -p a c
> cat a b c
xxxx
xxxx
xxxx

At this point “a” and “b” both point to the same file on the disk, which contains the string “xxxx”. “c” is a separate file that also contains the string “xxxx” and has the same permissions and timestamp as “a”.

> rsync -av c b
sending incremental file list
sent 39 bytes  received 12 bytes  102.00 bytes/sec
total size is 5  speedup is 0.10
> cat a b c
xxxx
xxxx
xxxx

At this point I’ve rsynced file “c” to “b”, but since c has the same contents and timestamp as “a” and “b” rsync does nothing at all. It doesn’t break the hard link. If I change “b” it still updates “a”:

> echo yyyy > b
> cat a b c
yyyy
yyyy
xxxx

This is how many modern file system backup programs work. On day 1 you make an rsync copy of your entire file system:

backup@backup_server> DAY1=`date +%Y%m%d%H%M%S`
backup@backup_server> rsync -av -e ssh earl@192.168.1.20:/home/earl/ /var/backups/$DAY1/

On day 2 you make a hard link copy of the backup, then a fresh rsync:

backup@backup_server> DAY2=`date +%Y%m%d%H%M%S`
backup@backup_server> cp -al /var/backups/$DAY1 /var/backups/$DAY2
backup@backup_server> rsync -av -e ssh --delete earl@192.168.1.20:/home/earl/ /var/backups/$DAY2/

“cp -al” makes a hard link copy of the entire /home/earl/ directory structure from the previous day, then rsync runs against the copy of the tree. If a file remains unchanged then rsync does nothing — the file remains a hard link. However, if the file’s contents changed, then rsync will create a new copy of the file in the target directory. If a file was deleted from /home/earl then rsync deletes the hard link from that day’s copy.

In this way, the $DAY1 directory has a snapshot of the /home/earl tree as it existed on day 1, and the $DAY2 directory has a snapshot of the /home/earl tree as it existed on day 2, but only the files that changed take up additional disk space. If you need to find a file as it existed at some point in time you can look at that day’s tree. If you need to restore yesterday’s backup you can rsync the tree from yesterday, but you don’t have to store a copy of all of the data from each day, you only use additional disk space for files that changed or were added.

I use this technique to keep 90 daily backups of a 500GB file system on a 1TB drive.

One caveat: The hard links do use up inodes. If you’re using a file system such as ext3, which has a set number of inodes, you should allocate extra inodes on the backup volume when you create it. If you’re using a file system that can dynamically add inodes, such as ext4, zfs or btrfs, then you don’t need to worry about this.

Increasing the size of an LVM Physical Volume (PV) while running multipathd — without rebooting

If you’re using the Linux Logical Volume Manager (LVM) to manage your disk space it’s easy to enlarge a logical volume while a server is up and running. It’s also easy to add new drives to an existing volume group.

But if you’re using a SAN the underlying physical drives can have different performance characteristics because they’re assigned to different QOS bands on the SAN. If you want to keep performance optimized it’s important to know what physical volume a logical volume is assigned to — otherwise you can split a single logical volume across multiple physical volumes and end up degrading system performance. If you run out of space on a physical volume and then enlarge a logical volume you will split the LV across two or more PVs. To prevent this from happening you need to enlarge the LUN, tell multipathd about the change, then enlarge the PV, then enlarge the LV, and finally enlarge the file system.

I have three SANs at the company where I work (two Pillar Axioms and a Xyratex) which are attached two two fibrechannel switches and several racks of blade servers. Each blade is running an Oracle database with multiple physical volumes (PVs) grouped together into a single LVM. The PVs are tagged and as logical volumes (LVs) are added they’re assigned to the base physical volume with the same tag name as the logical volume. That way we can assign the PV to a higher or lower performance band on the SAN and optimize the database’s performance. Oracle tablespaces that contain frequently-accessed data get assigned to a PV with a higher QOS band on the SAN. Archival data gets put on a PV with a lower QOS band.

We run OpenSUSE 11.x using multipathd to deal with the multiple fiber paths available between each blade and a SAN. Since each blade has 2 fiber ports for redundancy, which are attached to two fiber switches, each of which is cross-connected to 2 ports on 2 different controllers on the SAN, so there are 4 different fiber paths that data can take between the blade and the SAN. If any path fails, or one port on a fiber card fails, or one fiber switch fails, multipathd re-routes the data using the remaining data paths and everything keeps working. If a blade fails we switch to another blade.

If we run out of space on a PV I can log into the SAN’s administrative interface and enlarge the size of the underlying LUN, but getting the operating system on the blade to recognize the fact that more physical disk space was available is tricky. LVM’s pvresize command would claim that it was enlarging the PV, but nothing would happen unless the server was rebooted and then pvresize was run again. I wanted to be able to enlarge physical volumes without taking a database off-line and rebooting its server. Here’s how I did it:

  • First log into the SAN’s administrative interface and enlarge the LUN in question.
  • Open two xterm windows on the host as root
  • Gather information – you will need the physical device name, the multipath block device names, and the multipath map name. (Since our setup gives us 4 data paths for each LUN there are 4 multipath block device names.)
  • List the physical volumes and their associated tags with pvs -o +tags:
    # pvs -o +tags
      PV         VG     Fmt  Attr PSize   PFree   PV Tags                
      /dev/dm-1  switch lvm2 a-   500.38G 280.38G db024-lindx,lindx      
      /dev/dm-10 switch lvm2 a-     1.95T 801.00G db024-ldata,ldata      
      /dev/dm-11 switch lvm2 a-    81.50G      0  db024-mindx,mindx      
      /dev/dm-12 switch lvm2 a-   650.00G 100.00G db024-reports,reports  
      /dev/dm-13 switch lvm2 a-    51.25G  31.25G db024-log,log          
      /dev/dm-14 switch lvm2 a-   450.12G  50.12G db024-home,home        
      /dev/dm-15 switch lvm2 a-     1.76T 342.00G db024-q_backup,q_backup
      /dev/dm-16 switch lvm2 a-     1.00G 640.00M db024-control,control  
      /dev/dm-2  switch lvm2 a-   301.38G 120.38G db024-dbs,dbs          
      /dev/dm-3  switch lvm2 a-   401.88G 101.88G db024-cdr_data,cdr_data
      /dev/dm-5  switch lvm2 a-   450.62G 290.62G db024-archlogs,archlogs
      /dev/dm-6  switch lvm2 a-    40.88G  22.50G db024-boot,boot        
      /dev/dm-7  switch lvm2 a-    51.25G   1.25G db024-rbs,rbs          
      /dev/dm-8  switch lvm2 a-    51.25G  27.25G db024-temp,temp        
      /dev/dm-9  switch lvm2 a-   201.38G 161.38G db024-summary,summary
  • Find the device that corresponds to the LUN you just enlarged, e.g. /dev/dm-11
  • Run multipath -ll, find the device name in the listing. The large hex number at the start of the line is the multipath map name and the sdX block devices after the device name are the multipath block devices. So in this example the map name is 2000b080112002142 and the block devices are sdy, sdan, sdj, and sdbc:
    2000b080112002142 dm-11 Pillar,Axiom 500                 
    [size=82G][features=1 queue_if_no_path][hwhandler=0][rw] 
    \_ round-robin 0 [prio=100][active]                      
     \_ 0:0:5:9  sdy        65:128 [active][ready]           
     \_ 1:0:4:9  sdan       66:112 [active][ready]           
    \_ round-robin 0 [prio=20][enabled]                      
     \_ 0:0:4:9  sdj        8:144  [active][ready]           
     \_ 1:0:5:9  sdbc       67:96  [active][ready]
  • Next get multipath to recognize that the device is larger:
    • For each block device do echo 1 > /sys/block/sdX/device/rescan:
      # echo 1 > /sys/block/sdy/device/rescan
      # echo 1 > /sys/block/sdan/device/rescan
      # echo 1 > /sys/block/sdj/device/rescan
      # echo 1 > /sys/block/sdbc/device/rescan
    • In the second root window, pull up a multipath command line with multipathd -k
    • Delete and re-add the first block device from each group. Since multipathd provides multiple paths to the underlying SAN, the device will remain up and on-line during this process. Make sure that you get an ‘ok’ after each command. If you see ‘fail’ or anything else besides ‘ok’, STOP WHAT YOU’RE DOING and go to the next step.
      multipathd> del path sdy                                             
      ok                                                                   
      multipathd> add path sdy                                             
      ok                                                                   
      multipathd> del path sdj                                             
      ok                                                                   
      multipathd> add path sdj                                             
      ok
    • If you got a ‘fail’ response:
      • Type exit to get back to a command line.
      • Type multipath -r on the command line. This should recover/rebuild all block device paths.
      • Type multipath -ll | less again and verify that the block devices were re-added.
      • At this point multipath may actually recognize the new device size (you can see the size in the multipath -ll output). If everything looks good, skip ahead to the pvresize step.
    • In the first root window run multipath -ll again and verify that the block devices were re-added:
      2000b080112002142 dm-11 Pillar,Axiom 500                 
      [size=82G][features=1 queue_if_no_path][hwhandler=0][rw] 
      \_ round-robin 0 [prio=100][active]                      
       \_ 1:0:4:9  sdan       66:112 [active][ready]           
       \_ 0:0:5:9  sdy        65:128 [active][ready]           
      \_ round-robin 0 [prio=20][enabled]                      
       \_ 1:0:5:9  sdbc       67:96  [active][ready]           
       \_ 0:0:4:9  sdj        8:144  [active][ready]
    • Delete and re-add the remaining two block devices in the second root window:
      multipathd> del path sdan
      ok                       
      multipathd> add path sdan
      ok                       
      multipathd> del path sdbc
      ok                       
      multipathd> add path sdbc
      ok
    • In the first root window run multipath -ll again and verify that the block devices were re-added.
    • Tell multipathd to resize the block device map using the map name:
      multipathd> resize map 2000b080112002142
      ok
    • Press Ctrl-D to exit multipathd command line.
  • In the first root window run multipath -llagain to verify that multipath sees the new physical device size. The device below went from 82G to 142G:
    2000b080112002142 dm-11 Pillar,Axiom 500
    [size=142G][features=1 queue_if_no_path][hwhandler=0][rw]
    \_ round-robin 0 [prio=100][active]
     \_ 0:0:5:9  sdy        65:128 [active][ready]
     \_ 1:0:4:9  sdan       66:112 [active][ready]
    \_ round-robin 0 [prio=20][enabled]
     \_ 0:0:4:9  sdj        8:144  [active][ready]
     \_ 1:0:5:9  sdbc       67:96  [active][ready]
  • Finally, get the LVM volume group to recognize that the physical volume is larger using pvresize:
    # pvresize /dev/dm-11
      Physical volume "/dev/dm-11" changed
      1 physical volume(s) resized / 0 physical volume(s) not resized
    # pvs -o +tags
      PV         VG     Fmt  Attr PSize   PFree   PV Tags
      /dev/dm-1  switch lvm2 a-   500.38G 280.38G db024-lindx,lindx
      /dev/dm-10 switch lvm2 a-     1.95T 801.00G db024-ldata,ldata
      /dev/dm-11 switch lvm2 a-   141.50G  60.00G db024-mindx,mindx
      /dev/dm-12 switch lvm2 a-   650.00G 100.00G db024-reports,reports
      /dev/dm-13 switch lvm2 a-    51.25G  31.25G db024-log,log
      /dev/dm-14 switch lvm2 a-   450.12G  50.12G db024-home,home
      /dev/dm-15 switch lvm2 a-     1.76T 342.00G db024-q_backup,q_backup
      /dev/dm-16 switch lvm2 a-     1.00G 640.00M db024-control,control
      /dev/dm-2  switch lvm2 a-   301.38G 120.38G db024-dbs,dbs
      /dev/dm-3  switch lvm2 a-   401.88G 101.88G db024-cdr_data,cdr_data
      /dev/dm-5  switch lvm2 a-   450.62G 290.62G db024-archlogs,archlogs
      /dev/dm-6  switch lvm2 a-    40.88G  22.50G db024-boot,boot
      /dev/dm-7  switch lvm2 a-    51.25G   1.25G db024-rbs,rbs
      /dev/dm-8  switch lvm2 a-    51.25G  27.25G db024-temp,temp
      /dev/dm-9  switch lvm2 a-   201.38G 161.38G db024-summary,summary

    pvs shows that /dev/dm-11 is now 141.5G.

At this point you can enlarge any logical volumes residing on the underlying physical volume without splitting the logical volume across multiple (non-contiguous) physical volumes using lvresize and enlarge the file system using the file system tools, e.g. resize2fs.

If you ran out of space, your LVs were split across multiple PVs, and you need to coalesce a PV onto a single LV use pvmove to move the physical volume to a single device.

Hope you find this useful.