post

Copy entire file directories from a Linux host to Box

I had about 2TB of files on a cloud-based Linux host that I needed to backup to cloud storage. I had an Box Enterprise storage account with a 30PB limit on storage and a maximum file size of 150GB, so I decided to try to connect from Linux to Box and store all of the backup data in Box. You can check your own limits under Box “Account Settings”, bottom of the page:

The most difficult part of getting Box to work on a headless, cloud-based Linux host is getting authorization to work. Box wants to use OAuth2 web-based authentication, and I need to set up Box access on a remote host where I’m connecting via ssh and there is no web browser or desktop. The easiest way that I’ve found to do this is to generate an OAuth2 bearer token on my laptop that’s formatted using the JSON Web Token (JWT) format and then copy that to the Linux host.

I used rclone for the backup. I first installed rclone on the Ubuntu Linux host:

sudo apt-get install rclone

Then I installed rclone on my Mac laptop:

brew install rclone

I pulled up a terminal on the Mac and configured rclone so that my Box account was authorized:

rclone authorize box

This will cause a browser window to pop up and ask you to log into Box. Once you’ve logged in and authorized rclone to read and write files in your Box drive the command will finish up and spit out a bearer token:

$ rclone authorize box
2024/09/12 08:57:15 NOTICE: Config file "/Users/eruby/.config/rclone/rclone.conf" not found - using defaults
2024/09/12 08:57:15 NOTICE: If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth?state=qqf7swGNZ8pH4iJksvR3xA
2024/09/12 08:57:15 NOTICE: Log in and authorize rclone for access
2024/09/12 08:57:15 NOTICE: Waiting for code…
2024/09/12 08:57:45 NOTICE: Got code
Paste the following into your remote machine --->
{"access_token":"nOtaReaLacCessT0k3NsuCkas","token_type":"bearer","refresh_token":"nOtaReaLb3aR3rT0k3NKxRrg2J1rB7DKzKg6svazAlwAwHWKl","expiry":"2024-09-12T10:02:31.314087-07:00"}
<---End paste

The bearer token contains an access token, a refresh token, and an expiration date. The access token is good for an hour. After that it expires and application (rclone) will use the refresh token to get a new access token. This can keep happening until the user that generated the token (you) is no longer allowed to access Box or your password changes. If you change your password you’ll need to generate a new bearer token. The refresh token may expire before your password changes, depending on the security policy of the organization issuing the refresh token, so at some point you may need to regenerate the bearer token even if you don’t change your password.

Now you just have to paste the bearer token into the Linux host’s rclone config, so log into the Linux host and run rclone config. Here’s the entire interaction with the config command:

$ rclone config
2024/09/12 18:19:19 NOTICE: Config file "/home/eruby/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> Box
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / 1Fichier
   \ "fichier"
 2 / Alias for an existing remote
   \ "alias"
 3 / Amazon Drive
   \ "amazon cloud drive"
 4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
   \ "s3"
 5 / Backblaze B2
   \ "b2"
 6 / Box
   \ "box"
 7 / Cache a remote
   \ "cache"
 8 / Citrix Sharefile
   \ "sharefile"
 9 / Dropbox
   \ "dropbox"
10 / Encrypt/Decrypt a remote
   \ "crypt"
11 / FTP Connection
   \ "ftp"
12 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
13 / Google Drive
   \ "drive"
14 / Google Photos
   \ "google photos"
15 / Hubic
   \ "hubic"
16 / In memory object storage system.
   \ "memory"
17 / Jottacloud
   \ "jottacloud"
18 / Koofr
   \ "koofr"
19 / Local Disk
   \ "local"
20 / Mail.ru Cloud
   \ "mailru"
21 / Microsoft Azure Blob Storage
   \ "azureblob"
22 / Microsoft OneDrive
   \ "onedrive"
23 / OpenDrive
   \ "opendrive"
24 / OpenStack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
25 / Pcloud
   \ "pcloud"
26 / Put.io
   \ "putio"
27 / SSH/SFTP Connection
   \ "sftp"
28 / Sugarsync
   \ "sugarsync"
29 / Transparently chunk/split large files
   \ "chunker"
30 / Union merges the contents of several upstream fs
   \ "union"
31 / Webdav
   \ "webdav"
32 / Yandex Disk
   \ "yandex"
33 / http Connection
   \ "http"
34 / premiumize.me
   \ "premiumizeme"
35 / seafile
   \ "seafile"
Storage> 6
** See help for box backend at: https://rclone.org/box/ **

OAuth Client Id
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id>
OAuth Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret>
Box App config.json location
Leave blank normally.

Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.

Enter a string value. Press Enter for the default ("").
box_config_file>
Box App Primary Access Token
Leave blank normally.
Enter a string value. Press Enter for the default ("").
access_token>

Enter a string value. Press Enter for the default ("user").
Choose a number from below, or type in your own value
 1 / Rclone should act on behalf of a user
   \ "user"
 2 / Rclone should act on behalf of a service account
   \ "enterprise"
box_sub_type>
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> n
For this to work, you will need rclone available on a machine that has
a web browser available.

For more help and alternate methods see: https://rclone.org/remote_setup/

Execute the following on the machine with the web browser (same rclone
version recommended):

	rclone authorize "box"

Then paste the result below:
result> {"access_token":"nOtaReaLacCessT0k3NsuCkas","token_type":"bearer","refresh_token":"nOtaReaLb3aR3rT0k3NKxRrg2J1rB7DKzKg6svazAlwAwHWKl","expiry":"2024-09-12T10:02:31.314087-07:00"}
--------------------
[Box]
token = {"access_token":"nOtaReaLacCessT0k3NsuCkas","token_type":"bearer","refresh_token":"nOtaReaLb3aR3rT0k3NKxRrg2J1rB7DKzKg6svazAlwAwHWKl","expiry":"2024-09-12T10:02:31.314087-07:00"}
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
Box                  box

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

To explain:

  • I named the remote connection “Box”.
  • I made the type of remote connection “box” (choice 6 from the list of supported remote storage types),
  • I didn’t edit any options, just answered “n” when asked if I wanted to make edits.
  • When I got to the Use auto config? question I answered “n”.
  • I copied the entire bearer token I got from my Mac into the result> field on the Linux box. (This is the entire contents inside the curly braces “{ … }” that rclone tells you to “Paste the following into your remote machine”.)

At this point rclone should be configured, now I just have to copy the data into Box. All of the data in in the /testdata directory, so I ran:

rclone copy --copy-links \
    --create-empty-src-dirs \
    /testdata Box:backup-testdata \
    > ~/box-backup-testdata.log 2>&1 &
  • There are symlinks in the directory, so --copy-links is needed.
  • If there are empty directories on the source, I want to copy them over, so
    --create-empty-src-dirs is needed.
  • /testdata is the root directory that I’m copying over.
  • Box” is the name I used for the remote connection.
  • backup-testdata is going to be the location of the data in my Box account.
  • Since this is going to take many hours to copy, I redirected all output to a log file and ran the process in the background. I can come back and check the log file later to see if everything worked.

Just to make sure everything is working I logged into Box with my laptop and I can see the new backup-testdata directory and it’s being populated with data, so everything appears to be working.

Hope you find this useful.

Thank you Adam Ru for the suggestion to try rclone.

Policy-based Cloud Storage

This is a talk I gave last week at the SF Microservices Meetup titled Policy-based Cloud Storage, Persisting Data in a Multi-Site, Multi-Cloud World. In it I cover Apcera‘s approach to storage for containers and how to use policy to manage very large scale application deployments.

Adding a LUKS-encrypted iSCSI volume to Synology DS414 NAS and Ubuntu 15.04

I have an Ubuntu 15.04 “Vivid” workstation already set up with LUKS full disk encryption, and I have a Synology DS414 NAS with 12TB raw storage on my home network. I wanted to add a disk volume on the Synology DS414 that I could mount on the Ubuntu server, but NFS doesn’t support “at rest” encrypted file systems, and using EncFS over NFS seemed like the wrong way to go about it, so I decided to try setting up an iSCSI volume and encrypting it with LUKS. Using this type of setup, all data is encrypted both “on the wire” and “at rest”.

Log into the Synology Admin Panel and select Main Menu > Storage Manager:

  • Add an iSCSI LUN
    • Set Thin Provisioning = No
    • Advanced LUN Features = No
    • Make the volume as big as you need
  • Add an iSCSI Target
    • Use CHAP authentication
    • Write down the login name and password you choose

On your Ubuntu box switch over to a root prompt:

sudo /bin/bash

Install the open-iscsi drivers. (Since I’m already running LUKS on my Ubuntu box I don’t need to install LUKS.)

apt-get install open-iscsi

Edit the conf file

vi /etc/iscsi/iscsid.conf

Edit these lines:

node.startup = automatic
node.session.auth.username = [CHAP user name on Synology box]
node.session.auth.password = [CHAP password on Synology box]

Restart the open-iscsi service:

service open-iscsi restart
service open-iscsi status

Start open-iscsi at boot time:

systemctl enable open-iscsi

Now find the name of the iSCSI target on the Synology box:

iscsiadm -m discovery -t st -p $SYNOLOGY_IP
iscsiadm -m node

The target name should look something like “iqn.2000-01.com.synology:boxname.target-1.62332311”

Still on the Ubuntu workstation, log into the iSCSI target:

iscsiadm -m node --targetname "$TARGET_NAME" --portal "$SYNOLOGY_IP:3260" --login

Look for new devices:

fdisk -l

At this point fdisk should show you a new block device which is the iSCSI disk volume on the Synology box. In my case it was /dev/sdd.

Partition the device. I made one big /dev/sdd1 partition, type 8e (Linux LVM):

fdisk /dev/sdd

Set up the device as a LUKS-encrypted device:

cryptsetup --verbose --verify-passphrase luksFormat /dev/sdd1

Open the LUKS volume:

cryptsetup luksOpen /dev/sdd1 backupiscsi

Create a physical volume from the LUKS volume:

pvcreate /dev/mapper/backupiscsi

Add that to a new volume group:

vgcreate ibackup /dev/mapper/backupiscsi

Create a logical volume within the volume group:

lvcreate -L 1800GB -n backupvol /dev/ibackup

Put a file system on the logical volume:

mkfs.ext4 /dev/ibackup/backupvol

Add the logical volume to /etc/fstab to mount it on startup:

# Synology iSCSI target LUN-1
/dev/ibackup/backupvol /mnt/backup ext4 defaults,nofail,nobootwait 0 6

Get the UUID of the iSCSI drive:

ls -l /dev/disk/by-uuid | grep sdd1

Add the UUID to /etc/crypttab to be automatically prompted for the decrypt passphrase when you boot up Ubuntu:

backupiscsi UUID=693568ca-9334-4c19-8b01-881f2247ae0d none luks

If you found this interesting, you might want to check out my article Adding an external encrypted drive with LVM to Ubuntu Linux.

Hope you found this useful.

2014 HPCwire Awards

The StratoStor project I’ve been working on for the past 10 months just got a “Top 5 New Products or Technologies to Watch” award from HPCwire announced at this week’s SuperComputing 2014 (SC14) conference in New Orleans.

HPC = High Performance Computing, HPCwire is a news bureau for all things regarding High Performance Computing, and SC14 is where every major vendor of HPC equipment and products shows off their wares, so getting this bit of recognition from the readers of HPCwire is really nice.

So THANK YOU HPCwire readers, for this award.

https://www.hpcwire.com/2014-hpcwire-readers-choice-awards/23/

2014 HPCwire Awards