post

Copy entire file directories from a Linux host to Box

I had about 2TB of files on a cloud-based Linux host that I needed to backup to cloud storage. I had an Box Enterprise storage account with a 30PB limit on storage and a maximum file size of 150GB, so I decided to try to connect from Linux to Box and store all of the backup data in Box. You can check your own limits under Box “Account Settings”, bottom of the page:

The most difficult part of getting Box to work on a headless, cloud-based Linux host is getting authorization to work. Box wants to use OAuth2 web-based authentication, and I need to set up Box access on a remote host where I’m connecting via ssh and there is no web browser or desktop. The easiest way that I’ve found to do this is to generate an OAuth2 bearer token on my laptop that’s formatted using the JSON Web Token (JWT) format and then copy that to the Linux host.

I used rclone for the backup. I first installed rclone on the Ubuntu Linux host:

sudo apt-get install rclone

Then I installed rclone on my Mac laptop:

brew install rclone

I pulled up a terminal on the Mac and configured rclone so that my Box account was authorized:

rclone authorize box

This will cause a browser window to pop up and ask you to log into Box. Once you’ve logged in and authorized rclone to read and write files in your Box drive the command will finish up and spit out a bearer token:

$ rclone authorize box
2024/09/12 08:57:15 NOTICE: Config file "/Users/eruby/.config/rclone/rclone.conf" not found - using defaults
2024/09/12 08:57:15 NOTICE: If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth?state=qqf7swGNZ8pH4iJksvR3xA
2024/09/12 08:57:15 NOTICE: Log in and authorize rclone for access
2024/09/12 08:57:15 NOTICE: Waiting for code…
2024/09/12 08:57:45 NOTICE: Got code
Paste the following into your remote machine --->
{"access_token":"nOtaReaLacCessT0k3NsuCkas","token_type":"bearer","refresh_token":"nOtaReaLb3aR3rT0k3NKxRrg2J1rB7DKzKg6svazAlwAwHWKl","expiry":"2024-09-12T10:02:31.314087-07:00"}
<---End paste

The bearer token contains an access token, a refresh token, and an expiration date. The access token is good for an hour. After that it expires and application (rclone) will use the refresh token to get a new access token. This can keep happening until the user that generated the token (you) is no longer allowed to access Box or your password changes. If you change your password you’ll need to generate a new bearer token. The refresh token may expire before your password changes, depending on the security policy of the organization issuing the refresh token, so at some point you may need to regenerate the bearer token even if you don’t change your password.

Now you just have to paste the bearer token into the Linux host’s rclone config, so log into the Linux host and run rclone config. Here’s the entire interaction with the config command:

$ rclone config
2024/09/12 18:19:19 NOTICE: Config file "/home/eruby/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> Box
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / 1Fichier
   \ "fichier"
 2 / Alias for an existing remote
   \ "alias"
 3 / Amazon Drive
   \ "amazon cloud drive"
 4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
   \ "s3"
 5 / Backblaze B2
   \ "b2"
 6 / Box
   \ "box"
 7 / Cache a remote
   \ "cache"
 8 / Citrix Sharefile
   \ "sharefile"
 9 / Dropbox
   \ "dropbox"
10 / Encrypt/Decrypt a remote
   \ "crypt"
11 / FTP Connection
   \ "ftp"
12 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
13 / Google Drive
   \ "drive"
14 / Google Photos
   \ "google photos"
15 / Hubic
   \ "hubic"
16 / In memory object storage system.
   \ "memory"
17 / Jottacloud
   \ "jottacloud"
18 / Koofr
   \ "koofr"
19 / Local Disk
   \ "local"
20 / Mail.ru Cloud
   \ "mailru"
21 / Microsoft Azure Blob Storage
   \ "azureblob"
22 / Microsoft OneDrive
   \ "onedrive"
23 / OpenDrive
   \ "opendrive"
24 / OpenStack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
25 / Pcloud
   \ "pcloud"
26 / Put.io
   \ "putio"
27 / SSH/SFTP Connection
   \ "sftp"
28 / Sugarsync
   \ "sugarsync"
29 / Transparently chunk/split large files
   \ "chunker"
30 / Union merges the contents of several upstream fs
   \ "union"
31 / Webdav
   \ "webdav"
32 / Yandex Disk
   \ "yandex"
33 / http Connection
   \ "http"
34 / premiumize.me
   \ "premiumizeme"
35 / seafile
   \ "seafile"
Storage> 6
** See help for box backend at: https://rclone.org/box/ **

OAuth Client Id
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id>
OAuth Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret>
Box App config.json location
Leave blank normally.

Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.

Enter a string value. Press Enter for the default ("").
box_config_file>
Box App Primary Access Token
Leave blank normally.
Enter a string value. Press Enter for the default ("").
access_token>

Enter a string value. Press Enter for the default ("user").
Choose a number from below, or type in your own value
 1 / Rclone should act on behalf of a user
   \ "user"
 2 / Rclone should act on behalf of a service account
   \ "enterprise"
box_sub_type>
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> n
For this to work, you will need rclone available on a machine that has
a web browser available.

For more help and alternate methods see: https://rclone.org/remote_setup/

Execute the following on the machine with the web browser (same rclone
version recommended):

	rclone authorize "box"

Then paste the result below:
result> {"access_token":"nOtaReaLacCessT0k3NsuCkas","token_type":"bearer","refresh_token":"nOtaReaLb3aR3rT0k3NKxRrg2J1rB7DKzKg6svazAlwAwHWKl","expiry":"2024-09-12T10:02:31.314087-07:00"}
--------------------
[Box]
token = {"access_token":"nOtaReaLacCessT0k3NsuCkas","token_type":"bearer","refresh_token":"nOtaReaLb3aR3rT0k3NKxRrg2J1rB7DKzKg6svazAlwAwHWKl","expiry":"2024-09-12T10:02:31.314087-07:00"}
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
Box                  box

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

To explain:

  • I named the remote connection “Box”.
  • I made the type of remote connection “box” (choice 6 from the list of supported remote storage types),
  • I didn’t edit any options, just answered “n” when asked if I wanted to make edits.
  • When I got to the Use auto config? question I answered “n”.
  • I copied the entire bearer token I got from my Mac into the result> field on the Linux box. (This is the entire contents inside the curly braces “{ … }” that rclone tells you to “Paste the following into your remote machine”.)

At this point rclone should be configured, now I just have to copy the data into Box. All of the data in in the /testdata directory, so I ran:

rclone copy --copy-links \
    --create-empty-src-dirs \
    /testdata Box:backup-testdata \
    > ~/box-backup-testdata.log 2>&1 &
  • There are symlinks in the directory, so --copy-links is needed.
  • If there are empty directories on the source, I want to copy them over, so
    --create-empty-src-dirs is needed.
  • /testdata is the root directory that I’m copying over.
  • Box” is the name I used for the remote connection.
  • backup-testdata is going to be the location of the data in my Box account.
  • Since this is going to take many hours to copy, I redirected all output to a log file and ran the process in the background. I can come back and check the log file later to see if everything worked.

Just to make sure everything is working I logged into Box with my laptop and I can see the new backup-testdata directory and it’s being populated with data, so everything appears to be working.

Hope you find this useful.

Thank you Adam Ru for the suggestion to try rclone.

post

AI Without GPUs: Using Your Existing CPU Resources to Run AI Workloads

This is a talk Keith Bradley and I gave at VMware Explore 2024 Las Vegas called AI Without GPUs: Using Your Existing CPU Resources to Run AI Workloads. Keith Bradley is the Vice President of IT and Security at Nature Fresh Farms. Nature Fresh Farms uses AI to control every aspect of their agricultural operations and they’re using CPUs to process those AI workloads.

Graphics processing units (GPUs) are expensive, hard to acquire and extremely powerful, but there are many AI/ML applications that can run just fine without GPUs. This session covers how to use your existing central processing unit (CPU) resources to run AI workloads, what you can do, what you shouldn’t do and what types of problems you can solve without using any GPUs at all.

Covered in this talk:

  • Nature Fresh Farms use case
  • The AI/ML software stack
  • Introduction to Intel Xeon 4th Gen CPUs w/ AMX
  • Requirements for AMX on vSphere 8
  • Getting started with OneAPI and OpenVINO
  • Demo OpenVino for Computer Vision
  • Demo of LLM optimized for AMX
  • Demo tuned LLM on AMX
  • Takeaways

I hope you enjoy the talk.

post

Fix bouncing mail from a GNU Mailman server on Dreamhost

GNU Mailman is free software for managing electronic mail discussion and e-newsletter lists. I started using it back in 1998 for managing internal email lists at a company I worked for. I’ve used it many times over the years, but stopped when email lists fell out of fashion. I liked it because it’s pretty easy to set up an actual discussion list, where replies go to the list (not the sender), which results in actual discussion.

I recently set one up again, using a my Dreamhost account and their automated web panel to deploy a discussion list for a volunteer group I manage. I was having a problem though, some of the people on the list weren’t getting all of the mail.

Mostly it was anyone with an @gmail.com mailing address. The odd part was that they were getting some messages, just not all messages. I had people check their spam folders but that wasn’t it.

Since messages weren’t ending up in SPAM folders that usually means that (a) the recipient’s email server is bouncing the message (refusing the message) or (b) something was wrong with Mailman’s settings.

I did some Googling today and found that many other people were reporting similar problems, but no one had a good solution other than to turn on bounced message troubleshooting, so I did that.

I logged into the list’s mailing list administration page, selected the “Bounce processing” setup option, and made sure that all notifications were turned ON.

After I did that I sent a message to the mailing list. Almost immediately I got back a bounce message from sbcglobal.net:

<listsubscriber@sbcglobal.net>: host ff-ip4-mx-vip1.prodigy.net[144.160.159.21]
    said: 553 5.3.0 flpd577 DNSBL:RBL 521< [Mailman's IP] >_is_blocked.For
    assistance forward this error to abuse_rbl@abuse-att.net (in reply to MAIL FROM command)

DNSBL:RBL is a realtime DNS blacklist designed to block spam. I went to DNSBL.info and checked my Mailmain server’s IP address. It wasn’t listed:

Next I went to check the DNS SPF record for the mailing list’s domain name. I had assumed that since I’d used Dreamhost’s web panel to install the Mailman service that Dreamhost would automatically take care of the SPF record.

I was wrong, there was no SPF record.

Well that explains a lot.

When a mail server (technically a “mail exchanger” or “MX” server) receives mail from another mail server one of the things that it will do is ask two questions:

  • What domain did this email come from?
  • Is the server that sent this mail allowed to send mail for that domain?

The way that the second question is answered is an SPF record. The receiving mail server looks up the DNS SPF record for the domain that sent the mail. If the SPF record says that the server sending the mail is allowed to send mail for the domain the SPF check passes and all is well. If the SPF record doesn’t exist, or doesn’t list the server that the mail came from, the SPF check fails and the mail gets bounced.

Dreamhost installs Mailman on a subdomain. My Mailman subdomain name didn’t have an SPF record. I was somewhat surprised that any mail was getting though. Usually a missing SPF record will stop all mail coming from a domain to be bounced.

So I added an SPF record for my subdomain. In my case I allow-listed the following:

  • Any IP with an A record for my subdomain. The mailing is is on a subdomain with one A record that points to the VM running the Mailman server.
  • Any IP with an MX record for my subdomain, so any assigned mail exchangers.
  • netblocks.dreamhost.com and relay.mailchannels.net – Suggested by Dreamhost tech support. I’m guessing “all netblocks assigned to Dreamhost” and “all mail relays operated by Dreamhost.”

The subdomain’s DNS entry is a type “TXT” record with the contents:

"v=spf1 a mx include:netblocks.dreamhost.com
include:relay.mailchannels.net ~all"

The ~all at the end says that anyone attempting to send mail from my domain using a server that isn’t in the list will “soft fail” the SPF test, which is interpreted by most mail exchange servers to mean “mark it as spam if it doesn’t come from one of the listed hosts.” If you want the MX server to “hard fail” (bounce) the message use -all (hard fail) instead.

I tend to use soft fail just in case the list subscriber’s server is misconfigured or there’s some other failure. In that case the MX server will send list messages to spam (so the list subscriber will still see it) rather than bounce the message.

If you need to set this up for yourself make sure that you list all hosts that send mail for your domain. There are a number of web tools available to help you create an SPF record with the correct parameters, just Google “create an spf record” and you’ll find half a dozen.

Hope you find this useful.

post

AI without GPUs: Using Intel AMX CPUs on VMware vSphere with Tanzu Kubernetes

I was invited to AI Field Day 4 in Santa Clara last week to present a couple of talks on running AI workloads on Intel AMX CPUs. This is a recording of the talk I did on setting up Tanzu Kubernetes for running workloads that use Intel AMX CPUs.

Hope you find this useful.