post

Fix bouncing mail from a GNU Mailman server on Dreamhost

GNU Mailman is free software for managing electronic mail discussion and e-newsletter lists. I started using it back in 1998 for managing internal email lists at a company I worked for. I’ve used it many times over the years, but stopped when email lists fell out of fashion. I liked it because it’s pretty easy to set up an actual discussion list, where replies go to the list (not the sender), which results in actual discussion.

I recently set one up again, using a my Dreamhost account and their automated web panel to deploy a discussion list for a volunteer group I manage. I was having a problem though, some of the people on the list weren’t getting all of the mail.

Mostly it was anyone with an @gmail.com mailing address. The odd part was that they were getting some messages, just not all messages. I had people check their spam folders but that wasn’t it.

Since messages weren’t ending up in SPAM folders that usually means that (a) the recipient’s email server is bouncing the message (refusing the message) or (b) something was wrong with Mailman’s settings.

I did some Googling today and found that many other people were reporting similar problems, but no one had a good solution other than to turn on bounced message troubleshooting, so I did that.

I logged into the list’s mailing list administration page, selected the “Bounce processing” setup option, and made sure that all notifications were turned ON.

After I did that I sent a message to the mailing list. Almost immediately I got back a bounce message from sbcglobal.net:

<listsubscriber@sbcglobal.net>: host ff-ip4-mx-vip1.prodigy.net[144.160.159.21]
    said: 553 5.3.0 flpd577 DNSBL:RBL 521< [Mailman's IP] >_is_blocked.For
    assistance forward this error to abuse_rbl@abuse-att.net (in reply to MAIL FROM command)

DNSBL:RBL is a realtime DNS blacklist designed to block spam. I went to DNSBL.info and checked my Mailmain server’s IP address. It wasn’t listed:

Next I went to check the DNS SPF record for the mailing list’s domain name. I had assumed that since I’d used Dreamhost’s web panel to install the Mailman service that Dreamhost would automatically take care of the SPF record.

I was wrong, there was no SPF record.

Well that explains a lot.

When a mail server (technically a “mail exchanger” or “MX” server) receives mail from another mail server one of the things that it will do is ask two questions:

  • What domain did this email come from?
  • Is the server that sent this mail allowed to send mail for that domain?

The way that the second question is answered is an SPF record. The receiving mail server looks up the DNS SPF record for the domain that sent the mail. If the SPF record says that the server sending the mail is allowed to send mail for the domain the SPF check passes and all is well. If the SPF record doesn’t exist, or doesn’t list the server that the mail came from, the SPF check fails and the mail gets bounced.

Dreamhost installs Mailman on a subdomain. My Mailman subdomain name didn’t have an SPF record. I was somewhat surprised that any mail was getting though. Usually a missing SPF record will stop all mail coming from a domain to be bounced.

So I added an SPF record for my subdomain. In my case I allow-listed the following:

  • Any IP with an A record for my subdomain. The mailing is is on a subdomain with one A record that points to the VM running the Mailman server.
  • Any IP with an MX record for my subdomain, so any assigned mail exchangers.
  • netblocks.dreamhost.com and relay.mailchannels.net – Suggested by Dreamhost tech support. I’m guessing “all netblocks assigned to Dreamhost” and “all mail relays operated by Dreamhost.”

The subdomain’s DNS entry is a type “TXT” record with the contents:

"v=spf1 a mx include:netblocks.dreamhost.com
include:relay.mailchannels.net ~all"

The ~all at the end says that anyone attempting to send mail from my domain using a server that isn’t in the list will “soft fail” the SPF test, which is interpreted by most mail exchange servers to mean “mark it as spam if it doesn’t come from one of the listed hosts.” If you want the MX server to “hard fail” (bounce) the message use -all (hard fail) instead.

I tend to use soft fail just in case the list subscriber’s server is misconfigured or there’s some other failure. In that case the MX server will send list messages to spam (so the list subscriber will still see it) rather than bounce the message.

If you need to set this up for yourself make sure that you list all hosts that send mail for your domain. There are a number of web tools available to help you create an SPF record with the correct parameters, just Google “create an spf record” and you’ll find half a dozen.

Hope you find this useful.

post

AI without GPUs: Using Intel AMX CPUs on VMware vSphere with Tanzu Kubernetes

I was invited to AI Field Day 4 in Santa Clara last week to present a couple of talks on running AI workloads on Intel AMX CPUs. This is a recording of the talk I did on setting up Tanzu Kubernetes for running workloads that use Intel AMX CPUs.

Hope you find this useful.

post

Upgrading vCenter 7 via the command line

Updated on 2021-10-26.

I have vCenter 7.0.0.10700 installed and I want to update to 7.0.1.00200. When I run Update Planner > Interoperability it reports that all of my ESXi hosts are running ESXi 7.0.1. If I run the pre-update checks I get “No issues found”. When I go to the appliance to do the upgrade, both “Stage Only” and “Stage and Install” are greyed-out and unselectable.

vCenter 7 Appliance Available Updates screen

I tried a dozen different tricks, including ssh-ing into the appliance as root and editing the /etc/applmgmt/appliance/software_update_state.conf file, but nothing could enable the “Stage Only” and “Stage and Install” buttons.

Use the command line

I finally decided to try upgrading via the command line. I have backups going back 30 days. I even double-checked and yes, my NFS server has files in the backup directory for each of the past 30 days and they have data in them. There’s probably even a way to restore one of those backups if something goes horribly wrong. Onwards!

I was already logged into the vCenter appliance shell as root. The next thing I needed to do was to figure out where the command line tools were hidden. I found them in /usr/lib/applmgmt/support/scripts.

Disclaimer: I work at VMware, but I have no idea if the following is an “acceptable practice” or not. If your production vCenter is broken and you have a support contract, call support. If you’re messing around on a home or test system and you don’t care how badly you screw it up, feel free to try the command line tools.

root@vcenter [ ~ ]# cd /usr/lib/applmgmt/support/scripts
root@vcenter [ /usr/lib/applmgmt/support/scripts ]# ls -al
total 108
drwxr-xr-x 4 root root  4096 Aug 30 18:18 .
drwxr-xr-x 4 root root  4096 Aug 30 18:18 ..
-r-xr-xr-x 1 root root   205 Aug 15 07:16 autogrow.sh
-r-xr-xr-x 1 root root   633 Aug 15 07:16 manifest-verification
-r-xr-xr-x 1 root root   286 Aug 15 07:16 mapping.sh
-r-xr-xr-x 1 root root  2056 Aug 15 07:16 pgtop.py
-r-xr-xr-x 1 root root  3396 Aug 15 07:16 port-accessible.py
drwxr-xr-x 2 root root  4096 Aug 30 18:18 postinstallscripts
-r-xr-xr-x 1 root root  5207 Aug 15 07:16 prestart-applmgmt.sh
-r-xr-xr-x 1 root root  4171 Aug 15 07:16 resize-root.py
-r-xr-xr-x 1 root root   251 Aug 15 07:16 setup-env.sh
-r-xr-xr-x 1 root root  4001 Aug 15 07:16 showlog.py
-r-xr-xr-x 1 root root  3910 Aug 15 07:16 shutdown.py
-r-xr-xr-x 1 root root 35773 Aug 15 07:16 software-packages.py
-r-xr-xr-x 1 root root  8085 Aug 15 07:16 support-bundle.py
drwxr-xr-x 2 root root  4096 Aug 30 18:18 tests

These are the Python scripts that are linked to the Command shell. I’m actually in the root shell. I can run these directly from the root shell, or exit back to the Command shell and use them in the “official” way. In case I need to pull in support let’s do this the official way.

The software-packages.py script is what does the upgrade. Let’s exit back to the Command shell and see what it says it supports.

root@vcenter [ /usr/lib/applmgmt/support/scripts ]# exit
Command> software-packages
usage: software-packages [-h] {stage,unstage,validate,install,list} ...

optional arguments:
  -h, --help            show this help message and exit

sub-commands:
  {stage,unstage,validate,install,list}
    stage               Stage software update packages
    unstage             Purge staged software update packages
    validate            Validate software update packages
    install             Install software update packages
    list                List details of software update packages

Stage the packages for the update

Since the appliance wasn’t letting me upgrade, I thought I’d first check to see if I already have upgrades staged.

Command> software-packages list --staged
 [2021-01-22T21:45:41.022] : Packages not staged

OK. Nothing staged. How do I stage packages?

Command> software-packages stage --help
usage: software-packages stage [-h] [--url [URL]] [--iso] [--acceptEulas] [--thirdParty]

optional arguments:
  -h, --help     show this help message and exit
  --url [URL]    Download software update package from URL. If no url is specified, https://vapp-updates.vmware.com/vai-
                 catalog/valm/vmw/8dc0de9a-feedl-1337-be0a-6ddeadbeefa3/6.7.0.42000.latest/ is used.
  --iso          Load software update packages from CD/DVD drive attached to the appliance
  --acceptEulas  accept all Eulas
  --thirdParty   Stage third party packages.--thirdParty should only be usedwith --url.

Sounds clear enough. I’ll try that:

Command> software-packages stage --url --acceptEulas
 [2021-01-22T21:46:28.022] : Latest updates already installed on VCSA, Nothing to stage

Well that’s not correct. There’s definitely an update available. Re-reading help again I notice that the default URL looks something like:

https://vapp-updates.vmware.com/vai-catalog/valm/vmw/8dc0de9a-feedl-1337-be0a-6ddeadbeefa3/6.7.0.42000.latest/

I’ve obfuscated the actual URL, but that’s a vCenter 6.7.0 URL, I’m using 7.0.0, and I want 7.0.1.

I go back to the appliance web UI and click the Update > Settings button.

vCenter 7 Appliance Update screen

Settings shows a different URL for 7.0.1, so I copy and paste that into the command line:

Command> software-packages stage --acceptEulas --url https://vapp-updates.vmware.com/vai-catalog/valm/vmw/......
 [2021-01-22T21:48:28.022] : Target VCSA version = 7.0.1.00200
 [2021-01-22 21:48:28,781] : Running requirements script.....

Update as of 2021-09-21: I just found out about the update.get and update.set commands, used to find and set the default URL used for downloading updates on the command line.

If you type:

update.get

… you’ll get the Currenturl (set when you first installed vCenter) and the Defaulturl (what you should be using to update vCenter). If you then type:

update.set --currentURL default

The Currenturl gets set to the Defaulturl. After that you can type:

software-packages stage --url --acceptEulas

… and the software gets staged from the Currenturl, which is the same URL used by the vCenter GUI.

Installing a specific version of vCenter

Update as of 2021-10-26: The steps shown above are fine if you want to stage the latest update, but what if you want a specific version of vCenter, not the latest?

Right now I’ve got a vCenter 7.0.2.00500 and there are two updates available, 7.0.3.00000 and 7.0.3.00100. If I run update.get:

Command> update.get
Config:
Currenturl: https://vapp-updates.vmware.com/vai-catalog/valm/vmw/8dc0de9a-feedl-1337-be0a-6ddeadbeefa3/7.0.2.00500.latest/
Defaulturl: https://vapp-updates.vmware.com/vai-catalog/valm/vmw/8dc0de9a-feedl-1337-be0a-6ddeadbeefa3/7.0.2.00500.latest/
Checkupdates: disabled
Time: 00:00:00
Day: Everyday
Latestupdateinstalltime: 2021-09-23T00:03:48.493Z
Latestupdatequerytime: ''
Username: ''
Password: ''

(License number obfuscated in the above URLs, use your own.)

Note the “.latest” at the end of the URLs. If I use that URL for staging, but change the version to the specific version that I want (without the .latest extension):

software-packages stage --url https://vapp-updates.vmware.com/vai-catalog/valm/vmw/8d167796-34d5-4899-be0a-6daade4005a3/7.0.3.00000/

I’ve just staged 7.0.3.00000 for install, and that’s the version that will be installed, even though there’s a later 7.0.3.00100 version available.

Trust but verify

A little while later everything was staged. I decided to validate everything.

Command> software-packages validate
 [2021-01-22T21:50:11.022] : For the first instance of the identity domain, this is the password given to the Administrator account.  Otherwise, this is the password of the Administrator account of the replication partner.
Enter Single Sign-On administrator password:

 [2021-01-22T21:50:22.022] : Validating software update payload
 [2021-01-22 21:50:22,327] : Running validate script.....
 [2021-01-22T21:50:26.022] : Validation successful
 [2021-01-22T21:50:26.022] : Validation process completed successfully

Then I check to see what’s staged:

Command> software-packages list --staged
 [2021-01-22T21:50:45.022] :
        category: Bugfix
        kb: https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-vcenter-server-70u1c-release-notes.html
        leaf_services: ['vmware-pod', 'vsphere-ui', 'wcp']
        vendor: VMware, Inc.
        name: VC-7.0U1c
        size in MB: 5107
        tags: []
        version_supported: []
        productname: VMware vCenter Server
        releasedate: December 17, 2020
        executeurl: https://my.vmware.com/group/vmware/get-download?downloadGroup=VC70U1C
        version: 7.0.1.00200
        updateversion: True
        allowedSourceVersions: [7.0.0.0,]
        buildnumber: 17327517
        rebootrequired: False
        summary: {'id': 'patch.summary', 'translatable': 'In-place upgrade for vCenter appliances.', 'localized': 'In-place upgrade for vCenter appliances.'}
        type: Update
        severity: Critical
        TPP_ISO: False
        url: https://vapp-updates.vmware.com/vai-catalog/valm/vmw/8dc0de9a-feedl-1337-be0a-6ddeadbeefa3/7.0.0.10700.latest/
        thirdPartyAvailable: False
        nonThirdPartyAvailable: True
        thirdPartyInstallation: False
        timeToInstall: 0
        requiredDiskSpace: {'/storage/core': 30.353511543273928, '/storage/seat': 32.21015625}
        eulaAcceptTime: 2021-01-22 21:48:37 UTC

Well, that shows:

version: 7.0.1.00200

Which is the version I’ve been trying to upgrade to, so that looks good.

Did I mention that I have backup copies of vCenter going back 30 days? Well I do. If this goes really sideways I’m going to have to restore one of them.

Let’s do the update!

Command> software-packages install --staged
 [2021-01-22T21:51:23.022] : For the first instance of the identity domain, this is the password given to the Administrator account.  Otherwise, this is the password of the Administrator account of the replication partner.
Enter Single Sign-On administrator password:

 [2021-01-22T21:51:43.022] : Validating software update payload
 [2021-01-22 21:51:43,716] : Running validate script.....
 [2021-01-22T21:51:47.022] : Validation successful
 [2021-01-22 21:51:47,730] : Copying software packages 251/251
 [2021-01-22 21:55:37,642] : Running system-prepare script.....
 [2021-01-22 21:55:42,661] : Running test transaction ....
 [2021-01-22 21:55:44,678] : Running prepatch script...
....
 [2021-01-22 21:58:27,896] : Upgrading software packages ....
 [2021-01-22T22:02:10.022] : Setting appliance version to 7.0.1.00200 build 17327517
 [2021-01-22 22:02:10,242] : Running patch script.....
 [2021-01-22 22:11:34,245] : Starting all services ....
 [2021-01-22T22:11:35.022] : Services started.
 [2021-01-22T22:11:35.022] : Installation process completed successfully

That was it. The actual update took about 20 minutes, and although the UI said no reboot was necessary vCenter did reboot during the update. When it was done vCenter was running version 7.0.1.00200.

The vCenter appliance Update “Stage Only” and “Stage and Install” buttons are still greyed-out and unselectable, but right now there are no updates available so that’s how they should be. I’ll have to wait for the next update to see if they’re working again. If the buttons are still broken, at least now I know how to use the command line to install an update.

Hope you find this useful.

“Package discrepency error, Cannot resume!”

Update as of 2021-06-30: I have successfully upgraded a couple of times since I wrote this article using the GUI and the “Stage Only” and “Stage and Install” buttons are no longer greyed out when an update is available.

I did run into an issue upgrading from 7.0.2.00000 to 7.0.2.00100 where I got the error “Package discrepency error, Cannot resume!” [sic] when I tried to stage the update. Also when upgrading from 7.0.2.00100 to 7.0.2.002.00. Both times I resolved the error and got the upgrades to install by following the steps in William Lam’s article Stage Only & Stage and Install buttons disabled when updating to vSphere 7.0 Update 2a. According to William these steps will need to be repeated until 7.0.3 is released:

Command> shell
rm -rf /storage/core/software-update/updates
rm -rf /storage/updatemgr/software-*
rm /etc/applmgmt/appliance/software_update_state.conf
rm /storage/db/patching.db*
rm -r /storage/core/software-update/*

Update as of 2021-10-26: I tried the UI today to upgrade from vCenter 7.0.2.00500 to 7.0.3.00000, and the UI still failed, so I used the command line to upgrade to 7.0.3.00000.

Once 7.0.3.00000 was installed I was able to upgrade to 7.0.3.00100 using the UI, so it looks like the UI problem has been resolved in 7.0.3 as William said it would be.

Test transaction failed to update packages”

Update as of 2021-09-21: I was upgrading a couple of vCenter instances today to the latest 7.0.2.00500 release and on one vCenter I got the error:

 [2021-09-21T17:35:56.264] : Validating software update payload
 [2021-09-21T17:35:56.264] : UpdateInfo: Using product version 7.0.2.00200 and build 17958471
 [2021-09-21 17:35:56,064] : Running validate script.....
 [2021-09-21T17:36:00.264] : Validation successful
 [2021-09-21 17:36:00,084] : Copying software packages 152/152
 [2021-09-21 17:55:01,033] : Running system-prepare script.....
 [2021-09-21 17:55:06,053] : Running test transaction ....
 [2021-09-21T17:55:07.264] : Installation process failed
 [2021-09-21T17:55:07.264] : Test transaction failed to update packages

Test transaction failed to update packages” means something failed with the package install, so I read through /var/log/vmware/applmgmt/software-packages.log and looked for lines with ERR in them. Found out that I ran out of log space in /storage/log. Once I freed up some space I re-ran the update and it installed fine.

post

Updating the vCenter appliance root password

If you’re like me, you rarely ssh into your vCenter appliance as “root”. However, the time comes when you need to update vCenter, you run the “Pre-Update Checks” — and because you never log into the appliance — you get the message that your root password needs to be updated before you can install the update.

So… log into the vCenter Service Management Console (https://your-vcenter:5480), click Access and then Edit. Make sure that SSH Login, DCLI, Console CLI, and BASH access are all enabled. Set the BASH timeout to 15 minutes so it gets disabled automatically when you’re done.

Once you’ve done that, ssh to the appliance.

$ ssh root@vcenter.labs.earlruby.org

VMware vCenter Server 7.0.0.10700

Type: vCenter Server with an embedded Platform Services Controller

Received disconnect from 192.168.200.11 port 22:2: Too many authentication failures
Disconnected from 192.168.200.11 port 22

Did you get a “Received disconnect … Too many authentication failures” message? Don’t worry, no one is hacking into your vCenter, it’s just that you have more than one ssh key on your keyring and for some reason someone at VMware thought that it would be a great idea to set the vCenter ssh setting MaxAuthTries = 2. Your first ssh key counts as one try, your second ssh key counts as attempt number 2, and… you’re done. vCenter won’t let you log in.

To bypass public key authentication checks entirely use the -o PubkeyAuthentication=no parameter for ssh:

$ ssh -o PubkeyAuthentication=no root@vcenter.labs.earlruby.org

VMware vCenter Server 7.0.0.10700

Type: vCenter Server with an embedded Platform Services Controller

root@vcenter.labs.earlruby.org's password:
Connected to service

    * List APIs: "help api list"
    * List Plugins: "help pi list"
    * Launch BASH: "shell"

Command>

Now get to the bash shell by typing shell, then passwd to set the new password, and you can update the root password:

Command> shell
Shell access is granted to root
root@vcenter [ ~ ]# passwd
New password:
Retype new password:
passwd: password updated successfully
root@vcenter [ ~ ]# exit
Command> exit
Connection to vcenter.labs.earlruby.org closed.

Before you log out, run the Pre-Update Check again to verify that vCenter sees that the password has been updated. This time you should get the message “No issues found. Pre-update checks have passed.”

Hope you find this useful.