AI without GPUs: Accessing Sapphire Rapids AMX instructions on vSphere

Full disclosure: I used to work for a startup called Bitfusion, and that startup was bought by VMware, so I now work for VMware. At Bitfusion we developed a technology for accessing hardware accelerators, such as NVIDIA GPUs, remotely across networks using TCP/IP, Infiniband, and PVRDMA. Although I still do some work on the Bitfusion product at VMware, I spend most of my time these days seeing what I can do on the vSphere platform using the latest AI/ML accelerator hardware from NVIDIA, Intel, and AMD.

Although I work at VMware, this is my own personal blog, and any views, opinions, or mistakes I publish here are purely my own and are not official views or recommendations from VMware.

This specific article is based on a talk I just gave at VMware Explore Las Vegas.

Everyone wants the latest, greatest GPUs for AI/ML training and inference workloads. As I’m sure most of you know, GPUs are just specialized matrix processors. They can quickly perform mathematical operations — in parallel — on matrices of numbers. Although GPUs were originally designed for graphics, it turns out that being able to do matrix math is extremely useful for AI/ML.

Unfortunately, every GPU vendor on the planet seems to be having about a one year order backlog when it comes to shipping datacenter-class GPUs. If you’re having a hard time buying GPUs, one thing you can do to increase the performance of your AI/ML workloads is to let the CPU’s AMX instructions do some of that AI/ML work, lessening the need for expensive and hard-to-procure GPUs.

Advanced Matrix Extensions (AMX) are a new set of instructions available on x86 CPUs. These instructions are designed to work on matrices to accelerate artificial intelligence and machine learning -related workloads. These instructions are beginning to blur the lines between CPUs and GPUs when it comes to machine learning applications.

When I started hearing that Intel Sapphire Rapids CPUs were embedding matrix operations in the CPU’s instruction set I started wondering what can I do with those instructions using AI/ML tools?

“We can do good inference on Skylake, we added instructions in Cooper Lake, Ice Lake, and Cascade Lake. But AMX is a big leap, including for training.”

— Bob Valentine, the processor architect for Sapphire Rapids


As you replace older hosts with Sapphire Rapids -based hosts you not only get performance improvements for traditional computing, you also get AMX capabilities for AI/ML workloads. You can execute diverse AI & non-AI multi-tenant workloads side by side in a virtualized environment. You have the flexibility to repurpose the IT infrastructure for AI and non-AI use cases as demand changes without additional capex. The ubiquity of Intel Xeon & vSphere in on-Prem and cloud environments, combined with an optimized AI software stack, allows you to quickly scale the compute in hybrid environments. You can run your entire end to end AI pipeline — data prep, training, optimization, inference – using CPUs with built-in AI acceleration.

Does this really work? What kind of workloads can I run?

Here’s a demo I did using an llm-foundry LLM with a 7B parameter model from HuggingFace. The code is installed in a container and the model is loaded in a Kubernetes volume. I first start the LLM in a Tanzu cluster on an Ice Lake CPU -based system with no GPUs. As you can see it takes a while just to load the model into memory, then when it starts it’s pretty jerky and slow.

I start the same exact container on Tanzu cluster running on a Sapphire Rapids CPU -based system with no GPUs. The hardware is roughly equivalent (both are using what would be considered mid-range servers at the time they were purchased), the VMs are equivalent in memory and vCPUs, but the Sapphire Rapids system runs much faster than the previous generation Ice Lake system.

LLM running on Sapphire Rapids with AMX

In addition to the above side-by-side comparison of an LLM running on Ice Lake vs Sapphire Rapids, we also fine-tuned an LLM using just Sapphire Rapids CPUs. Starting with an off-the-shelf LLAMA2-7B model, we fine-tuned it with a dataset “Finance-Alpaca” of about 17,500 queries. We used to manage the AI pipeline and Pytorch distributed fine-tuning. It took about 3.5 hours to complete on a 4 VM Tanzu cluster with Sapphire Rapids Xeon 4 hardware.

Once the model was fine-tuned with financial data we ran 3 chatbots on a single host. Now that the model was fine-tuned we could ask it questions such as “What is IRR?”, “What in NPV?”, “What is the difference between IRR and NPV?” and get correct and detailed answers back from the LLM.

3 Finance Chatbots running on Sapphire Rapids with AMX

We just took an off-the-shelf LLM, fine-tuned it with financial services information in about 3.5 hours, and now we have a chatbot that can answer basic questions about finance and financial terms. No GPUs were used to do any of this.

You may not want to run every ML workload you have on just CPUs, but there are a lot of them that you can run on just CPUs. Workloads will run even faster with GPUs, but you may not want to pay for GPUs for every workload you run if the speed of a CPU is good enough.

vSphere Requirements for using AMX

If you want to try this in your vSphere environment this is what you’ll need:

  • Hardware with Sapphire Rapids CPUs.
  • Guest VMs running Linux kernel 5.16 or later. Kernel 5.19 or later recommended.
  • Guest VMs using HW version 20 (ESXI 8.0u1, vCenter 8.0u1).
  • If you’re running Kubernetes, your worker nodes will also need to run Linux kernel 5.16 or later.


Obviously you need hardware that supports AMX if you want to use AMX. I’m using Intel Sapphire Rapids Xeon4 CPUs. The hosts have motherboards that support DDR5 memory and PCIE5. In my lab I’m currently testing with Dell R760, Dell R660, and Supermicro SYS-421GE-TNRT servers.

Linux Kernel 5.19 or later

Support for AMX was added to the Linux 5.16 kernel, so if you want to use AMX you’ll need to use 5.16 or a later kernel. In my tests for guest VMs I tried Ubuntu 22.04 images with the 5.19 kernel and images using 6.2 kernels, both of which worked fine. Although Ubuntu 22.04 ships with a 5.15 kernel, the 6.2 kernel is available using the hardware enablement (HWE) kernel package that comes with 22.04. The HWE kernel can be installed with apt:

sudo apt update
sudo apt install \
    --install-recommends \

vSphere 8.0u1 and Hardware Version 20

Which capabilities of the underlying hardware are virtualized in vSphere is determined by the hardware version (HW version) of the guest VM. The AMX instructions are virtualized in HW version 20, so if you want to access AMX instructions in vSphere you need to be using HW version 20 on your VMs.

To find out what HW version a VM is using, in vCenter go to the VM, click the Updates tab, and click the CHECK STATUS button.

HW version 20 is supported on ESXI 8.0u1. To run ESXI 8.0u1 you’ll need vCenter 8.0u1. If you’re still running vCenter 7 and you want to try this technology out I suggest that you upgrade to vCenter 8 as soon as you can, then start upgrading ESXI hosts to ESXI 8.

Once you have a Linux VM with a 5.19 kernel (or later) running HW version 20, any AI/ML framework that you run on that VM will have access to the hardware’s AMX instructions. If you run Docker on the VM any AI/ML containers that you run will be running on a the VM’s kernel and will have access to the hardware’s AMX instructions. If the version of the tools that you’re using were compiled to use AMX, they’ll now run faster using the matrix math capabilities of the Sapphire Rapids CPU — no GPUs necessary.

Tanzu Requirements for using AMX

The kernel requirement also applies to Tanzu worker nodes. Whatever kernel is installed on your worker nodes is the kernel that your Kubernetes pods use. To use AMX your Tanzu worker nodes need to be running kernel 5.16 or later.

Tanzu comes with a set of pre-built, automatically-updated node images called Tanzu Kubernetes Releases (TKRs). Each image is an OVA file that deploys a Kubernetes control node or a worker node. A node is just a Linux VM with a specific version of Kubernetes installed on it and a specific Linux kernel.

When installing Tanzu one of the steps is to set up a Content Library where TKRs are stored. The TKRs are automatically downloaded from VMware into the Content Library whenever new TKRs are released.

When you upgrade a Tanzu Kubernetes cluster, say from Kubernetes 1.23 to 1.24, the Tanzu Supervisor Cluster will create a new VM from 1.24 TKR image, wait for it to join the cluster, then it will evacuate, shut down, and delete one of your 1.23 nodes. The Supervisor Cluster repeats this over and over, first replacing your cluster’s control nodes, then replacing the cluster’s worker nodes, until all of the nodes in the cluster are running Kubernetes 1.24.

Note: Kubernetes should only be upgraded from one minor release to the next minor release. If you have a cluster running Kubernetes 1.20 and you want to upgrade to 1.24, you have to first upgrade to 1.21, then 1.22, then 1.23, and finally to 1.24. Skipping a minor version is not recommended and may break your cluster.

VMware publishes two different TKR images for each version of Kubernetes, one based on PhotonOS and one based on Ubuntu.

At this time VMware has not yet published a TKR with a 5.19 (or later) kernel. If you want to start using Sapphire Rapids AMX instructions and you want to use Tanzu Kubernetes, you have two choices:

  • Wait for the official TKR from VMware with a 5.19 (or later) kernel.
  • Build your own TKR using the Bring Your Own Image (BYOI) process.

Bring Your Own Image (BYOI)

To build an image, follow the instructions on the Github page vSphere Tanzu Kubernetes Grid Image Builder. The process is fairly straightforward. The steps I followed were:

I cloned the repo with git clone:

$ git clone

I edited the packer-variables/vsphere.j2 file so it contained information about my vSphere environment. I also created a folder called “BYOI” under my cluster in vCenter and specified that folder in the config, so any “work in progress” images or VMs generated by the BYOI tool would be created in one place.

Make sure you put the correct values for your vSphere environment in the packer-variables/vsphere.j2 file. The first time I tried this I was using another group’s environment to build a TKR, I used the wrong network name, and I spent about 2 hours trying to figure out why the image was erroring out.

I ran make list-versions to get a list of the available versions:

$ make list-versions
            Kubernetes Version  |  Supported OS
              v1.24.9+vmware.1  |  [photon-3,ubuntu-2004-efi]
       v1.25.7+vmware.3-fips.1  |  [photon-3,ubuntu-2004-efi]

I am going to use v1.24.9+vmware.1, so I ran this to download the artifacts:

$ make run-artifacts-container KUBERNETES_VERSION=v1.24.9+vmware.1
Using default port for artifacts container 8081
Error: No such container: v1.24.9---vmware.1-artifacts-server
Unable to find image '' locally
v1.24.9_vmware.1-tkg.1: Pulling from tkg/tkg-vsphere-linux-resource-bundle
2731d8df91a4: Pull complete
73c864854baf: Pull complete
08eb7dea6abf: Pull complete
52654f918c81: Pull complete
da27b4bff06e: Pull complete
797512e2c717: Pull complete
0a994466e4a6: Pull complete
31d1a74dbc07: Pull complete
b3444fea81b1: Pull complete
193c65bff1b1: Pull complete
Digest: sha256:9dcec246657fa7cf5ece1feab6164e200c9bc82b359471bbdec197d028b8e577
Status: Downloaded newer image for

Customize the TKR OVA Image

The last step is to build the TKR OVA file, but before I build it I want to add two customizations. I need to need to use VM Hardware version (aka “VMX version”) 20 for the OVA, and I need to make sure that we build an Ubuntu OVA with a kernel >= 5.16.

The Github README docs have examples of how to customize the OVA. The first example shows how to change the HW version, and the second one shows how to add new OS packages. Reading those two examples tells me what I need to do.

Use HW Version 20 for the Image

I edit the packer-variables/default-args.j2 file and change the vmx_version:

    "vmx_version": "20",

Install a Kernel >= 5.16 on the Image

Earlier when I ran make list-versions I noticed that the v1.24.9+vmware.1 Kubernetes version supports Ubuntu 20.04. However, the only way to get a packaged kernel >= 5.16 installed is to install the Ubuntu 22.04 linux-image-generic-hwe-22.04 package, and vsphere-tanzu-kubernetes-grid-image-builder does not currently have a base image for 22.04.

Since I need 22.04, and 20.04 is the only version available, I’m going to force Packer to do a release upgrade before generating the OVA. To do that I’m going to install the jammy-updates repo from 22.04. When I do that, the vSphere Tanzu Kubernetes Grid Image Builder will cause Packer to upgrade the image to Ubuntu 22.04 and I can then install the Ubuntu 22.04 linux-image-generic-hwe-22.04 package.

Following the instructions from Adding new OS packages and configuring the repositories or sources:

I create a directory repos under ansible/files/

I create a file ansible/files/repos/ubuntu.list which contains the lines:

deb jammy-updates main restricted
deb jammy-security main restricted
deb jammy main restricted

I create the file packer-variables/repos.j2 which contains:

    {% if os_type == "photon-3" %}
    "extra_repos": "/image-builder/images/capi/image/ansible/files/repos/photon.repo"
    {% elif os_type == "ubuntu-2004-efi" %}
    "extra_repos": "/image-builder/images/capi/image/ansible/files/repos/ubuntu.list"
    {% endif %}

Doing all of that will add the jammy-updates repo to the TKR image. Now to add the kernel package I go back to the same packer-variables/default-args.j2 file we were editing earlier, I look for the extra_debs line and add the HWE kernel package for Ubuntu 22.04, linux-image-generic-hwe-22.04:

"extra_debs": "unzip iptables-persistent nfs-common linux-image-generic-hwe-22.04",

Now that I’ve made those changes I can build the TKR OVA.

Build the Image

The main Github README page says I can run make build-node-image to build the OVA, but I want to use a specific version of Kubernetes and I want to use Ubuntu 20.04, so I assume I need to pass some extra parameters to make. Typing make help gives me all of the information I need to construct the right build command:

IP=[my VM's IP address, where the artifact container is running]
make build-node-image \
    OS_TARGET=ubuntu-2004-efi \
    KUBERNETES_VERSION=v1.24.9+vmware.1 \
    TKR_SUFFIX=spr \
    HOST_IP=$IP \

This takes a while to run and will create and configure a VM on your vSphere cluster that will be used to create the TKR OVA image. If you want to watch the build, run the docker logs command that make build-node-image spits out:

docker logs -f v1.24.9---vmware.1-ubuntu-2004-efi-image-builder

When the process is done you should have an image file named ${HOME}/image/ovas/ubuntu-2004-amd64-v1.24.9---vmware.1-spr.ova

Add the Image to a local Content Library

In order for Tanzu to be able to use the image it has to be added to a local content library. If you don’t have a local content library create one by going to vSphere Client > Content Libraries > Create.

Once you’ve created the library click the library name to pull it up on the screen and click Actions > Import Item. Upload the ubuntu-2004-amd64-v1.24.9---vmware.1-spr.ova file.

Associate the Content Library with the Cluster Namespace

Go to vSphere Client > Workload Management > “your cluster namespace”, then click MANAGE CONTENT LIBRARIES on the VM Service tile. Make sure that the local library, and any other libraries used by your Cluster Namespace, are checked.

Deploy Your Own Image

To create a Kubernetes cluster you create a YAML file and run kubectl on in. The following YAML file builds a cluster based on the ubuntu-2004-amd64-v1.24.9---vmware.1-spr.ova TKR image, which is based on Ubuntu 20.04 and contains Kubernetes 1.24.9 and a Linux HWE kernel (currently kernel 6.2).

kind: TanzuKubernetesCluster
  name: my-tanzu-kubernetes-cluster-name
  namespace: my-tanzu-kubernetes-cluster-namespace
  annotations: os-name=ubuntu
      replicas: 3
      vmClass: guaranteed-small
      storageClass: vsan-default-storage-policy
          name: v1.24.9---vmware.1-spr
    - name: worker
      replicas: 3
      vmClass: guaranteed-8xlarge
      storageClass: vsan-default-storage-policy
        - name: containerd
          mountPath: /var/lib/containerd
            storage: 160Gi
          name: v1.24.9---vmware.1-spr

A couple of notes on this YAML file:

  • For a stable, easily-upgradable cluster I recommend a minimum of 3 control plane nodes and 3 worker nodes.
  • The metadata section’s annotations line must be present to use an Ubuntu TKR as the base image.
  • The TKR reference just refers to the first part of the TKR’s file name. You can see the TKR file names by looking in the vCenter Content Library you set up for Tanzu. To get a list of valid reference names:
    kubectl config use-context $my-tanzu-kubernetes-cluster-namespace
    kubectl get tanzukubernetesreleases

    Only the names that have READY=True and COMPATIBLE=True can be used to deploy a cluster.
  • In order to allocate a separate, larger volume for storing docker images on the worker nodes I added a volumes section. I have a storage class defined named vsan-default-storage-policy and the volumes section will allocate a 160GiB volume using the disk specified by vsan-default-storage-policy and mount it on the worker node using the path /var/lib/containerd, which is where container images are stored. Change vsan-default-storage-policy to the name of a storage policy defined for your tanzu-kubernetes-cluster-namespace if you want this to work on your system.
  • Since images are downloaded as needed, the containerd volume will be destroyed when a worker node is destroyed. It will be destroyed and recreated (empty) when a worker node is upgraded.

I recommend deploying a fresh cluster using this YAML file just so you can try it out and see how it works. Once you’ve deployed a new cluster any AI/ML containers that you run will be running on a 6.2 kernel and will have access to the hardware’s AMX instructions. If the version of the tools that you’re using were compiled to use AMX, they’ll now run faster using the matrix math capabilities of the Sapphire Rapids CPU — no GPUs necessary.

Upgrading an existing Tanzu Kubernetes cluster to the new TKR image

To upgrade an existing Tanzu Kubernetes 1.23 cluster to 1.24 using the new TKR image:

  • Modify the existing 1.23 cluster’s YAML file to refer to the v1.24.9---vmware.1-spr TKR image.
  • Make sure that the YAML file has the annotations line so the Supervisor will deploy an Ubuntu-based TKR.

Then run:

kubectl config use-context $my-tanzu-kubernetes-cluster-namespace
kubectl apply -f $my-yaml-filename

If you can’t find your cluster’s YAML file you can also do this:

kubectl config use-context $my-tanzu-kubernetes-cluster-namespace
kubectl edit tanzukubernetescluster/$my-tanzu-kubernetes-cluster-name

This will pull up a system editor (vim on my system) containing the cluster’s freshly-generated current YAML file. Make the changes and save the file. Any changes you make will be applied immediately when you save the file.

Check the deployed cluster VMs

You can ssh into a cluster’s VMs and check the kernel version running and verify that you can see the amx and avx flags for the CPUs, indicating that the extra instructions are accessible. In vCenter find one of the cluster’s VMs and get the IP address. To get the ssh password:

kubectl config use-context my-tanzu-kubernetes-cluster-namespace
kubectl get secret \
    my-tanzu-kubernetes-cluster-name-ssh-password \
    -o jsonpath='{.data.ssh-passwordkey}' \
    -n my-tanzu-kubernetes-cluster-namespace | base64 -d
ssh -o PubkeyAuthentication=no vmware-system-user@vm-ip-address

$ uname -a
Linux my-tanzu-kubernetes-cluster-name-02-twk2c-wzsjc 6.2.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

$ grep VERSION_ID /etc/os-release

$ grep avx /proc/cpuinfo | head -1
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid cldemote movdiri movdir64b fsrm md_clear serialize amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities

With these instructions you should now be able to create VMs and Kubernetes clusters that can access Sapphire Rapids AMX instructions. Any AI/ML framework that you run will have access to the hardware’s AMX instructions. If the version of the tools that you’re using were compiled to use AMX, they’ll now run faster using the matrix math capabilities of the Sapphire Rapids CPU — no GPUs necessary.

Hope you find this useful.


Calculating the value for 64bitMMIOSizeGB

When adding a GPU to a vSphere VM using PCI passthrough there are a couple of additional settings that you need to make or your VM won’t boot.

When creating the VM you’ll need to set the Actions > Edit > VM Options > Boot Options > Firmware and select “EFI”. You need to do this before you install the operating system on the VM. If you don’t do this the GPUs won’t work and the VM won’t boot.

To add a GPU, in vCenter go to the VM, select Actions > Edit > Add New Device. Any GPUs set up as PCI passthrough devices should appear in a pick list. Add one or more GPUs to your VM.

Note that after adding one device, when you add additional GPUs the first GPU you selected still appears in the pick list. If you add the same GPU more than once your VM will not boot. If you add a GPU that’s being used by another running VM your VM will not boot. Pay attention to the PCI bus addresses displayed and make sure that the GPUs you pick are unique and not in use on another VM.

Finally you have to set up memory-mapped I/O (MMIO) to map system memory to the GPU’s framebuffer memory so that the CPU can pass data to the GPU. In vCenter go to the VM, select Actions > Edit > VM Options > Advanced > Edit configuration.

Once you’re on the Configuration parameters screen, add two more parameters:

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = ????
Actions > Edit > VM Options > Advanced > Edit configuration

The 64bitMMIOSizeGB value is calculated by adding up the total GB of framebuffer memory on all GPUs attached to the VM.  If the total GPU framebuffer memory falls on a power-of-2, setting pciPassthru.64bitMMIOSizeGB to the next power of 2 works.

If the total GPU framebuffer memory falls between two powers-of-2, round up to the next power of 2, then round up again, to get a working setting.

Powers of 2 are 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 …

For example, two NVIDIA A100 cards with 40GB each = 80GB (in between 64GB and 128GB), so round up to the next power of 2 (128GB), then round up again to the next power of 2 after that (256GB) to get the correct setting. If you set it too low the VM won’t boot, but it won’t give you an error message telling you what the issue is either.

Here are some configurations that I’ve tested and verified:

  • 2 x 16GB NVIDIA V100 = 32GB, 32 is a power of 2, so round up to the next power of 2 which is 64, set pciPassthru.64bitMMIOSizeGB = 64 to boot.
  • 2 x 24GB NVIDIA P40 = 48GB, which is in-between 32 and 64, round up to 64 and again to 128, requires pciPassthru.64bitMMIOSizeGB = 128 to boot.
  • 8 x 16GB NVIDIA V100 = 128GB, 128 is a power of 2, so round up to the next power of 2 which is 256, set pciPassthru.64bitMMIOSizeGB = 256 to boot.
  • 10 x 16GB NVIDIA V100 = 160GB, which is in-between 128 and 256, round up to 256 and again to 512, set pciPassthru.64bitMMIOSizeGB = 512 to boot.

Hope you find this useful.


Updating ESXi root passwords and authorized ssh keys with Ansible

I manage a number of vCenter instances and a lot of ESXi hosts. Some of the hosts are production, some for test and development. Sometimes an ESXi host needs to be used by a different group or temporarily moved to a new cluster and then back again afterwards.

To automate the configuration of these systems and the VMs running on them I use Ansible. For a freshly-imaged, new installation of ESXi one of the first things I do it to run an Ansible playbook that sets up the ESXi host, and the first thing it does is to install the ssh keys of the people who need to log in as root, then it updates the root password.

I have ssh public keys for every user that needs root access. A short bash script combines those keys and my Ansible management public key into authorized_keys files for the ESXi hosts in each vCenter instance. In my Ansible group_vars/ directory is a file for each group of ESXi hosts, so all of the ESXi hosts in a group get the same root password and ssh keys. This also makes it easy to change root passwords and add and remove ssh keys of users as they are added to or leave different groups.

Here’s a portion of a group_vars/esxi_hosts_cicd/credentials.yml file for a production CICD cluster:

# ESXI Hosts (only Ops can ssh in)
esxi_root_authorized_keys_file: authkeys-ops

esxi_username: 'root'
esxi_password: !vault |

The password is encrypted using Ansible Vault.

In my main.yml file I call the esxi_host role for all of the hosts in the esxi_hosts inventory group. Since I use a different user to manage non-ESXi hosts, the play that calls the role tells Ansible to use the root user only when logging into ESXi hosts.

- name: Setup esxi_hosts
  gather_facts: False
  user: root
  hosts: esxi_hosts
    - esxi_host

The esxi_host role has an esxi_host/tasks/main.yml playbook. The two plays that update the authorized_keys file and root password look like this:

- name: Set the authorized ssh keys for the root user
    src: "{{ esxi_root_authorized_keys_file }}"
    dest: /etc/ssh/keys-root/authorized_keys
    owner: root
    group: root
    mode: '0600'

- name: Set the root password for ESXI Hosts
  shell: "echo '{{ esxi_password }}' | passwd -s"
  no_log: True

The first time I run this the password is set to some other value, so I start Ansible with:

ansible-playbook main.yml \
    --vault-id ~/path/to/vault/private/key/file \
    -i inventory/ \
    --limit [comma-separated list of new esxi hosts] \
    --ask-pass \

This will prompt me for the current root ssh password. Once I enter that it logs into each ESXi host, installs the new authorized_keys file, uses the vault private key to decrypt the password, then updates the root password.

After I’ve done this once, since the Ansible ssh key is also part of the authorized_keys file, subsequent Ansible updates just use the ssh key to login, and I don’t have to use --ask-pass or --ask-become-pass parameters.

This is also handy when switching a host from one cluster to another. As long as the ssh keys are installed I no longer need the current root password to update the root password.

Hope you find this useful.


Setting up a 100GbE PVRDMA Network on vCenter 7

After writing my last article on Getting NVIDIA NGC containers to work with VMware PVRDMA networks I had a couple of people ask me “How do I set up PVRDMA networking on vCenter?” These are the steps that I took to set up PVRDMA networking in my lab.

RDMA over Converged Ethernet (RoCE) is a network protocol that allows remote direct memory access (RDMA) over an Ethernet network. It works by encapsulating an Infiniband (IB) transport packet and sending it over Ethernet. If you’re working with network applications that require high bandwidth and low latency, RDMA will give you lower latency, higher bandwidth, and a lower CPU load than an API such as Berkeley sockets.

Full disclosure: I used to work for a startup called Bitfusion, and that startup was bought by VMware, so I now work for VMware. At Bitfusion we developed a technology for accessing hardware accelerators, such as NVIDIA GPUs, remotely across networks using TCP/IP, Infiniband, and PVRDMA. I still work on the Bitfusion product at VMware, and spend a lot of my time getting AI and ML workloads to work across networks on virtualized GPUs.

In my lab I’m using Mellanox Connect/X5 and ConnectX/6 cards on hosts that are running ESXi 7.0.2 and vCenter 7.0.2. The cards are connected to a Mellanox Onyx MSN2700 100GbE switch.

Since I’m working with Ubuntu 18.04 and 20.04 virtual machines (VMs) in a vCenter environment, I have a couple of options for high-speed networking:

  • I can use PCI passthrough to pass the PCI network card directly through to the VM and use the network card’s native drivers on the VM to set up a networking stack. However this means that my network card is only available to a single VM on the host, and can’t be shared between VMs. It also breaks vMotion (the ability to live-migrate the VM to another host) since the VM is tied to a specific piece of hardware on a specific host. I’ve set this up in my lab but stopped doing this because of the lack of flexibility and because we couldn’t identify any performance difference compared to SR-IOV networking.
  • I can use SR-IOV and Network Virtual Functions (NVFs) to make the single card appear as if it’s multiple network cards with multiple PCI addresses, pass those through to the VM, and use the network card’s native drivers on the VM to set up a networking stack. I’ve set this up in my lab as well. I can share a single card between multiple VMs and the performance is similar to PCI passthough. The disadvantages are that setting up SR-IOV and configuring the NVFs is specific to a card’s model and manufacturer, so what works in my lab might not work in someone else’s environment.
  • I can set up PVRDMA networking and use the PVRDMA driver that comes with Ubuntu. This is what I’m going to show how to do in this article.

Set up your physical switch

First, make sure that your switch is set up correctly. On my Mellanox Onyx MSN2700 100GbE switch that means:

  • Enable the ports you’re connecting to.
  • Set the speed of each port to 100G.
  • Set auto-negotiation for each link.
  • MTU: 9000
  • Flowcontrol Mode: Global
  • LAG/MLAG: No
  • LAG Mode: On

Set up your virtual switch

vCenter supports Paravirtual RDMA (PVRDMA) networking using Distributed Virtual Switches (DVS). This means you’re setting up a virtual switch in vCenter and you’ll connect your VMs to this virtual switch.

In vCenter navigate to Hosts and Clusters, then click the DataCenter icon (looks like a sphere or globe with a line under it). Find the cluster you want to add the virtual switch to, right click on the cluster and select Distributed Switch > New Distributed Switch.

  • Name: “rdma-dvs”
  • Version: 7.0.2 – ESXi 7.0.2 and later
  • Number of uplinks: 4
  • Network I/O control: Disabled
  • Default port group: Create
  • Port Group Name: “VM 100GbE Network”

Figure out which NIC is the right NIC

  • Go to Hosts and Clusters
  • Select the host
  • Click the Configure tab, then Networking > Physical adapters
  • Note which NIC is the 100GbE NIC for each host

Add Hosts to the Distributed Virtual Switch

  • Go to Hosts and Clusters
  • Click the DataCenter icon
  • Select the Networks top tab and the Distributed Switches sub-tab
  • Right click “rdma-dvs”
  • Click “Add and Manage Hosts”
  • Select “Add Hosts”
  • Select the hosts. Use “auto” for uplinks.
  • Select the physical adapters based on the list you created in the previous step, or find the Mellanox card in the list and add it. If more than one is listed, look for the card that’s “connected”.
  • Manage VMkernel adapters (accept defaults)
  • Migrate virtual machine networking (none)

Tag a vmknic for PVRDMA

  • Select an ESXi host and go to the Configure tab
  • Go to System > Advanced System Settings
  • Click Edit
  • Filter on “PVRDMA”
  • Set Net.PVRDMAVmknic = "vmk0"

Repeat for each ESXi host.

Set up the firewall for PVRDMA

  • Select an ESXi host and go to the Configure tab
  • Go to System > Firewall
  • Click Edit
  • Scroll down to find pvrdma and check the box to allow PVRDMA traffic through the firewall.

Repeat for each ESXi host.

Set up Jumbo Frames for PVRDMA

To enable jumbo frames a vCenter cluster using virtual switches you have to set MTU 9000 on the Distributed Virtual Switch.

  • Click the Data Center icon.
  • Click the Distributed Virtual Switch that you want to set up, “rdma-dvs” in this example.
  • Go to the Configure tab.
  • Select Settings > Properties.
  • Look at Properties > Advanced > MTU. This should be set to 9000. If it’s not, click Edit.
  • Click Advanced.
  • Set MTU to 9000.
  • Click OK.

Add a PVRDMA NIC to a VM

  • Edit the VM settings
  • Add a new device
  • Select “Network Adapter”
  • Pick “VM 100GbE Network” for the network.
  • Connect at Power On (checked)
  • Adapter type PVRDMA (very important!)
  • Device Protocol: RoCE v2

Configure the VM

For Ubuntu:

sudo apt-get install rdma-core infiniband-diags ibverbs-utils

Tweak the module load order

In order for RDMA to work the vmw_pvrdma module has to be loaded after several other modules. Maybe someone else knows a better way to do this, but the method that I got to work was adding a script /usr/local/sbin/ to ensure that Infiniband modules are loaded on boot, then calling that from /etc/rc.local so it gets executed at boot time.

# modules that need to be loaded for PVRDMA to work
/sbin/modprobe mlx4_ib
/sbin/modprobe ib_umad
/sbin/modprobe rdma_cm
/sbin/modprobe rdma_ucm

# Once those are loaded, reload the vmw_pvrdma module
/sbin/modprobe -r vmw_pvrdma
/sbin/modprobe vmw_pvrdma

Once that’s done just set up the PVRDMA network interface the same as any other network interface.

Testing the network

To verify that I’m getting something close to 100Gbps on the network I use the perftest package.

To test bandwith I pick two VMs on different hosts. On one VM I run:

$ ib_send_bw --report_gbits

On the other VM I run the same command plus I add the IP address of the PVRDMA interface on the first machine:

$ ib_send_bw --report_gbits

That sends a bunch of data across the network and reports back:

So I’m getting an average of 96.31Gbps over the network connection.

I can also check the latency using the ib_send_lat:

Hope you find this useful.


Upgrading vCenter 7 via the command line

Updated on 2021-10-26.

I have vCenter installed and I want to update to When I run Update Planner > Interoperability it reports that all of my ESXi hosts are running ESXi 7.0.1. If I run the pre-update checks I get “No issues found”. When I go to the appliance to do the upgrade, both “Stage Only” and “Stage and Install” are greyed-out and unselectable.

vCenter 7 Appliance Available Updates screen

I tried a dozen different tricks, including ssh-ing into the appliance as root and editing the /etc/applmgmt/appliance/software_update_state.conf file, but nothing could enable the “Stage Only” and “Stage and Install” buttons.

Use the command line

I finally decided to try upgrading via the command line. I have backups going back 30 days. I even double-checked and yes, my NFS server has files in the backup directory for each of the past 30 days and they have data in them. There’s probably even a way to restore one of those backups if something goes horribly wrong. Onwards!

I was already logged into the vCenter appliance shell as root. The next thing I needed to do was to figure out where the command line tools were hidden. I found them in /usr/lib/applmgmt/support/scripts.

Disclaimer: I work at VMware, but I have no idea if the following is an “acceptable practice” or not. If your production vCenter is broken and you have a support contract, call support. If you’re messing around on a home or test system and you don’t care how badly you screw it up, feel free to try the command line tools.

root@vcenter [ ~ ]# cd /usr/lib/applmgmt/support/scripts
root@vcenter [ /usr/lib/applmgmt/support/scripts ]# ls -al
total 108
drwxr-xr-x 4 root root  4096 Aug 30 18:18 .
drwxr-xr-x 4 root root  4096 Aug 30 18:18 ..
-r-xr-xr-x 1 root root   205 Aug 15 07:16
-r-xr-xr-x 1 root root   633 Aug 15 07:16 manifest-verification
-r-xr-xr-x 1 root root   286 Aug 15 07:16
-r-xr-xr-x 1 root root  2056 Aug 15 07:16
-r-xr-xr-x 1 root root  3396 Aug 15 07:16
drwxr-xr-x 2 root root  4096 Aug 30 18:18 postinstallscripts
-r-xr-xr-x 1 root root  5207 Aug 15 07:16
-r-xr-xr-x 1 root root  4171 Aug 15 07:16
-r-xr-xr-x 1 root root   251 Aug 15 07:16
-r-xr-xr-x 1 root root  4001 Aug 15 07:16
-r-xr-xr-x 1 root root  3910 Aug 15 07:16
-r-xr-xr-x 1 root root 35773 Aug 15 07:16
-r-xr-xr-x 1 root root  8085 Aug 15 07:16
drwxr-xr-x 2 root root  4096 Aug 30 18:18 tests

These are the Python scripts that are linked to the Command shell. I’m actually in the root shell. I can run these directly from the root shell, or exit back to the Command shell and use them in the “official” way. In case I need to pull in support let’s do this the official way.

The script is what does the upgrade. Let’s exit back to the Command shell and see what it says it supports.

root@vcenter [ /usr/lib/applmgmt/support/scripts ]# exit
Command> software-packages
usage: software-packages [-h] {stage,unstage,validate,install,list} ...

optional arguments:
  -h, --help            show this help message and exit

    stage               Stage software update packages
    unstage             Purge staged software update packages
    validate            Validate software update packages
    install             Install software update packages
    list                List details of software update packages

Stage the packages for the update

Since the appliance wasn’t letting me upgrade, I thought I’d first check to see if I already have upgrades staged.

Command> software-packages list --staged
 [2021-01-22T21:45:41.022] : Packages not staged

OK. Nothing staged. How do I stage packages?

Command> software-packages stage --help
usage: software-packages stage [-h] [--url [URL]] [--iso] [--acceptEulas] [--thirdParty]

optional arguments:
  -h, --help     show this help message and exit
  --url [URL]    Download software update package from URL. If no url is specified,
                 catalog/valm/vmw/8dc0de9a-feedl-1337-be0a-6ddeadbeefa3/ is used.
  --iso          Load software update packages from CD/DVD drive attached to the appliance
  --acceptEulas  accept all Eulas
  --thirdParty   Stage third party packages.--thirdParty should only be usedwith --url.

Sounds clear enough. I’ll try that:

Command> software-packages stage --url --acceptEulas
 [2021-01-22T21:46:28.022] : Latest updates already installed on VCSA, Nothing to stage

Well that’s not correct. There’s definitely an update available. Re-reading help again I notice that the default URL looks something like:

I’ve obfuscated the actual URL, but that’s a vCenter 6.7.0 URL, I’m using 7.0.0, and I want 7.0.1.

I go back to the appliance web UI and click the Update > Settings button.

vCenter 7 Appliance Update screen

Settings shows a different URL for 7.0.1, so I copy and paste that into the command line:

Command> software-packages stage --acceptEulas --url
 [2021-01-22T21:48:28.022] : Target VCSA version =
 [2021-01-22 21:48:28,781] : Running requirements script.....

Update as of 2021-09-21: I just found out about the update.get and update.set commands, used to find and set the default URL used for downloading updates on the command line.

If you type:


… you’ll get the Currenturl (set when you first installed vCenter) and the Defaulturl (what you should be using to update vCenter). If you then type:

update.set --currentURL default

The Currenturl gets set to the Defaulturl. After that you can type:

software-packages stage --url --acceptEulas

… and the software gets staged from the Currenturl, which is the same URL used by the vCenter GUI.

Installing a specific version of vCenter

Update as of 2021-10-26: The steps shown above are fine if you want to stage the latest update, but what if you want a specific version of vCenter, not the latest?

Right now I’ve got a vCenter and there are two updates available, and If I run update.get:

Command> update.get
Checkupdates: disabled
Time: 00:00:00
Day: Everyday
Latestupdateinstalltime: 2021-09-23T00:03:48.493Z
Latestupdatequerytime: ''
Username: ''
Password: ''

(License number obfuscated in the above URLs, use your own.)

Note the “.latest” at the end of the URLs. If I use that URL for staging, but change the version to the specific version that I want (without the .latest extension):

software-packages stage --url

I’ve just staged for install, and that’s the version that will be installed, even though there’s a later version available.

Trust but verify

A little while later everything was staged. I decided to validate everything.

Command> software-packages validate
 [2021-01-22T21:50:11.022] : For the first instance of the identity domain, this is the password given to the Administrator account.  Otherwise, this is the password of the Administrator account of the replication partner.
Enter Single Sign-On administrator password:

 [2021-01-22T21:50:22.022] : Validating software update payload
 [2021-01-22 21:50:22,327] : Running validate script.....
 [2021-01-22T21:50:26.022] : Validation successful
 [2021-01-22T21:50:26.022] : Validation process completed successfully

Then I check to see what’s staged:

Command> software-packages list --staged
 [2021-01-22T21:50:45.022] :
        category: Bugfix
        leaf_services: ['vmware-pod', 'vsphere-ui', 'wcp']
        vendor: VMware, Inc.
        name: VC-7.0U1c
        size in MB: 5107
        tags: []
        version_supported: []
        productname: VMware vCenter Server
        releasedate: December 17, 2020
        updateversion: True
        allowedSourceVersions: [,]
        buildnumber: 17327517
        rebootrequired: False
        summary: {'id': 'patch.summary', 'translatable': 'In-place upgrade for vCenter appliances.', 'localized': 'In-place upgrade for vCenter appliances.'}
        type: Update
        severity: Critical
        TPP_ISO: False
        thirdPartyAvailable: False
        nonThirdPartyAvailable: True
        thirdPartyInstallation: False
        timeToInstall: 0
        requiredDiskSpace: {'/storage/core': 30.353511543273928, '/storage/seat': 32.21015625}
        eulaAcceptTime: 2021-01-22 21:48:37 UTC

Well, that shows:


Which is the version I’ve been trying to upgrade to, so that looks good.

Did I mention that I have backup copies of vCenter going back 30 days? Well I do. If this goes really sideways I’m going to have to restore one of them.

Let’s do the update!

Command> software-packages install --staged
 [2021-01-22T21:51:23.022] : For the first instance of the identity domain, this is the password given to the Administrator account.  Otherwise, this is the password of the Administrator account of the replication partner.
Enter Single Sign-On administrator password:

 [2021-01-22T21:51:43.022] : Validating software update payload
 [2021-01-22 21:51:43,716] : Running validate script.....
 [2021-01-22T21:51:47.022] : Validation successful
 [2021-01-22 21:51:47,730] : Copying software packages 251/251
 [2021-01-22 21:55:37,642] : Running system-prepare script.....
 [2021-01-22 21:55:42,661] : Running test transaction ....
 [2021-01-22 21:55:44,678] : Running prepatch script...
 [2021-01-22 21:58:27,896] : Upgrading software packages ....
 [2021-01-22T22:02:10.022] : Setting appliance version to build 17327517
 [2021-01-22 22:02:10,242] : Running patch script.....
 [2021-01-22 22:11:34,245] : Starting all services ....
 [2021-01-22T22:11:35.022] : Services started.
 [2021-01-22T22:11:35.022] : Installation process completed successfully

That was it. The actual update took about 20 minutes, and although the UI said no reboot was necessary vCenter did reboot during the update. When it was done vCenter was running version

The vCenter appliance Update “Stage Only” and “Stage and Install” buttons are still greyed-out and unselectable, but right now there are no updates available so that’s how they should be. I’ll have to wait for the next update to see if they’re working again. If the buttons are still broken, at least now I know how to use the command line to install an update.

Hope you find this useful.

“Package discrepency error, Cannot resume!”

Update as of 2021-06-30: I have successfully upgraded a couple of times since I wrote this article using the GUI and the “Stage Only” and “Stage and Install” buttons are no longer greyed out when an update is available.

I did run into an issue upgrading from to where I got the error “Package discrepency error, Cannot resume!” [sic] when I tried to stage the update. Also when upgrading from to Both times I resolved the error and got the upgrades to install by following the steps in William Lam’s article Stage Only & Stage and Install buttons disabled when updating to vSphere 7.0 Update 2a. According to William these steps will need to be repeated until 7.0.3 is released:

Command> shell
rm -rf /storage/core/software-update/updates
rm -rf /storage/updatemgr/software-*
rm /etc/applmgmt/appliance/software_update_state.conf
rm /storage/db/patching.db*
rm -r /storage/core/software-update/*

Update as of 2021-10-26: I tried the UI today to upgrade from vCenter to, and the UI still failed, so I used the command line to upgrade to

Once was installed I was able to upgrade to using the UI, so it looks like the UI problem has been resolved in 7.0.3 as William said it would be.

Test transaction failed to update packages”

Update as of 2021-09-21: I was upgrading a couple of vCenter instances today to the latest release and on one vCenter I got the error:

 [2021-09-21T17:35:56.264] : Validating software update payload
 [2021-09-21T17:35:56.264] : UpdateInfo: Using product version and build 17958471
 [2021-09-21 17:35:56,064] : Running validate script.....
 [2021-09-21T17:36:00.264] : Validation successful
 [2021-09-21 17:36:00,084] : Copying software packages 152/152
 [2021-09-21 17:55:01,033] : Running system-prepare script.....
 [2021-09-21 17:55:06,053] : Running test transaction ....
 [2021-09-21T17:55:07.264] : Installation process failed
 [2021-09-21T17:55:07.264] : Test transaction failed to update packages

Test transaction failed to update packages” means something failed with the package install, so I read through /var/log/vmware/applmgmt/software-packages.log and looked for lines with ERR in them. Found out that I ran out of log space in /storage/log. Once I freed up some space I re-ran the update and it installed fine.