I really like VSCode, and I use the ssh plugin to edit code on remote machines, but recently the ssh connection has been dropping all of the time, even when I’m editing code on another machine that’s on the same local network.
I’ve updated both my OS and VSCode multiple times recently, so I thought some bug had slipped into one of the updates and that was causing a problem. I was somewhat correct. It seems that VSCode keeps a cache of data and code on the remote machine, and something in a VSCode update was trying to do something using the bits in the old cache data that was no longer supported.
To fix the problem I just removed the cache as follows:
Exit completely out of VSCode so that no VSCode processes are running. Force quit if you have to.
ssh to the remote machine(s) and delete the ~/.vscode-server directory with rm -Rf ~/.vscode-server/
If you get any “cannot remove [file]: Device or resource busy” errors then look for stuck processes: lsof | grep $HOME/.vscode-server | awk '{ print $2 }' | sort -u … then kill those processes, then trying removing the directory again.
Restart VSCode.
Once I did this all of my connection problems disappeared.
RDMA over Converged Ethernet (RoCE) is a network protocol that allows remote direct memory access (RDMA) over an Ethernet network. It works by encapsulating an Infiniband (IB) transport packet and sending it over Ethernet. If you’re working with network applications that require high bandwidth and low latency, RDMA will give you lower latency, higher bandwidth, and a lower CPU load than an API such as Berkeley sockets.
Full disclosure: I used to work for a startup called Bitfusion, and that startup was bought by VMware, so I now work for VMware. At Bitfusion we developed a technology for accessing hardware accelerators, such as NVIDIA GPUs, remotely across networks using TCP/IP, Infiniband, and PVRDMA. I still work on the Bitfusion product at VMware, and spend a lot of my time getting AI and ML workloads to work across networks on virtualized GPUs.
In my lab I’m using Mellanox Connect/X5 and ConnectX/6 cards on hosts that are running ESXi 7.0.2 and vCenter 7.0.2. The cards are connected to a Mellanox Onyx MSN2700 100GbE switch.
Since I’m working with Ubuntu 18.04 and 20.04 virtual machines (VMs) in a vCenter environment, I have a couple of options for high-speed networking:
I can use PCI passthrough to pass the PCI network card directly through to the VM and use the network card’s native drivers on the VM to set up a networking stack. However this means that my network card is only available to a single VM on the host, and can’t be shared between VMs. It also breaks vMotion (the ability to live-migrate the VM to another host) since the VM is tied to a specific piece of hardware on a specific host. I’ve set this up in my lab but stopped doing this because of the lack of flexibility and because we couldn’t identify any performance difference compared to SR-IOV networking.
I can use SR-IOV and Network Virtual Functions (NVFs) to make the single card appear as if it’s multiple network cards with multiple PCI addresses, pass those through to the VM, and use the network card’s native drivers on the VM to set up a networking stack. I’ve set this up in my lab as well. I can share a single card between multiple VMs and the performance is similar to PCI passthough. The disadvantages are that setting up SR-IOV and configuring the NVFs is specific to a card’s model and manufacturer, so what works in my lab might not work in someone else’s environment.
I can set up PVRDMA networking and use the PVRDMA driver that comes with Ubuntu. This is what I’m going to show how to do in this article.
Set up your physical switch
First, make sure that your switch is set up correctly. On my Mellanox Onyx MSN2700 100GbE switch that means:
Enable the ports you’re connecting to.
Set the speed of each port to 100G.
Set auto-negotiation for each link.
MTU: 9000
Flowcontrol Mode: Global
LAG/MLAG: No
LAG Mode: On
Set up your virtual switch
vCenter supports Paravirtual RDMA (PVRDMA) networking using Distributed Virtual Switches (DVS). This means you’re setting up a virtual switch in vCenter and you’ll connect your VMs to this virtual switch.
In vCenter navigate to Hosts and Clusters, then click the DataCenter icon (looks like a sphere or globe with a line under it). Find the cluster you want to add the virtual switch to, right click on the cluster and select Distributed Switch > New Distributed Switch.
Name: “rdma-dvs”
Version: 7.0.2 – ESXi 7.0.2 and later
Number of uplinks: 4
Network I/O control: Disabled
Default port group: Create
Port Group Name: “VM 100GbE Network”
VLAN Type: VLAN (If you are using a VLAN)
VLAN ID: (the VLAN ID associated with the subnet you’re using for this network)
Figure out which NIC is the right NIC
Go to Hosts and Clusters
Select the host
Click the Configure tab, then Networking > Physical adapters
Note which NIC is the 100GbE NIC for each host
Add Hosts to the Distributed Virtual Switch
Go to Hosts and Clusters
Click the DataCenter icon
Select the Networks top tab and the Distributed Switches sub-tab
Right click “rdma-dvs”
Click “Add and Manage Hosts”
Select “Add Hosts”
Select the hosts. Use “auto” for uplinks.
Select the physical adapters based on the list you created in the previous step, or find the Mellanox card in the list and add it. If more than one is listed, look for the card that’s “connected”.
Manage VMkernel adapters (accept defaults)
Migrate virtual machine networking (none)
Tag a vmknic for PVRDMA
PVRDMA requires an out of band (OOB) communication channel (outside of the RDMA protocol) to exchange information that enables virtualization to work. The ESXi Net.PVRDMAVmknic setting determines which vmknic the OOB communication happens on. It has no effect on the data path or on other vmk services — vMotion, vSAN, Provisioning, Management, etc. — those are turned on or off on a per-vmk basis.
Select an ESXi host and go to the Configure tab
Go to System > Advanced System Settings
Click Edit
Filter on “PVRDMA”
Set Net.PVRDMAVmknic = "vmk0"
Repeat for each ESXi host.
Set up the firewall for PVRDMA
Select an ESXi host and go to the Configure tab
Go to System > Firewall
Click Edit
Scroll down to find pvrdma and check the box to allow PVRDMA traffic through the firewall.
Repeat for each ESXi host.
Set up Jumbo Frames for PVRDMA
To enable jumbo frames a vCenter cluster using virtual switches you have to set MTU 9000 on the Distributed Virtual Switch.
Click the Data Center icon.
Click the Distributed Virtual Switch that you want to set up, “rdma-dvs” in this example.
Go to the Configure tab.
Select Settings > Properties.
Look at Properties > Advanced > MTU. This should be set to 9000. If it’s not, click Edit.
In order for RDMA to work the vmw_pvrdma module has to be loaded after several other modules. Maybe someone else knows a better way to do this, but the method that I got to work was adding a script /usr/local/sbin/rdma-modules.sh to ensure that Infiniband modules are loaded on boot, then calling that from /etc/rc.local so it gets executed at boot time.
#!/bin/bash
# rdma-modules.sh
# modules that need to be loaded for PVRDMA to work
/sbin/modprobe mlx4_ib
/sbin/modprobe ib_umad
/sbin/modprobe rdma_cm
/sbin/modprobe rdma_ucm
# Once those are loaded, reload the vmw_pvrdma module
/sbin/modprobe -r vmw_pvrdma
/sbin/modprobe vmw_pvrdma
Once that’s done just set up the PVRDMA network interface the same as any other network interface.
Testing the network
To verify that I’m getting something close to 100Gbps on the network I use the perftest package.
To test bandwith I pick two VMs on different hosts. On one VM I run:
$ ib_send_bw --report_gbits
On the other VM I run the same command plus I add the IP address of the PVRDMA interface on the first machine:
$ ib_send_bw --report_gbits 192.168.128.39
That sends a bunch of data across the network and reports back:
So I’m getting an average of 96.31Gbps over the network connection.
I can also check the latency using the ib_send_lat:
Hope you find this useful.
Manage Cookie Consent
We use technologies like cookies to store and/or access device information. We do this to improve browsing experience and to show (non-) personalized ads. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.