Running firewalls on vSphere
Everyone with a homelab runs some sort of firewall in their environment. Either as a physical box (Ubiquiti Dream Machine, Sophos SG/XG appliances, pfSense appliances, whitebox firewall) or as a virtual machine on vSphere, Hyper-V, Proxmox etc. As a long time fan of Sophos UTM (formerly known as Astaro) I was running it on a Sophos appliance and then moved it to a Dell R220, see blog post here. Unfortunately Sophos announced recently on their Partner news portal that Sophos UTM will be EOL (End-of-Life) by June 30, 2026. You can find this information here, here and here. Although the EOL is still more then 3 years ahead of us, I already planned a migration to my next firewall already. Based on my recent challenges I wanted to write this post about running firewalls on vSphere.
History of Sophos firewall
Sophos has 2 firewall products:
- Sophos UTM
- Sophos XG
Sophos UTM came from the Astaro acquisition in 2011 whereas the Sophos XG is the next generation firewall Sophos built from scratch. As a long time Astaro UTM / Sophos UTM fan the most obvious choice would be to move to Sophos XG. In November 2015 when Sophos XG was released for the very first time (v15) it was the most horrible successor for the Sophos UTM. Everything has changed, a lot of features UTM had, were not implemented, complete new interface etc. So I decided at that point in time not to move to XG.
Sophos also offers a home user license that differs a little bit between these 2 products. For the UTM you have a limitation of 50 IP addresses that you can protect (I had a 100 IP version as I did beta testing for Astaro in the past). For the XG you have a hardware limitation, 4 Cores and 6GB memory. Although you can have more cores and memory installed, XG will only use the limit.
Sophos UTM to Sophos XG migration preparation
After almost 8 years after the initial release I decided to give Sophos XG another shot and see how it has evolved. I wanted to have minimal downtime for my internet connection so I decided to deploy a virtual XG in my homelab and make a manual migration of all the configuration I have in UTM, export the config and import it into the new firewall. Unfortunately some configuration are not working as expected in XG like it did in UTM. Also the use of predefined objects is not as good as it is in UTM.
Migration to Sophos XG to bare metal
After the initial migration of the most important configuration I went for the bare metal installation of XG. For the current Sophos UTM I’m running a Dell R220 (OEM) with the following specs:
- Intel Xeon E3-1231 v3 (4 Cores + HT)
- 8GB UDIMM ECC Memory
- 2 onboard NICs (Broadcom BCM5720)
- Quad port NIC card (Broadcom BCM5719, HP Ethernet 1Gb 4-port 331T Adapter)
The network configuration is following:
- Onboard NIC 1 -> eth0 in UTM -> Internal
- Onboard NIC 2 -> eth1 in UTM -> External -> Cable Modem
- Quad NIC 1 (first from the left) -> eth2 in UTM -> Transfer LAN to Cisco SG300
- Quad NIC 2 (second from the left -> eth3 in UTM -> VLAN for IOT
- Quad NIC3/4 -> eth4/5 in UTM -> unused currently or for network testing
Because I had some SSDs laying around I used a new one for the XG installation to be sure that I can go back to UTM if something happens and the migration is not going as expected. Unfortunately the first and most important thing that has changed with the installation of Sophos XG was the enumeration of the physical NICs. Now eth0-3 was the quad port NIC and eth4-5 the onboard one. As a little Monk, I can’t stand this. I tried to see if there is a way to overcome this by changing the order within Sophos XG but I haven’t found a way to do this.
Migration to Sophos XG on vSphere
I thought about some alternatives and decided to install ESXi on the Dell R220 in the hope that with a virtual machine I won’t have this problem. After the installation of the custom iso from Dell to 7.0 U3i and update to the lastest version all of the NICs are showing in the correct order.
I decided to create a vSwitch for every vmnic and create there the required port groups. When I first deployed the XG version as an OVF it comes with 3 network adapter and these 3 adapters corresponded to the numbering scheme in vSphere. So network card 1 was Port A, network card 2 was Port B and network card 3 was Port C.
When I added then 3 additional network card it started to get messy. Now I had the ordering of the NICs inside the VM mixed up. So network card 1 was not PortA, instead it was PortB etc. To be sure I also installed UTM as a virtual machine and encountered there the same problem.
PCI Slot numbering in vSphere
It took some time to figure out why this was happening. In the beginning I thought the ordering was based on the MAC address somehow, but that was wrong. I also tried to re-order the NICs within the OS but for both XG and UTM there is nothing you can do from an OS perspective. Per accident I stumbled across PCI slot numbering in VMs so I checked the number in the advanced configuration of the VMs.
When a VM is created and powered-on with 3 network adapters the ethernetX.pcislotnumber value in the advanced configuration of this VM looks like the following:
For PCIe devices the numbering starts with 160. In this case 160 is used either for the LSI SAS or Paravirtual SCSI Controller of the VM. So the network cards starts with 192. When we now add 3 additional cards it will look like that:
As you can see there is a continuous numbering for the pcislotnumer value. Unfortunately from an OS perspective it looks like this now:
|VM network card||pcislotnumber||OS Network card|
|Network card 1||192||eth1|
|Network card 2||224||eth3|
|Network card 3||256||eth5|
|Network card 4||1184||eth0|
|Network card 5||1216||eth2|
|Network card 6||1248||eth4|
The Linux OS is ignoring the first digit and ordering the NICs based on the last 3 digits. To fix this issue I had just to shutdown the VM and change the values of the pcislotnumber to the following order:
After this change the network cards in the Linux OS were in the correct order and I could complete the UTM/XG installation without any further issues.
After multiple hours and a long night I finally figured out what happened here and how to fix it. I will follow up with a “PCI Slot numbering: TL;DR” version, where I will explain in more detail what is happening in the backend. Based on my knowledge now and without wasting to much time I could have also just changed the virtual uplink the the right one. But as I said in the beginning I’m a little Monk and therefor this solution doesn’t look good/pretty!
I took also the time the check another firewall solution (OPNsense) and the results were the same.
If you have further questions please contact me by mail or leave a comment below.