Deploy OpenStack Ussuri on Focal with MAAS 2.8 and Juju

David van der Spek
11 min readSep 3, 2020

Due to the global pandemic my internship for my Master’s Degree in Bio-Pharmaceutical Sciences was abruptly interrupted. In an effort to find a project that be compatible in today’s remote working environment I turned to my life long hobby of computers and IT. The journey started with Kubeflow which is a platform for “making deployments of machine learning workflows on Kubernetes simple, portable and scalable” and centers around Jupyter Notebooks which are used by the computational scientists at the faculty were I was doing my internship. However, due to the limited memory capacity I have in my home server I was facing issues when trying to run a somewhat production ready Kubernetes cluster for deploying Kubeflow setup for multiple users. After going out and buying 3 used Dell R620 servers for a very good price I investigated what platform would allow me to get the most out of my new servers in terms of deployment flexibility and deploying a production ready Kubernetes cluster for Kubeflow. XCP-ng and Proxmox came to mind, however, I stumbled on a blog by 2stacks which encouraged me to look into OpenStack. While following his guides I decided to do things slightly differently from the start, as I chose to use the latest official bundle of charmed OpenStack and Ubuntu 20.04 (Focal Fossa). Needless to say, this came with it’s own difficulties and learning curve. Especially once I took a look at the reference OpenStack architecture by Dell and Canonical and decided I wanted to try and achieve a similar type of deployment, although lacking the high availability. This led me to Dimiter Naydenov’s excellent blog where he goes into great detail of the more advanced VLAN setup seen in the reference OpenStack architecture from Dell and Canonical. After spending countless hours with digging through documentation and bug reports while referencing the previously mentioned blogs and the reference OpenStack architecture by Dell and Canonical I’ve come to the point where I would like to give back to the community with my findings and an updated and expanded guide for deploying OpenStack using MAAS 2.8.1 and Juju.

Objective

Provide an easy to follow and complete guide for deploying a more production ready version of charmed OpenStack while continuing to use the official OpenStack bundles. Just like the reference OpenStack architecture by Dell and Canonical, the MAAS node will be used as a KVM host managed by MAAS. This enables us to deploy the Juju controller needed for deploying OpenStack to a VM on the MAAS node. At a later point we will also look at deploying logging, monitoring and alarming applications to one or multiple VM(s) on the MAAS node, again similar to the reference OpenStack architecture by Dell and Canonical.

Hardware requirements

  • 1 x Managed switch (switch with VLAN support), a second switch can simplify the configuration but is not required
  • 1 x Physical host with two network interface for MAAS (this can also be a VM if your host supports nested virtualization)
  • 3 x Physical hosts for OpenStack with each two network interfaces and two storage disks
  • 1 x Router or Firewall that can provide internet access and routing between VLANs.

Networking

In this deployment the network layout by Dimiter Naydenov will be used, which is similar to the Dell and Canonical reference OpenStack architecture. Without going into too much detail regarding the Juju/MAAS network model, as these can be found in the linked blog post and the official documentation, a few terms are of particular importance.
Spaces: “A space is a logical grouping of subnets that can communicate with one another.”
VLAN: “A common way to create logically separate networks using the same physical infrastructure.
Fabrics: A means of connect and allow communication between VLANs.

OpenStack network layout

The following overview of the typical networks in an OpenStack deployment have been adapted from Dimiter Naydenov’s blog. While they might be slightly outdated they are still very relevant.

The 7 different networks typically seen in an OpenStack deployment
  • admin — used for admin-level access to services, including for automating administrative tasks.
  • internal — used for internal endpoints and communications between most of the services.
  • public — used for public service endpoints, e.g. using the OpenStack CLI to upload images to glance.
  • external — used by neutron to provide outbound access for tenant networks.
  • data — used mostly for guest compute traffic between VMs an between VMs and OpenStack services.
  • storage(data) — used by clients of the Ceph/Swift storage backend to consume block and object storage contents.
  • storage(cluster) — used for replicating persistent storage data between units of Ceph/Swift.

The admin network is not shown in the above diagram to make it more clear. However, each application is also connected to the admin network (except the storage cluster unit?).

These separate network will be mapped to various spaces in MAAS and Juju for the OpenStack deployment. An overview of the mapping of OpenStack applications to MAAS/Juju spaces that was adapted from Dimiter Naydenov’s blog can be found below.

OpenStack application mapped to MAAS/Juju spaces
  • default space is used for MAAS PXE booting and Juju API servers.
  • admin-api space represents the OpenStack admin network
  • internal-api space represents the OpenStack internal network.
  • public-api space represents the OpenStack public network.
  • storage-data space represents the OpenStack storage client network.
  • storage-cluster space represents the OpenStack storage cluster network.
  • compute-data space represents the OpenStack data network.
  • compute-external space represents the OpenStack external network.

Each network will be created using a VLAN and have a unique subnet assigned to it. In my personal deployment I have two additional VLANs, one for IPMI for MAAS to manage power to the servers and one for the default space to negate the need for a second switch.

Physical network layout

In this setup, the MAAS node functions as a router/firewall and provides DNS and DHCP for all VLANs except for the public and external VLANs. The MAAS node will be setup to route the traffic from its secondary network interface connected to the VLANS to the router through its primary interface. It is a requirement that MAAS provides internet connectivity to the VLANs it manages to be able to properly deploy OpenStack. To avoid the need for a second switch, the default network is set to VLAN 2 on ports 3 to 9. This means that on these ports, VLAN 2 will need to be set as the untagged network and thus these ports do not have access to the existing network.

Physical network connections and VLAN allocations

The following VLANs with their accompanying CIDR will be configured in the switch. The above physical network diagram does not show the out-of-band management (OOBM) ports as this can differ between devices. For my setup, I have iDRAC7 Enterprise which has a dedicated NIC and has been setup to use VLAN 11. Older versions of iDRAC do not support this setting to my knowledge, which would require the ethernet ports on the switch that connects to the iDRAC controller of each server to have the untagged (default) network set to VLAN 11.

  • default (for PXE boot): 10.14.0.0/20 (VID: 2)
  • public: 10.50.0.0/20 (VID: 50)
  • internal: 10.100.0.0/20 (VID:100)
  • admin: 10.150.0.0/20 (VID:150)
  • storage (for client data): 10.200.0.0/20 (VID:200)
  • compute (for guest VM data): 10.250.0.0/20 (VID:250)
  • external (for guest VM outbound): 10.99.0.0/20 (VID:99)
  • cluster (for storage replication): 10.30.0.0/20 (VID:30)
  • OOBM/IPMI (for power control: 192.168.11.0/24 (VID:11)

MAAS node installation and configuration

The first step to setting up the MAAS node is to install the latest version of Ubuntu Server 20.04. The networking settings can be left as default as long as the server has a working connection to the internet for updates. After installing Ubuntu Server and rebooting the first we need to do is enable the netplan network configuration to be persistent. This can be done with the command echo “network: {config: disabled}” > /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg. Next, a few packages will need installing so we can configure the VLANs and the routing/firewall.

sudo apt get update
sudo apt install openssh-server vlan
sudo snap install ufw
sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils

Networking

Apart from the VLAN configuration, to allow the MAAS node to also be a KVM host managed by MAAS, a large number bridges need to be configured in netplan. In the configuration file, each VLAN is linked to a network interface and each bridge is linked to a VLAN (except br0 and br1, which link directly to the network interface and should be connected to the existing network and default network (10.14.0.0/20) respectively). To edit the netplan configuration use the command: sudo nano /etc/netplan/00-installer-config.yaml. Below is the configuration file I use:

# This is the network config written by 'subiquity'
network:
version: 2
ethernets:
enp5s0:
dhcp4: false
dhcp6: false
enp6s0:
dhcp4: false
dhcp6: false
bridges:
br0:
interfaces: [enp5s0]
dhcp4: no
dhcp6: no
addresses: [192.168.178.223/24]
gateway4: 192.168.178.1
nameservers:
addresses: [192.168.178.1]
br1:
interfaces: [enp6s0]
dhcp4: no
dhcp6: no
addresses: [10.14.0.1/20]
#IPMI
br11:
interfaces: [vlan11]
dhcp4: no
dhcp6: no
addresses: ["192.168.11.99/24"]
#public
br50:
interfaces: [vlan50]
dhcp4: no
dhcp6: no
addresses: ["10.50.0.2/20"]
#internal
br100:
interfaces: [vlan100]
dhcp4: no
dhcp6: no
addresses: ["10.100.0.1/20"]
#admin
br150:
interfaces: [vlan150]
dhcp4: no
dhcp6: no
addresses: ["10.150.0.1/20"]
#storage
br200:
interfaces: [vlan200]
dhcp4: no
dhcp6: no
addresses: ["10.200.0.1/20"]
#compute
br250:
interfaces: [vlan250]
dhcp4: no
dhcp6: no
addresses: ["10.250.0.1/20"]
#external
br99:
interfaces: [vlan99]
dhcp4: no
dhcp6: no
addresses: ["10.99.0.2/20"]
#cluster
br30:
interfaces: [vlan30]
dhcp4: no
dhcp6: no
addresses: ["10.30.0.1/20"]
vlans:
#IPMI
vlan11:
id: 11
link: enp6s0
#public
vlan50:
id: 50
link: enp6s0
#internal
vlan100:
id: 100
link: enp6s0
#admin
vlan150:
id: 150
link: enp6s0
#storage
vlan200:
id: 200
link: enp6s0
#compute
vlan250:
id: 250
link: enp6s0
#external
vlan99:
id: 99
link: enp6s0
#cluster
vlan30:
id: 30
link: enp6s0

In my case, my primary network interface is named enp5s0 and my secondary network interface is named enp6s0. These values will need to be changed according to your hardware along with the IP address of your existing network, default gateway and DNS address for the primary network interface of the MAAS node. The command ip addr can be used to display the available network interfaces of the system. Once the necessary changes have been made execute sudo netplan apply to apply the changes and reboot for good measure. After rebooting it is a good idea to list the network interfaces using ip addr to ensure everything was applied correctly.

Next, ufw will be used to setup MAAS to route traffic from the OpenStack nodes connected to the VLANs on its secondary network interface to the router through its primary interface. First, forwarding must be enabled in ufw. This can be done using sudo nano /etc/default/ufw and changing the DEFAULT_FORWARD_POLICY to accept: DEFAULT_FORWARD_POLICY="ACCEPT". Next, edit /etc/ufw/sysctl.conf and uncommend the line net/ipv4/ip_forward=1. To configure NAT for the VLAN interfaces: sudo nano /etc/ufw/before.rules. Below is the configuration file I use. In the NAT table rules, the -F is used to flush the iptables when ufw is loaded. Without this, duplicate iptables will be added upon restarts of ufw which might cause issues.

#
# rules.before
#
# Rules that should be run before the ufw command line added rules. Custom
# rules should be added to one of these chains:
# ufw-before-input
# ufw-before-output
# ufw-before-forward
#

# NAT table rules
*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]

# Forward traffic through eth0 - Change to match you out-interface
-F
-A POSTROUTING -s 10.14.0.0/20 -o br0 -j MASQUERADE
-A POSTROUTING -s 10.30.0.0/20 -o br0 -j MASQUERADE
-A POSTROUTING -s 10.100.0.0/20 -o br0 -j MASQUERADE
-A POSTROUTING -s 10.150.0.0/20 -o br0 -j MASQUERADE
-A POSTROUTING -s 10.200.0.0/20 -o br0 -j MASQUERADE
-A POSTROUTING -s 10.250.0.0/20 -o br0 -j MASQUERADE
-A POSTROUTING -o br0 -j SNAT --to-source 192.168.178.223
#-D POSTROUTING -o br0 -j SNAT --to-source 192.168.178.223
#-A FORWARD -i br0 -o br1 -m state \
# --state RELATED,ESTABLISHED -j ACCEPT
#-A FORWARD -i br1 -o br0 -j ACCEPT

# don't delete the 'COMMIT' line or these nat table rules won't
# be processed
COMMIT

# Don't delete these required lines, otherwise there will be errors
*filter
:ufw-before-input - [0:0]
:ufw-before-output - [0:0]
:ufw-before-forward - [0:0]
:ufw-not-local - [0:0]
# End required lines


# allow all on loopback
-A ufw-before-input -i lo -j ACCEPT
-A ufw-before-output -o lo -j ACCEPT

# quickly process packets for which we already have a connection
-A ufw-before-input -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A ufw-before-output -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A ufw-before-forward -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

# drop INVALID packets (logs these in loglevel medium and higher)
-A ufw-before-input -m conntrack --ctstate INVALID -j ufw-logging-deny
-A ufw-before-input -m conntrack --ctstate INVALID -j DROP

# ok icmp codes for INPUT
-A ufw-before-input -p icmp --icmp-type destination-unreachable -j ACCEPT
-A ufw-before-input -p icmp --icmp-type time-exceeded -j ACCEPT
-A ufw-before-input -p icmp --icmp-type parameter-problem -j ACCEPT
-A ufw-before-input -p icmp --icmp-type echo-request -j ACCEPT

# ok icmp code for FORWARD
-A ufw-before-forward -p icmp --icmp-type destination-unreachable -j ACCEPT
-A ufw-before-forward -p icmp --icmp-type time-exceeded -j ACCEPT
-A ufw-before-forward -p icmp --icmp-type parameter-problem -j ACCEPT
-A ufw-before-forward -p icmp --icmp-type echo-request -j ACCEPT

# allow dhcp client to work
-A ufw-before-input -p udp --sport 67 --dport 68 -j ACCEPT

#
# ufw-not-local
#
-A ufw-before-input -j ufw-not-local

# if LOCAL, RETURN
-A ufw-not-local -m addrtype --dst-type LOCAL -j RETURN

# if MULTICAST, RETURN
-A ufw-not-local -m addrtype --dst-type MULTICAST -j RETURN

# if BROADCAST, RETURN
-A ufw-not-local -m addrtype --dst-type BROADCAST -j RETURN

# all other non-local packets are dropped
-A ufw-not-local -m limit --limit 3/min --limit-burst 10 -j ufw-logging-deny
-A ufw-not-local -j DROP

# allow MULTICAST mDNS for service discovery (be sure the MULTICAST line above
# is uncommented)
-A ufw-before-input -p udp -d 224.0.0.251 --dport 5353 -j ACCEPT

# allow MULTICAST UPnP for service discovery (be sure the MULTICAST line above
# is uncommented)
-A ufw-before-input -p udp -d 239.255.255.250 --dport 1900 -j ACCEPT

# don't delete the 'COMMIT' line or these rules won't be processed
COMMIT

As ufw is not just a router but a firewall, it is important to set the following rules for MAAS to be able to function properly.

sudo ufw allow 5240
sudo ufw allow 5248
sudo ufw allow 5241:5247/tcp
sudo ufw allow 5241:5247/udp
sudo ufw allow 5250:5270/tcp
sudo ufw allow 5250:5270/udp
sudo ufw allow 22
sudo ufw allow 5900
sudo ufw allow 7911
sudo ufw allow 53
sudo ufw allow 8000/tcp
sudo ufw allow 3128/tcp
sudo ufw allow bind9
sudo ufw allow ntp
sudo ufw allow tftp
sudo ufw allow 67:68/udp
sudo ufw allow 5787/udp
sudo ufw route allow in on br1 out on br0 from 10.14.0.0/20

Finally, restart ufw: sudo ufw disable && sudo ufw enable. The following commands can be useful for debugging iptable issues:
sudo iptables -t nat -v -x -n -L
sudo iptables -t nat -L -v

MAAS installation and configuration

For this deployment the maas-test-db snap is used. However, for production deployments it is recommended to use a dedicated installation of postgresql. Start off by installing maas-test-db and maas from snap.
sudo snap install maas-test-db
sudo snap install maas

Next, initialize MAAS and create an admin user. Ensure that the MAAS URL shown during the initialization returns the IP address of the primary (external) network interface of the MAAS node and change it if needed. In my case it shows: MAAS URL [default=http://192.168.178.223:5240/MAAS/]:

sudo maas init region+rack — database-uri maas-test-db:///
sudo maas createadmin

For those reading this story, I have since moved on from OpenStack to Kubernetes on bare-metal. However, at some point in the future I do plan on finishing this article.

--

--

David van der Spek

Bio-Pharmaceutical Sciences MSc student that got caught up with Kubeflow during the pandemic.