Improving your VM performance in OpenStack: NUMA and CPU Pinning

Improving your VM performance in OpenStack: NUMA and CPU Pinning

Memory has a large impact in the performance of your workload. This affirmation is specially true if your workload is running on a VM, so you must be careful with your memory and NUMA if your machine supports it.

But wait! What is NUMA?

In the past, processors had been designed as Symmetric Multi-processing or Uniform Memory Architecture machines, which means that all processors shared the same access to all memory available in the system.

UMA architecture example. Source: Intel.com

However that changed a few years ago with AMD Opteron and Intel Nehalem proccesors. They implemented a new architecture called Non-uniform memory access (NUMA) or more correctly Cache-Coherent Numa (ccNUMA). In this architecture each processor has a “local” bank of memory, to which it has a much closer (lower latency) access. Of course a processor can access to the whole memory available in the system, but at a pottentialy higher latency and lower performance.

NUMA architecture example. Source: Intel.com

As you can see on the diagram, each processor has a “local” bank of memory. If data resides in local memory, access is very fast, however if your data resides in remote memory, access it’s slower and you’ll get a performance hit.

CPU pinning and OpenStack

In OpenStack if you create a VM using a flavor with 2 or more VCPU’s, your VCPU’s could be mapped to different phyiscal memory zones (NUMA node0 and NUMA node1) which would imply that your VPCPU’s would need to access to two different memory zones. This is a major problem if you want to squeeze your performance. Let’s see how can we deal with this problem

First of all, you should check if your machine supports NUMA. You check it issuing the following command:

lscpu | grep NUMA
NUMA node(s):          2
NUMA node0 CPU(s):     0-17,36-53
NUMA node1 CPU(s):     18-35,54-71

As you can see on the example, my machine has 2 NUMA nodes. The first seventeen cores belongs to the first NUMA node. Cores from 18 to 35 belong to the second NUMA node. Don’t take into consideration cores from 36 to 71 since they’re additional threads from HyperThreading. It’s very important for CPU pinning to know which cores are virtual and which cores are pyhsical. Use lscpu tool to identify which cores are physical and which cores are virtual

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2

Lastly, check that your hypervisor is aware of NUMA topology.

virsh nodeinfo
CPU model:           x86_64
CPU(s):              72
CPU frequency:       2099 MHz
CPU socket(s):       1
Core(s) per socket:  18
Thread(s) per core:  2
NUMA cell(s):        2
Memory size:         150582772 KiB

On each compute node that pinning of virtual machines will be allowed we need to edit the nova.conf file and set the following option:

vcpu_pin_set=0-30

This options specifies a list or range of physical CPU’s cores to reserve for VM’s. OpenStack will ensure that your VM’s will be pinned to these CPU cores.

Now, restart nova-compute on each compute node (the name of the package could be different if you’re using Ubuntu)

systemctl restart openstack-nova-compute

We have set up that our VM’s will be pinned to cores 0 to 30. We need to ensure that host processes would not run on these cores. We can achieve that with the isolcpus kernel argument.

On Red Hat 7 and derivatives you can edit boot options using grubby:

grubby --update-kernel=ALL --args="isolcpus=0-30"

Update your boot record after that:

grub2-install your_boot_device

And of course reboot your machine.

NUMA and OpenStack Scheduler: How to?

We’re very close to being able to launch VM’s pinned to our physical cores. However we need to set up a few more things.

First of all we need to edit the Nova-Scheduler filters. We need to add AggregateInstanceExtraSpecFilter and NUMATopologyFilter values to the list of scheduler_default_filters. We’ll be using these filters to segregate compute nodes that can be used for CPU pinning from those that can not, and to apply NUMA aware scheduling rules when launching instances.

Your scheduler_default_filters should look similar to this one:

scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,
ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter,
NUMATopologyFilter,AggregateInstanceExtraSpecsFilter

Now, create the “numa_nodes” host aggregate. This hosts will hosts our pinned VM’s

nova aggregate-create numa_nodes

We need to create some metadata for the “numa_nodes” aggregate. This metadata will match the flavors which will be used to instantiate our VM’s pinned to the phyisical cores

nova aggregate-set-metadata 1 pinned=true

Also, we’re going to create another host aggregate for hosts which will not host pinned VM’s

nova aggregate-create normal
nova aggregate-set-metadata 2 pinned=false

Update the existing flavors so that their extra spec’s match them to compute hosts in normal aggregate

for FLAVOR in `nova flavor-list | cut -f 2 -d ' ' | grep -o [0-9]*`; \
 do nova flavor-key ${FLAVOR} set \
 "aggregate_instance_extra_specs:pinned"="false"; \
done

Create a new flavour for our pinned VM’s:

nova flavor-create m1.small.numa 6 2048 20 2

We need to set hw:cpy_policy flavor extra specification to dedicated This option specifies that all instances created using this flavor will require dedicated compute resources and will be pinned to physical cores accordingly.

nova flavor-key flavor_id set hw:cpu_policy=dedicated

Set the aggregate_instance_extra_specs:pinned flavor extra specification to true:

nova flavor-key flavor_id set aggregate_instance_extra_specs:pinned=true

Lastly, add some compute hosts to our “numa_nodes” aggregate. Compute nodes which are not intended to be targets for pinned instances sholud be added to our “normal” aggregate

nova aggregate-add-host 1 compute-node-1

Happy “OpenStacking”!

 
comments powered by Disqus