Saturday, December 20, 2014

Configuring DVR in OpenStack Juno

Before Juno, when we deploy Openstack in production, there was always a painful point about the single l3-agent node which caused two issues: a performance bottleneck and a single point of failure (albeit there were some non-standard ways around this issue).   Now Juno comes with new Neutron features to provide HA L3-agent and Distributed Virtual Router (DVR).

DVR distributes East-West traffic via virtual routers running on compute nodes. Also virtual routers on compute nodes handle North-South floating IP traffic locally for VM running on the same node. However if floating IP is not in use, VM originated external SNAT traffic is still handled centrally by virtual router in controller/network node.  These aspects spread the load of network traffic across your compute nodes and your network controller nodes thus distributing network performance.

HA L3 Agent provides virtual router HA by VRRP. A virtual gateway IP is always available from one of controller/network nodes thus eliminating the single point of failure.

The following blog will discuss how to configure DVR in Juno in a complete configuration aspect.   In this example we used RHEL7 on Redhat’s RDO for Juno.

The host configuration is 3 nodes, one management node, and two compute nodes.   Each node has a data interface for access to the node itself and a bridge interface for the floating-ip network that allows instances access outside of their private subnet to the physical network.

I ran through a standard packstack install specifying GRE tunnels for my connectivity between my management and compute nodes.  Be aware that the current version of DVR only supports GRE or VXLAN tunnels as VLANS are not yet supported.    I then configured the setup as if I was using standard neutron networking for a multi-tenant setup, that is all my instances would route traffic through the l3-agent running on the management node (similar behavior in Icehouse and Havana).  Once I confirmed this legacy setup was working then moved on to changing it to use DVR on the compute nodes.


On the management node where the neutron server runs edit the following files: neutron.conf, l3_agent.ini, ml2_conf.ini and ovs_neutron_plugin.ini

In /etc/neutron/neutron.conf

Edit the lines to state the following by either adding or uncommenting them:

router_distributed = True
dvr_base_mac = fa:16:3f:00:00:00

Note:  When creating a network as admin, one can override the distributed router by using the following flag:  "--distributed False"

In /etc/neutron/l3_agent.ini

Edit the line to state the following:

agent_mode = dvr_snat

Note:  This will provide the SNAT translation for any instances that do not get assigned a floating-ip.  Therefore they will route through the central l3-agent on the management node if they need outside access but will not have a floating-ip associated.  Given the l3-agent at the management node can be HA in Juno, this will still not ne a single point of failure.  However we are not covering that topic in this article.

In /etc/neutron/plugins/ml2/ml2_conf.ini

Edit the line to state the following:

mechanism_drivers = openvswitch, l2population

In /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

Edit or add the lines to state the following:

l2_population = True
enable_distributed_routing = True

One each of the compute nodes do the following steps:

Make the ml2 plugin directory. copy over the ml2_conf.ini from neutron node and setup softlink:

mkdir /etc/neutron/plugins/ml2
rsync -av root@ctl1:/etc/neutron/plugins/ml2 /etc/neutron/plugins
cd /etc/neutron
ln -s plugins/ml2/ml2_conf.ini plugin.ini

Copy over the metadata_agent.ini from the neutron server node:

rsync -av root@ctl1:/etc/neutron/metadata_agent.ini /etc/neutron

In /etc/neutron/ l3_agent.ini

Edit the line to state the following:

agent_mode = dvr

In /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

Edit or add the lines to state the following:

l2_population = True

enable_distributed_routing = True

One final step on the compute node is to associate the br-ex interface with the physical interface on the compute node that will bridge the floating-ip’s to the physical vlan.

ovs-vsctl add-port br-ex

Restart the openstack services on the management node.

Restart the openstack services on the compute node as well.  Also ensure you start the l3-agent and metadata service on the compute node.

If you plan on using Horizon to spin up instances and associate floating-ip’s, you will need to make the following edit in the Horizon code as there is a bug:  https://bugs.launchpad.net/horizon/+bug/1388305.   Without the code update, you will not see a list of valid ports to associate the floating-ip to on the instance.  This association does work from the cli however without modification.

Edit the following file:  /usr/share/openstack-dashboard/openstack_dashboard/api/neutron.py

Find the line:

p.device_owner == 'network:router_interface'

And replace it with:

p.device_owner == 'network:router_interface'   or p.device_owner == 'network:router_interface_distributed'

Restart the httpd service.

Once you have followed the steps above you should be able to spin up an instance and associate a floating-ip to it and that instance will be accessible via the compute node l3-agent.   You can confirm a proper namespace is setup by running the following on the compute node:

ip netns

fip-4a7697ba-c29c-4a19-9b92-2a9194e1d6de
qrouter-6b4a2758-3aa7-4603-9fcd-f86f05d0c62

The fip is the floating-ip namespace and the qrouter is just like the namespaces previously seen on a network management node.  You can use ip netns exec commands to explore those namespaces and further troubleshoot should the configuration not be working.

Another way to confirm traffic is coming to your instance directly on the compute node is to use tcpdump and sniff on the physical network interface that is bridging to the physical network for the floating-ip network.  Then while running tcpdump you can ping your instance from another host somewhere on your network and you will see the packets in the tcpdump.

DVR promising to provide a convenient way of distributing network traffic loads to the compute nodes of the instances on them and helps to alleviate the bottleneck of the neutron management node.

3 comments:

Chandra Sekhar Vejendla said...

I am trying to enable DVR in kilo and followed all the steps you mentioned, but am facing some issues. I have a 4 node setup with 1 controller 1 network node and 2 compute nodes.

When i enable DVR the vxlan ports in br-tun bridge on all the nodes disappear. I have enabled l2_population and arp_responder but that does not help. Can you please help me out here.

2a04a23f-2ac4-4c07-93f7-8c3c88cf6d36
Bridge br-ex
Port br-ex
Interface br-ex
type: internal
Port "enp2s0f1"
Interface "enp2s0f1"
Bridge br-tun
fail_mode: secure
Port br-tun
Interface br-tun
type: internal
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Bridge br-int
fail_mode: secure
Port br-int
Interface br-int
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
ovs_version: "2.3.1"

Boris Derzhavets said...
This comment has been removed by the author.
Boris Derzhavets said...

I believe you replicated ml2_conf.ini across landscape. Just the same file like on Juno.
Add following two lines to ml2_conf.ini on each Compute Node :-
[agent]
l2_population = True
and restart nodes