Floating IP not working for vxlan neutron-openvswitch

Bug #1512407 reported by Larry Michel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron-api (Juju Charms Collection)
Invalid
High
David Ames

Bug Description

Deploying Openstack with neutron-openvswitch and I am finding that we can't access instance through floating IP:

# OIL CI test-catalog pipeline parameters
export OPENSTACK_RELEASE=kilo
export COMPUTE=nova-kvm
export BLOCK_STORAGE=cinder-iscsi
export IMAGE_STORAGE=glance-swift
export PIPELINE_ID=e65c6f41-1122-43a4-9358-076e8bb70001
export BACKEND_DATABASE=mysql
export NETWORKING=neutron-openvswitch-vxlan
export UBUNTU_RELEASE=trusty
~

The same exact bundle file with GRE works:

This what the bridges look like on one of the computes:

ubuntu@maero:~$ sudo ovs-vsctl show
0a226e99-bff5-42db-9ed0-6dbf96bcba7c
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-data
        Port br-data
            Interface br-data
                type: internal
        Port phy-br-data
            Interface phy-br-data
                type: patch
                options: {peer=int-br-data}
    Bridge br-tun
        fail_mode: secure
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port br-tun
            Interface br-tun
                type: internal
        Port "vxlan-0af500e0"
            Interface "vxlan-0af500e0"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="10.245.0.207", out_key=flow, remote_ip="10.245.0.224"}
    Bridge br-int
        fail_mode: secure
        Port int-br-data
            Interface int-br-data
                type: patch
                options: {peer=phy-br-data}
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
        Port "qvo103fe969-77"
            tag: 1
            Interface "qvo103fe969-77"
    ovs_version: "2.3.2"

Here's bundle file: https://pastebin.canonical.com/143217/

I also noticed that vxlan offloading would not get enabled (syslog) even though I was able to get it to work when creating the bridges manually.

Tags: oil
Larry Michel (lmic)
description: updated
James Page (james-page)
Changed in neutron-api (Juju Charms Collection):
assignee: nobody → David Ames (thedac)
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

This doesn't work for Juno either - it's not kilo specific.

Revision history for this message
David Ames (thedac) wrote :

The key issue is "Network type value 'gre' not supported"

2015-11-09 21:59:43.727 18222 ERROR oslo_messaging.rpc.dispatcher [req-5218b942-5b63-4931-ba6e-edfb1b38278b ] Exception during message handling: Invalid input for operation: Network type value 'gre' not supported.
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/type_tunnel.py", line 265, in tunnel_sync
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher raise exc.InvalidInput(error_message=msg)
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher InvalidInput: Invalid input for operation: Network type value 'gre' not supported.
2015-11-09 21:59:43.727 18222 TRACE oslo_messaging.rpc.dispatcher
2015-11-09 21:59:43.730 18222 ERROR oslo_messaging._drivers.common [req-5218b942-5b63-4931-ba6e-edfb1b38278b ] Returning exception Invalid input for operation: Network type value 'gre' not supported. to caller
2015-11-09 21:59:43.730 18222 ERROR oslo_messaging._drivers.common [req-5218b942-5b63-4931-ba6e-edfb1b38278b ] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n executor_callback))\n', ' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch\n executor_callback)\n', ' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch\n result = func(ctxt, **new_args)\n', ' File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/type_tunnel.py", line 265, in tunnel_sync\n raise exc.InvalidInput(error_message=msg)\n', "InvalidInput: Invalid input for operation: Network type value 'gre' not supported.\n"]
2015-11-09 21:59:45.729 18223 ERROR oslo_messaging.rpc.dispatcher [req-5218b942-5b63-4931-ba6e-edfb1b38278b ] Exception during message handling: Invalid input for operation: Network type value 'gre' not supported.

Our templates do allow gre when vxlan is set.
[ml2]
type_drivers = {{ overlay_network_type }},vlan,flat
tenant_network_types = {{ overlay_network_type }},vlan,flat
mechanism_drivers = openvswitch,hyperv,l2population

Changed in neutron-api (Juju Charms Collection):
status: New → Triaged
importance: Undecided → High
milestone: none → 16.01
Revision history for this message
David Ames (thedac) wrote :

My previous comment turns out to be a red haring. Those errors in the logs are during setup before everything is completed.

The problem is a missing relation. Neutron-gateway requires a relationship with neutron-api. It learns what the overlay network type is from neutron-api. Neutron-gateway defaults to GRE thus it is possible it could "work" without the relation with GRE. But it breaks with vxlan because compute and api are using vxlan while it defaults to GRE.

Existing bundle:
  - - neutron-api
    - mysql
  - - neutron-api
    - rabbitmq-server
  - - neutron-api
    - nova-cloud-controller
  - - neutron-api
    - neutron-openvswitch
  - - neutron-api
    - keystone

Add:
  - - neutron-api
    - neutron-gateway

I have successfully deployed this with the added relation. Please test. Let me know the results.

Changed in neutron-api (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
Larry Michel (lmic) wrote :

Great. Thanks for looking into and find the missing relation. I will test and update bug with test result.

Revision history for this message
James Page (james-page) wrote : Re: [Bug 1512407] Re: Floating IP not working for vxlan neutron-openvswitch
Download full text (3.2 KiB)

Can you also ensure that you're creating networks with the correct
segmentation type - I think you may get a default of gre but I may be
wrong, which would not work if the agents are configured todo vxlan only.

On Wed, Nov 11, 2015 at 2:36 PM, Larry Michel <email address hidden>
wrote:

> Great. Thanks for looking into and find the missing relation. I will
> test and update bug with test result.
>
> --
> You received this bug notification because you are a member of OpenStack
> Charmers, which is subscribed to neutron-api in Juju Charms Collection.
> https://bugs.launchpad.net/bugs/1512407
>
> Title:
> Floating IP not working for vxlan neutron-openvswitch
>
> Status in neutron-api package in Juju Charms Collection:
> Invalid
>
> Bug description:
> Deploying Openstack with neutron-openvswitch and I am finding that we
> can't access instance through floating IP:
>
> # OIL CI test-catalog pipeline parameters
> export OPENSTACK_RELEASE=kilo
> export COMPUTE=nova-kvm
> export BLOCK_STORAGE=cinder-iscsi
> export IMAGE_STORAGE=glance-swift
> export PIPELINE_ID=e65c6f41-1122-43a4-9358-076e8bb70001
> export BACKEND_DATABASE=mysql
> export NETWORKING=neutron-openvswitch-vxlan
> export UBUNTU_RELEASE=trusty
> ~
>
> The same exact bundle file with GRE works:
>
> This what the bridges look like on one of the computes:
>
> ubuntu@maero:~$ sudo ovs-vsctl show
> 0a226e99-bff5-42db-9ed0-6dbf96bcba7c
> Bridge br-ex
> Port br-ex
> Interface br-ex
> type: internal
> Bridge br-data
> Port br-data
> Interface br-data
> type: internal
> Port phy-br-data
> Interface phy-br-data
> type: patch
> options: {peer=int-br-data}
> Bridge br-tun
> fail_mode: secure
> Port patch-int
> Interface patch-int
> type: patch
> options: {peer=patch-tun}
> Port br-tun
> Interface br-tun
> type: internal
> Port "vxlan-0af500e0"
> Interface "vxlan-0af500e0"
> type: vxlan
> options: {df_default="true", in_key=flow,
> local_ip="10.245.0.207", out_key=flow, remote_ip="10.245.0.224"}
> Bridge br-int
> fail_mode: secure
> Port int-br-data
> Interface int-br-data
> type: patch
> options: {peer=phy-br-data}
> Port patch-tun
> Interface patch-tun
> type: patch
> options: {peer=patch-int}
> Port br-int
> Interface br-int
> type: internal
> Port "qvo103fe969-77"
> tag: 1
> Interface "qvo103fe969-77"
> ovs_version: "2.3.2"
>
> Here's bundle file: https://pastebin.canonical.com/143217/
>
> I also noticed that vxlan offloading would not get enabled (syslog)
> even though I was able to get it to work when creating the bridges
> manually.
>
> To manage notifications about this bug go to:
>
> https://bugs....

Read more...

Revision history for this message
Larry Michel (lmic) wrote :
Download full text (6.3 KiB)

The network looks to be created with vxlan correctly since it assigns segmentation id of 1001, 1002 which is within correct range.

For example:

Network Overview
Name
admin-net
ID
860d9bb2-815e-44b2-9e0b-fac37a2eaa0e
Project ID
c3cdec75250647d39af3529c04d60204
Status
ACTIVE
Admin State
UP
Shared
No
External Network
No
Provider Network
Network Type: vxlan
Physical Network: -
Segmentation ID: 1001

But, in testing with the correct relations, it looks like I hit a separate issue. On the neutron-api node, I see lots of errors and some of them are consistent with bug 1429739. Here's small excerpt and I am attaching entire server.log file.

2015-11-11 20:20:40.230 9354 ERROR neutron.service [-] Unrecoverable error: please check log for details.
2015-11-11 20:20:40.230 9354 TRACE neutron.service Traceback (most recent call last):
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/neutron/service.py", line 102, in serve_wsgi
2015-11-11 20:20:40.230 9354 TRACE neutron.service service.start()
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/neutron/service.py", line 73, in start
2015-11-11 20:20:40.230 9354 TRACE neutron.service self.wsgi_app = _run_wsgi(self.app_name)
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/neutron/service.py", line 168, in _run_wsgi
2015-11-11 20:20:40.230 9354 TRACE neutron.service app = config.load_paste_app(app_name)
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/neutron/common/config.py", line 183, in load_paste_app
2015-11-11 20:20:40.230 9354 TRACE neutron.service app = deploy.loadapp("config:%s" % config_path, name=app_name)
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 247, in loadapp
2015-11-11 20:20:40.230 9354 TRACE neutron.service return loadobj(APP, uri, name=name, **kw)
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 272, in loadobj
2015-11-11 20:20:40.230 9354 TRACE neutron.service return context.create()
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 710, in create
2015-11-11 20:20:40.230 9354 TRACE neutron.service return self.object_type.invoke(self)
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 144, in invoke
2015-11-11 20:20:40.230 9354 TRACE neutron.service **context.local_conf)
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/paste/deploy/util.py", line 55, in fix_call
2015-11-11 20:20:40.230 9354 TRACE neutron.service val = callable(*args, **kw)
2015-11-11 20:20:40.230 9354 TRACE neutron.service File "/usr/lib/python2.7/dist-packages/paste/urlmap.py", line 28, in urlmap_factory
2015-11-11 20:20:40.230 9354 TRACE neutron.service app = loader.get_app(app_name, global_conf=global_conf)

...
2015-11-11 20:20:40.230 9354 TRACE neutron.service for...

Read more...

Revision history for this message
David Ames (thedac) wrote :

Larry,

With respect to the neutron-api errors, are you actually experiencing failure of functionality? I can re-create those errors which happen during the deploy when all the relations are not yet complete. But once everything is up and settled the errors stop and everything functions as expected.

Revision history for this message
Larry Michel (lmic) wrote :

David,
yes, the right functionality did not appear to be there. I did not see any of the vxlan-xxxxxxx ports being created after setting up the network.

Revision history for this message
David Ames (thedac) wrote :

Larry,

I have double checked our charms and there is no code that specifically "turns on" VXLAN offloading. The charm code only interacts with openvswitch mainly by setting values in /etc/neutron/plugins/ml2/ml2_plugin.ini.

Is there a Mellanox (assuming this is Mellanox) subordinate charm that is not represented in the bundle? Maybe there needs to be.

Reading the Mellanox documentation [1] for enabling VXLAN offloading requires a modprobe of mlx4_core (Step 2). (If this is not Mellanox there is likely a similar step). This may need to be its own subordinate charm.

What steps are you taking to manually turn on VXLAN offloading?

[1] https://community.mellanox.com/docs/DOC-1446

Revision history for this message
Larry Michel (lmic) wrote :

David, To clarify, the adapters that we are using are Emulex cards not Mellanox. For these cards, there were no steps to enable VXLAN offloading. It's enabled by default in their driver. AFAICT, the vxlan offloading bit is only secondary so that it should work even when offloading is disabled. But should vxlan be configured then the driver will optimize by enabling offloading. I currently have an environment deployed and I will email you information to access the environment. I am also trying to recreate on system without these cards and I will update the bug with thos results.

Revision history for this message
Larry Michel (lmic) wrote :

Found issue with the MTU size being too small. I've increased it in neutron-gateway and neutron-api to 1600 from 1300... though it seems like I probably don't need to set it for neutron-api. The floating IP is now working with the large MTU size, however the vxlan offloading still does not get enabled.

Revision history for this message
Larry Michel (lmic) wrote :

It turns out the vxlan offloading does get enabled. One host did not show it getting enabled in the syslog. But when I went to try a test on the 2nd compute host which I had not checked, I could see expecting log message about offloading getting enabled:

Nov 23 20:19:59 maero kernel: [18302.839361] be2net 0000:08:00.1: Only one UDP port supported for VxLAN offloads
Nov 23 20:19:59 maero kernel: [18302.839367] be2net 0000:08:00.1: Disabling VxLAN offloads

https://pastebin.canonical.com/144654/

But I am still a bit confused about the MTU size change. I now see that the MTU size for GRE was done specifically to address the possibility of fragmentation and AFAICT the document states that instance-mtu ought to be less than network-device-mtu. Also, I would expect a small difference between the GRE and VXLAN headers... so need to dig into this further.

Revision history for this message
Larry Michel (lmic) wrote :

Wrong paste in previous comment. Here's he correct one:

Nov 23 15:59:51 biesel ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=10 -- --may-exist add-port br-tun vxlan-0af50108 -- set Interface vxlan-0af50108 type=vxlan options:df_default=true options:remote_ip=10.245.1.8 options:local_ip=10.245.1.18 options:in_key=flow options:out_key=flow
Nov 23 15:59:51 biesel kernel: [ 2651.907275] be2net 0000:08:00.0: Enabled VxLAN offloads for UDP port 4789
Nov 23 15:59:51 biesel kernel: [ 2651.913688] be2net 0000:08:00.1: Enabled VxLAN offloads for UDP port 4789

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.