ovs-vswitchd missing argument 'remote_ip'

Bug #1199003 reported by Shannon McFarland
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cisco Openstack
Triaged
Low
Unassigned
Grizzly
Triaged
Low
Unassigned

Bug Description

On the controller node when the puppet agent runs there is an error of:

Jul 3 16:27:47 control-server ovs-vswitchd: 00109|netdev_vport|ERR|gre-2: gre type requires valid 'remote_ip' argument

A full log of the event:
Jul 3 16:27:47 control-server ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 add-port br-tun gre-2
Jul 3 16:27:47 control-server ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Interface gre-2 type=gre
Jul 3 16:27:47 control-server ovs-vswitchd: 00109|netdev_vport|ERR|gre-2: gre type requires valid 'remote_ip' argument
Jul 3 16:27:47 control-server ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Interface gre-2 options:remote_ip=10.121.13.51
Jul 3 16:27:47 control-server ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Interface gre-2 options:in_key=flow
Jul 3 16:27:48 control-server ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Interface gre-2 options:out_key=flow

It looks like it may be looking for remote_ip and in our site.pp file we use 'tunnel_ip'.

This is running on g.0

Revision history for this message
Chris Ricker (chris-ricker) wrote :

do you see this after every puppet agent run, or only in the logs from the initial run?

Revision history for this message
Shannon McFarland (shmcfarl) wrote :

I see it after each puppet agent run.

Revision history for this message
Mark T. Voelker (mvoelker) wrote :

I don't think this is happening because we use tunnel_ip in our manifests--the OVS source code actually has the string 'remote_ip' hardcoded into that warning (and just to note: I can't find anywhere that remote_ip is used in our modules either). 'remote_ip' is a tunnel option in ovs-vswitchd interface table, so I suspect something might not be getting populated correctly (though I'm not sure how that would be puppet's fault?).

I'm also not seeing this in my setup at the moment, but I'll poke around some more tomorrow. To help me reproduce:

Did you have to have multiple instance running in order for this to manifest (or indeed any)? Or instances connected to multiple networks?

Revision history for this message
Mark T. Voelker (mvoelker) wrote :

Also, could you attach your site.pp to this bug (as an attachment, not a copy/paste into a comment)?

Revision history for this message
Shannon McFarland (shmcfarl) wrote :

Site.pp attached. Sometimes I do have instances running and other times I don't. It happens every puppet agent runs on the controller. I am in the midst of doing a rebuild (again) for some of the swift testing and changes I am making for puppet/swift stuff and I will confirm if this is still going on.

Revision history for this message
Mark T. Voelker (mvoelker) wrote :

Ok, thanks. What's the proximity of these errors to puppet runs--e.g. does it appear immediately after the catalog run is complete, immediately after it starts, somewhere in between? I'm having a hard time figuring out how Puppet might be triggering this, so trying to pin down the proximity to see if it might be coincidental to a Puppet run.

If you're doing a rebuild, it would also be interesting to see:

1.) Does the error appear before any instances have launched?

2.) Does it appear after starting an instance or two?

3.) Does it appear after terminating one of those instances?

4.) Does it appear if you get an instance into an error state (say, by telling nova to boot an instance while your compute nodes are rebooting)?

I'm basically wondering if the interface table is somehow getting entries for ports that don't have an endpoint due to the instance having been terminated and the IP returned to the pool. I've seen only one occurance of this error in my logs, and I know I had an instance in error state around that time (due to me having tried to launch an instance while my compute nodes were down....oops). I'll try to test that theory later today.

Revision history for this message
Shannon McFarland (shmcfarl) wrote :

Update: I rebuilt my environment again and this time I only saw the error on the first puppet run. I just repeated the puppet run over and over and did not see it again. It looks like it is an order of operation issue anyhow.

It errors:
control-server ovs-vswitchd: 00156|netdev_vport|ERR|gre-2: gre type requires valid 'remote_ip' argument

Sets the remote_ip in the very next line:
Jul 3 16:27:47 control-server ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Interface gre-2 options:remote_ip=10.121.13.51

And then it does not appear again in that run. I have no idea what would have changed in this last build that would cause it to only run the first time but that is good news. I saw some stuff come in on my 'git pull' but none of it looked related to this. Should we close this?

Revision history for this message
Mark T. Voelker (mvoelker) wrote :

I was just able to replicate this on a new build. However I've also verified that the appearances of the error are not related to Puppet runs. A snippet:

Jul 9 19:12:53 control01 ovs-vswitchd: 00267|netdev_vport|ERR|gre-2: gre type requires valid 'remote_ip' argument
Jul 9 19:13:08 control01 ovs-vswitchd: 00289|netdev_vport|ERR|gre-3: gre type requires valid 'remote_ip' argument
Jul 9 19:13:24 control01 ovs-vswitchd: 00351|netdev_vport|ERR|gre-4: gre type requires valid 'remote_ip' argument

Those are a *lot* closer together than my puppet agent runs. Also note that this was before I'd fired up any instances (defaults to 30 minutes). The ovs ports in question (gre-2, gre-3, gre-4) do show up in "ovs-vsctl show" as having a remote_ip argument though:

localadmin@control01:~$ sudo ovs-vsctl show
95a0d870-14f4-4eae-97dd-3ba3a40b74e5
    Bridge br-tun
        Port "gre-4"
            Interface "gre-4"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="2.6.1.5"}
        Port br-tun
            Interface br-tun
                type: internal
        Port "gre-2"
            Interface "gre-2"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="2.6.1.3"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "gre-3"
            Interface "gre-3"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="2.6.1.4"}
[SNIP]

Also, the errors don't recur...after that one appearance for each port, I'm not seeing any further errors (even after reloading the puppet agent).

Given that we don't see this on every build and it appears to self-correct, I'm leaving it open but setting the severity to low.

Revision history for this message
Mark T. Voelker (mvoelker) wrote :

Incidentally, in case it wasn't clear: the remote_ip's on the "gre-X" ports in question are the IP's of my compute nodes. I'm actually now wondering if they might not have finished their initial puppet runs yet...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.