Bug #1084355 “Tap interface does not automatically get an IP add...” : Bugs : neutron

Revision history for this message

Gary Kotton (garyk) wrote on 2012-11-29:

#1

Hi,
Can you please clarify a few things:
1. Are you working from packages?
2. The tap interface that you are referring to is from the DHCP agent. Can you please check if this is running after reboot (please also check the log file)
Thanks
Gary

Changed in quantum:
status:	New → Incomplete

Revision history for this message

Salman Baset (salman-h) wrote on 2012-11-29:

#2

I did an install from Folsom packages. The dhcp agent agent is running (from log file and status).

Revision history for this message

Sumit Naiksatam (snaiksat) wrote on 2012-11-30:

#3

Can you please provide the output of: ip link show

I am guessing that the tap is not set to UP by the dhcp agent after reboot. "ip link show" will tell us if that's the case.

Revision history for this message

Phil Hopkins (phil-hopkins-a) wrote on 2012-12-04:

#4

Download full text (11.4 KiB)

I have run into the same problem. Here is what I found comparing a RHEL 6.3, Fedora 17 and Ubuntu 12.10

Using an "all-in-one" install.
In all three scenarios nova is configured with:
start_guests_on_host_boot=true #(this seems to cause problems in RHEL 6.3, it is set to false there).
resume_guests_state_on_host_boot=true

Quantum is configured with one network and a minimun of one subnet. In this case, the output of quantum subnet-list:

quantum subnet-list
+--------------------------------------+-----------------+-------------+--------------------------------------------+
| id | name | cidr | allocation_pools |
+--------------------------------------+-----------------+-------------+--------------------------------------------+
| 1fff853c-6949-483b-8bdc-3d3aa0fdc23b | private-subnet2 | 10.0.0.8/29 | {"start": "10.0.0.10", "end": "10.0.0.14"} |
| c14577fe-24ed-4af8-9bdd-0a7b976ca20b | private-subnet1 | 10.0.0.0/29 | {"start": "10.0.0.2", "end": "10.0.0.6"} |
+--------------------------------------+-----------------+-------------+--------------------------------------------+

One or more instances are running and are accessable through its network interface.

After issuing a reboot on the "all-in-one" node, the system reboots, the instance(s) are retsarted, however network access to the instance(s) does not work. It can be restarted using the following process:
For RHEL 6.3:
after a reboot:
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:99:14:71 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.58/24 brd 192.168.122.255 scope global eth0
    inet6 fe80::5054:ff:fe99:1471/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 52:54:00:09:7b:24 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 4e:6f:72:f9:e3:ed brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 5e:61:a5:f5:f0:4f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7438:28ff:feee:e5a6/64 scope link
       valid_lft forever preferred_lft forever
6: tapaacc2584-09: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 0a:89:f4:70:5a:e5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.2/29 brd 10.0.0.7 scope global tapaacc2584-09
    inet6 fe80::889:f4ff:fe70:5ae5/64 scope link
       valid_lft forever preferred_lft forever
8: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 52:fa:95:eb:a9:4d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ce2:5cff:fe82:9c5e/64 scope link
       valid_lft forever preferred_lft forever
10: tap6de885fb-d0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 500
    link/ether 8e:2d:05:45:cd:7b brd ff:ff:ff:ff:ff:ff
11: br-tun: <BROADCA...

I have run into the same problem. Here is what I found comparing a RHEL 6.3, Fedora 17 and Ubuntu 12.10

Using an "all-in-one" install.
In all three scenarios nova is configured with:
start_guests_on_host_boot=true     #(this seems to cause problems in RHEL 6.3, it is set to false there).
resume_guests_state_on_host_boot=true

Quantum is configured with one network and a minimun of one subnet. In this case, the output of quantum subnet-list:

quantum subnet-list
+--------------------------------------+-----------------+-------------+--------------------------------------------+
| id                                   | name            | cidr        | allocation_pools                           |
+--------------------------------------+-----------------+-------------+--------------------------------------------+
| 1fff853c-6949-483b-8bdc-3d3aa0fdc23b | private-subnet2 | 10.0.0.8/29 | {"start": "10.0.0.10", "end": "10.0.0.14"} |
| c14577fe-24ed-4af8-9bdd-0a7b976ca20b | private-subnet1 | 10.0.0.0/29 | {"start": "10.0.0.2", "end": "10.0.0.6"}   |
+--------------------------------------+-----------------+-------------+--------------------------------------------+

One or more instances are running and are accessable through its network interface.

After issuing a reboot on the "all-in-one" node, the system reboots, the instance(s) are retsarted, however network access to the instance(s) does not work. It can be restarted using the following process:
For RHEL 6.3:
after a reboot:
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:99:14:71 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.58/24 brd 192.168.122.255 scope global eth0
    inet6 fe80::5054:ff:fe99:1471/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 52:54:00:09:7b:24 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 4e:6f:72:f9:e3:ed brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 5e:61:a5:f5:f0:4f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7438:28ff:feee:e5a6/64 scope link 
       valid_lft forever preferred_lft forever
6: tapaacc2584-09: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 0a:89:f4:70:5a:e5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.2/29 brd 10.0.0.7 scope global tapaacc2584-09
    inet6 fe80::889:f4ff:fe70:5ae5/64 scope link 
       valid_lft forever preferred_lft forever
8: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 52:fa:95:eb:a9:4d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ce2:5cff:fe82:9c5e/64 scope link 
       valid_lft forever preferred_lft forever
10: tap6de885fb-d0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 500
    link/ether 8e:2d:05:45:cd:7b brd ff:ff:ff:ff:ff:ff
11: br-tun: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether d2:35:d2:0e:6b:40 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8884:ddff:fede:de9a/64 scope link 
       valid_lft forever preferred_lft forever

ovs-dpctl show
system@ovs-system:
        lookups: hit:46 missed:50 lost:0
        flows: 0
        port 0: ovs-system (internal)
        port 1: br-int (internal)
        port 2: tapaacc2584-09 (internal)
        port 3: tap6de885fb-d0
        port 4: br-tun (internal)
        port 5: patch-tun (patch: peer=patch-int)
        port 6: br-ex (internal)
        port 7: patch-int (patch: peer=patch-tun)

Networking to the instance does not work

after running:

ip link set tap6de885fb-d0 up

networking to the instance now works

For Fedora 17:
 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:64:c9:d7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.118/24 brd 192.168.122.255 scope global eth0
    inet6 fe80::5054:ff:fe64:c9d7/64 scope link 
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether ee:fd:ca:c9:df:17 brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.1/24 brd 192.168.123.255 scope global virbr0
4: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 6a:94:3f:ed:6f:4b brd ff:ff:ff:ff:ff:ff
6: br-ex: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 66:98:57:80:cc:4c brd ff:ff:ff:ff:ff:ff
7: tapea0eee62-64: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 1e:87:2f:b8:30:83 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.10/29 brd 10.0.0.15 scope global tapea0eee62-64
    inet 10.0.0.2/29 brd 10.0.0.7 scope global tapea0eee62-64
9: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 2a:df:8d:c4:0a:41 brd ff:ff:ff:ff:ff:ff
10: tap62250b48-d7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 500
    link/ether 8e:10:26:b3:41:58 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8c10:26ff:feb3:4158/64 scope link 
       valid_lft forever preferred_lft forever
11: tapccb1fbed-36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 500
    link/ether f6:21:eb:23:52:a9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f421:ebff:fe23:52a9/64 scope link 
       valid_lft forever preferred_lft forever
[root@localhost ~]# ovs-dpctl show
system@br-tun:
        lookups: hit:0 missed:0 lost:0
        flows: 0
        port 0: br-tun (internal)
system@br-int:
        lookups: hit:0 missed:0 lost:0
        flows: 0
        port 0: br-int (internal)
        port 1: tapea0eee62-64 (internal)
system@br-ex:
        lookups: hit:0 missed:0 lost:0
        flows: 0
        port 0: br-ex (internal)

Networking to the instance does not work

issue the commands:
service openvswitch restart
ip link set tapea0eee62-64 up

Instance(s) can now aquire an ip and netwoking functions properly.

For Ubuntu 12.10

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:b5:f1:89 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.226/24 brd 192.168.122.255 scope global eth0
    inet6 fe80::5054:ff:feb5:f189/64 scope link 
       valid_lft forever preferred_lft forever
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 26:dc:81:18:6f:44 brd ff:ff:ff:ff:ff:ff
6: br-ex: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 7e:00:7d:8d:f5:4f brd ff:ff:ff:ff:ff:ff
7: tap7d3e496c-a2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 2a:6b:c7:11:e1:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.2/29 brd 10.0.0.7 scope global tap7d3e496c-a2
8: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether ce:aa:a0:1c:65:45 brd ff:ff:ff:ff:ff:ff
9: qbr755a9aae-60: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 6e:45:7f:1f:a1:1a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7863:adff:fef3:af36/64 scope link 
       valid_lft forever preferred_lft forever
10: qvo755a9aae-60: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 22:ac:dd:15:ad:08 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::20ac:ddff:fe15:ad08/64 scope link 
       valid_lft forever preferred_lft forever
11: qvb755a9aae-60: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr755a9aae-60 state UP qlen 1000
    link/ether 6e:45:7f:1f:a1:1a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::6c45:7fff:fe1f:a11a/64 scope link 
       valid_lft forever preferred_lft forever
12: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr755a9aae-60 state UNKNOWN qlen 500
    link/ether fe:16:3e:6e:c6:d1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe6e:c6d1/64 scope link 
       valid_lft forever preferred_lft forever
ovs-dpctl show
system@br-tun:
        lookups: hit:0 missed:0 lost:0
        flows: 0
        port 0: br-tun (internal)
system@br-int:
        lookups: hit:27 missed:20 lost:0
        flows: 0
        port 0: br-int (internal)
        port 1: tap7d3e496c-a2 (internal)
        port 4: qvo755a9aae-60
system@br-ex:
        lookups: hit:0 missed:0 lost:0
        flows: 0
        port 0: br-ex (internal)

Networking to the instance is not working.

Issue the command:
ip link set tap7d3e496c-a2 up

and networking to the instance works.

RHEL 6.3
python-quantum-2012.2-2.el6.noarch
python-quantumclient-2.1.1-0.el6.noarch
openstack-quantum-openvswitch-2012.2-2.el6.noarch
openstack-quantum-2012.2-2.el6.noarch
openvswitch-1.9.90-1.x86_64
kmod-openvswitch-1.9.90-1.el6.x86_64

Fedora 17
openstack-quantum-2012.2-1.fc18.noarch
python-quantum-2012.2-1.fc18.noarch
openstack-quantum-openvswitch-2012.2-1.fc18.noarch
python-quantumclient-2.1.1-0.fc18.noarch
openvswitch-1.9.90-1.fc17.x86_64
or
openvswitch-1.4.2-5.fc17.x86_64

After reboot with nova settings:

Ubuntu

ii  python-quantum                     2012.2-0ubuntu1                           all          Quantum is a virutal network service for Openstack. (python library)
ii  python-quantumclient               1:2.1-0ubuntu1                            all          client - Quantum is a virtual network service for Openstack
ii  quantum-common                     2012.2-0ubuntu1                           all          common - Quantum is a virtual network service for Openstack.
ii  quantum-dhcp-agent                 2012.2-0ubuntu1                           all          Quantum is a virtual network service for Openstack. (dhcp agent)
ii  quantum-l3-agent                   2012.2-0ubuntu1                           all          Quantum is a virtual network service for Openstack. (l3 agent)
ii  quantum-plugin-openvswitch         2012.2-0ubuntu1                           all          Quantum is a virtual network service for Openstack. (openvswitch plugin)
ii  quantum-plugin-openvswitch-agent   2012.2-0ubuntu1                           all          Quantum is a virtual network service for Openstack. (openvswitch plugin agent)
ii  quantum-server                     2012.2-0ubuntu1

ii  openvswitch-common                 1.4.3-0ubuntu2                            amd64        Open vSwitch common components
ii  openvswitch-switch                 1.4.3-0ubuntu2

Did not try the Ubuntu install with Openvswitch 1.9.90

On both Ubuntu and Fedora the tap interface for the created quantum network was in a down state after the reboot preventing the instance to access the created network (Fedora also was missing the tap interface for the instance from the ovs int-br).

On RHEL 6.3 the tap interface for the instance was down after a reboot and prevented the instance from accessing the network.

Either behaviour is wrong. After a reboot of the computer node with  resume_guests_state_on_host_boot=true, the node should restart with the node running and the network functioning properly with out manual intervention.

Phil

Revision history for this message

dan wendlandt (danwent) wrote on 2012-12-04:

#5

Thanks for the very detailed report Phil!

This behavior probably depends on what vif-plugging mechanism you are using in Nova (and is hence, likely a Nova change, not a Quantum change, but the the quantum team is probably best to debug it, so I'd keep this issue also filed against Quantum).

Based on the fact that you're using OVS and you are seeing tap devices, it is correct to assume you are using

libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtOpenVswitchDriver

It may be that this line should not be underneath the if-check that checks if the device already exists. I wonder if libvirt somehow saves the fact that the type device exists, and thus it exists when plug() is called, which prevents us from setting it up. If that tis the case, we should move this line out from under the if-check that tests if the device exists.

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L159

Are you able to directly edit the code and check?

It would also be very interesting to understand what happens in this same scenario with other vif-drivers.

In particular, if you are using libvirt 0.9.11 or newer, the preferred vif-driver is actually:

libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtOpenVswitchVirtualPortDriver

In this case, libvirt automatically manages the devices connected to br-int, and thus I would expect that you wouldn't see this problem (but can't say for sure...)

If libvirt behaves as described above when "resuming" VMs, there may also be negative complications for the hybrid drivers, which is used when there is a plugin like the OVS plugin that also needs to use IPtables rules (e.g., for security groups).

libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtHybridOVSBridgeDriver

Changed in quantum:
status:	Incomplete → Confirmed
importance:	Undecided → High
assignee:	nobody → dan wendlandt (danwent)
Changed in nova:
assignee:	nobody → dan wendlandt (danwent)
status:	New → Confirmed

Revision history for this message

Phil Hopkins (phil-hopkins-a) wrote on 2012-12-05:

#6

First I am using all of the standard packages from EPEL for RHEL (http://repos.fedorapeople.org/repos/openstack/openstack-folsom/epel-6) and

Rhel6.3

libvirt-0.9.10-21.el6.x86_64

also

/etc/nova/nova.conf:libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtOpenVswitchDriver

changing line 159 in /usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py from:

        if not linux_net._device_exists(dev):
            # Older version of the command 'ip' from the iproute2 package
            # don't have support for the tuntap option (lp:882568). If it
            # turns out we're on an old version we work around this by using
            # tunctl.
            try:
                # First, try with 'ip'
                utils.execute('ip', 'tuntap', 'add', dev, 'mode', 'tap',
                          run_as_root=True)
            except exception.ProcessExecutionError:
                # Second option: tunctl
                utils.execute('tunctl', '-b', '-t', dev, run_as_root=True)
                utils.execute('ip', 'link', 'set', dev, 'up', run_as_root=True)

to:
        if not linux_net._device_exists(dev):
            # Older version of the command 'ip' from the iproute2 package
            # don't have support for the tuntap option (lp:882568). If it
            # turns out we're on an old version we work around this by using
            # tunctl.
            try:
                # First, try with 'ip'
                utils.execute('ip', 'tuntap', 'add', dev, 'mode', 'tap',
                          run_as_root=True)
            except exception.ProcessExecutionError:
                # Second option: tunctl
                utils.execute('tunctl', '-b', '-t', dev, run_as_root=True)
        utils.execute('ip', 'link', 'set', dev, 'up', run_as_root=True)

Seems to fix the problem in RHEL 6.3. The tap interface is in the up state after a reboot.

Making that change did not affect either the Ubuntu of Fedora systems. I suspect that the fact that their packaging systems appear to use different points of the Openstack release will have some effect. All three of these systems are virtual machines that I run using KVM on a Fedora workstation. That allows for quick comaprison between them. I also had to set start_guests_on_host_boot=false on the RHEL system. It was causing very bizzare behaviour which I will be documenting next.

That change did fix the RHEL system.

Do you need anything else? I may try other VIF drivers if get a chance. If you think that is essential for this bug let me know and I will give that some priority.

Phil

First I am using all of the standard packages from EPEL for RHEL (http://repos.fedorapeople.org/repos/openstack/openstack-folsom/epel-6) and

Rhel6.3

libvirt-0.9.10-21.el6.x86_64

also

/etc/nova/nova.conf:libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtOpenVswitchDriver

changing line 159 in /usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py from:

if not linux_net._device_exists(dev):
            # Older version of the command 'ip' from the iproute2 package
            # don't have support for the tuntap option (lp:882568).  If it
            # turns out we're on an old version we work around this by using
            # tunctl.
            try:
                # First, try with 'ip'
                utils.execute('ip', 'tuntap', 'add', dev, 'mode', 'tap',
                          run_as_root=True)
            except exception.ProcessExecutionError:
                # Second option: tunctl
                utils.execute('tunctl', '-b', '-t', dev, run_as_root=True)
                utils.execute('ip', 'link', 'set', dev, 'up', run_as_root=True)

to:
        if not linux_net._device_exists(dev):
            # Older version of the command 'ip' from the iproute2 package
            # don't have support for the tuntap option (lp:882568).  If it
            # turns out we're on an old version we work around this by using
            # tunctl.
            try:
                # First, try with 'ip'
                utils.execute('ip', 'tuntap', 'add', dev, 'mode', 'tap',
                          run_as_root=True)
            except exception.ProcessExecutionError:
                # Second option: tunctl
                utils.execute('tunctl', '-b', '-t', dev, run_as_root=True)
        utils.execute('ip', 'link', 'set', dev, 'up', run_as_root=True)

Seems to fix the problem in RHEL 6.3. The tap interface is in the up state after a reboot.

Making that change did not affect either the Ubuntu of Fedora systems. I suspect that the fact that their packaging systems appear to use different points of the Openstack release will have some effect. All three of these systems are virtual machines that I run using KVM on a Fedora workstation. That allows for quick comaprison between them. I also had to set start_guests_on_host_boot=false on the RHEL system. It was causing very bizzare behaviour which I will be documenting next.

That change did fix the RHEL system.

Do you need anything else? I may try other VIF drivers if get a chance. If you think that is essential for this bug let me know and I will give that some priority.

Phil

Thierry Carrez (ttx) on 2012-12-13

Changed in nova:
importance:	Undecided → High

Revision history for this message

Akihiro Motoki (amotoki) wrote on 2013-02-07:

#7

This issue still exists in nova master (after libvirt-vif-driver refactoring)
https://github.com/openstack/nova/blob/3fd1c63e37436eaf4621df62f112ae1886d238cc/nova/network/linux_net.py#L1170

As Phil tested, we need to link up the tap interface even if it exists already to fix it.

Revision history for this message

Gary Kotton (garyk) wrote on 2013-02-24:

#8

Hi,
I think that there are two problems here and we have addressed them both:
1. The first is that when the host was rebooting the OVS tap devices were being saved by the OVS. We introducted the quantum-ovs-cleanup utility. When this is invoked on reboot it enables the DHCP agent to receive the necessary IP address
2. The resync intervale of the DHCP agent was 30 seconds. (bug 1128180) After reboot it could take upto 2 minutes for the tap device to get a IP address. This too has been addressed upstream and in stable folsom.
I think that the above mentioned problems have been addressed. We just need to make sure that they are included inthe latest stable folsom packages.
Thanks
Gary

Revision history for this message

YunQiang Su (wzssyqa) wrote on 2013-02-27:

#9

We just upgraded our quantum version to 2012.2.3 with custom built package based on cloud archive, but we're still seeing the issues described by Phil Hopkings on ubuntu 12.04.

After we a reboot the instance won't be able to get an ip, but if we launch a new instance after the reboot and then reboot the instance which didn't get an ip, its able to get its ip.

If we execute the following two commands the instances are also able to get an ip again:
ip netns exec qdhcp-338a57f5-aa60-4b3e-b519-0683d26467e9 bash
ip link set tap98eb6fb8-e4 up
service openvswitch-switch restart

Changing the following flag also fixed the issue:
libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtHybridOVSBridgeDriver
to
libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtOpenVswitchVirtualPortDriver

So I think this issue is not solved in latest stable folsom packages.

Revision history for this message

YunQiang Su (wzssyqa) wrote on 2013-02-27:

#10

Edit: Correction changing the flag to nova.virt.libvirt.vif.LibvirtOpenVswitchVirtualPortDriver doesn't fix the problem. Sorry for confusion

Revision history for this message

Gary Kotton (garyk) wrote on 2013-02-27:

#11

Hi,
Can you please check if the quantum-ovs-cleanup script is running on boot?
Thanks
Gary

Revision history for this message

YunQiang Su (wzssyqa) wrote on 2013-02-28:

#12

Ohhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh,

Why add such a binary and let's call it manually.
Why not call it directly from quantum-ovs service directly.

Revision history for this message

Gary Kotton (garyk) wrote on 2013-02-28:

#13

There are number fo reasons:
1. some plugins make use of openvswicth but do not use the openvswitch-agent
2. it complicates the boot process having this in the agent - if the agent restarts we will need to know whether or not to invoke - you would not like it to delete a tap device of the dhcp aget
Hence we added the binary which enables the packages to run this prior to all other queantum services and user to run it if and when they choose
We need to try and ensure that it is added to the ubuntu start up scripts
Thanks
Gary

Revision history for this message

YunQiang Su (wzssyqa) wrote on 2013-03-01:

#14

I run quantum-ovs-cleanup in upstart of quantum-plugin-openvswitch-agent, at either pre-start or post-start.

Non of them can work.

Revision history for this message

YunQiang Su (wzssyqa) wrote on 2013-03-01:

#15

This is my current workaround

1. /etc/init/{quantum-dhcp-agent,quantum-l3-agent}.conf
replace
start on runlevel [2345]
with
start on starting nova-compute

2. edit /etc/init/quantum-plugin-openvswitch-agent.conf to

start on starting nova-compute
stop on stopped openvswitch-switch

chdir /var/run

pre-start script
        mkdir -p /var/run/quantum
        chown quantum:root /var/run/quantum
        service openvswitch-switch restart
        quantum-ovs-cleanup
end script

exec start-stop-daemon --start --chuid quantum --exec /usr/bin/quantum-openvswitch-agent -- --config-file=/etc/quantum/quantum.conf --config-file=/etc/quantum/plugins/openvswitch/ovs_quantum_plugin.ini --log-file=/var/log/quantum/openvswitch-agent.log

post-start script
service openvswitch-switch restart
end script

Revision history for this message

YunQiang Su (wzssyqa) wrote on 2013-03-01:

#16

With cleanup, the suspend function will be unusable.

Revision history for this message

Gary Kotton (garyk) wrote on 2013-03-03:

#17

Sorry, I do not understand the part about the suspend function. Can you please clarify?
Thanks
Gary

Revision history for this message

YunQiang Su (wzssyqa) wrote on 2013-03-04:

#18

It is not caused by cleanup, while it is a bug of quantum itself.

When use it, the suspend instance cannot wake up again.

dan wendlandt (danwent) on 2013-03-04

Changed in quantum:
milestone:	none → grizzly-rc1

dan wendlandt (danwent) on 2013-03-05

summary:

- Tap interface does not automatically get an IP address upon a reboot
+ Tap interface does not automatically get an IP address upon a hypervisor
+ reboot

Revision history for this message

dan wendlandt (danwent) wrote on 2013-03-06:

#19

Ok, we need to figure out what to do with this bug.

My understanding is that when a hypervisor (or a network node?) is rebooted, in some cases, devices do not seem to get IPs.

When I worked with Phil on this thread earlier, it seems like at least part of the problem was that we were only if-up'ing a device if it also needed to be added to ovs. He said that doing the if-up outside of the check if the device already exists helped on RHEL 6.3, but not on Ubuntu. I'm now confused about why that would help at all though, as tap devices should not persist across a reboot of the physical box (I had originally thought this bug was about the reboot or suspend of a VM).

I suspect that garyk is correct that a combination of the resync interval changing and the quantum-cleanup script are a viable explanation. If anyone is able to still repro this, please update this bug.

Changed in quantum:
milestone:	grizzly-rc1 → none
no longer affects:	nova
Changed in quantum:
status:	Confirmed → Incomplete
importance:	High → Medium

Mark McClain (markmcclain) on 2013-03-10

tags:

added: ovs

Revision history for this message

chetandiwani (chetandiwani) wrote on 2013-08-09:

#20

I have modified Multi node setup in to 1 and was facing same problem while Physical Node get rebooted, the Guest was not able to get the IP address.

Setup Details : Ubuntu 12.04.2 LTS : Quantum Version
ii python-quantum 1:2013.1.2-0ubuntu1~cloud0 Quantum is a virutal network service for Openstack - Python library
ii python-quantumclient 1:2.2.0-0ubuntu1~cloud0 client - Quantum is a virtual network service for Openstack
ii quantum-common 1:2013.1.2-0ubuntu1~cloud0 Quantum is a virtual network service for Openstack - common
ii quantum-dhcp-agent 1:2013.1.2-0ubuntu1~cloud0 Quantum is a virtual network service for Openstack - DHCP agent
ii quantum-l3-agent 1:2013.1.2-0ubuntu1~cloud0 Quantum is a virtual network service for Openstack - l3 agent
ii quantum-metadata-agent 1:2013.1.2-0ubuntu1~cloud0 Quantum is a virtual network service for Openstack - metadata agent
ii quantum-plugin-openvswitch 1:2013.1.2-0ubuntu1~cloud0 Quantum is a virtual network service for Openstack - Open vSwitch plugin
ii quantum-plugin-openvswitch-agent 1:2013.1.2-0ubuntu1~cloud0 Quantum is a virtual network service for Openstack - Open vSwitch plugin agent
ii quantum-server 1:2013.1.2-0ubuntu1~cloud0 Quantum is a virtual network service for Openstack - server

For me putting quantum-ovs-cleanup -v &> /root/cleanupovs.log in /etc/rc.local helps the VM to get the IP address.

Revision history for this message

Marios Andreou (marios-b) wrote on 2014-05-26:

#21

is this still a reproducible bug? seems from discussion may be fixed now. Can we mark this bug as done?

Revision history for this message

Ryan Moats (rmoats) wrote on 2015-09-22:

#22

marking invalid as it was incomplete and hasn't been updated in over a year

Changed in neutron:
status:	Incomplete → Invalid

neutron

Tap interface does not automatically get an IP address upon a hypervisor reboot

Bug Description

Other bug subscribers

Remote bug watches