No dhcp lease after shelve unshelve

Bug #1445569 reported by Clark Boylan on 2015-04-17
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Unassigned

Bug Description

This may be related to 1290635 but I am not familiar enough with Nova's dhcp and shelve implementations to know for sure. Also the behavior I am seeing seems to be slightly different.

In the multinode nova-net job (http://logs.openstack.org/88/174288/1/check/check-tempest-dsvm-multinode-full/3e3be58/) during tempest test_shelve_instance test we see dhcp fail when the shelved instance is unshelved:

http://logs.openstack.org/88/174288/1/check/check-tempest-dsvm-multinode-full/3e3be58/console.html#_2015-04-16_11_26_00_029
2015-04-16 11:26:00.029 | Starting network...
2015-04-16 11:26:00.029 | udhcpc (v1.20.1) started
2015-04-16 11:26:00.029 | Sending discover...
2015-04-16 11:26:00.029 | Sending discover...
2015-04-16 11:26:00.029 | Sending discover...
2015-04-16 11:26:00.029 | No lease, failing
2015-04-16 11:26:00.029 | WARN: /etc/rc3.d/S40-network failed
2015-04-16 11:26:00.029 | cirros-ds 'net' up at 187.20

Looking at tempest logs we find that node's MAC address (fa:16:3e:fb:3e:3e):

http://logs.openstack.org/88/174288/1/check/check-tempest-dsvm-multinode-full/3e3be58/console.html#_2015-04-16_11_25_59_976
2015-04-16 11:25:59.976 | Body: {"server": {"status": "ACTIVE", "updated": "2015-04-16T11:16:44Z", "hostId": "d0dc2083935df1bf05cadea5c75358ee9d9e0406887667ea4bb582de", "addresses": {"private": [{"OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:fb:3e:3e", "version": 4, "addr": "10.1.0.6", "OS-EXT-IPS:type": "fixed"}, {"OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:fb:3e:3e", "version": 4, "addr": "172.24.5.6", "OS-EXT-IPS:type": "floating"}]}, "links": [{"href": "http://10.208.224.113:8774/v2/b7b633c0117148628342ab9162d7885e/servers/0e9a79cd-96d5-4fcd-a0db-994638967291", "rel": "self"}, {"href": "http://10.208.224.113:8774/b7b633c0117148628342ab9162d7885e/servers/0e9a79cd-96d5-4fcd-a0db-994638967291", "rel": "bookmark"}], "key_name": "TestShelveInstance-494463835", "image": {"id": "18e4f345-a147-4d0a-922c-46b72b9497e9", "links": [{"href": "http://10.208.224.113:8774/b7b633c0117148628342ab9162d7885e/images/18e4f345-a147-4d0a-922c-46b72b9497e9", "rel": "bookmark"}]}, "OS-EXT-STS:task_state": null, "OS-EXT-STS:vm_state": "active", "OS-SRV-USG:launched_at": "2015-04-16T11:16:44.000000", "flavor": {"id": "42", "links": [{"href": "http://10.208.224.113:8774/b7b633c0117148628342ab9162d7885e/flavors/42", "rel": "bookmark"}]}, "id": "0e9a79cd-96d5-4fcd-a0db-994638967291", "security_groups": [{"name": "TestShelveInstance-909686184"}], "OS-SRV-USG:terminated_at": null, "OS-EXT-AZ:availability_zone": "nova", "user_id": "a02a8bd6d7734cd1a7aebbfdb4a3eb16", "name": "TestShelveInstance-512771961", "created": "2015-04-16T11:15:34Z", "tenant_id": "b7b633c0117148628342ab9162d7885e", "OS-DCF:diskConfig": "MANUAL", "os-extended-volumes:volumes_attached": [], "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 1, "config_drive": "", "metadata": {}}}

According to the logs above MAC addr fa:16:3e:fb:3e:3e should get IP 10.1.0.6 but syslog shows:

http://logs.openstack.org/88/174288/1/check/check-tempest-dsvm-multinode-full/3e3be58/logs/10.176.200.184-subnode/syslog.txt.gz#_Apr_16_11_16_00
devstack-trusty-2-node-rax-iad-2194251-1687 dnsmasq-dhcp[22310]: DHCPDISCOVER(br100) fa:16:3e:fb:3e:3e no address available

I think this points at one of two problems. Either there is a race between booting a node and setting its dnsmasq config with nova-net or nova-net is never setting the dnsmasq config in the first place.

If it is a race then this likely affects other operations. If it is never set at all that may be a shelve unshelve specific issue with restoring instance state.

Joe Gordon (jogo) on 2015-04-23
Changed in nova:
status: New → Confirmed
Joe Gordon (jogo) wrote :

This doesn't happen every time an instance is unshelved into a different node.

Changed in nova:
importance: Undecided → High
Joe Gordon (jogo) wrote :

It looks like tempest doesn't test ssh connectivity most of the time, which explains how we are seeing "no address available" in syslog for successful multinode runs (http://logs.openstack.org/55/138255/17/check/check-tempest-dsvm-multinode-full/878eb11/logs/syslog.txt.gz).

We are not seeing "no address available" in single node runs, so maybe this is a general nova-net multihost or migration issue

tags: added: network shelve
Sean Dague (sdague) wrote :

I believe that https://review.openstack.org/#/c/273042/4 largely solves this issue

Changed in nova:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers