CirrOS cannot get DHCP lease in cloud when CentOS based VM can

Bug #1224618 reported by Chris Lehy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
CirrOS
Incomplete
Medium
Unassigned

Bug Description

I am running an all in one Grizzly cloud built with Red Hat Packstack on a single CentOS 6.4 box. The CenTOS box itself is a VMware virtual machine (version 9).

I can successfully deploy VMs using a CEntOS image. The VMs receive their IP address as expected
If I try to deploy a VM using the Cirros image, the VMS do not receive an IP address successfully, the console log shows the following:

Starting network...
udhcpc (v1.18.5) started
Sending discover...
Sending select for 10.5.63.4...
Sending select for 10.5.63.4...
Sending select for 10.5.63.4...
No lease, failing
WARN: /etc/rc3.d/S40-network failed

I have seen a number of people reporting this problem (an IP address is received but then cannot be acknowledge for some reason)

The fact that the pb does not exhibit when booting a CentOS based VM let me thing that this is specific to CirrOS.

Tags: dhcp grizzly
Revision history for this message
Chris Lehy (clehy) wrote :
Revision history for this message
Harm Weites (harmw) wrote :

Please configure dnsmasq to log its activity so you can atleast verify what IT thinks about the dhcp traffic.

# openstack-config --set /etc/quantum/dhcp_agent.ini DEFAULT dnsmasq_config_file /etc/quantum/dnsmasq.conf

And this goes in the config file:
log-facility = /data/log/quantum/dnsmasq.log
log-dhcp

You should now be able to see if the instance is offered a IP and what actually happens with the offer.

Tsharking the bridge or namespaced interface could be interesting aswell.

Revision history for this message
Stephan Renatus (s-renatus) wrote :

Not sure if this is the same underlying issue, but I just noticed the same problem with CirrOS in OpenStack: dhclient (Ubuntu, but entOS uses that, too) picks up a lease, udhcpc doesn't.

The communications is as follows:

> DISCOVER
< OFFER
> REQUEST
< ACK
> REQUEST
< ACK
...

Using tcpdump, I noticed the following: the DHCPACK has a "bad udp checksum":

> REQUEST

12:19:28.035500 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 324)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from fa:16:3e:76:a8:2a (oui Unknown), length 296, xid 0x9c5cbe30, secs 6, Flags [none] (0x0000)
   Client-Ethernet-Address fa:16:3e:76:a8:2a (oui Unknown)
   Vendor-rfc1048 Extensions
     Magic Cookie 0x63825363
     DHCP-Message Option 53, length 1: Request
     Client-ID Option 61, length 7: ether fa:16:3e:76:a8:2a
     Requested-IP Option 50, length 4: 192.168.0.11
     Server-ID Option 54, length 4: 192.168.0.2
     MSZ Option 57, length 2: 576
     Parameter-Request Option 55, length 7:
       Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname
       Domain-Name, BR, NTP
     Vendor-Class Option 60, length 12: "udhcp 1.18.5"
     Hostname Option 12, length 2: "-v"

< ACK

12:19:28.036401 IP (tos 0xc0, ttl 64, id 46530, offset 0, flags [none], proto UDP (17), length 350)
    192.168.0.2.bootps > 192.168.0.11.bootpc: [bad udp cksum 0x82b9 -> 0x445e!] BOOTP/DHCP, Reply, length 322, xid 0x9c5cbe30, secs 6, Flags [none] (0x0000)
   Your-IP 192.168.0.11
   Server-IP 192.168.0.2
   Client-Ethernet-Address fa:16:3e:76:a8:2a (oui Unknown)
   Vendor-rfc1048 Extensions
     Magic Cookie 0x63825363
     DHCP-Message Option 53, length 1: ACK
     Server-ID Option 54, length 4: 192.168.0.2
     Lease-Time Option 51, length 4: 120
     RN Option 58, length 4: 60
     RB Option 59, length 4: 105
     Subnet-Mask Option 1, length 4: 255.255.255.0
     BR Option 28, length 4: 192.168.0.255
     Domain-Name Option 15, length 14: "openstacklocal"
     Hostname Option 12, length 12: "192-168-0-11"
     Default-Gateway Option 3, length 4: 192.168.0.1
     Domain-Name-Server Option 6, length 4: google-public-dns-a.google.com

On irc://irc.freenode.net/busybox I was told to try a current git checkout, and ta-daaaa, it just works (compiled and tried that on ubuntu, the problem with cirros' udhcpc was replicable there):

root@ubuntu:~/busybox# ./busybox udhcpc -v -i eth1
Adapter index 3
MAC fa:16:3e:76:a8:2a
udhcpc (v1.22.0.git) started
Executing /usr/share/udhcpc/default.script deconfig
Entering listen mode: raw
Opening raw socket on ifindex 3
Got raw socket fd
Attached filter to raw socket fd
Created raw socket
Adapter index 3
MAC fa:16:3e:76:a8:2a
Sending discover...
Waiting on select 3 seconds
Received a packet
Adapter index 3
MAC fa:16:3e:76:a8:2a
Sending select for 192.168.0.11...
Waiting on select 3 seconds
Received a packet
Lease of 192.168.0.11 obtained, lease time 120
Executing /usr/share/udhcpc/default.script bound
/usr/share/udhcpc/default.script: Resetting default routes
SIOCDELRT: No such process

So.... updating busybox might be a solution... (not sure which commit brought the fix, though...)

Revision history for this message
Scott Moser (smoser) wrote :

perhaps a newer busybox will fix this, but the bad checksum comments seem to indicate bug 930962 (redhat bug https://bugzilla.redhat.com/show_bug.cgi?id=910619). Do you have a new enough nova with fix from https://review.openstack.org/#/c/18336/ ?

Ie, I *think* this can be fixed in the host.

Scott Moser (smoser)
Changed in cirros:
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Ian Pilcher (arequipeno) wrote :

Neutron is now running the dnsmasq DHCP agent in authoritative mode.

  https://bugs.launchpad.net/neutron/+bug/1417057

With this change, Cirros instances do not successfully get an IP address, or even boot to a console login prompt, in a high-availability setup with multiple DHCP agents on a tenant network.

I'm thinking you probably want to up the importance of this issue. :-/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.