CirrOS

CirrOS cannot get DHCP lease in cloud when CentOS based VM can

Bug #1224618 reported by Chris Lehy on 2013-09-12

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	CirrOS	Incomplete	Medium	Unassigned

Bug Description

I am running an all in one Grizzly cloud built with Red Hat Packstack on a single CentOS 6.4 box. The CenTOS box itself is a VMware virtual machine (version 9).

I can successfully deploy VMs using a CEntOS image. The VMs receive their IP address as expected
If I try to deploy a VM using the Cirros image, the VMS do not receive an IP address successfully, the console log shows the following:

Starting network...
udhcpc (v1.18.5) started
Sending discover...
Sending select for 10.5.63.4...
Sending select for 10.5.63.4...
Sending select for 10.5.63.4...
No lease, failing
WARN: /etc/rc3.d/S40-network failed

I have seen a number of people reporting this problem (an IP address is received but then cannot be acknowledge for some reason)

The fact that the pb does not exhibit when booting a CentOS based VM let me thing that this is specific to CirrOS.

Tags:

Revision history for this message

Chris Lehy (clehy) wrote on 2013-09-12:

console log Edit (18.8 KiB, text/plain)

Revision history for this message

Harm Weites (harmw) wrote on 2013-09-20:

Please configure dnsmasq to log its activity so you can atleast verify what IT thinks about the dhcp traffic.

# openstack-config --set /etc/quantum/dhcp_agent.ini DEFAULT dnsmasq_config_file /etc/quantum/dnsmasq.conf

And this goes in the config file:
log-facility = /data/log/quantum/dnsmasq.log
log-dhcp

You should now be able to see if the instance is offered a IP and what actually happens with the offer.

Tsharking the bridge or namespaced interface could be interesting aswell.

Revision history for this message

Stephan Renatus (s-renatus) wrote on 2013-10-01:

Not sure if this is the same underlying issue, but I just noticed the same problem with CirrOS in OpenStack: dhclient (Ubuntu, but entOS uses that, too) picks up a lease, udhcpc doesn't.

The communications is as follows:

> DISCOVER
< OFFER
> REQUEST
< ACK
> REQUEST
< ACK
...

Using tcpdump, I noticed the following: the DHCPACK has a "bad udp checksum":

> REQUEST

12:19:28.035500 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 324)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from fa:16:3e:76:a8:2a (oui Unknown), length 296, xid 0x9c5cbe30, secs 6, Flags [none] (0x0000)
   Client-Ethernet-Address fa:16:3e:76:a8:2a (oui Unknown)
   Vendor-rfc1048 Extensions
     Magic Cookie 0x63825363
     DHCP-Message Option 53, length 1: Request
     Client-ID Option 61, length 7: ether fa:16:3e:76:a8:2a
     Requested-IP Option 50, length 4: 192.168.0.11
     Server-ID Option 54, length 4: 192.168.0.2
     MSZ Option 57, length 2: 576
     Parameter-Request Option 55, length 7:
       Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname
       Domain-Name, BR, NTP
     Vendor-Class Option 60, length 12: "udhcp 1.18.5"
     Hostname Option 12, length 2: "-v"

< ACK

12:19:28.036401 IP (tos 0xc0, ttl 64, id 46530, offset 0, flags [none], proto UDP (17), length 350)
    192.168.0.2.bootps > 192.168.0.11.bootpc: [bad udp cksum 0x82b9 -> 0x445e!] BOOTP/DHCP, Reply, length 322, xid 0x9c5cbe30, secs 6, Flags [none] (0x0000)
   Your-IP 192.168.0.11
   Server-IP 192.168.0.2
   Client-Ethernet-Address fa:16:3e:76:a8:2a (oui Unknown)
   Vendor-rfc1048 Extensions
     Magic Cookie 0x63825363
     DHCP-Message Option 53, length 1: ACK
     Server-ID Option 54, length 4: 192.168.0.2
     Lease-Time Option 51, length 4: 120
     RN Option 58, length 4: 60
     RB Option 59, length 4: 105
     Subnet-Mask Option 1, length 4: 255.255.255.0
     BR Option 28, length 4: 192.168.0.255
     Domain-Name Option 15, length 14: "openstacklocal"
     Hostname Option 12, length 12: "192-168-0-11"
     Default-Gateway Option 3, length 4: 192.168.0.1
     Domain-Name-Server Option 6, length 4: google-public-dns-a.google.com

On irc://irc.freenode.net/busybox I was told to try a current git checkout, and ta-daaaa, it just works (compiled and tried that on ubuntu, the problem with cirros' udhcpc was replicable there):

root@ubuntu:~/busybox# ./busybox udhcpc -v -i eth1
Adapter index 3
MAC fa:16:3e:76:a8:2a
udhcpc (v1.22.0.git) started
Executing /usr/share/udhcpc/default.script deconfig
Entering listen mode: raw
Opening raw socket on ifindex 3
Got raw socket fd
Attached filter to raw socket fd
Created raw socket
Adapter index 3
MAC fa:16:3e:76:a8:2a
Sending discover...
Waiting on select 3 seconds
Received a packet
Adapter index 3
MAC fa:16:3e:76:a8:2a
Sending select for 192.168.0.11...
Waiting on select 3 seconds
Received a packet
Lease of 192.168.0.11 obtained, lease time 120
Executing /usr/share/udhcpc/default.script bound
/usr/share/udhcpc/default.script: Resetting default routes
SIOCDELRT: No such process

So.... updating busybox might be a solution... (not sure which commit brought the fix, though...)

Not sure if this is the same underlying issue, but I just noticed the same problem with CirrOS in OpenStack: dhclient (Ubuntu, but entOS uses that, too) picks up a lease, udhcpc doesn't.

The communications is as follows:

> DISCOVER
< OFFER
> REQUEST
< ACK
> REQUEST
< ACK
...

Using tcpdump, I noticed the following:  the DHCPACK has a "bad udp checksum":

> REQUEST

12:19:28.035500 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 324)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from fa:16:3e:76:a8:2a (oui Unknown), length 296, xid 0x9c5cbe30, secs 6, Flags [none] (0x0000)
	  Client-Ethernet-Address fa:16:3e:76:a8:2a (oui Unknown)
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Request
	    Client-ID Option 61, length 7: ether fa:16:3e:76:a8:2a
	    Requested-IP Option 50, length 4: 192.168.0.11
	    Server-ID Option 54, length 4: 192.168.0.2
	    MSZ Option 57, length 2: 576
	    Parameter-Request Option 55, length 7:
	      Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname
	      Domain-Name, BR, NTP
	    Vendor-Class Option 60, length 12: "udhcp 1.18.5"
	    Hostname Option 12, length 2: "-v"

< ACK

12:19:28.036401 IP (tos 0xc0, ttl 64, id 46530, offset 0, flags [none], proto UDP (17), length 350)
    192.168.0.2.bootps > 192.168.0.11.bootpc: [bad udp cksum 0x82b9 -> 0x445e!] BOOTP/DHCP, Reply, length 322, xid 0x9c5cbe30, secs 6, Flags [none] (0x0000)
	  Your-IP 192.168.0.11
	  Server-IP 192.168.0.2
	  Client-Ethernet-Address fa:16:3e:76:a8:2a (oui Unknown)
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: ACK
	    Server-ID Option 54, length 4: 192.168.0.2
	    Lease-Time Option 51, length 4: 120
	    RN Option 58, length 4: 60
	    RB Option 59, length 4: 105
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    BR Option 28, length 4: 192.168.0.255
	    Domain-Name Option 15, length 14: "openstacklocal"
	    Hostname Option 12, length 12: "192-168-0-11"
	    Default-Gateway Option 3, length 4: 192.168.0.1
	    Domain-Name-Server Option 6, length 4: google-public-dns-a.google.com

On irc://irc.freenode.net/busybox I was told to try a current git checkout, and ta-daaaa, it just works (compiled and tried that on ubuntu, the problem with cirros' udhcpc was replicable there):

So.... updating busybox might be a solution...  (not sure which commit brought the fix, though...)

Revision history for this message

Scott Moser (smoser) wrote on 2013-11-19:

perhaps a newer busybox will fix this, but the bad checksum comments seem to indicate bug 930962 (redhat bug https://bugzilla.redhat.com/show_bug.cgi?id=910619). Do you have a new enough nova with fix from https://review.openstack.org/#/c/18336/ ?

Ie, I *think* this can be fixed in the host.

Scott Moser (smoser) on 2013-12-05

Changed in cirros:
status:	New → Incomplete
importance:	Undecided → Medium

Revision history for this message

Ian Pilcher (arequipeno) wrote on 2015-05-15:

Neutron is now running the dnsmasq DHCP agent in authoritative mode.

https://bugs.launchpad.net/neutron/+bug/1417057

With this change, Cirros instances do not successfully get an IP address, or even boot to a console login prompt, in a high-availability setup with multiple DHCP agents on a tenant network.

I'm thinking you probably want to up the importance of this issue. :-/