OpenStack Compute (nova)

Some instances can't connect to metadata due to ARP failure

Bug #719798 reported by Hyunsun Moon on 2011-02-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Vish Ishaya	OpenStack Compute (nova) 2011.2 "cactus"

Bug Description

Instances that have a local route (due to installing Network Manager in ubuntu for example) cannot contact the metadata server. This is because they send out an ARP (who-has) for 169.254.169.254 and never get a response. This issue also affects windows vms. The issue can be solved by giving the ip address to the host that is running nova-network, something along the lines of:

ip addr add 169.254.169.254/32 scope link dev eth1

This causes the network host to respond to the arp and the metadata request succeeds. Nova should automatically add this address to avoid this failure.

Example Error messages are below:

Instance fails to access metadata server at launch resulting it fails to complete initial booting process including sshd startup.

Logs from tty linux.
=========================================

Lease of 10.0.0.5 obtained, lease time 120^M
starting DHCP forEthernet interface eth0 [ ^[[1;32mOK^[[0;39m ]^M
cloud-setup: checking http://169.254.169.254/2009-04-04/meta-data/instance-id^M
cloud-setup: failed 1/30: up 7.97. iid had 1.0^M
cloud-setup: failed 2/30: up 9.18. iid had 1.0^M
cloud-setup: failed 3/30: up 10.35. iid had 1.0^M
cloud-setup: failed 4/30: up 11.52. iid had 1.0^M
cloud-setup: failed 5/30: up 12.70. iid had 1.0^M
cloud-setup: failed 6/30: up 13.88. iid had 1.0^M
cloud-setup: failed 7/30: up 15.06. iid had 1.0^M
cloud-setup: failed 8/30: up 16.24. iid had 1.0^M
cloud-setup: failed 9/30: up 17.43. iid had 1.0^M
cloud-setup: failed 10/30: up 18.62. iid had 1.0^M
cloud-setup: failed 11/30: up 19.81. iid had 1.0^M
cloud-setup: failed 12/30: up 21.00. iid had 1.0^M
cloud-setup: failed 13/30: up 22.20. iid had 1.0^M
cloud-setup: failed 14/30: up 23.40. iid had 1.0^M
cloud-setup: failed 15/30: up 24.60. iid had 1.0^M
cloud-setup: failed 16/30: up 25.80. iid had 1.0^M
cloud-setup: failed 17/30: up 27.01. iid had 1.0^M
cloud-setup: failed 18/30: up 28.22. iid had 1.0^M
cloud-setup: failed 19/30: up 29.43. iid had 1.0^M
cloud-setup: failed 20/30: up 30.65. iid had 1.0^M
cloud-setup: failed 21/30: up 31.86. iid had 1.0^M
cloud-setup: failed 22/30: up 33.08. iid had 1.0^M
cloud-setup: failed 23/30: up 34.30. iid had 1.0^M
cloud-setup: failed 24/30: up 35.60. iid had 1.0^M
cloud-setup: failed 25/30: up 36.89. iid had 1.0^M
cloud-setup: failed 26/30: up 38.11. iid had 1.0^M
cloud-setup: failed 27/30: up 39.34. iid had 1.0^M
cloud-setup: failed 28/30: up 40.56. iid had 1.0^M
cloud-setup: failed 29/30: up 41.82. iid had 1.0^M
cloud-setup: failed 30/30: up 43.05. iid had 1.0^M
cloud-setup: after 30 fails, debugging^M
cloud-setup: running debug (30 tries reached)^M
############ debug start ##############^M
### /etc/rc.d/init.d/sshd start^M
stty: /dev/console^M
generating DSS host key [^[[1;33mWATING^[[0;39m]^[[-11G^[[1;34m..^[[0;39m [ ^[[1;32mOK^[[0;39m ]^M
generating RSA host key [^[[1;33mWATING^[[0;39m]^[[-11G^[[1;34m..^[[0;39m [ ^[[1;32mOK^[[0;39m ]^M
startup dropbear [ ^[[1;32mOK^[[0;39m ]^M
### ifconfig -a^M
eth0 Link encap:Ethernet HWaddr 02:16:3E:57:D3:B5 ^M
=========================================

Logs from UEC image.
=========================================
init: plymouth-splash main process (263) terminated with status 2
init: plymouth main process (48) killed by SEGV signal
cloud-init running: Tue, 15 Feb 2011 09:55:54 +0000. up 30.11 seconds
consuming user data failed!
Traceback (most recent call last):
  File "/usr/bin/cloud-init", line 103, in <module>
    main()
  File "/usr/bin/cloud-init", line 60, in main
    cloud.consume_userdata,[],False)
  File "/usr/lib/python2.6/dist-packages/cloudinit/__init__.py", line 215, in sem_and_run
    if self.sem_has_run(semname,freq): return
  File "/usr/lib/python2.6/dist-packages/cloudinit/__init__.py", line 173, in sem_has_run
    semfile = self.sem_getpath(name,freq)
  File "/usr/lib/python2.6/dist-packages/cloudinit/__init__.py", line 167, in sem_getpath
    freqtok = self.datasource.get_instance_id()
  File "/usr/lib/python2.6/dist-packages/cloudinit/DataSourceEc2.py", line 65, in get_instance_id
    return(self.metadata['instance-id'])
KeyError: 'instance-id'
init: cloud-init main process (334) terminated with status 1
mountall: Event failed
mountall: Plymouth command failed
mountall: Plymouth command failed
mountall: Plymouth command failed
mountall: Plymouth command failed
mountall: Disconnected from Plymouth
init: plymouth-log main process (364) terminated with status 1
* Starting AppArmor profiles [ OK ]
Traceback (most recent call last):
  File "/usr/bin/cloud-init-cfg", line 56, in <module>
Traceback (most recent call last):
  File "/usr/bin/cloud-init-cfg", line 56, in <module>
    main()
    main()
  File "/usr/bin/cloud-init-cfg", line 43, in main
    cc = cloudinit.CloudConfig.CloudConfig(cfg_path)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 42, in __init__
  File "/usr/bin/cloud-init-cfg", line 43, in main
    cc = cloudinit.CloudConfig.CloudConfig(cfg_path)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 42, in __init__
    self.cfg = self.get_config_obj(cfgfile)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 53, in get_config_obj
    f=file(cfgfile)
Traceback (most recent call last):
  File "/usr/bin/cloud-init-cfg", line 56, in <module>
    self.cfg = self.get_config_obj(cfgfile)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 53, in get_config_obj
    f=file(cfgfile)
IOError: IOError: [Errno 2] No such file or directory: '/var/lib/cloud/data/cloud-config.txt'
    main()
  File "/usr/bin/cloud-init-cfg", line 43, in main
    cc = cloudinit.CloudConfig.CloudConfig(cfg_path)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 42, in __init__
    self.cfg = self.get_config_obj(cfgfile)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 53, in get_config_obj
[Errno 2] No such file or directory: '/var/lib/cloud/data/cloud-config.txt'
    f=file(cfgfile)
IOError: [Errno 2] No such file or directory: '/var/lib/cloud/data/cloud-config.txt'
Traceback (most recent call last):
  File "/usr/bin/cloud-init-cfg", line 56, in <module>
    main()
  File "/usr/bin/cloud-init-cfg", line 43, in main
    cc = cloudinit.CloudConfig.CloudConfig(cfg_path)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 42, in __init__
    self.cfg = self.get_config_obj(cfgfile)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 53, in get_config_obj
    f=file(cfgfile)
IOError: [Errno 2] No such file or directory: '/var/lib/cloud/data/cloud-config.txt'
Traceback (most recent call last):
  File "/usr/bin/cloud-init-cfg", line 56, in <module>
    main()
  File "/usr/bin/cloud-init-cfg", line 43, in main
    cc = cloudinit.CloudConfig.CloudConfig(cfg_path)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 42, in __init__
    self.cfg = self.get_config_obj(cfgfile)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 53, in get_config_obj
    f=file(cfgfile)
IOError: [Errno 2] No such file or directory: '/var/lib/cloud/data/cloud-config.txt'
landscape-client is not configured, please run landscape-config.
Traceback (most recent call last):
  File "/usr/bin/cloud-init-cfg", line 56, in <module>
    main()
  File "/usr/bin/cloud-init-cfg", line 43, in main
    cc = cloudinit.CloudConfig.CloudConfig(cfg_path)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 42, in __init__
    self.cfg = self.get_config_obj(cfgfile)
  File "/usr/lib/python2.6/dist-packages/cloudinit/CloudConfig.py", line 53, in get_config_obj
    f=file(cfgfile)
IOError: [Errno 2] No such file or directory: '/var/lib/cloud/data/cloud-config.txt'

See original description

Related branches

lp:~vishvananda/nova/automatic-metadata

Merged into lp:~hudson-openstack/nova/trunk at revision 957

Devin Carlen (community): Approve on 2011-04-07

Soren Hansen (community): Approve on 2011-04-07

Revision history for this message

Thierry Carrez (ttx) wrote on 2011-02-23:

What network mode are you using ? Modes outside VlanNamager require specific routing for metadata server to work.

Changed in nova:
status:	New → Incomplete

Revision history for this message

Hyunsun Moon (hyunsun-moon) wrote on 2011-02-24:

It was default VLAN mode.

Revision history for this message

Wayne A. Walls (wayne-walls) wrote on 2011-02-24:

Greetings!

I've messed around quite a bit with the UEC images, and I found if you add an iptables NAT wherever your nova-api services runs it fixes the boot problems. You would want something like this...

iptables -t nat -A PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination <NOVA-API-SERVER-IP>:8773

Thierry is right though, VlanManager /usually/ doesn't need this, but it's worth a shot. Lastly, to be honest, it looks like there is more going on with your UEC image than just the metadata server not getting contacted...if that was the only case, you'd likely see something more along, 'Cannot reach metadata server ... trying again 1/100.' The image will try 100 times to contact the metadata server, and if it can't it will sometimes continue booting, but in my experience it just loops :(

Give it a try, and let us know how it goes!

Cheers

Revision history for this message

Hyunsun Moon (hyunsun-moon) wrote on 2011-02-24:

Download full text (5.2 KiB)

I've already tried iptables command and it didn't work for me.
The reason I need to access metadata server is for cloudpipe instance, get 'autorun.sh' from the server.

Here's my 'iptables -L' result. "cloud02" the hostname of API Server.
Something wrong?

Chain INPUT (policy DROP)
target prot opt source ACCEPT udp -- anywhere ACCEPT tcp -- anywhere ACCEPT udp -- anywhere ACCEPT tcp -- anywhere ACCEPT all -- anywhere ACCEPT icmp -- anywhere ACCEPT tcp -- anywhere ACCEPT tcp -- anywhere ACCEPT tcp -- anywhere ACCEPT tcp -- anywhere ACCEPT tcp -- anywhere DROP all -- anywhere ACCEPT all -- anywhere ACCEPT tcp -- anywhere ACCEPT udp -- anywhere nova_input all -- anywhere ACCEPT icmp -- anywhere REJECT tcp -- anywhere REJECT all -- anywhere destination
anywhere udp dpt:domain
anywhere tcp dpt:domain
anywhere udp dpt:bootps
anywhere tcp dpt:bootps
anywhere
anywhere icmp any
anywhere state NEW tcp dpt:ftp
anywhere state NEW tcp dpt:ssh
anywhere state NEW tcp dpt:telnet
anywhere state NEW tcp dpt:smtp
anywhere state NEW tcp dpt:www
anywhere state INVALID
anywhere state RELATED,ESTABLISHED
cloud02 tcp dpt:ssh
anywhere udp dpt:ntp
anywhere
anywhere
anywhere reject-with tcp-reset
anywhere reject-with icmp-port-unreachable

Chain nova-fallback (1 references)
target prot opt source de...

I've already tried iptables command and it didn't work for me.
The reason I need to access metadata server is for cloudpipe instance, get 'autorun.sh' from the server.

Here's my 'iptables -L' result. "cloud02" the hostname of API Server.
Something wrong?

Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     udp  --  anywhere             anywhere            udp dpt:domain
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:domain
ACCEPT     udp  --  anywhere             anywhere            udp dpt:bootps
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:bootps
ACCEPT     all  --  anywhere             anywhere
ACCEPT     icmp --  anywhere             anywhere            icmp any
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:ftp
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:ssh
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:telnet
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:smtp
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:www
DROP       all  --  anywhere             anywhere            state INVALID
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:ssh
ACCEPT     udp  --  anywhere             anywhere            udp dpt:ntp
nova_input  all  --  anywhere             anywhere
ACCEPT     icmp --  anywhere             anywhere
REJECT     tcp  --  anywhere             anywhere            reject-with tcp-reset
REJECT     all  --  anywhere             anywhere            reject-with icmp-port-unreachable

Chain FORWARD (policy DROP)
target     prot opt source               destination
nova-local  all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     udp  --  anywhere             10.0.0.2            udp dpt:openvpn
ACCEPT     all  --  anywhere             192.168.122.0/24    state RELATED,ESTABLISHED
ACCEPT     all  --  192.168.122.0/24     anywhere
ACCEPT     all  --  anywhere             anywhere
REJECT     all  --  anywhere             anywhere            reject-with icmp-port-unreachable
REJECT     all  --  anywhere             anywhere            reject-with icmp-port-unreachable
DROP       all  --  anywhere             anywhere            state INVALID
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED
TCPMSS     tcp  --  anywhere             anywhere            tcp flags:SYN,RST/SYN TCPMSS clamp to PMTU
nova_forward  all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
nova-local  all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere            state INVALID
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED
nova_output  all  --  anywhere             anywhere

Chain nova-fallback (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere

Chain nova-inst-1 (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere            state INVALID
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED
nova-sg-1  all  --  anywhere             anywhere
ACCEPT     udp  --  10.0.0.1             anywhere            udp spt:bootps dpt:bootpc
ACCEPT     all  --  10.0.0.0/26          anywhere
nova-fallback  all  --  anywhere             anywhere

Chain nova-local (2 references)
target     prot opt source               destination
nova-inst-1  all  --  anywhere             10.0.0.3

Chain nova-sg-1 (1 references)
target     prot opt source               destination

Chain nova_forward (1 references)
target     prot opt source               destination

Chain nova_input (1 references)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:8649
ACCEPT     udp  --  anywhere             cloud02             udp dpt:8649
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:www
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:https
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:3333
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:8773
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:6379
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:mysql
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:4369
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:amqp
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:53284
ACCEPT     tcp  --  10.0.0.0/12          anywhere            tcp dpt:domain
ACCEPT     udp  --  10.0.0.0/12          anywhere            udp dpt:domain
ACCEPT     udp  --  anywhere             anywhere            udp dpt:bootps
ACCEPT     tcp  --  anywhere             cloud02             tcp dpt:ldap

Chain nova_output (1 references)
target     prot opt source               destination

Revision history for this message

Thierry Carrez (ttx) wrote on 2011-02-24:

I don't really know where this fails, but the plan is to simplify metadata access by serving metadata from the local compute node rather than from the API node. This should then work in every network mode and not rely on fancy routes/rules/bridges setup.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-03-21:

If this is a desktop image, you may have to give the 169.254 address to the network host:
something like:
ip addr add 169.254.169.254/32 scope link dev eth1
This will allow it to arp for the address. The eth device that you add the address to isn't particularly important, although if you decide to add it to br100 you should probably use scope global instead of scope link, or the ordering of ip addresses can sometimes mess up dhcp.
If this is not a desktop image, then you may be having issues with your forwarding rules. Check:
iptables -L -n -v
for the 169.254 rule. Make sure that the rule has the proper ip for your api server and make sure that the rule is actually getting hit properly.

The ip addr add command really needs to be done automatically. I consider this a must have for cactus. The workaround of adding it manually is too much to expect users to have to do

Vish Ishaya (vishvananda) on 2011-03-28

Changed in nova:
status:	Incomplete → Triaged
importance:	Undecided → High
summary:	- Instance fails to access metadata server + Some instances can't connect to metadata due to ARP failure
description:	updated

Vish Ishaya (vishvananda) on 2011-04-06

Changed in nova:
assignee:	nobody → Vish Ishaya (vishvananda)
milestone:	none → cactus-gamma
status:	Triaged → In Progress

Thierry Carrez (ttx) on 2011-04-08

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2011-04-08

Changed in nova:
milestone:	cactus-gamma → none

Thierry Carrez (ttx) on 2011-04-15

Changed in nova:
milestone:	none → 2011.2
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.