Cisco Openstack

Quantum: Instances/Router cannot reach meta_ip service

Bug #1089055 reported by Shannon McFarland on 2012-12-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cisco Openstack	Fix Committed	Critical	Daneyon Hansen
	Folsom	Fix Released	Critical	Daneyon Hansen	Cisco Openstack 2012.2.2

Bug Description

The Quantum configuration used in the COE does not seem to work with instances as they cannot reach (nor can a Quantum router) the meta_ip service:

1) We need precise documentation that describes how the eth1 interface should be configured to ensure that when Quantum runs the bridges are associated correctly, and that when manual route commands and Quantum network/subnet/router configurations are entered that things are associated properly with eth1 when required.

2) Steps to produce issue:
root@control-01:~# quantum router-list
+--------------------------------------+---------+--------------------------------------------------------+
| id | name | external_gateway_info |
+--------------------------------------+---------+--------------------------------------------------------+
| 57891ecb-2a3c-4d7a-8649-ce87ddd18985 | router1 | {"network_id": "f327fa57-cf99-4218-8e50-d0018ffb2b7d"} |
+--------------------------------------+---------+--------------------------------------------------------+

root@control-01:~# quantum port-list -- --device_id 57891ecb-2a3c-4d7a-8649-ce87ddd18985 --device_owner network:router_gateway
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
| 78df5a76-b82c-471c-a6a6-a39348df17ed | | fa:16:3e:00:25:1c | {"subnet_id": "876bd637-e6f8-4154-8c25-1150da4c6e5d", "ip_address": "192.168.238.3"} |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+

root@control-01:~#route add -net 10.10.10.0/24 gw 192.168.238.3

root@control-01:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.121.13.1 0.0.0.0 UG 100 0 0 eth0
10.10.10.0 192.168.238.3 255.255.255.0 UG 0 0 0 eth1
10.121.13.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.238.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1

Now, create keys, launch instance and try to ssh into instance - watch the console log for the instance. After awhile it will timeout not being able to reach the meta service:

root@control-01:~# nova console-log --length=25 vm1
ci-info: eth0 : 1 10.10.10.3 255.255.255.0 fa:16:3e:64:21:92
ci-info: route-0: 0.0.0.0 10.10.10.1 0.0.0.0 eth0 UG
ci-info: route-1: 10.10.10.0 0.0.0.0 255.255.255.0 eth0 U
cloud-init start running: Tue, 11 Dec 2012 17:32:29 +0000. up 29.07 seconds
2012-12-11 17:33:20,300 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [50/120s]: url error [timed out]
2012-12-11 17:34:11,352 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [101/120s]: url error [timed out]
2012-12-11 17:34:29,371 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [119/120s]: url error [timed out]
2012-12-11 17:34:30,373 - DataSourceEc2.py[CRITICAL]: giving up on md after 120 seconds

no instance data found in start
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
[74G[ OK ]g AppArmor profiles [80G
landscape-client is not configured, please run landscape-config.
* Stopping System V initialisation compatibility[74G[ OK ]
* Starting System V runlevel compatibility[74G[ OK ]
* Starting ACPI daemon[74G[ OK ]
* Starting save kernel messages[74G[ OK ]
* Starting regular background program processing daemon[74G[ OK ]
* Starting deferred execution scheduler[74G[ OK ]
* Starting automatic crash report generation[74G[ OK ]
* Starting CPU interrupts balancing daemon[74G[ OK ]
* Stopping save kernel messages[74G[ OK ]
* Stopping System V runlevel compatibility[74G[ OK ]
* Starting execute cloud user/final scripts[74G[ OK ]

Pinging works from qrouter to instance:
root@control-01:~# ip netns exec qrouter-57891ecb-2a3c-4d7a-8649-ce87ddd18985 ping 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_req=1 ttl=64 time=0.633 ms
64 bytes from 10.10.10.3: icmp_req=2 ttl=64 time=0.526 ms
64 bytes from 10.10.10.3: icmp_req=3 ttl=64 time=0.634 ms
64 bytes from 10.10.10.3: icmp_req=4 ttl=64 time=0.413 ms
^C
--- 10.10.10.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.413/0.551/0.634/0.094 ms

SSH does not work (no keys injected to instance because of connectivity issue to meta service):

root@control-01:~# ip netns exec qrouter-57891ecb-2a3c-4d7a-8649-ce87ddd18985 ssh ubuntu@10.10.10.3
Read from socket failed: Connection reset by peer

Qrouter cannot reach the IP address of the meta service either (COE uses the eth0 IP address for this service)
root@control-01:~# ip netns exec qrouter-57891ecb-2a3c-4d7a-8649-ce87ddd18985 ping 10.121.13.50
PING 10.121.13.50 (10.121.13.50) 56(84) bytes of data.
^C
--- 10.121.13.50 ping statistics ---
11 packets transmitted, 0 received, 100% packet loss, time 10031ms

So, either the node interface configuration (COE does not setup eth1 at all) and therefore possible bridges are hosed for access, something is broken in the quantum services, and/or a route is needed (and therefore documented) to allow access from the qrouter/instances to the meta_ip service.

Tags:

Shannon McFarland (shmcfarl) on 2012-12-11

Changed in openstack-cisco:
importance:	Undecided → Critical

Shannon McFarland (shmcfarl) on 2012-12-11

Changed in openstack-cisco:
status:	New → Confirmed
assignee:	nobody → Edgar Magana (emagana)

Chris Ricker (chris-ricker) on 2012-12-12

Changed in openstack-cisco:
assignee:	Edgar Magana (emagana) → Daneyon Hansen (danehans)

Revision history for this message

Chris Ricker (chris-ricker) wrote on 2012-12-13:

Remaining need is to document the configuration of the upstream router

Chris Ricker (chris-ricker) on 2012-12-14

Changed in openstack-cisco:
status:	Confirmed → In Progress

Revision history for this message

Mark T. Voelker (mvoelker) wrote on 2012-12-18:

> Remaining need is to document the configuration of the upstream router

And say what pulls fixed the problem. =)

Revision history for this message

Robert Starmer (starmer) wrote on 2012-12-18:

Currently documented in two locations:

http://docwiki.cisco.com/wiki/Cisco_OpenStack_Edition:_Folsom_Manual_Install
http://docwiki.cisco.com/wiki/OpenStack:Folsom#Deploy_Your_First_VM

Daneyon Hansen (danehans) on 2013-01-16

Changed in openstack-cisco:
status:	In Progress → Fix Committed

Revision history for this message

Robert Starmer (starmer) wrote on 2013-01-24:

This has been validated with the current (as of 23-010-2012 and pull 31 against the manifests).

An alternate fix is in investigation, but doesn't negate the current solution (with external router).

Mark T. Voelker (mvoelker) on 2013-02-05

Changed in openstack-cisco:
milestone:	none → 2012.2.2

Mark T. Voelker (mvoelker) on 2013-02-05

Changed in openstack-cisco:
milestone:	2012.2.2 → none

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.