Quantum: Instances/Router cannot reach meta_ip service

Bug #1089055 reported by Shannon McFarland
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cisco Openstack
Fix Committed
Critical
Daneyon Hansen
Folsom
Fix Released
Critical
Daneyon Hansen

Bug Description

The Quantum configuration used in the COE does not seem to work with instances as they cannot reach (nor can a Quantum router) the meta_ip service:

1) We need precise documentation that describes how the eth1 interface should be configured to ensure that when Quantum runs the bridges are associated correctly, and that when manual route commands and Quantum network/subnet/router configurations are entered that things are associated properly with eth1 when required.

2) Steps to produce issue:
root@control-01:~# quantum router-list
+--------------------------------------+---------+--------------------------------------------------------+
| id | name | external_gateway_info |
+--------------------------------------+---------+--------------------------------------------------------+
| 57891ecb-2a3c-4d7a-8649-ce87ddd18985 | router1 | {"network_id": "f327fa57-cf99-4218-8e50-d0018ffb2b7d"} |
+--------------------------------------+---------+--------------------------------------------------------+

root@control-01:~# quantum port-list -- --device_id 57891ecb-2a3c-4d7a-8649-ce87ddd18985 --device_owner network:router_gateway
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
| 78df5a76-b82c-471c-a6a6-a39348df17ed | | fa:16:3e:00:25:1c | {"subnet_id": "876bd637-e6f8-4154-8c25-1150da4c6e5d", "ip_address": "192.168.238.3"} |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+

root@control-01:~#route add -net 10.10.10.0/24 gw 192.168.238.3

root@control-01:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.121.13.1 0.0.0.0 UG 100 0 0 eth0
10.10.10.0 192.168.238.3 255.255.255.0 UG 0 0 0 eth1
10.121.13.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.238.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1

Now, create keys, launch instance and try to ssh into instance - watch the console log for the instance. After awhile it will timeout not being able to reach the meta service:

root@control-01:~# nova console-log --length=25 vm1
ci-info: eth0 : 1 10.10.10.3 255.255.255.0 fa:16:3e:64:21:92
ci-info: route-0: 0.0.0.0 10.10.10.1 0.0.0.0 eth0 UG
ci-info: route-1: 10.10.10.0 0.0.0.0 255.255.255.0 eth0 U
cloud-init start running: Tue, 11 Dec 2012 17:32:29 +0000. up 29.07 seconds
2012-12-11 17:33:20,300 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [50/120s]: url error [timed out]
2012-12-11 17:34:11,352 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [101/120s]: url error [timed out]
2012-12-11 17:34:29,371 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [119/120s]: url error [timed out]
2012-12-11 17:34:30,373 - DataSourceEc2.py[CRITICAL]: giving up on md after 120 seconds

no instance data found in start
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
[74G[ OK ]g AppArmor profiles [80G
landscape-client is not configured, please run landscape-config.
 * Stopping System V initialisation compatibility[74G[ OK ]
 * Starting System V runlevel compatibility[74G[ OK ]
 * Starting ACPI daemon[74G[ OK ]
 * Starting save kernel messages[74G[ OK ]
 * Starting regular background program processing daemon[74G[ OK ]
 * Starting deferred execution scheduler[74G[ OK ]
 * Starting automatic crash report generation[74G[ OK ]
 * Starting CPU interrupts balancing daemon[74G[ OK ]
 * Stopping save kernel messages[74G[ OK ]
 * Stopping System V runlevel compatibility[74G[ OK ]
 * Starting execute cloud user/final scripts[74G[ OK ]

Pinging works from qrouter to instance:
root@control-01:~# ip netns exec qrouter-57891ecb-2a3c-4d7a-8649-ce87ddd18985 ping 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_req=1 ttl=64 time=0.633 ms
64 bytes from 10.10.10.3: icmp_req=2 ttl=64 time=0.526 ms
64 bytes from 10.10.10.3: icmp_req=3 ttl=64 time=0.634 ms
64 bytes from 10.10.10.3: icmp_req=4 ttl=64 time=0.413 ms
^C
--- 10.10.10.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.413/0.551/0.634/0.094 ms

SSH does not work (no keys injected to instance because of connectivity issue to meta service):

root@control-01:~# ip netns exec qrouter-57891ecb-2a3c-4d7a-8649-ce87ddd18985 ssh ubuntu@10.10.10.3
Read from socket failed: Connection reset by peer

Qrouter cannot reach the IP address of the meta service either (COE uses the eth0 IP address for this service)
root@control-01:~# ip netns exec qrouter-57891ecb-2a3c-4d7a-8649-ce87ddd18985 ping 10.121.13.50
PING 10.121.13.50 (10.121.13.50) 56(84) bytes of data.
^C
--- 10.121.13.50 ping statistics ---
11 packets transmitted, 0 received, 100% packet loss, time 10031ms

So, either the node interface configuration (COE does not setup eth1 at all) and therefore possible bridges are hosed for access, something is broken in the quantum services, and/or a route is needed (and therefore documented) to allow access from the qrouter/instances to the meta_ip service.

Tags: 2012.2.2
Changed in openstack-cisco:
importance: Undecided → Critical
Changed in openstack-cisco:
status: New → Confirmed
assignee: nobody → Edgar Magana (emagana)
Changed in openstack-cisco:
assignee: Edgar Magana (emagana) → Daneyon Hansen (danehans)
Revision history for this message
Chris Ricker (chris-ricker) wrote :

Remaining need is to document the configuration of the upstream router

Changed in openstack-cisco:
status: Confirmed → In Progress
Revision history for this message
Mark T. Voelker (mvoelker) wrote :

> Remaining need is to document the configuration of the upstream router

And say what pulls fixed the problem. =)

Revision history for this message
Robert Starmer (starmer) wrote :
Changed in openstack-cisco:
status: In Progress → Fix Committed
Revision history for this message
Robert Starmer (starmer) wrote :

This has been validated with the current (as of 23-010-2012 and pull 31 against the manifests).

An alternate fix is in investigation, but doesn't negate the current solution (with external router).

Changed in openstack-cisco:
milestone: none → 2012.2.2
Changed in openstack-cisco:
milestone: 2012.2.2 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.