VM's cannot have ssh key injected because can't access metadata service

Bug #1702257 reported by Arx Cruz on 2017-07-04
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Unassigned

Bug Description

The periodic tempest job is failing in the following tests:

tempest.scenario.test_network_v6.TestGettingAddress.test_dhcp6_stateless_from_os
tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os
tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_multi_prefix_dhcpv6_stateless
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_multi_prefix_slaac
tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac

The reason is that the VM's cannot reach the service metadata at 169.254.169.254 and so don't get the ssh key injected, which is used to access the VM, as you can see in the nova console-log below

$ nova console-log edd925b0-c89f-40b9-bc77-15cbf2c065cc
info: initramfs: up at 3.26
NOCHANGE: partition 1 is size 64260. it cannot be grown
info: initramfs loading root from /dev/vda1
info: /etc/init.d/rc.sysinit: up at 4.46
info: container: none
Starting logging: OK
modprobe: module virtio_blk not found in modules.dep
modprobe: module virtio_net not found in modules.dep
WARN: /etc/rc3.d/S10-load-modules failed
Initializing random number generator... done.
Starting acpid: OK
cirros-ds 'local' up at 6.78
no results found for mode=local. up 7.35. searched: nocloud configdrive ec2
Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending select for 10.100.0.5...
Lease of 10.100.0.5 obtained, lease time 86400
route: SIOCADDRT: File exists
WARN: failed: route add -net "0.0.0.0/0" gw "10.100.0.1"
cirros-ds 'net' up at 8.87
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 9.12. request failed
failed 2/20: up 12.53. request failed
failed 3/20: up 15.38. request failed
failed 4/20: up 18.20. request failed
failed 5/20: up 20.95. request failed
failed 6/20: up 24.38. request failed
failed 7/20: up 27.11. request failed
failed 8/20: up 29.96. request failed
failed 9/20: up 32.69. request failed
failed 10/20: up 35.98. request failed
failed 11/20: up 38.53. request failed
failed 12/20: up 41.74. request failed
failed 13/20: up 44.32. request failed
failed 14/20: up 47.37. request failed
failed 15/20: up 49.91. request failed
failed 16/20: up 53.04. request failed
failed 17/20: up 55.78. request failed
failed 18/20: up 58.82. request failed

The tests consist in create a vm using a keypair and try to access the vm with that keypair. If you take a look in the test traceback the test can reach the vm, but cannot access with an authentication error

[stack@undercloud tempest]$ python -m testtools.run tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac
Tests running...
======================================================================
ERROR: tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac[compute,id-dec222b1-180c-4098-b8c5-cc1b8342d611,network,slow]
----------------------------------------------------------------------

...

2017-07-04 07:56:55,117 10972 INFO [tempest.lib.common.ssh] Creating ssh connection to '192.168.24.153:22' as 'cirros' with public key authentication
2017-07-04 07:57:01,134 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 1. Retry after 2 seconds.
2017-07-04 07:57:04,140 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 2. Retry after 3 seconds.
2017-07-04 07:57:07,670 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 3. Retry after 4 seconds.
2017-07-04 07:57:12,183 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 4. Retry after 5 seconds.
2017-07-04 07:57:17,696 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 5. Retry after 6 seconds.
2017-07-04 07:57:24,210 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 6. Retry after 7 seconds.
2017-07-04 07:57:31,721 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 7. Retry after 8 seconds.
2017-07-04 07:57:40,238 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 8. Retry after 9 seconds.
2017-07-04 07:57:49,752 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 9. Retry after 10 seconds.
2017-07-04 07:58:00,267 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 ([Errno None] Unable to connect to port 22 on 192.168.24.153). Number attempts: 10. Retry after 11 seconds.
2017-07-04 07:58:11,839 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 07:58:12,019 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 07:58:12,146 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 11. Retry after 12 seconds.
2017-07-04 07:58:24,670 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 07:58:24,815 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 07:58:24,938 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 12. Retry after 13 seconds.
2017-07-04 07:58:38,463 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 07:58:38,598 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 07:58:38,719 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 13. Retry after 14 seconds.
2017-07-04 07:58:53,251 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 07:58:53,381 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 07:58:53,503 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 14. Retry after 15 seconds.
2017-07-04 07:59:09,032 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 07:59:09,134 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 07:59:09,256 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 15. Retry after 16 seconds.
2017-07-04 07:59:25,784 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 07:59:25,914 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 07:59:26,036 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 16. Retry after 17 seconds.
2017-07-04 07:59:43,571 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 07:59:43,700 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 07:59:43,821 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 17. Retry after 18 seconds.
2017-07-04 08:00:02,358 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 08:00:02,489 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 08:00:02,612 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 18. Retry after 19 seconds.
2017-07-04 08:00:22,151 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 08:00:22,283 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 08:00:22,404 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 19. Retry after 20 seconds.
2017-07-04 08:00:42,959 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 08:00:43,103 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 08:00:43,220 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 20. Retry after 21 seconds.
2017-07-04 08:01:04,753 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 08:01:04,894 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 08:01:05,016 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 21. Retry after 22 seconds.
2017-07-04 08:01:27,555 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 08:01:27,695 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 08:01:27,815 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 22. Retry after 23 seconds.
2017-07-04 08:01:51,354 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 08:01:51,484 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 08:01:51,605 10972 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 (Authentication failed.). Number attempts: 23. Retry after 24 seconds.
2017-07-04 08:02:16,149 10972 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2017-07-04 08:02:16,278 10972 INFO [paramiko.transport] Authentication (publickey) failed.
2017-07-04 08:02:16,401 10972 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.153 after 23 attempts

...

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/test.py", line 96, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 234, in test_multi_prefix_slaac
    self._prepare_and_test(address6_mode='slaac', n_subnets6=2)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 166, in _prepare_and_test
    sshv4_1, ips_from_api_1, sid1 = self.prepare_server(networks=net_list)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 134, in prepare_server
    username=username, server=srv)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 342, in get_remote_client
    linux_client.validate_authentication()
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 57, in wrapper
    six.reraise(*original_exception)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 30, in wrapper
    return function(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 113, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 207, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 121, in _get_ssh_connection
    password=self.password)
tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.153 via SSH timed out.
User: cirros, Password: None

It's possible to access the vm through username cirros and password cubswin:), but the test was designed to access the vm through ssh keys.

(overcloud) [stack@undercloud ~]$ nova list --all-tenant
+--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+---------------------------------------------------------------------------------------------------------------------+
| ID | Name | Tenant ID | Status | Task State | Power State | Networks |
+--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+---------------------------------------------------------------------------------------------------------------------+
| edd925b0-c89f-40b9-bc77-15cbf2c065cc | tempest-TestGettingAddress-server-1670975734 | 3f4003b3f3964b288d5b43ceb141b2f7 | ACTIVE | - | Running | tempest-network-smoke--180109034=2003::f816:3eff:fed7:68a7, 10.100.0.5, 2003::1:f816:3eff:fed7:68a7, 192.168.24.153 |
+--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+---------------------------------------------------------------------------------------------------------------------+
(overcloud) [stack@undercloud ~]$ ssh cirros@192.168.24.153
The authenticity of host '192.168.24.153 (192.168.24.153)' can't be established.
RSA key fingerprint is 3f:1c:28:97:30:2e:65:23:97:ef:a4:b2:6e:6d:aa:a9.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.24.153' (RSA) to the list of known hosts.
cirros@192.168.24.153's password:
Permission denied, please try again.
cirros@192.168.24.153's password:
$ ls
$ ls -lah
total 5
drwxr-xr-x 2 cirros cirros 1.0K Jul 4 09:00 .
drwxrwxr-x 4 root root 1.0K May 7 2015 ..
-rw------- 1 cirros cirros 11 Jul 4 09:00 .ash_history
-rwxr-xr-x 1 cirros cirros 43 May 7 2015 .profile
-rwxr-xr-x 1 cirros cirros 66 May 7 2015 .shrc

Notice that one of the tests failing is the test_minimum_basic_scenario and this is a very important test, so please increase the importance of this bug.

Here's the route from the undercloud, and the controller in overcloud

[stack@undercloud tempest]$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.23.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.24.0 0.0.0.0 255.255.255.0 U 0 0 0 br-ctlplane

[heat-admin@overcloud-controller-0 ~]$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 0 0 0 br-ex
169.254.169.254 gateway 255.255.255.255 UGH 0 0 0 br-ex
192.168.24.0 0.0.0.0 255.255.255.0 U 0 0 0 br-ex

Arx Cruz (arxcruz) on 2017-07-04
description: updated
wes hayutin (weshayutin) on 2017-07-04
Changed in tripleo:
importance: Medium → High
milestone: none → pike-3
status: New → Triaged
tags: added: alert ci promotion-blocker
Changed in tripleo:
importance: High → Critical
Ben Nemec (bnemec) wrote :

Closing per the last comment.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers