OpenStack Compute (nova)

Failed to establish authenticated ssh connection to cirros - error: [Errno 113] No route to host

Bug #1370496 reported by Matt Riedemann on 2014-09-17

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Undecided	Augustina Ragwitz
	neutron	Expired	Undecided	Unassigned

Bug Description

Saw this in the gate today, I think it's separate from bug 1349617, at least the error is different and the hits in logstash are different. The root cause might be the same.

This is in both nova-network and neutron jobs.

http://logs.openstack.org/01/116001/3/gate/gate-grenade-dsvm-partial-ncpu/3dce5d7/logs/tempest.txt.gz#_2014-09-16_22_32_14_888

2014-09-16 22:32:14.888 6636 ERROR tempest.common.ssh [-] Failed to establish authenticated ssh connection to cirros@172.24.4.4 after 15 attempts
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh Traceback (most recent call last):
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh File "tempest/common/ssh.py", line 76, in _get_ssh_connection
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh timeout=self.channel_timeout, pkey=self.pkey)
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh File "/usr/local/lib/python2.7/dist-packages/paramiko/client.py", line 236, in connect
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh retry_on_signal(lambda: sock.connect(addr))
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh File "/usr/local/lib/python2.7/dist-packages/paramiko/util.py", line 278, in retry_on_signal
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh return function()
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh File "/usr/local/lib/python2.7/dist-packages/paramiko/client.py", line 236, in <lambda>
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh retry_on_signal(lambda: sock.connect(addr))
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh File "/usr/lib/python2.7/socket.py", line 224, in meth
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh return getattr(self._sock,name)(*args)
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh error: [Errno 113] No route to host
2014-09-16 22:32:14.888 6636 TRACE tempest.common.ssh

message:"_get_ssh_connection" AND message:"error: [Errno 113] No route to host" AND tags:"tempest.txt"

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiX2dldF9zc2hfY29ubmVjdGlvblwiIEFORCBtZXNzYWdlOlwiZXJyb3I6IFtFcnJubyAxMTNdIE5vIHJvdXRlIHRvIGhvc3RcIiBBTkQgdGFnczpcInRlbXBlc3QudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImN1c3RvbSIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJmcm9tIjoiMjAxNC0wOS0xMFQxMjo0ODowMyswMDowMCIsInRvIjoiMjAxNC0wOS0xN1QxMjo0ODowMyswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sInN0YW1wIjoxNDEwOTU4MTU3MTY3LCJtb2RlIjoiIiwiYW5hbHl6ZV9maWVsZCI6IiJ9

63 hits in 10 days, check and gate, all failures.

Tags:

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-09-17:

e-r query: https://review.openstack.org/#/c/122140/

Revision history for this message

Sean Dague (sdague) wrote on 2014-09-17:

This needs deeper debug to be useful to address

Changed in nova:
status:	New → Incomplete

Revision history for this message

Salvatore Orlando (salvatore-orlando) wrote on 2014-09-18:

The console log is garbled as it's all printed in one line (it's printed nicely for other scenario tests, but I guess this depends on OfficialClientTest vs ScenarioTest base class, so it's probably not something worth addressing as the former is being transition to the latter).

Anyway, the VM never boots correctly - at least in all the failure instances I've looked at.
These are the last lines of a console log:

Freeing unused kernel memory: 1188k freed
info: initramfs: up at 2.44

The failure has been observed at both the 2nd and 3rd step, which means it does not depend on the snapshot process, but more on boot from volume. I have not seen the failure on the 1st step, but this does not mean automatically that it won't happen there too.

This could be just a red herring, since this message is very common: http://logs.openstack.org/89/119589/24/check/check-tempest-dsvm-neutron-full/159af17/logs/screen-n-cpu.txt.gz#_2014-09-10_21_43_18_577

It seems however the instance was stopped early, and this could justify the failed connection.

It sounds like the volume does not get correctly attached to the instance, however no anomaly is reported by cinder logs:
http://logs.openstack.org/89/119589/24/check/check-tempest-dsvm-neutron-full/159af17/logs/screen-c-vol.txt.gz#_2014-09-10_21_42_44_822

Finally, libvirt logs appears to report no errors. perhaps something could be found in qemu logs, which need to be searched for nova's vm uuid.

Revision history for this message

Sean Dague (sdague) wrote on 2014-10-02:

Currently one of the top gate hits

Changed in nova:
importance:	High → Critical

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2014-10-03:

Changing to incomplete for neutron per Salvatore's analysis

Changed in neutron:
status:	New → Incomplete

Salvatore Orlando (salvatore-orlando) on 2014-10-03

Changed in neutron:
assignee:	nobody → Salvatore Orlando (salvatore-orlando)

Salvatore Orlando (salvatore-orlando) on 2014-10-07

tags:

added: juno-rc-potential

Revision history for this message

Salvatore Orlando (salvatore-orlando) wrote on 2014-10-08:

Unfortunately I can't find any hint about the root of this bug, so I have to spend more time trying to reproduce it locally.
To me it seems like the VM is not able to boot from volume in some circumstances - it could be something strictly related to cirros.

Thierry Carrez (ttx) on 2014-10-16

tags:

added: juno-backport-potential
removed: juno-rc-potential

Salvatore Orlando (salvatore-orlando) on 2015-01-08

Changed in neutron:
assignee:	Salvatore Orlando (salvatore-orlando) → nobody

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-01-15:

I'm seeing a failure that elastic-recheck thinks maps to this bug: http://logs.openstack.org/14/146914/1/check/check-tempest-dsvm-neutron-full/e1fee15/, for the very small Nova code change at https://review.openstack.org/#/c/146914/. If that's correct, it's a bit disheartening that this bug hasn't seen any change since last October.

My interpretation of the logs at http://logs.openstack.org/14/146914/1/check/check-tempest-dsvm-neutron-full/e1fee15/ is that the problem is that the instance isn't getting its IPv4 address; the instance's console output includes:

Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending discover...
Sending discover...
No lease, failing
[...]
=== network info ===
if-info: lo,up,127.0.0.1,8,::1
if-info: eth0,up,,8,fe80::f816:3eff:fefb:eaa7

So, I wonder if anyone can throw light on two questions:

1. Assuming that DHCP _should_ be working in the failing test, and that the instance _should_ be getting its IPv4 address from DHCP, can anyone see why that is now not working?

2. Is elastic-recheck correct to map the test failure that I'm seeing to this bug?

Revision history for this message

Robert Collins (lifeless) wrote on 2015-01-28:

@Neil Boot from volume has worked if the instance console got that far. Failing to ssh in is a fairly blanket failure - lots of different actual causes end up with the same symptom

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-03-29:

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status:	Incomplete → Expired

John Garbutt (johngarbutt) on 2015-04-09

Changed in nova:
importance:	Critical → Undecided

Alan Pevec (apevec) on 2015-11-24

tags:

removed: juno-backport-potential

Revision history for this message

Augustina Ragwitz (auggy) wrote on 2016-03-08:

#10

This may have been fixed by: https://review.openstack.org/#/c/273042/

I checked the above logstash query and got zero results for the past 30 days.

Closing this as Fix Released. If this issue pops back up, please open a new bug.

Changed in nova:
status:	Incomplete → Fix Released
assignee:	nobody → Augustina Ragwitz (auggy)

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1408493

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.