SSH Auth fails in AdvancedNetworkOps scenario

Bug #1360011 reported by Salvatore Orlando
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned
neutron
New
High
Salvatore Orlando

Bug Description

Affects all neutron full jobs and check-grenade-dsvm-partial-ncpu
The latter runs nova network.

In the past 7 days:
105 hits (12 in gate)
grenade: 30
neutron-standard: 1
neutron-full: 74

in the past 36 hours:
72 hits (8 in gate)
grenade: 0
neutron-standard: 1
neutron-full: 71

Something apparently has fixed the issue in the grenade test but screwed the neutron tests.

Logstash query (from console, as there is no clue in logs) available at [1]

The issue manifests as a failure to authenticate to the server (SSH server responds).
then paramiko starts returning errors like [2], until the timeout expires

[1] http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiVFJBQ0VcIiBBTkQgbWVzc2FnZTpcIlNTSEV4Y2VwdGlvbjogRXJyb3IgcmVhZGluZyBTU0ggcHJvdG9jb2wgYmFubmVyW0Vycm5vIDEwNF0gQ29ubmVjdGlvbiByZXNldCBieSBwZWVyXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImN1c3RvbSIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJmcm9tIjoiMjAxNC0wOC0yMFQxMTo1NDoyMCswMDowMCIsInRvIjoiMjAxNC0wOC0yMVQyMzo1NDoyMCswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sInN0YW1wIjoxNDA4NjY1MjkzODA2LCJtb2RlIjoiIiwiYW5hbHl6ZV9maWVsZCI6IiJ9
[2] http://logs.openstack.org/10/98010/5/gate/gate-tempest-dsvm-neutron-full/aca3f89/console.html#_2014-08-21_08_36_14_931

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

this bug is the same as https://bugs.launchpad.net/neutron/+bug/1265495 which has been closed because it disappeared from the gate but we never had a confirmation that something fixed it.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

This is the only commit which might have something to do with the failure imho: http://git.openstack.org/cgit/openstack/nova/commit/?id=41f6e4afc91a2454940abff947bf07973f229ea8

Revision history for this message
Matt Riedemann (mriedem) wrote :

This sounds like a duplicate of bug 1349617.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

I can't say where it is a duplicate - the other bug discusses little beyond the manifestation as ssh protocol banner error.
For this bug I am investigating I have a 100% correlation with resize events. I'm not sure about the other, but I would not be surprised if the root causes are different.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

This failure occurs only in two tests:

test_server_connectivity_resize - after going to ACTIVE from VERIFY_RESIZE
test_server_connectivity_start_stop - after going to ACTIVE from SHUTOFF

The mysql job has 2.5 times more failures the postgres job. This is probably not down to the DB backend, but to the fact that postgres jobs do not use config drive.

Recent changes in shutoff process might be the cause of this failure. Adding therefore nova to affected projects.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.