Several tempest tests are failed because can not connection via SSH timed out

Bug #1651077 reported by Sofiia Andriichenko
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Sofiia Andriichenko

Bug Description

Configuration:
    ISO: 9.2 snapshot #638
Settings:
Compute - QEMU.
Network - Neutron with VLAN segmentation.
Storage Backends - LVM
Additional services - Install Ironic, Install Sahara

In tab Settings->Compute check Nova quotas
In tab Settings->OpenStack Services check enable Install Ceilometer and Aodh
In tab Networks->Other check enable Neutron DVR

Nodes: controller, compute, ironic,cinder, Telemetry - MongoDB

Trace test_server_connectivity_pause_unpause:
http://paste.openstack.org/show/592766/

Trace test_snapshot_pattern:
http://paste.openstack.org/show/592765/

Failed tests:
test_server_connectivity_pause_unpause[compute,id-2b2642db-6568-4b35-b812-eceed3fa20ce,network]
test_server_connectivity_resize[compute,id-719eb59d-2f42-4b66-b8b1-bb1254473967,network]
test_server_connectivity_stop_start[compute,id-61f1aa9a-1573-410e-9054-afa557cab021,network]
test_mtu_sized_frames[compute,id-b158ea55-472e-4086-8fa9-c64ac0c6c1d0,network]
test_network_basic_ops[compute,id-f323b3ba-82f8-4db7-8ea6-6a895869ec49,network,smoke]
test_port_security_macspoofing_port[compute,id-7c0bb1a2-d053-49a4-98f9-ca1a1d849f63,network]
test_subnet_details[compute,id-d8bb918e-e2df-48b2-97cd-b73c95450980,network]
test_multi_prefix_slaac[compute,id-dec222b1-180c-4098-b8c5-cc1b8342d611,network,slow]
test_slaac_from_os[compute,id-2c92df61-29f0-4eaa-bee3-7c65bef62a43,network,slow]
test_cross_tenant_traffic[compute,id-e79f879e-debb-440c-a7e4-efeda05b6848,network]
test_port_update_new_security_group[compute,id-f4d556d7-1526-42ad-bafb-6bebf48568f6,network]
test_shelve_volume_backed_instance[compute,id-c1b6318c-b9da-490b-9c67-9339b627271f,image,network,volume]
test_volume_boot_pattern[compute,id-557cd2c2-4eb8-4dce-98be-f86765ff311b,image,smoke,volume]
test_volume_boot_pattern[compute,id-557cd2c2-4eb8-4dce-98be-f86765ff311b,image,smoke,volume]
test_port_security_disable_security_group[compute,id-7c811dcc-263b-49a3-92d2-1b4d8405f50c,network]
test_server_basic_ops[compute,id-7fff3fb3-91d8-4fd0-bd7d-0204f1f180ba,network,smoke]
test_shelve_instance[compute,id-1164e700-0af0-4a4c-8792-35909a88743c,image,network]
test_snapshot_pattern[compute,id-608e604b-1d63-4a82-8e3e-91bc665c90b4,image,network]

snapshot: https://drive.google.com/a/mirantis.com/file/d/0BxPLDs6wcpbDajk2NXhxdkl1OEk/view?usp=sharing

Tags: area-neutron
Changed in mos:
milestone: none → 9.2
Changed in mos:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → MOS Neutron (mos-neutron)
tags: added: area-neutron
Revision history for this message
Kevin Benton (kevinbenton) wrote :

I examined the floating IP status update failure in http://paste.openstack.org/show/592766/ . The floating IP was assigned to a port on node-3 at 2016-12-16 00:40:38. However, in the snapshot, node-3 doesn't seem to have any logs during that timeframe for the l3 agent.

Also, you can see these log entries in the server log indicating that the l3 agent was dead:

<164>Dec 16 00:40:08 node-1 neutron-server: 2016-12-16 00:40:08.962 1615 WARNING neutron.db.agents_db [req-b4c998a5-e260-43e7-bfa0-c3fdcde9a310 - - - - -] Agent healthcheck: found 1 dead agents out of 18:
                Type Last heartbeat host
            L3 agent 2016-12-16 00:32:30 node-3.test.domain.local

So, it seems the L3 agent wasn't running on that node.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Indeed, l3 agent on compute node-3 was stopped at 00:32 and was dead till the end of the tests - this caused all VMs on that node to loose l3 connectivity and hence, some tests failed. Need to find the reason for l3 agent stoppage.

How often is it reproduced? Can we get access to the env the issue?

Changed in mos:
status: Confirmed → Incomplete
assignee: MOS Neutron (mos-neutron) → Sofiia Andriichenko (sandriichenko)
Revision history for this message
Sofiia Andriichenko (sandriichenko) wrote :

Can not reproduce manual and on CI

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.