[nailgun] Node not returned to online state after graceful shutdown and power on back

Bug #1512370 reported by Vladimir Khlyunev
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Stanislav Makar

Bug Description

ISO #105 Liberty

System test failed - https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.thread_3/33/console

Tests steps:
1. Create cluster in ha mode with 1 controller
2. Add 1 node with controller and ceph OSD roles
3. Add 1 node with compute role
4. Add 2 nodes with cinder and ceph OSD roles
5. Deploy the cluster
6. Warm restart (run '/sbin/shutdown -Ph now' on the slave, wait until node will be marked as offline, power on vm) < failed here
7. Check ceph status

The slave still marked as offline after 15 min (ubuntu was booted, available via VNC and via ssh)

Snapshot https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.thread_3/33/artifact/logs/fail_error_ceph_ha_one_controller_with_cinder_restart-fuel-snapshot-2015-11-02_01-32-18.tar.xz

description: updated
description: updated
Dmitry Klenov (dklenov)
Changed in fuel:
status: New → Confirmed
assignee: Fuel Python Team (fuel-python) → Arthur Svechnikov (asvechnikov)
Revision history for this message
Arthur Svechnikov (asvechnikov) wrote :

Seems there some problem on library side. Nodes are offline because of all bridge interfaces excluding br-storage are down. Deployment network information provider is set to lnx[0], but in nodes' network scripts (for br-fw-admin, br-mgmt, etc..) have ovs type provider.

[0] piece of astute.log https://paste.mirantis.net/show/1364/

Changed in fuel:
assignee: Arthur Svechnikov (asvechnikov) → slava valyavskiy (slava-val-al)
Revision history for this message
slava valyavskiy (slava-val-al) wrote :

Test has been passed successfully with the same iso on the other slave (cz7130.bud.mirantis.net):
http://jenkins-product.srt.mirantis.net:8080/view/cgmos_16/job/cgmos_7.0.custom_system_test/125/

Changed in fuel:
assignee: slava valyavskiy (slava-val-al) → Fuel QA Team (fuel-qa)
Dmitry Pyzhov (dpyzhov)
tags: added: area-qa
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

looks like some floating issue, according to the same problem was happens on bvt like here https://product-ci.infra.mirantis.net/job/8.0.ubuntu.smoke_neutron/118/testReport/junit/%28root%29/deploy_neutron_tun/deploy_neutron_tun/, but in logs we can see that some nodes are marked as offline , after revert for this logs we can see next data into nailgun-agent(see attached image) at the same time all work fine on controller

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel Python Team (fuel-python)
tags: added: area-python
removed: area-qa
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

will update snapshot soon

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Alexander Kislitsky (akislitsky)
Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

After reboot we have no br-fw-admin network iface.
Seems that bridge configuration is wrong. In /etc/network/interfaces.d/ifcfg-br-fw-admin:

ovs_type: OVSIntPort

is used for for bridge.

I guess we should use:

ovs_type: OVSBridge

Changed in fuel:
assignee: Alexander Kislitsky (akislitsky) → Fuel Library Team (fuel-library)
tags: added: area-library
removed: area-python
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Matthew Mosesohn (raytrac3r)
Dmitry Pyzhov (dpyzhov)
tags: added: regression-8.0
Stanislav Makar (smakar)
Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → Stanislav Makar (smakar)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.