'nova-compute' service is down right after deployment

Bug #1421114 reported by Artem Panchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Medium
Fuel Library (Deprecated)

Bug Description

Fuel 6.1 iso # 114, version info: http://paste.openstack.org/show/171864/

Sometimes Health check 'Check that required services are running' fails, because 'nova-compute' state is down. Here is an example of OSTF logs:

http://paste.openstack.org/show/171875/

Failed system test on CI:

http://jenkins-product.srt.mirantis.net:8080/job/6.1.system_test.centos.bonding_ha_one_controller/10/testReport/(root)/deploy_bonding_balance_slb/deploy_bonding_balance_slb/?

Steps to reproduce:

1. Create new cluster (CentOS, HA, NeutronVlan)
2. Add 1 controller and 1 compute nodes
3. Combine 4 network interfaces to bond (balance-slb) and assign all networks (except 'admin/pxe') to it. All assigned networks (except 'public') should have vlan tag.
4. Deploy changes.
5. Run OSTF right after the deployment is completed.

Expected result:

 - all checks are passed

Actual result:

 - 'Check that required services are running' failed

After some time (in my case after I reverted devops environment from snapshot) health check begins to pass. Diagnostic snapshot is attached.

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The OSTF test behavior looks correct, the issue with 'flipping' status should be investigated. Could be related to Oslo.messaging heartbeats issue.

Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
assignee: Fuel Python Team (fuel-python) → Fuel Library Team (fuel-library)
status: New → Confirmed
summary: - OSTF reports that 'nova-compute' service is down right after deployment
+ 'nova-compute' service is down right after deployment
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

@Bogdan,

I missed it when investigated OSTF logs first time, but now I've found that time on compute node was out of sync:

Binary Host Zone Status State Updated_At
nova-consoleauth node-1.test.domain.local internal enabled :-) 2015-02-12 04:44:58
nova-scheduler node-1.test.domain.local internal enabled :-) 2015-02-12 04:44:50
nova-conductor node-1.test.domain.local internal enabled :-) 2015-02-12 04:44:56
nova-cert node-1.test.domain.local internal enabled :-) 2015-02-12 04:44:49
nova-compute node-2.test.domain.local nova enabled XXX 2015-02-12 04:43:58

It seems that time on slaves was not synchronized before deployment (on first boot after provisioning).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.