[ostf] Can't boot instance after Compute reboot

Bug #1606551 reported by Andrey Lavrentyev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel QA Team
Mitaka
Invalid
High
Fuel QA Team

Bug Description

Detailed bug description:
Can't boot instance after Compute reboot.
OSTF test failure: Create volume and boot instance from it (failure)

Details: http://paste.openstack.org/show/542046/

Swarm failures: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_dpdk/7/testReport/%28root%29/deploy_cluster_with_dpdk/
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_dpdk/4/testReport/(root)/deploy_cluster_with_dpdk/

Similar and probably related issues with similar traits:
- https://bugs.launchpad.net/fuel/+bug/1575091
- https://bugs.launchpad.net/fuel/+bug/1575853
- https://bugs.launchpad.net/fuel/+bug/1580680

Steps to reproduce:
1. Create new environment with VLAN segmentation for Neutron
2. Set KVM as Hypervisor
3. Add controller and compute nodes
4. Configure HugePages for compute nodes
5. Configure private network in DPDK mode
6. Run network verification
7. Deploy environment
8. Run network verification
9. Run OSTF
10. Reboot compute
11. Run OSTF
12. Run instance on compute with DPDK and check its availability via floating IP

Expected results:
All OSTF tests are passed

Actual result:
OSFT 'Create volume and boot instance from it' test failed

Description of the environment:
9.1 snapshot #52
[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 495
cat /etc/fuel_build_number:
 495
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 fuel-misc-9.0.0-1.mos8460.noarch
 python-packetary-9.0.0-1.mos140.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 fuel-migrate-9.0.0-1.mos8460.noarch
 fuel_plugin_example_v4_hotpluggable-4.0-4.0.0-1.noarch
 rubygem-astute-9.0.0-1.mos750.noarch
 fuel-mirror-9.0.0-1.mos140.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-openstack-metadata-9.0.0-1.mos8743.noarch
 fuel-notify-9.0.0-1.mos8460.noarch
 nailgun-mcagents-9.0.0-1.mos750.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-utils-9.0.0-1.mos8460.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8743.noarch
 fuel-library9.0-9.0.0-1.mos8460.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 fuel-ostf-9.0.0-1.mos936.noarch
 fuelmenu-9.0.0-1.mos274.noarch
 fuel-nailgun-9.0.0-1.mos8743.noarch

FUEL_QA_COMMIT: dc43af1a4143da75e695cf0f7612f23845dee58c
MOS_CENTOS_OS_MIRROR_ID: os-2016-06-23-135731
MOS_CENTOS_PROPOSED_MIRROR_ID: proposed-2016-07-20-120320
MOS_CENTOS_UPDATES_MIRROR_ID: updates-2016-06-23-135916
MOS_CENTOS_SECURITY_MIRROR_ID: security-2016-06-23-140002
MOS_CENTOS_HOLDBACK_MIRROR_ID: holdback-2016-06-23-140047
MOS_CENTOS_HOTFIX_MIRROR_ID: hotfix-2016-07-18-162958
UBUNTU_MIRROR_ID: ubuntu-2016-07-24-170707
CENTOS_MIRROR_ID: centos-7.2.1511-2016-05-31-083834

Saved logs: https://drive.google.com/open?id=0B5HPBFb7K7gXUkg5aV95dWJxNkU

Tags: swarm-fail
summary: - [ostf] Can't boot instance after Nova reboot
+ [ostf] Can't boot instance after Compute reboot
description: updated
Changed in fuel:
assignee: nobody → Fuel QA Team (fuel-qa)
milestone: none → 9.1
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

The issue is in prod, because in nova logs you can find trace from nova:
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 150, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 104, in select_destinations
    dests = self.driver.select_destinations(ctxt, spec_obj)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 74, in select_destinations
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel Sustaining (fuel-sustaining-team)
Changed in fuel:
importance: Undecided → High
tags: added: swarm-fail
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 9.1 → 10.0
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → MOS Nova (mos-nova)
status: New → Confirmed
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

I've checked the logs and see that scheduling failed because there were no compute nodes available:

http://paste.openstack.org/show/549211/

^ ComputeFilter failed, which basically means we haven't heard from the compute node (we only have one in this environment) for a while. Are you sure you are giving the node enough time to return back online after reboot?

It looks like you do not:

http://paste.openstack.org/show/549213/

^ the nova-compute process just started when the scheduling request failed.

IMO, we should change the test case to wait until the corresponding nova-compute process is marked `up` again in the `nova service-list` output after reboot of the node.

Changed in fuel:
assignee: MOS Nova (mos-nova) → Andrey Lavrentyev (alavrentyev)
status: Confirmed → Incomplete
Changed in fuel:
assignee: Andrey Lavrentyev (alavrentyev) → Fuel QA Team (fuel-qa)
Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.