Deploy of HA cluster with Cinder, Neutron and network template fails

Bug #1605845 reported by ElenaRossokhina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Oleksiy Molchanov
Mitaka
Invalid
High
Oleksiy Molchanov
Newton
Invalid
High
Oleksiy Molchanov

Bug Description

Detailed bug description:
Found on CI: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.network_templates/3/testReport/(root)/network_config_consistency_on_reboot/
9.1 snapshot #36
Steps to reproduce:
Deploy HA environment with Cinder, Neutron and network template
        Scenario:
            1. Revert snapshot with 5 slaves
            2. Create cluster (HA) with Neutron VLAN
            3. Add 3 controller and 1 compute + cinder nodes
            4. Upload 'default_ovs' network template
            5. Create custom network groups basing
               on template endpoints assignments
            6. Run network verification
            7. Deploy cluster and run basic health checks
            8. Run network verification
            9. Check L3 network configuration on slaves
            10. Check that services are listening on their networks only
            11. Reboot a node
            12. Run network verification
            13. Check L3 network configuration on slaves
            14. Check that services are listening on their networks only
            15. Run OSTF
Expected results:
Cluster is deployed, all checks passed
Actual result:
Deployment has failed. All nodes are finished. Failed tasks: Task[openstack-haproxy-aodh/1], Task[netconfig/3], Task[openstack-haproxy-keystone/2], Task[openstack-haproxy-keystone/5] Stopping the deployment process!
 id 5
 uuid 7977839c-cb0c-4e86-bb76-67e7973ef5bd
2016-07-22 06:47:16,697 - ERROR __init__.py:66 -- assert_task_success raised: AssertionError("Task 'deploy' has incorrect status. error != ready, 'Deployment has failed. All nodes are finished. Failed tasks: Task[openstack-haproxy-aodh/1], Task[netconfig/3], Task[openstack-haproxy-keystone/2], Task[openstack-haproxy-keystone/5] Stopping the deployment process!'",)
Traceback: Traceback (most recent call last):
  File "/home/jenkins/workspace/9.x.system_test.ubuntu.network_templates/fuelweb_test/__init__.py", line 59, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.x.system_test.ubuntu.network_templates/fuelweb_test/models/fuel_web_client.py", line 327, in assert_task_success
    task["name"], task['status'], 'ready', _message(task)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/asserts.py", line 55, in assert_equal
    raise ASSERTION_ERROR(message)
AssertionError: Task 'deploy' has incorrect status. error != ready, 'Deployment has failed. All nodes are finished. Failed tasks: Task[openstack-haproxy-aodh/1], Task[netconfig/3], Task[openstack-haproxy-keystone/2], Task[openstack-haproxy-keystone/5] Stopping the deployment process!'

Diagnoctic snapsot is available https://drive.google.com/open?id=0B2ag_Bf-ShtTWmtrbG13SDJybUU

Changed in fuel:
assignee: nobody → l23network (l23network)
milestone: none → 9.1
Changed in fuel:
assignee: l23network (l23network) → Fuel Sustaining (fuel-sustaining-team)
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Alex Schultz (alex-schultz) wrote :

Tasks marked as failed because:

2016-07-22 06:47:09 WARNING [30060] Puppet agent 3 didn't respond within the allotted time
2016-07-22 06:47:09 WARNING [30060] Puppet agent 2 didn't respond within the allotted time
2016-07-22 06:47:09 WARNING [30060] Puppet agent 5 didn't respond within the allotted time
2016-07-22 06:47:09 WARNING [30060] Puppet agent 1 didn't respond within the allotted time

tags: added: swarm-fail
tags: added: area-library
Revision history for this message
Alex Schultz (alex-schultz) wrote :

I took a look at some of the newer failures and those also seem to be related to some sort of resource contention as tasks are getting marked as in error but no actual errors are showing up in the puppet logs. For example, the glance-keystone task was marked in error (puppet claims it was completed in ~26 seconds) and then it was restarted. The latest run was successfuly so this one might have been a CI related issue as the first couple of runs did not have the proper package updates being applied. See https://review.fuel-infra.org/#/c/23680/

Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Oleksiy Molchanov (omolchanov)
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Actually the node-3 went offline during deployment, that was the root cause. We need env to investigate, but the last 2 test runs are green. So marking this as Incomplete, please reopen as soon as fail again.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Roman Prykhodchenko (romcheg) wrote :

Marking as invalid since there was no update for this incomplete bug for longer that one month.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.