VIPs are in 'stopped' state after starting deployment

Bug #1427211 reported by Roman Podoliaka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Undecided
Fuel Library (Deprecated)

Bug Description

http://jenkins-product.srt.mirantis.net:8080/job/5.1.2.staging.ubuntu.bvt_2/64/console

Deployments fails with:

======================================================================
FAIL: Deploy cluster in HA mode with VLAN Manager
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/5.1.2.staging.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 52, in wrapper
    return func(*args, **kwagrs)
  File "/home/jenkins/workspace/5.1.2.staging.ubuntu.bvt_2/fuelweb_test/tests/test_ha.py", line 76, in deploy_ha_vlan
    self.fuel_web.deploy_cluster_wait(cluster_id)
  File "/home/jenkins/workspace/5.1.2.staging.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 209, in wrapper
    return func(*args, **kwargs)
  File "/home/jenkins/workspace/5.1.2.staging.ubuntu.bvt_2/fuelweb_test/models/fuel_web_client.py", line 406, in deploy_cluster_wait
    self.assert_task_success(task, interval=interval)
  File "/home/jenkins/workspace/5.1.2.staging.ubuntu.bvt_2/fuelweb_test/__init__.py", line 48, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/5.1.2.staging.ubuntu.bvt_2/fuelweb_test/models/fuel_web_client.py", line 239, in assert_task_success
    task['status'], 'ready', name=task["name"]
AssertionError: Task 'deploy' has incorrect status. error != ready

----------------------------------------------------------------------

[root@nailgun astute]# fuel node
id | status | name | cluster | ip | mac | roles | pending_roles | online
---|-------------|---------------------|---------|--------------|-------------------|------------|---------------|-------
4 | provisioned | slave-03_controller | 1 | 10.108.155.6 | 64:e1:3e:fe:82:09 | controller | | True
2 | provisioned | slave-01_controller | 1 | 10.108.155.4 | 64:f8:e9:69:03:f1 | controller | | True
3 | provisioned | slave-05_compute | 1 | 10.108.155.5 | 64:b7:f9:e0:9a:ce | compute | | True
1 | error | slave-02_controller | 1 | 10.108.155.3 | 64:eb:7f:47:c8:25 | controller | | True
5 | error | slave-04_compute | 1 | 10.108.155.7 | 64:f3:db:a8:d1:df | compute | | True

root@node-1:/var/log# crm resource list
 vip__management_old (ocf::mirantis:ns_IPaddr2): Stopped
 vip__public_old (ocf::mirantis:ns_IPaddr2): Stopped
 Clone Set: clone_ping_vip__public_old [ping_vip__public_old]
     Stopped: [ node-1 ]

puppet.log contains the following error:

Sat Feb 28 07:21:36 +0000 2015 /Stage[corosync_setup]/Osnailyfacter::Cluster_ha::Virtual_ips/Cluster::Virtual_ips[public_old]/Cluster::Virtual_ip[public_old]/Service[ping_vip__public_old]/ensure (err): change from stopped to running failed: execution expired

Tags: staging
Revision history for this message
slava valyavskiy (slava-val-al) wrote :

Do you have pingable public gateway?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Sorry, what the kicking deployment off means?

Changed in fuel:
assignee: Fuel Dev (fuel-dev) → Fuel Library Team (fuel-library)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

It's staging jobs for 5.1.2 on our regular jenkins slaves. At least after I revert the snapshot, public GWs are pingable.

summary: - VIPs are in 'stopped' state after kicking the deployment off
+ VIPs are in 'stopped' state after starting deployment
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The las message in puppet log is
2015-02-28T07:13:34.692262+00:00 debug: (Service[vip__management_old](provider=pacemaker)) Not starting to wait for the service to start. Simple resource is started elsewhere.

Changed in fuel:
status: New → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Note, that at the previous failed BVT #63, there is the same issue with sudden interrupt in puppet logs:
"2015-03-01T07:08:45.532647+00:00 notice: (/Stage[corosync_setup]/Osnailyfacter::Cluster_ha::Virtual_ips/Cluster::Virtual_ips[public_old]/Cluster::Virtual_ip[public_old]/Cs_resource[vip__public_old]/ensure) created"

That should be the some devops environment related issue, perhaps

The stopped state of resources after the failed deployment is no issue, it is ensured by astute orchestrator - there are 3 controller nodes in cluster, but no quorum in Corosync, so no-quorum-policy=stopped stops all resources as expected.

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.