nova-compute is marked as down after management VIP migration

Bug #1452632 reported by Artem Panchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Roman Podoliaka
6.0.x
Invalid
High
Alexey Stupnikov
6.1.x
Fix Released
High
Roman Podoliaka
7.0.x
Fix Released
High
Roman Podoliaka

Bug Description

Fuel version info (6.1 build #386): http://paste.openstack.org/show/215959/

After deletion of management VIP on one of controllers it migrated to another one, but 'nova-compute' on one compute node was marked as down and OSTF test 'Check that required services are running' failed:

http://jenkins-product.srt.mirantis.net:8080/job/6.1.system_test.centos.thread_5/115/testReport/(root)/ha_nova_delete_vips/ha_nova_delete_vips/?

Here is the part of nova-scheduler logs and output of service-list command:

http://paste.openstack.org/show/215960/
http://paste.openstack.org/show/215962/

Seems nova-compute on node-4 was unable to re-connect to RabbitMQ after VIP migration:

http://paste.openstack.org/show/215974/

Steps to reproduce:

1. Deploy environment: CentOS, NovaDHCP, CinderLVM, 3 controllers, 2 computes
2. Delete 10 time public and management VIPs ('ip netns exec {0} ip addr del {1} dev {2}'.format(namespace, ip, interface)))
3. Wait while it is being restored
4. Verify it is restored
5. Run OSTF

Expected result:

- all health checks are green

Actual:

 - 'Check that required services are running' test failed

After restart of 'nova-compute' on node-4 it became 'up' in services list. Diagnostic snapshot is attached.

Tags: nova messaging
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Changed in fuel:
importance: Undecided → High
status: New → Confirmed
tags: added: messaging nova
no longer affects: fuel
summary: - Nova-compute is marked as down after management VIP migration
+ nova-compute is marked as down after management VIP migration
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

A known hard to reproduce issue. I once tried to push a workaround upstream - https://review.openstack.org/#/c/122471/1 , but it wasn't accepted.

I propose we merge it to 6.1 and debug this thoroughly from the oslo.messaging perspective in 7.0 - it looks like messaging may occasionally return None instead of raising a proper exception (maybe it has something to do with order of messages delivery and ending=True hacks in oslo.messaging).

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/6473

Revision history for this message
Viktor Serhieiev (vsergeyev) wrote :

As for ending=True hacks in oslo.messaging - in the community we have a blueprint [1] to remove them at all. But this breaks backward compatibility unfortunately :(

Anyway, it's would be useful to debug this thoroughly to find the root cause of this bug.

[1] https://blueprints.launchpad.net/oslo.messaging/+spec/remove-double-reply

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/6473
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 993a018acb0c4ec0c1fed9e23f5eb5b08c0894aa
Author: Roman Podoliaka <email address hidden>
Date: Thu May 7 14:13:07 2015

Fix a rare issue in service state reporting

Occasionally, conductor_api.service_update() will return None, which
will break all further state reports and the service will become
unavailable from nova-api/nova-conductor point of view. Should this
happen, just skip the current update and try again next time.

Closes-Bug: #1452632

Change-Id: I155b304d57a6babcf56f5984dc703e899fdf8648

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.0.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.0.1/2014.2
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/6489

tags: added: on-verification
Revision history for this message
Sergey Novikov (snovikov) wrote :

Verified on fuel-6.1-437-2015-05-19_10-05-51.iso.

Steps to verify:
    1. Deploy environment: CentOS, NovaDHCP, CinderLVM, 3 controllers, 2 computes
    2. Delete 10 time public and management VIPs ('ip netns exec {0} ip addr del {1} dev {2}'.format(namespace, ip, interface))
    3. Wait while it is being restored
    4. Verify it is restored
    5. Run OSTF

tags: removed: on-verification
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/8262

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/8262
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: dc05635667712f3c8f2a00099c15a1597bf05516
Author: Roman Podoliaka <email address hidden>
Date: Wed Jul 15 13:57:21 2015

Fix a rare issue in service state reporting

Occasionally, conductor_api.service_update() will return None, which
will break all further state reports and the service will become
unavailable from nova-api/nova-conductor point of view. Should this
happen, just skip the current update and try again next time.

Closes-Bug: #1452632

Change-Id: I155b304d57a6babcf56f5984dc703e899fdf8648

tags: added: on-verification
Revision history for this message
Alexander Arzhanov (aarzhanov) wrote :

Verified on ISO #286:

api: '1.0'
astute_sha: 8283dc2932c24caab852ae9de15f94605cc350c6
auth_required: true
build_id: '286'
build_number: '286'
feature_groups:
- mirantis
fuel-agent_sha: 082a47bf014002e515001be05f99040437281a2d
fuel-library_sha: ff63a0bbc93a3a0fb78215c2fd0c77add8dfe589
fuel-nailgun-agent_sha: d7027952870a35db8dc52f185bb1158cdd3d1ebd
fuel-ostf_sha: 1f08e6e71021179b9881a824d9c999957fcc7045
fuelmain_sha: 9ab01caf960013dc882825dc9b0e11ccf0b81cb0
nailgun_sha: 5c33995a2e6d9b1b8cdddfa2630689da5084506f
openstack_version: 2015.1.0-7.0
production: docker
python-fuelclient_sha: 1ce8ecd8beb640f2f62f73435f4e18d1469979ac
release: '7.0'
release_versions:
  2015.1.0-7.0:
    VERSION:
      api: '1.0'
      astute_sha: 8283dc2932c24caab852ae9de15f94605cc350c6
      build_id: '286'
      build_number: '286'
      feature_groups:
      - mirantis
      fuel-agent_sha: 082a47bf014002e515001be05f99040437281a2d
      fuel-library_sha: ff63a0bbc93a3a0fb78215c2fd0c77add8dfe589
      fuel-nailgun-agent_sha: d7027952870a35db8dc52f185bb1158cdd3d1ebd
      fuel-ostf_sha: 1f08e6e71021179b9881a824d9c999957fcc7045
      fuelmain_sha: 9ab01caf960013dc882825dc9b0e11ccf0b81cb0
      nailgun_sha: 5c33995a2e6d9b1b8cdddfa2630689da5084506f
      openstack_version: 2015.1.0-7.0
      production: docker
      python-fuelclient_sha: 1ce8ecd8beb640f2f62f73435f4e18d1469979ac
      release: '7.0'

tags: removed: on-verification
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-6.0.1/2014.2)

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-6.0.1/2014.2
Review: https://review.fuel-infra.org/6489

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/13282

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change restored on openstack/nova (openstack-ci/fuel-6.0.1/2014.2)

Change restored by Alexey Stupnikov <email address hidden> on branch: openstack-ci/fuel-6.0.1/2014.2
Review: https://review.fuel-infra.org/6489

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-6.0.1/2014.2)

Change abandoned by Alexey Stupnikov <email address hidden> on branch: openstack-ci/fuel-6.0.1/2014.2
Review: https://review.fuel-infra.org/6489

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I have set invalid status for MOS 6.0 since I couldn't reproduce this bug after 20 retries and A. Panchenko said that he haven't checked this issue for MOS 6.0.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-8.0/liberty)

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/13282
Reason: The issue is not reproduced on MOS 8.0 - this ugly workaround is not needed anymore.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.