Mirantis OpenStack

Create volume and boot instance from it failed on step server deletion

Bug #1532163 reported by Tatyanka on 2016-01-08

This bug report is a duplicate of: Bug #1560097: Failed to uplink subnet to router after destroying one controller. Edit Remove

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Mirantis OpenStack	In Progress	High	Dmitry Mescheryakov	Mirantis OpenStack 8.0-updates
8.0.x	In Progress	High	Dmitry Mescheryakov	Mirantis OpenStack 8.0-updates
9.x	In Progress	High	Dmitry Mescheryakov	Mirantis OpenStack 9.0

Bug Description

Destroy two controllers and check pacemaker status is correct

Scenario:
1. Destroy first controller
2. Check pacemaker status
3. Run OSTF
4. Revert environment
5. Destroy second controller
6. Check pacemaker status
7. Run OSTF

Actual Result:
OStf failed on step 7:
Create volume and boot instance from it (failure)
Instance do not become active, so deletion starts, and failed by timeout
fuel_health.test: DEBUG: Waiting for <Server: ost1_test-boot-volume-instance1099375625> to get to ACTIVE status. Currently in build status
fuel_health.test: DEBUG: Sleeping for 10 seconds
fuel_health.common.test_mixins: INFO: STEP:5, verify action: 'server deletion'
fuel_health.nmanager: DEBUG: Deleting server.
fuel_health.test: DEBUG: Sleeping for 10 seconds
fuel_health.test: DEBUG: Sleeping for 10 seconds
fuel_health.common.test_mixins: INFO: Timeout 30s exceeded for server deletion
fuel_health.common.test_mixins: DEBUG: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/fuel_health/common/test_mixins.py", line 177, in verify
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/fuel_health/common/test_mixins.py", line 223, in __exit__
    raise AssertionError(msg)
AssertionError: Time limit exceeded while waiting for server deletion to finish.

So looks like create instace after destructive actions take a liitle bit more time, so may we need to increase timeout for instance creation

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "408"
  build_id: "408"
  fuel-nailgun_sha: "9ebbaa0473effafa5adee40270da96acf9c7d58a"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "df16d41cd7a9445cf82ad9fd8f0d53824711fcd8"
  fuel-nailgun-agent_sha: "92ebd5ade6fab60897761bfa084aefc320bff246"
  astute_sha: "c7ca63a49216744e0bfdfff5cb527556aad2e2a5"
  fuel-library_sha: "7ef751bdc0e4601310e85b8bf713a62ed4aee305"
  fuel-ostf_sha: "214e794835acc7aa0c1c5de936e93696a90bb57a"
  fuel-mirror_sha: "8bb8c70efc61bcf633e02d6054dbf5ec8dcf6699"
  fuelmenu_sha: "2a0def56276f0fc30fd949605eeefc43e5d7cc6c"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "62573cb2a8aa54845de9303b4a30935a90e1db61"

Tags:

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2016-01-08:

fail_error_ha_neutron_destroy_controllers-fuel-snapshot-2016-01-08_07-18-09.tar.xz Edit (73.5 MiB, application/octet-stream)

Dmitry Tyzhnenko (dtyzhnenko) on 2016-01-08

Changed in fuel:
status:	New → Confirmed

Egor Kotko (ykotko) on 2016-01-12

Changed in fuel:
assignee:	Fuel QA Team (fuel-qa) → Egor Kotko (ykotko)

Revision history for this message

Egor Kotko (ykotko) wrote on 2016-01-13:

Seems the root of issue not only in time-outs. We are waiting for dictionary as response, but got the error message:
ERROR: Gateway Time-out (HTTP 504)

Steps to reproduce:
1). Deploy cluster: 3Controllers, 2Computes, 1Cinder
2). Destroy (Force Shutoff) one controller
3). Create cinder volume

Expected:
Volume will be created in ~2min
Response will be like(in cli):
http://paste.openstack.org/show/483768/
in rest api - dictionary

Actual:
Response: error message:
http://paste.openstack.org/show/483769/
but the volume was created.
Approximate time of volume creation ~5 min

https://drive.google.com/file/d/0BzWDM1PONYEub0FpdXlMdzQ3S1U/view?usp=sharing

Changed in fuel:
assignee:	Egor Kotko (ykotko) → Fuel Library Team (fuel-library)
assignee:	Fuel Library Team (fuel-library) → Fuel Python Team (fuel-python)

Revision history for this message

Dmitry Pyzhov (dpyzhov) wrote on 2016-01-18:

This is incorrect behaviour of cinder. Moving to mos-cinder team.

Changed in fuel:
assignee:	Fuel Python Team (fuel-python) → MOS Cinder (mos-cinder)

Roman Podoliaka (rpodolyaka) on 2016-01-19

tags:

added: area-cinder

Revision history for this message

Yuriy Nesenenko (ynesenenko) wrote on 2016-01-19:

Looks like the used env is slow according to the appropriate time of volume creation (the above mentioned ~2-5min). So I don't think that it is incorrect behaviour of cinder.

Yuriy Nesenenko (ynesenenko) on 2016-01-19

Changed in fuel:
status:	Confirmed → Incomplete
assignee:	MOS Cinder (mos-cinder) → nobody

Yuriy Nesenenko (ynesenenko) on 2016-01-19

Changed in fuel:
assignee:	nobody → Yuriy Nesenenko (ynesenenko)
assignee:	Yuriy Nesenenko (ynesenenko) → nobody

Yuriy Nesenenko (ynesenenko) on 2016-01-19

Changed in fuel:
assignee:	nobody → Yuriy Nesenenko (ynesenenko)
assignee:	Yuriy Nesenenko (ynesenenko) → nobody
assignee:	nobody → MOS Cinder (mos-cinder)

Ivan Kolodyazhny (e0ne) on 2016-01-20

Changed in fuel:
assignee:	MOS Cinder (mos-cinder) → Tatyanka (tatyana-leontovich)

Revision history for this message

Yuriy Nesenenko (ynesenenko) wrote on 2016-01-20:

I think that the used env is slow according to the appropriate time of volume creation (the above mentioned ~2-5min). Please check it out on a faster environment. The appropriate time of volume creation should be < 1 min.

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2016-01-21:

@Ivan - please use snapshots and described steps to reproduce, also ping mos -qa for the help here

Changed in fuel:
assignee:	Tatyanka (tatyana-leontovich) → MOS Cinder (mos-cinder)

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2016-01-21:

Note according to test rail, the same issue happens on baremetal env https://mirantis.testrail.com/index.php?/tests/view/2465860&group_by=tests:status_id&group_order=asc&group_id=8 - so do not thinks that problem is in slow bm environment that appears only after destructive actions

Changed in fuel:
status:	Incomplete → Confirmed

Ivan Kolodyazhny (e0ne) on 2016-01-21

Changed in fuel:
assignee:	MOS Cinder (mos-cinder) → Yuriy Nesenenko (ynesenenko)
no longer affects:	fuel
no longer affects:	fuel/8.0.x
Changed in mos:
assignee:	nobody → Yuriy Nesenenko (ynesenenko)
status:	New → Confirmed
importance:	Undecided → High
milestone:	none → 8.0

Revision history for this message

Egor Kotko (ykotko) wrote on 2016-01-22:

Reproduced again on the baremetal environment, sometimes it is possible reproduce the problem not from the first time.
Destroy controller start test if was not reproduced:
start destroyed controller wait it is online and destroy another one after start the test.

https://drive.google.com/file/d/0BzWDM1PONYEuM0pYOUNwTGt0aG8/view?usp=sharing

Yuriy Nesenenko (ynesenenko) on 2016-01-27

Changed in mos:
status:	Confirmed → In Progress

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-27:

Another failure here: http://jenkins-product.srt.mirantis.net:8080/job/8.0.custom_system_test/1669/consoleFull

Roman Podoliaka (rpodolyaka) on 2016-01-27

summary:

- Create volume and boot instance from it failed on step server deleteion
+ Create volume and boot instance from it failed on step server deletion

Ivan Kolodyazhny (e0ne) on 2016-01-28

Changed in mos:
assignee:	Yuriy Nesenenko (ynesenenko) → Ivan Kolodyazhny (e0ne)

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2016-01-28:

#10

Upstream bug: https://bugs.launchpad.net/cinder/+bug/1539057

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2016-01-28: Fix proposed to openstack/cinder (openstack-ci/fuel-8.0/liberty)

#11

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Ivan Kolodyazhny <email address hidden>
Review: https://review.fuel-infra.org/16539

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-29:

#12

Do we have a stable repro? Also, why RabbitMQ is down?

The last thing I want to do here is introduce a last minute fix, which can break other things. Maybe it makes to move this to -updates?

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2016-01-29:

#13

+ occurrence https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.ha_neutron_destructive/123/testReport/%28root%29/ha_neutron_destroy_controllers/ha_neutron_destroy_controllers/

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2016-01-29:

#14

+ occurrence
https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.ha_neutron_destructive/123/testReport/%28root%29/ha_neutron_test_3_1_rabbit_failover/ha_neutron_test_3_1_rabbit_failover/

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2016-02-02: Change abandoned on openstack/cinder (openstack-ci/fuel-8.0/liberty)

#15

Change abandoned by Ivan Kolodyazhny <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/16539

Roman Podoliaka (rpodolyaka) on 2016-02-02

tags:

added: move-to-mu

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-02-03:

#16

We haven't seen in this in the field yet, only on CI (rarely). The proposed fix is too risky to be merged at this stage of the release cycle. We believe it's safe to move this to a MU and continue investigation of this issue.

Changed in mos:
milestone:	8.0 → 8.0-updates

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2016-02-03:

#17

We've got a hacky workaround. Proper fix [1] requires fix in oslo.messaging to support 'retry' and 'timeout' params per client. For now, these params are global.

[1] https://review.openstack.org/274148 - PoC

Changed in mos:
assignee:	Ivan Kolodyazhny (e0ne) → Dmitry Mescheryakov (dmitrymex)

Dmitry Mescheryakov (dmitrymex) on 2016-02-16

tags:

added: release-notes

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-02-18:

#18

Given the fact we haven't actually seen this in the field, I suggest we skip the release notes part.

tags:

removed: release-notes

Revision history for this message

Dmitry Belyaninov (dbelyaninov) wrote on 2016-04-07:

#19

Also applicable for 7.0
https://mirantis.testrail.com/index.php?/tests/view/4044632

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2016-04-11:

#20

The original issue is definitely the same as #1560097. Since we already tracking all the work there, I marking current bug as duplicate.

Re issue that Egor have found in comment #2 - it is al least a very different issue, but it might be that that one is a wrong test as well. For instance, it is not clear if Egor waited for RabbitMQ to recover after destructive action. Please file a separate issue if you experience it again and also please post snippets of the logs with timestamps indicating when the issue occurred, because it is completely unclear from the comment.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1560097 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

fail_error_ha_neutron_destroy_controllers-fuel-snapshot-2016-01-08_07-18-09.tar.xz Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.