pause/resume failing (workload status races)

Bug #1581171 reported by Ryan Beisner
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Landscape Server
High
Andreas Hasenack
16.06
High
Andreas Hasenack
ceilometer-agent (Juju Charms Collection)
High
Ryan Beisner
cinder (Juju Charms Collection)
High
David Ames
glance (Juju Charms Collection)
High
David Ames
keystone (Juju Charms Collection)
High
David Ames

Bug Description

Cinder, keystone, glance, swift-proxy are failing pause/resume amulet full tests @ master as of May 10-13+, whereas they were all passing prior.

To date, I've only observed this failure on Trusty targets (ie. upstart). I've not pinned this to any one cause, though it could be an init/image revision, or an existing race that is simply exacerbated by slower performance during higher load.

## Timing observation in one repro, cinder@master
20:11:38 Test issues pause action
20:11:49 Test declares failure (based on juju action fetch returning non-zero).
20:12:22 Services on units get SIGTERM

## Juju status
Shows the charm has declared itself not-paused successfully, which is what the Amulet test goes by:
http://pastebin.ubuntu.com/16380414/

=== original bug description ===
cinder@master amulet test often fails the pause action test

Observed and reproduced on trusty-liberty, trusty-mitaka test combos.

## Unit status
cinder/0 blocked idle 1.25.5 1 10.5.1.105 Services should be paused but these services running: cinder-api, cinder-volume, cinder-scheduler, Paused. Use 'resume' action to resume normal service.

## Amulet failure (trusty-mitaka, cinder@master 2016 May 12)
2016-05-12 18:11:56,004 test_910_pause_and_resume DEBUG: Checking pause and resume actions...
Running command: juju action do --format=json cinder/0 pause

Traceback (most recent call last):
  File "/tmp/bundletester-QVbeQC/cinder/tests/gate-basic-trusty-mitaka", line 11, in <module>
    deployment.run_tests()
  File "/tmp/bundletester-QVbeQC/cinder/tests/charmhelpers/contrib/amulet/deployment.py", line 95, in run_tests
    getattr(self, test)()
  File "/tmp/bundletester-QVbeQC/cinder/tests/basic_deployment.py", line 773, in test_910_pause_and_resume
    assert self._wait_on_action(action_id), "Pause action failed."
AssertionError: Pause action failed.

PASS: 1 ERROR: 1 Total: 2 (869.865811 sec)

## Juju status
http://pastebin.ubuntu.com/16380414/

Related branches

Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :

I think the workload status bits have some sort of race in the charm. According to the running processes on the unit, cinder-* were actually NOT running, despite the extended workload status message to the contrary:

https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_amulet_full/openstack/charm-cinder/314773/5/2016-05-12_14-12-06/test_charm_amulet_full/logs/0-processes.bz2

Revision history for this message
Ryan Beisner (1chb1n) wrote : Re: cinder@master amulet test often fails the pause action test
Download full text (3.4 KiB)

In another reproduction (with additional timestamp debug bits added to the test) of this errored pause test, here is some interesting timing info:

## Timestamps from tests
2016-05-12 20:11:24,131 test_910_pause_and_resume DEBUG: Checking pause and resume actions...
2016-05-12 20:11:24,134 test_910_pause_and_resume INFO: Waiting on extended status checks...
2016-05-12 20:11:24,135 _auto_wait_for_status INFO: Waiting for extended status on units...
2016-05-12 20:11:24,137 _auto_wait_for_status DEBUG: Default extended status wait match: contains READY (case-insensitive)
2016-05-12 20:11:24,137 _auto_wait_for_status DEBUG: Excluding services from extended status match: ['mysql']
2016-05-12 20:11:24,138 _auto_wait_for_status DEBUG: Waiting up to 1800s for extended status on services: ['cinder', 'glance', 'keystone', 'rabbitmq-server']
2016-05-12 20:11:24,727 _auto_wait_for_status INFO: OK
2016-05-12 20:11:24,728 test_910_pause_and_resume DEBUG: Unit name: cinder/6
2016-05-12 20:11:24,729 test_910_pause_and_resume DEBUG: Checking for active status on cinder/6
2016-05-12 20:11:38,170 test_910_pause_and_resume DEBUG: Running pause action on cinder/6
Running command: juju action do --format=json cinder/6 pause

2016-05-12 20:11:38,998 test_910_pause_and_resume DEBUG: Waiting on action 1f3e7042-bbb7-4025-82d8-1ca3f6254653
Traceback (most recent call last):
  File "/tmp/bundletester-V7qqmq/cinder/tests/gate-basic-trusty-liberty", line 11, in <module>
    deployment.run_tests()
  File "/tmp/bundletester-V7qqmq/cinder/tests/charmhelpers/contrib/amulet/deployment.py", line 95, in run_tests
    getattr(self, test)()
  File "/tmp/bundletester-V7qqmq/cinder/tests/basic_deployment.py", line 780, in test_910_pause_and_resume
    "Pause action failed at {}".format(datetime.datetime.now())
AssertionError: Pause action failed at 2016-05-12 20:11:49.244020

## SIGTERM Timestamps from cinder logs
ubuntu@juju-beis0-machine-31:/var/log/cinder$ grep SIG *
cinder-api.log:2016-05-12 20:03:07.062 14524 INFO oslo_service.service [-] Caught SIGTERM, stopping children
cinder-api.log:2016-05-12 20:12:21.951 32034 INFO oslo_service.service [-] Caught SIGTERM, stopping children
cinder-scheduler.log:2016-05-12 20:03:07.454 14576 INFO oslo_service.service [req-42bbdc02-be8c-437c-a087-5006710eef11 - - - - -] Caught SIGTERM, exiting
cinder-scheduler.log:2016-05-12 20:12:22.396 32089 INFO oslo_service.service [req-70ae4130-3d76-46a9-98a6-91a14fa47afa - - - - -] Caught SIGTERM, exiting
cinder-volume.log:2016-05-12 20:03:07.216 14630 INFO oslo_service.service [req-9f77de11-e5a6-42d5-a436-7cb2c900fd29 - - - - -] Caught SIGTERM, stopping children
cinder-volume.log:2016-05-12 20:12:22.118 32058 INFO oslo_service.service [req-c43b791d-e062-4148-b23c-5cdfe55ebb79 - - - - -] Caught SIGTERM, stopping children

ubuntu@juju-beis0-machine-31:/var/log$ ps aux | grep cinder
root 3396 0.2 1.5 349744 24344 ? Ssl 20:01 0:02 /var/lib/juju/tools/unit-cinder-6/jujud unit --data-dir /var/lib/juju --unit-name cinder/6 --debug
ubuntu 4517 0.0 0.0 10460 936 pts/0 S+ 20:21 0:00 grep --color=auto cinder
ubuntu@juju-beis0-machine-31:/var/log$ ps aux | grep oslo
ubuntu 4519...

Read more...

summary: - cinder@master trusty-mitaka amulet test often fails the pause action
- test
+ cinder@master amulet test often fails the pause action test
description: updated
Revision history for this message
Ryan Beisner (1chb1n) wrote :

After the cinder unit is in that state, a resume brings the juju service status back to a state where the charm believes it is healthy, but tgtd isn't running (haproxy is) and naturally the cinder api calls fail.

juju action do --format=json cinder/6 resume

juju action fetch abd1eea7-3a05-4d5b-8a52-4b9f81006162
status: completed
timing:
  completed: 2016-05-12 20:31:09 +0000 UTC
  enqueued: 2016-05-12 20:30:58 +0000 UTC
  started: 2016-05-12 20:31:02 +0000 UTC

cinder/6 active idle 1.25.5 31 10.5.1.135 Unit is ready

summary: - cinder@master amulet test often fails the pause action test
+ pause/resume failing
Ryan Beisner (1chb1n)
description: updated
David Ames (thedac)
Changed in cinder (Juju Charms Collection):
status: New → In Progress
Changed in glance (Juju Charms Collection):
status: New → In Progress
Changed in keystone (Juju Charms Collection):
status: New → In Progress
Changed in cinder (Juju Charms Collection):
importance: Undecided → High
Changed in glance (Juju Charms Collection):
importance: Undecided → High
Changed in keystone (Juju Charms Collection):
importance: Undecided → High
Changed in cinder (Juju Charms Collection):
assignee: nobody → David Ames (thedac)
Changed in glance (Juju Charms Collection):
assignee: nobody → David Ames (thedac)
Changed in keystone (Juju Charms Collection):
assignee: nobody → David Ames (thedac)
Changed in cinder (Juju Charms Collection):
milestone: none → 16.07
Changed in glance (Juju Charms Collection):
milestone: none → 16.07
Changed in keystone (Juju Charms Collection):
milestone: none → 16.07
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/316192

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/316194

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/316195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-glance (master)

Reviewed: https://review.openstack.org/316192
Committed: https://git.openstack.org/cgit/openstack/charm-glance/commit/?id=421e3ee1cdc0db6f39629a24457a93601519cea2
Submitter: Jenkins
Branch: master

commit 421e3ee1cdc0db6f39629a24457a93601519cea2
Author: David Ames <email address hidden>
Date: Fri May 13 09:48:45 2016 -0700

    Charm-helpers sync to pull in service_running fix

    Change-Id: I45f6d54cc1c3271d08314ee7c6e1bbd0237fc5cf
    Closes-Bug: 1581171

Changed in glance (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-cinder (master)

Reviewed: https://review.openstack.org/316194
Committed: https://git.openstack.org/cgit/openstack/charm-cinder/commit/?id=13cf8d93500c8bd148641aba0627cce65c0a2a07
Submitter: Jenkins
Branch: master

commit 13cf8d93500c8bd148641aba0627cce65c0a2a07
Author: David Ames <email address hidden>
Date: Fri May 13 09:43:20 2016 -0700

    Charm-helpers sync to pull in service_running fix

    Change-Id: If0ccd1a3c4156fedb45fe5bdec8cc74bc7e0257a
    Closes-Bug: 1581171

Changed in cinder (Juju Charms Collection):
status: In Progress → Fix Committed
Ryan Beisner (1chb1n)
summary: - pause/resume failing
+ pause/resume failing (workload status races)
Ryan Beisner (1chb1n)
Changed in ceilometer-agent (Juju Charms Collection):
importance: Undecided → High
status: New → In Progress
milestone: none → 16.07
assignee: nobody → Ryan Beisner (1chb1n)
David Britton (dpb)
tags: added: kanban-cross-team landscape maintenance-mode
tags: removed: kanban-cross-team
Revision history for this message
James Page (james-page) wrote :

Associate bug 1582813

A change to lsb-base makes init.d scripts behave like upstart configurations, which confuses "--status-all"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/317913

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-cinder (master)

Change abandoned by James Page (<email address hidden>) on branch: master
Review: https://review.openstack.org/317913

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceilometer-agent (master)

Change abandoned by Ryan Beisner (<email address hidden>) on branch: master
Review: https://review.openstack.org/317470

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceilometer-agent (master)

Reviewed: https://review.openstack.org/318055
Committed: https://git.openstack.org/cgit/openstack/charm-ceilometer-agent/commit/?id=854c5184eeb9ee217b5a0b395753486ce6782ad0
Submitter: Jenkins
Branch: master

commit 854c5184eeb9ee217b5a0b395753486ce6782ad0
Author: James Page <email address hidden>
Date: Wed May 18 14:00:57 2016 +0100

    Resync charm-helpers

    Avoid use of 'service --status-all' which is currently
    broken on trusty for upstart managed daemons; the change
    moves to detecting how the daemon is managed, and then
    using upstart status XXX or the return code of service XXX
    status to determine whether a process is running.

    Fixes for IPv6 network address detection under Ubuntu
    16.04 which changes the output format of the ip commands
    slightly.

    Update the version map to include 8.1.x as a Neutron
    version for Mitaka.

    Change-Id: Ib5f252cc3fff2269a818b7b7856bbe8925f5c12b
    Closes-Bug: 1581171
    Closes-Bug: 1581598
    Closes-Bug: 1580674

Changed in ceilometer-agent (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-cinder (master)

Reviewed: https://review.openstack.org/318060
Committed: https://git.openstack.org/cgit/openstack/charm-cinder/commit/?id=d9a6cb7606137fed36bff97c2eb21a569bb869fd
Submitter: Jenkins
Branch: master

commit d9a6cb7606137fed36bff97c2eb21a569bb869fd
Author: James Page <email address hidden>
Date: Wed May 18 14:03:33 2016 +0100

    Resync charm-helpers

    Avoid use of 'service --status-all' which is currently
    broken on trusty for upstart managed daemons; the change
    moves to detecting how the daemon is managed, and then
    using upstart status XXX or the return code of service XXX
    status to determine whether a process is running.

    Fixes for IPv6 network address detection under Ubuntu
    16.04 which changes the output format of the ip commands
    slightly.

    Update the version map to include 8.1.x as a Neutron
    version for Mitaka.

    Change-Id: Ic565bf17f13315c9511636c7ee9ede560c37e91f
    Closes-Bug: 1581171
    Closes-Bug: 1581598
    Closes-Bug: 1580674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-glance (master)

Reviewed: https://review.openstack.org/318063
Committed: https://git.openstack.org/cgit/openstack/charm-glance/commit/?id=f9167bafc7dd81541d5dd5ea02b075b85f9b7319
Submitter: Jenkins
Branch: master

commit f9167bafc7dd81541d5dd5ea02b075b85f9b7319
Author: James Page <email address hidden>
Date: Wed May 18 14:05:02 2016 +0100

    Resync charm-helpers

    Avoid use of 'service --status-all' which is currently
    broken on trusty for upstart managed daemons; the change
    moves to detecting how the daemon is managed, and then
    using upstart status XXX or the return code of service XXX
    status to determine whether a process is running.

    Fixes for IPv6 network address detection under Ubuntu
    16.04 which changes the output format of the ip commands
    slightly.

    Update the version map to include 8.1.x as a Neutron
    version for Mitaka.

    Change-Id: I2b6ade54c8955247d29f7a45a0a8b71a66a672a5
    Closes-Bug: 1581171
    Closes-Bug: 1581598
    Closes-Bug: 1580674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceilometer-agent (stable/16.04)

Fix proposed to branch: stable/16.04
Review: https://review.openstack.org/318218

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-glance (stable/16.04)

Fix proposed to branch: stable/16.04
Review: https://review.openstack.org/318224

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-keystone (stable/16.04)

Fix proposed to branch: stable/16.04
Review: https://review.openstack.org/318225

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-cinder (stable/16.04)

Fix proposed to branch: stable/16.04
Review: https://review.openstack.org/318239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceilometer-agent (stable/16.04)

Reviewed: https://review.openstack.org/318218
Committed: https://git.openstack.org/cgit/openstack/charm-ceilometer-agent/commit/?id=e35c11225f9b6a566bddfcafcd95546a2e580425
Submitter: Jenkins
Branch: stable/16.04

commit e35c11225f9b6a566bddfcafcd95546a2e580425
Author: James Page <email address hidden>
Date: Wed May 18 17:40:35 2016 +0100

    Resync stable charm-helpers

    Avoid use of 'service --status-all' which is currently
    broken on trusty for upstart managed daemons; the change
    moves to detecting how the daemon is managed, and then
    using upstart status XXX or the return code of service XXX
    status to determine whether a process is running.

    Fixes for IPv6 network address detection under Ubuntu
    16.04 which changes the output format of the ip commands
    slightly.

    Update the version map to include 8.1.x as a Neutron
    version for Mitaka.

    Change-Id: I7263c762582a75813c5f0b0f0e14b34af8c2a7fd
    Closes-Bug: 1581171
    Closes-Bug: 1581598
    Closes-Bug: 1580674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-cinder (stable/16.04)

Reviewed: https://review.openstack.org/318239
Committed: https://git.openstack.org/cgit/openstack/charm-cinder/commit/?id=442a5adcdeec840a2ff01d147fa8dc6990d47c65
Submitter: Jenkins
Branch: stable/16.04

commit 442a5adcdeec840a2ff01d147fa8dc6990d47c65
Author: James Page <email address hidden>
Date: Wed May 18 17:56:11 2016 +0100

    Resync stable charm-helpers

    Avoid use of 'service --status-all' which is currently
    broken on trusty for upstart managed daemons; the change
    moves to detecting how the daemon is managed, and then
    using upstart status XXX or the return code of service XXX
    status to determine whether a process is running.

    Fixes for IPv6 network address detection under Ubuntu
    16.04 which changes the output format of the ip commands
    slightly.

    Update the version map to include 8.1.x as a Neutron
    version for Mitaka.

    Change-Id: I08e0dcebaf7019625fa8b48e27981b999cc04709
    Closes-Bug: 1581171
    Closes-Bug: 1581598
    Closes-Bug: 1580674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-glance (stable/16.04)

Reviewed: https://review.openstack.org/318224
Committed: https://git.openstack.org/cgit/openstack/charm-glance/commit/?id=6956376e65095b00346a5829e8652df73b0cbc58
Submitter: Jenkins
Branch: stable/16.04

commit 6956376e65095b00346a5829e8652df73b0cbc58
Author: James Page <email address hidden>
Date: Wed May 18 17:43:51 2016 +0100

    Resync stable charm-helpers

    Avoid use of 'service --status-all' which is currently
    broken on trusty for upstart managed daemons; the change
    moves to detecting how the daemon is managed, and then
    using upstart status XXX or the return code of service XXX
    status to determine whether a process is running.

    Fixes for IPv6 network address detection under Ubuntu
    16.04 which changes the output format of the ip commands
    slightly.

    Update the version map to include 8.1.x as a Neutron
    version for Mitaka.

    Change-Id: Ia7c03c9f38468326718f21db0aed9ad157ae1bed
    Closes-Bug: 1581171
    Closes-Bug: 1581598
    Closes-Bug: 1580674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (stable/16.04)

Reviewed: https://review.openstack.org/318225
Committed: https://git.openstack.org/cgit/openstack/charm-keystone/commit/?id=f7a35ddd9bcea287d5cffa18014e8e42a497f506
Submitter: Jenkins
Branch: stable/16.04

commit f7a35ddd9bcea287d5cffa18014e8e42a497f506
Author: James Page <email address hidden>
Date: Wed May 18 17:44:19 2016 +0100

    Resync stable charm-helpers

    Avoid use of 'service --status-all' which is currently
    broken on trusty for upstart managed daemons; the change
    moves to detecting how the daemon is managed, and then
    using upstart status XXX or the return code of service XXX
    status to determine whether a process is running.

    Fixes for IPv6 network address detection under Ubuntu
    16.04 which changes the output format of the ip commands
    slightly.

    Update the version map to include 8.1.x as a Neutron
    version for Mitaka.

    Change-Id: I5b412a5f75985a5183444e3e8666534cce718563
    Closes-Bug: 1581171
    Closes-Bug: 1581598
    Closes-Bug: 1580674

Changed in keystone (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (master)

Reviewed: https://review.openstack.org/318065
Committed: https://git.openstack.org/cgit/openstack/charm-keystone/commit/?id=a651e239a612a442fcc3aa882390036bb87cb815
Submitter: Jenkins
Branch: master

commit a651e239a612a442fcc3aa882390036bb87cb815
Author: James Page <email address hidden>
Date: Wed May 18 14:06:12 2016 +0100

    Resync charm-helpers

    Avoid use of 'service --status-all' which is currently
    broken on trusty for upstart managed daemons; the change
    moves to detecting how the daemon is managed, and then
    using upstart status XXX or the return code of service XXX
    status to determine whether a process is running.

    Fixes for IPv6 network address detection under Ubuntu
    16.04 which changes the output format of the ip commands
    slightly.

    Update the version map to include 8.1.x as a Neutron
    version for Mitaka.

    Change-Id: I6c002c7671600bff53785da7bf9a73b4ddd6348b
    Closes-Bug: 1581171
    Closes-Bug: 1581598
    Closes-Bug: 1580674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-keystone (master)

Change abandoned by James Page (<email address hidden>) on branch: master
Review: https://review.openstack.org/316195
Reason: Superceded

Changed in landscape:
assignee: nobody → Andreas Hasenack (ahasenack)
importance: Undecided → High
status: New → Fix Committed
milestone: none → 16.06
Liam Young (gnuoy)
Changed in cinder (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in glance (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in keystone (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in ceilometer-agent (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in landscape:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers