series upgrade fails with a hook error in the prepare step

Bug #1877937 reported by Chris MacNaughton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Fix Released
Undecided
Liam Young

Bug Description

All referenced files in the following will be attached to this bug

## Do a series upgrade!

juju set-series keystone bionic

# Pause the non-leader's hacluster units
juju run-action keystone-hacluster/1 pause
juju run-action keystone-hacluster/2 pause

# pause the keystone units
juju run-action keystone/0 pause
juju run-action keystone/1 pause

juju show-action-status > action-status.txt
# all of the action statuses should be "completed"

juju status keystone> juju-status-after-pause.txt

# The juju status should now show Paused for the non-leader keystone units as well as their hacluster subordinates

# Call upgrade series - note that I'm doing 1 and 2 in parallel, and then I wait to do the leader (0)
juju upgrade-series 6 prepare bionic -y &
juju upgrade-series 7 prepare bionic -y &
# machine-6 started upgrade series from "xenial" to "bionic"
# machine-7 started upgrade series from "xenial" to "bionic"
#
# keystone/1 pre-series-upgrade hook running
# keystone/0 pre-series-upgrade hook running
#
# keystone/1 pre-series-upgrade completed
# keystone-hacluster/2 pre-series-upgrade hook running
# keystone/0 pre-series-upgrade completed
# keystone-hacluster/1 pre-series-upgrade hook running
# keystone-hacluster/2 pre-series-upgrade completed
# keystone-hacluster/1 pre-series-upgrade completed
# machine-7 binaries and service files written
#
# Juju is now ready for the series to be updated.
# Perform any manual steps required along with "do-release-upgrade".
# When ready, run the following to complete the upgrade series process:
#
# juju upgrade-series 7 complete
# machine-6 binaries and service files written
#
# Juju is now ready for the series to be updated.
# Perform any manual steps required along with "do-release-upgrade".
# When ready, run the following to complete the upgrade series process:
#
# juju upgrade-series 6 complete

juju status keystone > juju-mid-prepare.txt
# Notice that keystone-hacluster/0 is in executing state:
# keystone-hacluster/0* active executing ...
# Wait until it's idle again
juju upgrade-series 8 prepare bionic -y
# This looks like it gets part way in and then hangs:
# machine-8 started upgrade series from "xenial" to "bionic"
# keystone/2 pre-series-upgrade hook running
# keystone/2 pre-series-upgrade completed
# keystone-hacluster/0 pre-series-upgrade hook running

If I leave it up (why stop it) and jump to another temrinal, I can
see that there's a hook error on this hacluster unit:

juju status keystone-hacluster/0

Model Controller Cloud/Region Version SLA Timestamp
icey icey-serverstack serverstack/serverstack 2.7.6 unsupported 07:07:33Z

App Version Status Scale Charm Store Rev OS Notes
keystone 13.0.2 blocked 1 keystone jujucharms 494 ubuntu
keystone-hacluster error 1 hacluster jujucharms 131 ubuntu

Unit Workload Agent Machine Public address Ports Message
keystone/2* blocked idle 8 10.5.0.19 5000/tcp Ready for # -release-upgrade and reboot. Set complete when finished., Unit paused.
  keystone-hacluster/0* error idle 10.5.0.19 hook failed: # re-series-upgrade"

Machine State DNS Inst id Series AZ Message
8 started 10.5.0.19 50685221-656f-44ba-bebc-3980753b8f08 xenial nova ACTIVE

On that unit, the agent log shows:

2020-05-11 06:53:30 DEBUG pre-series-upgrade ERROR: running cibadmin -Ql: Signon to CIB failed: Transport endpoint is not connected
2020-05-11 06:53:30 DEBUG pre-series-upgrade Init failed, could not perform requested operations
2020-05-11 06:53:30 DEBUG pre-series-upgrade ERROR: juju-467145-icey-8: node name not recognized
2020-05-11 06:53:30 DEBUG pre-series-upgrade Traceback (most recent call last):
2020-05-11 06:53:30 DEBUG pre-series-upgrade File "/var/lib/juju/agents/unit-keystone-hacluster-0/charm/hooks/pre-series-upgrade", line 658, in <module>
2020-05-11 06:53:30 DEBUG pre-series-upgrade hooks.execute(sys.argv)
2020-05-11 06:53:30 DEBUG pre-series-upgrade File "/var/lib/juju/agents/unit-keystone-hacluster-0/charm/charmhelpers/core/hookenv.py", line 943, in execute
2020-05-11 06:53:30 DEBUG pre-series-upgrade self._hooks[hook_name]()
2020-05-11 06:53:30 DEBUG pre-series-upgrade File "/var/lib/juju/agents/unit-keystone-hacluster-0/charm/hooks/pre-series-upgrade", line 619, in series_upgrade_prepare
2020-05-11 06:53:30 DEBUG pre-series-upgrade pause_unit()
2020-05-11 06:53:30 DEBUG pre-series-upgrade File "/var/lib/juju/agents/unit-keystone-hacluster-0/charm/hooks/utils.py", line 1141, in pause_unit
2020-05-11 06:53:30 DEBUG pre-series-upgrade enter_standby_mode(node_name)
2020-05-11 06:53:30 DEBUG pre-series-upgrade File "/var/lib/juju/agents/unit-keystone-hacluster-0/charm/hooks/utils.py", line 1082, in enter_standby_mode
2020-05-11 06:53:30 DEBUG pre-series-upgrade subprocess.check_call(['crm', 'node', 'standby', node_name, duration])
2020-05-11 06:53:30 DEBUG pre-series-upgrade File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
2020-05-11 06:53:30 DEBUG pre-series-upgrade raise CalledProcessError(retcode, cmd)
2020-05-11 06:53:30 DEBUG pre-series-upgrade subprocess.CalledProcessError: Command '['crm', 'node', 'standby', 'juju-467145-icey-8', 'forever']' returned non-zero exit status 1
2020-05-11 06:53:30 ERROR juju.worker.uniter.operation runhook.go:132 hook "pre-series-upgrade" failed: exit status 1

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

This bundle should be sufficient to reproduce this on serverstack

Liam Young (gnuoy)
Changed in charm-hacluster:
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

The linked bundle is broken because I tried to cut down a test bundle, I'm running a test with the following bundle now:

series: xenial
machines:
  0: {}
  1: {}
  2: {}
  3: {}
  4: {}
  5: {}
applications:
  mysql:
    charm: cs:~openstack-charmers-next/percona-cluster
    num_units: 3
    options:
      vip: 10.5.0.240
      min-cluster-size: 3
    to:
      - 0
      - 1
      - 2
  mysql-hacluster:
    charm: cs:~openstack-charmers-next/hacluster
  keystone:
    charm: cs:~openstack-charmers-next/keystone
    constraints: mem=1G
    num_units: 3
    options:
      vip: 10.5.0.241
      openstack-origin: cloud:xenial-queens
    to:
      - 3
      - 4
      - 5
  keystone-hacluster:
    charm: cs:~openstack-charmers-next/hacluster
relations:
  - [ keystone, mysql ]
  - [ mysql, mysql-hacluster ]
  - [ keystone, keystone-hacluster ]

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

And a smaller version with a non-HA mysql:

series: xenial
machines:
  0: {}
  1: {}
  2: {}
  3: {}
applications:
  mysql:
    charm: cs:~openstack-charmers-next/percona-cluster
    num_units: 1
    to:
      - 0
  mysql-hacluster:
    charm: cs:~openstack-charmers-next/hacluster
  keystone:
    charm: cs:~openstack-charmers-next/keystone
    constraints: mem=1G
    num_units: 3
    options:
      vip: 10.5.0.241
      openstack-origin: cloud:xenial-queens
    to:
      - 1
      - 2
      - 3
  keystone-hacluster:
    charm: cs:~openstack-charmers-next/hacluster
relations:
  - [ keystone, mysql ]
  - [ keystone, keystone-hacluster ]

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

It seems to work just fine if the prepare step is run on the leader unit before the non-leaders

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-hacluster (master)

Fix proposed to branch: master
Review: https://review.opendev.org/726775

Changed in charm-hacluster:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-hacluster (master)

Reviewed: https://review.opendev.org/726775
Committed: https://git.openstack.org/cgit/openstack/charm-hacluster/commit/?id=d860f3406cab0c044066bde5ef86db69409d0dae
Submitter: Zuul
Branch: master

commit d860f3406cab0c044066bde5ef86db69409d0dae
Author: Liam Young <email address hidden>
Date: Mon May 11 10:56:22 2020 +0000

    Check for peer series upgrade in pause and status

    Check whether peers have sent series upgrade notifications before
    pausing a unit. If notifications have been sent then HA services
    will have been shutdown and pausing will fail.

    Similarly, if series upgrade notifications have been sent then
    do not try and issue crm commands when assessing status.

    Change-Id: I4de0ffe5d5e24578db614c2e8640ebd32b8cd469
    Closes-Bug: #1877937

Changed in charm-hacluster:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-hacluster:
milestone: none → 20.05
David Ames (thedac)
Changed in charm-hacluster:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.