[19.04][Queens -> Rocky] Upgrading to Rocky resulted in "Services not running that should be: designate-producer"

Bug #1828534 reported by Dmitrii Shcherbakov
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Designate
Undecided
Corey Bryant
OpenStack Charms Deployment Guide
High
Unassigned
OpenStack Designate Charm
Undecided
Chris MacNaughton
Ubuntu Cloud Archive
High
Unassigned
Rocky
High
Unassigned
Stein
High
Unassigned
Train
High
Unassigned
memcached-charm
High
Unassigned
designate (Ubuntu)
High
Unassigned
Disco
High
Unassigned
Eoan
High
Unassigned

Bug Description

Designate has to use Worker and Producer as of Rocky as the support for zone manager and pool manager was up for removal in Rocky. This was addressed in https://bugs.launchpad.net/charm-designate/+bug/1773190

During a Queens to Rocky upgrade I ran into an issue with designate-producer being down (see the details below).

I was able to successfully start designate-producer by hand. It looks like there were several attempts to restart the service by systemd itself ("Scheduled restart job, restart counter is at 5.") which eventually failed when the failcount became higher than 5. Debugging mode was disabled so I did not see anything in the producer log.

designate/0* blocked idle 2/lxd/2 10.232.46.153 9001/tcp Services not running that should be: designate-producer
  hacluster-designate/0* active idle 10.232.46.153 Unit is ready and clustered

systemctl list-unit-files | grep designate-
designate-agent.service enabled
designate-api.service enabled
designate-central.service enabled
designate-mdns.service enabled
designate-producer.service enabled
designate-sink.service enabled
designate-worker.service enabled

ubuntu@juju-eeda89-2-lxd-2:~$ journalctl -u designate-producer.service
https://paste.ubuntu.com/p/WRXYvfynnd/

ubuntu@juju-eeda89-2-lxd-2:~$ pgrep -af designate
172877 bash /lib/systemd/system/jujud-unit-hacluster-designate-0/exec-start.sh
172878 bash /lib/systemd/system/jujud-unit-designate-0/exec-start.sh
172884 /var/lib/juju/tools/unit-designate-0/jujud unit --data-dir /var/lib/juju --unit-name designate/0 --debug
172886 /var/lib/juju/tools/unit-hacluster-designate-0/jujud unit --data-dir /var/lib/juju --unit-name hacluster-designate/0 --debug
392388 /usr/bin/python3.6 /usr/bin/designate-mdns --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-mdns.log
392399 /usr/bin/python3.6 /usr/bin/designate-agent --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-agent.log
392405 /usr/bin/python3.6 /usr/bin/designate-worker --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-worker.log
392411 /usr/bin/python3.6 /usr/bin/designate-central --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-central.log
392415 /usr/bin/python3.6 /usr/bin/designate-sink --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-sink.log
392423 /usr/bin/python3.6 /usr/bin/designate-api --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-api.log
392481 /usr/bin/python3.6 /usr/bin/designate-central --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-central.log
392482 /usr/bin/python3.6 /usr/bin/designate-central --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-central.log
392483 /usr/bin/python3.6 /usr/bin/designate-central --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-central.log
392484 /usr/bin/python3.6 /usr/bin/designate-central --config-file=/etc/designate/designate.conf --log-file=/var/log/designate/designate-central.log

----------

designate/0 unit log:

2019-05-09 00:20:38 DEBUG openstack-upgrade Setting up designate-producer (1:7.0.0-0ubuntu1~cloud0) ...
2019-05-09 00:20:38 DEBUG openstack-upgrade Created symlink /etc/systemd/system/multi-user.target.wants/designate-producer.service → /lib/systemd/system/designate-producer.service.

# journalctl -u designate-producer
journalctl -u designate-producer | grep start
May 09 00:20:40 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Service hold-off time over, scheduling restart.
May 09 00:20:40 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Scheduled restart job, restart counter is at 1.
May 09 00:20:41 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Service hold-off time over, scheduling restart.
May 09 00:20:41 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Scheduled restart job, restart counter is at 2.
May 09 00:20:43 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Service hold-off time over, scheduling restart.
May 09 00:20:43 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Scheduled restart job, restart counter is at 3.
May 09 00:20:44 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Service hold-off time over, scheduling restart.
May 09 00:20:44 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Scheduled restart job, restart counter is at 4.
May 09 00:20:46 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Service hold-off time over, scheduling restart.
May 09 00:20:46 juju-eeda89-2-lxd-2 systemd[1]: designate-producer.service: Scheduled restart job, restart counter is at 5.
May 09 00:20:46 juju-eeda89-2-lxd-2 systemd[1]: Failed to start OpenStack Designate DNSaaS producer.

designate/0 unit log (continued):

2019-05-09 00:20:58 INFO juju-log Purging ['designate-pool-manager', 'designate-zone-manager', 'python-designate', 'python-memcache']

2019-05-09 00:20:59 DEBUG openstack-upgrade Removing designate-pool-manager (1:7.0.0-0ubuntu1~cloud0) ...
2019-05-09 00:21:15 DEBUG openstack-upgrade Removing designate-zone-manager (1:7.0.0-0ubuntu1~cloud0) ...
2019-05-09 00:21:31 DEBUG openstack-upgrade Removing python-designate (1:7.0.0-0ubuntu1~cloud0) ...

2019-05-09 00:21:31 DEBUG openstack-upgrade update-alternatives: using /usr/bin/python3-designate-producer to provide /usr/bin/designate-producer (designate-producer) in auto mode

-------------------

grep producer /var/log/juju/unit-designate-0.log
2019-05-09 00:18:43 DEBUG openstack-upgrade update-alternatives: using /usr/bin/python2-designate-producer to provide /usr/bin/designate-producer (designate-producer) in auto mode
2019-05-09 00:19:58 INFO juju-log Installing ['designate-agent', 'designate-api', 'designate-central', 'designate-common', 'designate-mdns', 'designate-worker', 'designate-sink', 'designate-producer', 'bind9utils', 'python3-designate', 'python-apt', 'memcached', 'python3-memcache', 'haproxy', 'apache2'] with options: ['--option', 'Dpkg::Options::=--force-confnew', '--option', 'Dpkg::Options::=--force-confdef']
2019-05-09 00:19:58 DEBUG openstack-upgrade designate-producer designate-worker python3-amqp python3-anyjson
2019-05-09 00:20:09 DEBUG openstack-upgrade Get:112 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/rocky/main amd64 designate-producer all 1:7.0.0-0ubuntu1~cloud0 [10.6 kB]
2019-05-09 00:20:20 DEBUG openstack-upgrade Selecting previously unselected package designate-producer.
2019-05-09 00:20:20 DEBUG openstack-upgrade Preparing to unpack .../105-designate-producer_1%3a7.0.0-0ubuntu1~cloud0_all.deb ...
2019-05-09 00:20:20 DEBUG openstack-upgrade Unpacking designate-producer (1:7.0.0-0ubuntu1~cloud0) ...
2019-05-09 00:20:38 DEBUG openstack-upgrade Setting up designate-producer (1:7.0.0-0ubuntu1~cloud0) ...
2019-05-09 00:20:38 DEBUG openstack-upgrade Created symlink /etc/systemd/system/multi-user.target.wants/designate-producer.service → /lib/systemd/system/designate-producer.service.
2019-05-09 00:21:31 DEBUG openstack-upgrade update-alternatives: using /usr/bin/python3-designate-producer to provide /usr/bin/designate-producer (designate-producer) in auto mode

Related branches

description: updated
Revision history for this message
David Ames (thedac) wrote :

TRIAGE:

Seems we have a race such that designate-producer is not ready during the upgrade.
Guarantee designate-producer has everything it needs or is restarted at the end of the upgrade process.

Changed in charm-designate:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 19.07
James Page (james-page)
Changed in charm-designate:
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
Liam Young (gnuoy) wrote :

I think this is a packaging bug

Changed in charm-designate:
status: Triaged → Invalid
assignee: Liam Young (gnuoy) → nobody
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Any tips on how to reproduce this? I just upgrade designate from queens to rocky and didn't hit it.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I'm going to do another upgrade attempt.

https://pastebin.canonical.com/p/72XWqsjMdr/ (bundle)

https://paste.ubuntu.com/p/JSSFZvp3bB/ (upgrade script: os-upgrade.py with a `juju --wait` and service list modification)

~/bundles/os-upgrade-queens-rocky.py -p

Let's see if I can reproduce it again.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Managed to reproduce it (only on one unit though), this time with debug=true - looks like it is a py2 - py3 issue:

designate/0 active idle 2/lxd/2 10.232.46.209 9001/tcp Unit is ready
  hacluster-designate/0* active idle 10.232.46.209 Unit is ready and clustered
designate/1* blocked idle 3/lxd/1 10.232.46.208 9001/tcp Services not running that should be: designate-producer
  hacluster-designate/1 active idle 10.232.46.208 Unit is ready and clustered

/var/log/designate/designate-producer.log
https://paste.ubuntu.com/p/pMbt2q6KTF/

2019-07-03 23:43:48.226 37919 DEBUG designate.service [-] Starting RPC server on topic 'producer' start /usr/lib/python3/dist-packages/designate/service.py:171
2019-07-03 23:43:48.252 37919 DEBUG designate.coordination [-] Starting partitioner start /usr/lib/python3/dist-packages/designate/coordination.py:213
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service [-] Error starting thread.: TypeError: '<' not supported between instances of 'str' and 'bytes'
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service Traceback (most recent call last):
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 794, in run_service
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service service.start()
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/designate/producer/service.py", line 72, in start
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service self._partitioner.start()
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/designate/coordination.py", line 223, in start
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service self._my_partitions = self._update_partitions()[1]
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/designate/coordination.py", line 200, in _update_partitions
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service members = sorted(list(self._get_members(self._group_id)))
2019-07-03 23:43:48.257 37919 ERROR oslo_service.service TypeError: '<' not supported between instances of 'str' and 'bytes'

systemctl status designate-producer
● designate-producer.service - OpenStack Designate DNSaaS producer
   Loaded: loaded (/lib/systemd/system/designate-producer.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2019-07-03 23:43:49 UTC; 16h ago
 Main PID: 37919 (code=exited, status=0/SUCCESS)

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Attached logs from a healthy unit as well.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

The traceback looks like an upstream designate issue with py3 support:

[-] Error starting thread.: TypeError: '<' not supported between instances of 'str' and 'bytes'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 794, in run_service
    service.start()
  File "/usr/lib/python3/dist-packages/designate/producer/service.py", line 72, in start
    self._partitioner.start()
  File "/usr/lib/python3/dist-packages/designate/coordination.py", line 223, in start
    self._my_partitions = self._update_partitions()[1]
  File "/usr/lib/python3/dist-packages/designate/coordination.py", line 200, in _update_partitions
    members = sorted(list(self._get_members(self._group_id)))
TypeError: '<' not supported between instances of 'str' and 'bytes'

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I was able to reproduce this. It's specific to coordination backend code that is used for HA. For charms, the following bundle config was used (snipped for brevity to just show designate bits):

    hacluster-designate:
      charm: cs:hacluster
      options:
        cluster_count: 2
    designate:
      charm: cs:~openstack-charmers-next/designate
      constraints: mem=1G
      num_units: 2
      options:
        action-managed-upgrade: true
        debug: true
        nameservers: 'ns1.ubuntu.com'
        nova-domain: 'serverstack.ubuntu.com.'
        neutron-domain: 'serverstack.ubuntu.com.'
        nova-domain-email: '<email address hidden>'
        neutron-domain-email: '<email address hidden>'
        vip: 10.5.20.1
    designate-bind:
      charm: cs:~openstack-charmers-next/designate-bind
  relations:
    - [ designate, keystone ]
    - [ designate, mysql ]
    - [ designate, rabbitmq-server ]
    - [ designate, designate-bind ]
    - [ designate, memcached ]
    - [ designate, hacluster-designate ]
    - - designate:dnsaas
      - neutron-api:external-dns

Changed in designate (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in charm-designate:
importance: Critical → Undecided
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is reproducible in a python3 shell. The problem is that sorted() is called on a list that includes both bytes and str types:

$ python3
Python 3.7.4 (default, Jul 11 2019, 10:43:21)
[GCC 9.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> blist=['1'.encode(), '3', '2'.encode()]
>>> sorted(blist)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'bytes'
>>>

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Reproducing with designate, the problem appears after we have one designate unit that is at bionic-rocky (py3) and one that is at bionic-queens (py2). It is resolved once all units are at bionic-rocky. More details on that:

# upgrade one unit to bionic-rocky
juju config designate openstack-origin=cloud:bionic-rocky
juju run-action designate/0 openstack-upgrade

juju status
Unit Workload Agent Machine Public address Ports Message
designate/0 error idle 7 10.5.0.198 9001/tcp Services not running that should be: designate-producer
  hacluster-designate/0 active idle 10.5.0.198 Unit is ready and clustered
designate/1* active idle 8 10.5.0.6 9001/tcp Unit is ready
  hacluster-designate/1* active idle 10.5.0.6 Unit is ready and clustered

juju ssh designate/0

2019-07-25 18:04:27.452 24807 DEBUG designate.coordination [-] CCB: list(self._get_members(self._group_id))=[b'juju-3d28eb-coreycb2-8:86c49114-a37a-4e1a-8654-72bf1ccbde1f', 'juju-3d28eb-coreycb2-7:d64cde68-d9f0-4910-9f1a-6d9724ea77b9'] _update_partitions /usr/lib/python3/dist-packages/designate/coordination.py:202

where:
- machine 8 (b'juju-3d28eb-coreycb2-8:86c49114-a37a-4e1a-8654-72bf1ccbde1f') hasn't been upgraded yet and is still bionic-queens
- machine 7 ('juju-3d28eb-coreycb2-7:d64cde68-d9f0-4910-9f1a-6d9724ea77b9') is the machine that was upgraded to bionic-rocky

Note: designate coordinator backend gets the juju strings by calling get_members() from the tooz memcached backend. /usr/lib/python3/dist-packages/tooz/drivers/memcached.py

# now try to upgrade the other unit to bionic-rocky
juju run-action designate/1 openstack-upgrade

# restart memcache and designate-producer on designate/0 and resolve
juju resolved designate/0

# all better

Unit Workload Agent Machine Public address Ports Message
designate/0 active idle 7 10.5.0.198 9001/tcp Unit is ready
  hacluster-designate/0 active idle 10.5.0.198 Unit is ready and clustered
designate/1* active idle 8 10.5.0.6 9001/tcp Unit is ready
  hacluster-designate/1* active idle 10.5.0.6 Unit is ready and clustered

juju ssh designate/0

2019-07-25 19:31:11.365 12217 DEBUG designate.coordination [-] CCB: list(self._get_members(self._group_id))=['juju-3d28eb-coreycb2-7:0ceca5d8-9eee-492d-bead-f99ee990a21b', 'juju-3d28eb-coreycb2-8:134e6142-4364-43a5-96fc-b9f125ba87a0'] _update_partitions /usr/lib/python3/dist-packages/designate/coordination.py:202

Both strings are str type now and able to be sorted.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

From /usr/lib/python3/dist-packages/designate/coordination.py:

from oslo_config import cfg
CONF = cfg.CONF
class CoordinationMixin(object):
    def __init__(self, *args, **kwargs):
        super(CoordinationMixin, self).__init__(*args, **kwargs)

        self._coordinator = None

    def start(self):
        self._coordination_id = ":".join([CONF.host, generate_uuid()]) # <------ this line seems to match up with the logs above where CONF.host is bytes and str type

It looks like that might be defined in one of these places:

ubuntu@juju-3d28eb-coreycb2-7:/usr/lib/python3/dist-packages/designate$ grep -r StrOpt | grep \'host\'
__init__.py: cfg.StrOpt('host', default=socket.gethostname(),
pool_manager/__init__.py: cfg.StrOpt('host'),

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I have a fix coming for this. It is very similar to the example at: https://docs.openstack.org/tooz/latest/user/tutorial/group_membership.html , where host-1 is binary encoded.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (master)

Fix proposed to branch: master
Review: https://review.opendev.org/673360

Changed in designate:
assignee: nobody → Corey Bryant (corey.bryant)
status: New → In Progress
tags: added: py3
Frode Nordahl (fnordahl)
Changed in charm-designate:
milestone: 19.07 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (master)

Reviewed: https://review.opendev.org/673360
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=556a27e4e9c9c4e21fb0ea46d8d8832d28c85314
Submitter: Zuul
Branch: master

commit 556a27e4e9c9c4e21fb0ea46d8d8832d28c85314
Author: Corey Bryant <email address hidden>
Date: Mon Jul 29 15:44:48 2019 -0400

    Ensure coordination IDs are encoded

    Ensure coordination IDs are encoded when working with coordination
    backend. This fixes an issue when upgrading to Python 3 (where bytes
    and str are different types) and _update_partitions() attempts to
    sort types of 'str' and 'bytes', causing designate-producer to crash.

    Change-Id: Id8206ee5285d3a73e00ef21b7d3961a29c23ab4b
    Closes-Bug: #1828534

Changed in designate:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/designate 9.0.0.0rc1

This issue was fixed in the openstack/designate 9.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/697346

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/697347

Changed in designate (Ubuntu Eoan):
status: New → Fix Released
importance: Undecided → High
Changed in designate (Ubuntu Disco):
importance: Undecided → High
status: New → Triaged
Changed in designate (Ubuntu):
status: Triaged → Fix Committed
status: Fix Committed → Fix Released
Changed in cloud-archive:
status: New → Fix Released
importance: Undecided → High
Revision history for this message
Corey Bryant (corey.bryant) wrote :

I don't think we'll be fixing this in stable/rocky or stable/stein as it would break existing py3 deployments in a stable release. The trade off is that it would fix py2 deployments that upgrade to py3. For now I'm going to mark those releases as "won't fix".

Changed in designate (Ubuntu Disco):
status: Triaged → Won't Fix
Revision history for this message
Dincer Celik (osmanlicilegi) wrote :

@Corey

Is http://paste.openstack.org/show/787404/ related with this issue? Using stein with ubuntu.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

@Dincer, that doesn't look the same

Revision history for this message
James Troup (elmo) wrote :

Hi Corey, if we can't fix this, we need to at least document it as a known issue in the charm OpenStack upgrade documentation, surely?

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Yes let's update the charm deployment guide, specifically this section should call this upgrade issue out: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-upgrade-openstack.html#known-openstack-upgrade-issues

Changed in charm-deployment-guide:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Also we could consider cherry-picking the fixes back to rocky and stein packages. They've just not landed upstream. It would fix py2->py3 upgrades, which this bug reported. But I believe the fix would break py3->py3 upgrades. The charm deploys py3 as of rocky. The package was py2 by default in rocky (with py3 available) and py3 by default in stein.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

This issue can be resolved by restarting the memcached service that designate is using for coordination. After this, the designate-producer service seems to run normally.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

adding project charm-memcached per @chris.macnaughton comment #24.

That charm could grow a hook for cache-relation-changed to respond to requests for encoding updates/service recycle from other charms.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Chris' doc update has been merged: https://review.opendev.org/#/c/729300

Changed in charm-deployment-guide:
status: Triaged → Fix Committed
Changed in charm-designate:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
Diko Parvanov (dparv)
Changed in charm-memcached:
importance: Undecided → High
status: New → Triaged
Alvaro Uria (aluria)
Changed in charm-memcached:
milestone: none → 20.08
status: Triaged → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-designate (master)

Reviewed: https://review.opendev.org/733121
Committed: https://git.openstack.org/cgit/openstack/charm-designate/commit/?id=5c929b3ba950257cd612c1ba94d57c145ec03526
Submitter: Zuul
Branch: master

commit 5c929b3ba950257cd612c1ba94d57c145ec03526
Author: Chris MacNaughton <email address hidden>
Date: Wed Jun 3 14:02:57 2020 +0200

    Request a restart of memcached after a service upgrade after queens

    Change-Id: I54b235de947e63e3d7b86ccfdba9f1b968b75650
    Closes-Bug: #1828534

Changed in charm-designate:
status: In Progress → Fix Committed
Revision history for this message
Andrea Ieri (aieri) wrote :

released as cs:~llama-charmers-next/memcached-4

Changed in charm-memcached:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-designate (stable/20.05)

Fix proposed to branch: stable/20.05
Review: https://review.opendev.org/740009

James Page (james-page)
Changed in charm-designate:
milestone: none → 20.08
Changed in charm-designate:
status: Fix Committed → Fix Released
Changed in charm-deployment-guide:
status: Fix Committed → Fix Released
Revision history for this message
Frédéric MOSSER (fmosser93) wrote :

Hi,

As we are upgrading our Cloud from Stein to Train, we experienced the exact same problem, one of
our 3 HA designate unit reports "Services not running that should be : designate-producer", I tried to "pause" then "resume" the unit, nothing changed, neither after a reboot, and when I try to restart memcached.service and designate-producer.service, this last was still "active/dead".

After checking the upgrade process, this blocked unit don't have any reference to "train repository" in the sources.list and seems to have not been able to perform the upgrade ... no designate-* packages found in version 9.0.1 (for train)...

So I did the upgrade procedure once again, pausing hacluster-designate on the unit and designate unit itself, then launched again 'juju run-action designate/3 --wait openstack-upgrade' and then it did the job !

When I resumed the unit and his subordinated charm, juju reported it as "active/idle" and on the unit the "designate-producer.service" is "Active: active (running)" and no problem remains, all packages are up to date...

I hope it will help anybody !

Best regards.

Frédéric MOSSER.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

I also experienced this on 2 of 3 units of designate running 21.01 charms upgrading from cloud:bionic-stein to cloud:bionic-train with action-managed-upgrade=false.

Interestingly, the unit that did not exhibit this race was not the leader.

I've got an SOSreport available if interested.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers