[19.04][Queens -> Rocky] Upgrading to Rocky resulted in "Services not running that should be: designate-producer"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Designate |
Undecided
|
Corey Bryant | |||
OpenStack Charms Deployment Guide |
High
|
Unassigned | |||
OpenStack Designate Charm |
Undecided
|
Chris MacNaughton | |||
Ubuntu Cloud Archive |
High
|
Unassigned | |||
Rocky |
High
|
Unassigned | |||
Stein |
High
|
Unassigned | |||
Train |
High
|
Unassigned | |||
memcached-charm |
High
|
Unassigned | |||
designate (Ubuntu) |
High
|
Unassigned | |||
Disco |
High
|
Unassigned | |||
Eoan |
High
|
Unassigned |
Bug Description
Designate has to use Worker and Producer as of Rocky as the support for zone manager and pool manager was up for removal in Rocky. This was addressed in https:/
During a Queens to Rocky upgrade I ran into an issue with designate-producer being down (see the details below).
I was able to successfully start designate-producer by hand. It looks like there were several attempts to restart the service by systemd itself ("Scheduled restart job, restart counter is at 5.") which eventually failed when the failcount became higher than 5. Debugging mode was disabled so I did not see anything in the producer log.
designate/0* blocked idle 2/lxd/2 10.232.46.153 9001/tcp Services not running that should be: designate-producer
hacluster-
systemctl list-unit-files | grep designate-
designate-
designate-
designate-
designate-
designate-
designate-
designate-
ubuntu@
https:/
ubuntu@
172877 bash /lib/systemd/
172878 bash /lib/systemd/
172884 /var/lib/
172886 /var/lib/
392388 /usr/bin/python3.6 /usr/bin/
392399 /usr/bin/python3.6 /usr/bin/
392405 /usr/bin/python3.6 /usr/bin/
392411 /usr/bin/python3.6 /usr/bin/
392415 /usr/bin/python3.6 /usr/bin/
392423 /usr/bin/python3.6 /usr/bin/
392481 /usr/bin/python3.6 /usr/bin/
392482 /usr/bin/python3.6 /usr/bin/
392483 /usr/bin/python3.6 /usr/bin/
392484 /usr/bin/python3.6 /usr/bin/
----------
designate/0 unit log:
2019-05-09 00:20:38 DEBUG openstack-upgrade Setting up designate-producer (1:7.0.
2019-05-09 00:20:38 DEBUG openstack-upgrade Created symlink /etc/systemd/
# journalctl -u designate-producer
journalctl -u designate-producer | grep start
May 09 00:20:40 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:40 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:41 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:41 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:43 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:43 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:44 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:44 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:46 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:46 juju-eeda89-2-lxd-2 systemd[1]: designate-
May 09 00:20:46 juju-eeda89-2-lxd-2 systemd[1]: Failed to start OpenStack Designate DNSaaS producer.
designate/0 unit log (continued):
2019-05-09 00:20:58 INFO juju-log Purging ['designate-
2019-05-09 00:20:59 DEBUG openstack-upgrade Removing designate-
2019-05-09 00:21:15 DEBUG openstack-upgrade Removing designate-
2019-05-09 00:21:31 DEBUG openstack-upgrade Removing python-designate (1:7.0.
2019-05-09 00:21:31 DEBUG openstack-upgrade update-
-------------------
grep producer /var/log/
2019-05-09 00:18:43 DEBUG openstack-upgrade update-
2019-05-09 00:19:58 INFO juju-log Installing ['designate-agent', 'designate-api', 'designate-
2019-05-09 00:19:58 DEBUG openstack-upgrade designate-producer designate-worker python3-amqp python3-anyjson
2019-05-09 00:20:09 DEBUG openstack-upgrade Get:112 http://
2019-05-09 00:20:20 DEBUG openstack-upgrade Selecting previously unselected package designate-producer.
2019-05-09 00:20:20 DEBUG openstack-upgrade Preparing to unpack .../105-
2019-05-09 00:20:20 DEBUG openstack-upgrade Unpacking designate-producer (1:7.0.
2019-05-09 00:20:38 DEBUG openstack-upgrade Setting up designate-producer (1:7.0.
2019-05-09 00:20:38 DEBUG openstack-upgrade Created symlink /etc/systemd/
2019-05-09 00:21:31 DEBUG openstack-upgrade update-
Related branches
- Alvaro Uria: Approve on 2020-06-10
-
Diff: 101 lines (+46/-1)4 files modifiedhooks/cache-relation-changed (+1/-0)
hooks/memcached_hooks.py (+10/-1)
templates/memcached.conf (+4/-0)
unit_tests/test_memcached_hooks.py (+31/-0)
description: | updated |
David Ames (thedac) wrote : | #1 |
Changed in charm-designate: | |
status: | New → Triaged |
importance: | Undecided → Critical |
milestone: | none → 19.07 |
Changed in charm-designate: | |
assignee: | nobody → Liam Young (gnuoy) |
Liam Young (gnuoy) wrote : | #2 |
I think this is a packaging bug
Changed in charm-designate: | |
status: | Triaged → Invalid |
assignee: | Liam Young (gnuoy) → nobody |
Corey Bryant (corey.bryant) wrote : | #3 |
Any tips on how to reproduce this? I just upgrade designate from queens to rocky and didn't hit it.
Dmitrii Shcherbakov (dmitriis) wrote : | #4 |
I'm going to do another upgrade attempt.
https:/
https:/
~/bundles/
Let's see if I can reproduce it again.
Dmitrii Shcherbakov (dmitriis) wrote : | #5 |
Managed to reproduce it (only on one unit though), this time with debug=true - looks like it is a py2 - py3 issue:
designate/0 active idle 2/lxd/2 10.232.46.209 9001/tcp Unit is ready
hacluster-
designate/1* blocked idle 3/lxd/1 10.232.46.208 9001/tcp Services not running that should be: designate-producer
hacluster-
/var/log/
https:/
2019-07-03 23:43:48.226 37919 DEBUG designate.service [-] Starting RPC server on topic 'producer' start /usr/lib/
2019-07-03 23:43:48.252 37919 DEBUG designate.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
2019-07-03 23:43:48.257 37919 ERROR oslo_service.
systemctl status designate-producer
● designate-
Loaded: loaded (/lib/systemd/
Active: inactive (dead) since Wed 2019-07-03 23:43:49 UTC; 16h ago
Main PID: 37919 (code=exited, status=0/SUCCESS)
Dmitrii Shcherbakov (dmitriis) wrote : | #6 |
Attached logs from a healthy unit as well.
Corey Bryant (corey.bryant) wrote : | #7 |
The traceback looks like an upstream designate issue with py3 support:
[-] Error starting thread.: TypeError: '<' not supported between instances of 'str' and 'bytes'
Traceback (most recent call last):
File "/usr/lib/
service.start()
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
members = sorted(
TypeError: '<' not supported between instances of 'str' and 'bytes'
Corey Bryant (corey.bryant) wrote : | #8 |
I was able to reproduce this. It's specific to coordination backend code that is used for HA. For charms, the following bundle config was used (snipped for brevity to just show designate bits):
hacluster-
charm: cs:hacluster
options:
designate:
charm: cs:~openstack-
constraints: mem=1G
num_units: 2
options:
debug: true
vip: 10.5.20.1
designate-bind:
charm: cs:~openstack-
relations:
- [ designate, keystone ]
- [ designate, mysql ]
- [ designate, rabbitmq-server ]
- [ designate, designate-bind ]
- [ designate, memcached ]
- [ designate, hacluster-designate ]
- - designate:dnsaas
- neutron-
Changed in designate (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in charm-designate: | |
importance: | Critical → Undecided |
Corey Bryant (corey.bryant) wrote : | #9 |
This is reproducible in a python3 shell. The problem is that sorted() is called on a list that includes both bytes and str types:
$ python3
Python 3.7.4 (default, Jul 11 2019, 10:43:21)
[GCC 9.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> blist=[
>>> sorted(blist)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'bytes'
>>>
Corey Bryant (corey.bryant) wrote : | #10 |
Reproducing with designate, the problem appears after we have one designate unit that is at bionic-rocky (py3) and one that is at bionic-queens (py2). It is resolved once all units are at bionic-rocky. More details on that:
# upgrade one unit to bionic-rocky
juju config designate openstack-
juju run-action designate/0 openstack-upgrade
juju status
Unit Workload Agent Machine Public address Ports Message
designate/0 error idle 7 10.5.0.198 9001/tcp Services not running that should be: designate-producer
hacluster-
designate/1* active idle 8 10.5.0.6 9001/tcp Unit is ready
hacluster-
juju ssh designate/0
2019-07-25 18:04:27.452 24807 DEBUG designate.
where:
- machine 8 (b'juju-
- machine 7 ('juju-
Note: designate coordinator backend gets the juju strings by calling get_members() from the tooz memcached backend. /usr/lib/
# now try to upgrade the other unit to bionic-rocky
juju run-action designate/1 openstack-upgrade
# restart memcache and designate-producer on designate/0 and resolve
juju resolved designate/0
# all better
Unit Workload Agent Machine Public address Ports Message
designate/0 active idle 7 10.5.0.198 9001/tcp Unit is ready
hacluster-
designate/1* active idle 8 10.5.0.6 9001/tcp Unit is ready
hacluster-
juju ssh designate/0
2019-07-25 19:31:11.365 12217 DEBUG designate.
Both strings are str type now and able to be sorted.
Corey Bryant (corey.bryant) wrote : | #11 |
From /usr/lib/
from oslo_config import cfg
CONF = cfg.CONF
class CoordinationMix
def __init__(self, *args, **kwargs):
def start(self):
It looks like that might be defined in one of these places:
ubuntu@
__init__.py: cfg.StrOpt('host', default=
pool_manager/
Corey Bryant (corey.bryant) wrote : | #12 |
I have a fix coming for this. It is very similar to the example at: https:/
Fix proposed to branch: master
Review: https:/
Changed in designate: | |
assignee: | nobody → Corey Bryant (corey.bryant) |
status: | New → In Progress |
tags: | added: py3 |
Changed in charm-designate: | |
milestone: | 19.07 → none |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 556a27e4e9c9c4e
Author: Corey Bryant <email address hidden>
Date: Mon Jul 29 15:44:48 2019 -0400
Ensure coordination IDs are encoded
Ensure coordination IDs are encoded when working with coordination
backend. This fixes an issue when upgrading to Python 3 (where bytes
and str are different types) and _update_
sort types of 'str' and 'bytes', causing designate-producer to crash.
Change-Id: Id8206ee5285d3a
Closes-Bug: #1828534
Changed in designate: | |
status: | In Progress → Fix Released |
This issue was fixed in the openstack/designate 9.0.0.0rc1 release candidate.
Fix proposed to branch: stable/stein
Review: https:/
Fix proposed to branch: stable/rocky
Review: https:/
Changed in designate (Ubuntu Eoan): | |
status: | New → Fix Released |
importance: | Undecided → High |
Changed in designate (Ubuntu Disco): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in designate (Ubuntu): | |
status: | Triaged → Fix Committed |
status: | Fix Committed → Fix Released |
Changed in cloud-archive: | |
status: | New → Fix Released |
importance: | Undecided → High |
Corey Bryant (corey.bryant) wrote : | #18 |
I don't think we'll be fixing this in stable/rocky or stable/stein as it would break existing py3 deployments in a stable release. The trade off is that it would fix py2 deployments that upgrade to py3. For now I'm going to mark those releases as "won't fix".
Changed in designate (Ubuntu Disco): | |
status: | Triaged → Won't Fix |
Dincer Celik (osmanlicilegi) wrote : | #19 |
@Corey
Is http://
Corey Bryant (corey.bryant) wrote : | #20 |
@Dincer, that doesn't look the same
James Troup (elmo) wrote : | #21 |
Hi Corey, if we can't fix this, we need to at least document it as a known issue in the charm OpenStack upgrade documentation, surely?
Corey Bryant (corey.bryant) wrote : | #22 |
Yes let's update the charm deployment guide, specifically this section should call this upgrade issue out: https:/
Changed in charm-deployment-guide: | |
importance: | Undecided → High |
status: | New → Triaged |
Corey Bryant (corey.bryant) wrote : | #23 |
Also we could consider cherry-picking the fixes back to rocky and stein packages. They've just not landed upstream. It would fix py2->py3 upgrades, which this bug reported. But I believe the fix would break py3->py3 upgrades. The charm deploys py3 as of rocky. The package was py2 by default in rocky (with py3 available) and py3 by default in stein.
This issue can be resolved by restarting the memcached service that designate is using for coordination. After this, the designate-producer service seems to run normally.
Drew Freiberger (afreiberger) wrote : | #25 |
adding project charm-memcached per @chris.macnaughton comment #24.
That charm could grow a hook for cache-relation-
Corey Bryant (corey.bryant) wrote : | #26 |
Chris' doc update has been merged: https:/
Changed in charm-deployment-guide: | |
status: | Triaged → Fix Committed |
Changed in charm-designate: | |
assignee: | nobody → Chris MacNaughton (chris.macnaughton) |
Changed in charm-memcached: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in charm-memcached: | |
milestone: | none → 20.08 |
status: | Triaged → Fix Committed |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 5c929b3ba950257
Author: Chris MacNaughton <email address hidden>
Date: Wed Jun 3 14:02:57 2020 +0200
Request a restart of memcached after a service upgrade after queens
Change-Id: I54b235de947e63
Closes-Bug: #1828534
Changed in charm-designate: | |
status: | In Progress → Fix Committed |
Andrea Ieri (aieri) wrote : | #28 |
released as cs:~llama-
Changed in charm-memcached: | |
status: | Fix Committed → Fix Released |
Fix proposed to branch: stable/20.05
Review: https:/
Changed in charm-designate: | |
milestone: | none → 20.08 |
Changed in charm-designate: | |
status: | Fix Committed → Fix Released |
Changed in charm-deployment-guide: | |
status: | Fix Committed → Fix Released |
TRIAGE:
Seems we have a race such that designate-producer is not ready during the upgrade.
Guarantee designate-producer has everything it needs or is restarted at the end of the upgrade process.