Deployment failed retrieving domain id

Bug #1729231 reported by Liam Young
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Designate Charm
In Progress
High
Liam Young

Bug Description

HA xenial/mitaka deployment failed on a subprocess call:

2017-10-31 19:52:34 DEBUG identity-service-relation-changed subprocess.CalledProcessError: Command '['reactive/designate_utils.py', 'domain-get', '--domain-name', 'mojo.serverstack.com.']' returned non-zero exit status 1
2017-10-31 19:52:34 ERROR juju.worker.uniter.operation runhook.go:107 hook "identity-service-relation-changed" failed: exit status 1

The corresponding message in the designate-api logs:

Oct 31 19:52:33 juju-f9e7e2-designate-9 designate-api[25593]: 2017-10-31 19:52:33.802 25593 ERROR designate.api.middleware MessagingTimeout: Timed out waiting for a reply to message ID cb0b0db5fc114430b253d
Oct 31 19:52:33 juju-f9e7e2-designate-9 designate-api[25593]: 2017-10-31 19:52:33.802 25593 ERROR designate.api.middleware

This looks like a race as the proceeding call ensures the api is up and running, so something could have gone done between the api check and the domain-get. The only other thing that stands out is this message from the designate-central service which looks like it has hung:

journalctl -u designate-central.service
Oct 31 17:51:16 juju-f9e7e2-designate-9 designate-central[13381]: 2017-10-31 17:51:16.118 13381 WARNING oslo_messaging.server [-] Possible hang: stop is waiting for start to complete
Oct 31 19:43:22 juju-f9e7e2-designate-9 systemd[1]: Stopping OpenStack Designate DNSaaS central...

http://lists.openstack.org/pipermail/openstack-dev/2017-August/120386.html

but it looks like retrieving the domain id does not rely on the central service

The rabbit units appear to have been up and functioning fine when the error occurred.

Revision history for this message
Liam Young (gnuoy) wrote :

Another failure on HA xenial/mitaka deployment. This time looks like an issue talking to keystone:

2017-11-22 12:09:09 INFO juju-log identity-service:80: Retrying 'ensure_api_responding' 1 more times (delay=50)
2017-11-22 12:09:59 WARNING juju-log identity-service:80: Checking API service is responding
2017-11-22 12:10:07 DEBUG identity-service-relation-changed Traceback (most recent call last):
2017-11-22 12:10:07 DEBUG identity-service-relation-changed File "reactive/designate_utils.py", line 169, in <module>
2017-11-22 12:10:07 DEBUG identity-service-relation-changed commands[args.command]()
2017-11-22 12:10:07 DEBUG identity-service-relation-changed File "reactive/designate_utils.py", line 137, in display_servers
2017-11-22 12:10:07 DEBUG identity-service-relation-changed for server in get_servers():
2017-11-22 12:10:07 DEBUG identity-service-relation-changed File "reactive/designate_utils.py", line 121, in get_servers
2017-11-22 12:10:07 DEBUG identity-service-relation-changed out, err = run_command(cmd)
2017-11-22 12:10:07 DEBUG identity-service-relation-changed File "reactive/designate_utils.py", line 34, in run_command
2017-11-22 12:10:07 DEBUG identity-service-relation-changed cmd, p.returncode, out, err))
2017-11-22 12:10:07 DEBUG identity-service-relation-changed RuntimeError: ['designate', 'server-list', '-f', 'value'] failed, status code 1 stdout b'' stderr b'/usr/lib/python2.7/dist-packages/designateclient/cli/base.py:38: DeprecationWarning: The "designate" CLI is being deprecated in favour of the "openstack" CLI plugin. All designate API v2 commands are implemented there. When the v1 API is removed this CLI will stop functioning\n DeprecationWarning)\nERROR: Unable to establish connection to http://10.5.100.2:35357/v2.0/tokens: HTTPConnectionPool(host=\'10.5.100.2\', port=35357): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError(\'<requests.packages.urllib3.connection.HTTPConnection object at 0x7fbf7eae8d90>: Failed to establish a new connection: [Errno 113] No route to host\',))\n'
<snip>Command '['reactive/designate_utils.py', 'server-list']' returned non-zero exit status 1
2017-11-22 12:10:07 ERROR juju.worker.uniter.operation runhook.go:107 hook "identity-service-relation-changed" failed: exit status 1

Liam Young (gnuoy)
Changed in charm-designate:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Liam Young (gnuoy)
milestone: none → 17.11
Changed in charm-designate:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-designate (master)

Reviewed: https://review.openstack.org/523379
Committed: https://git.openstack.org/cgit/openstack/charm-designate/commit/?id=9b7a81664a28383db839f993027498c20f307808
Submitter: Zuul
Branch: master

commit 9b7a81664a28383db839f993027498c20f307808
Author: Liam Young <email address hidden>
Date: Tue Nov 28 10:17:35 2017 +0000

    Extend retry times and decorated funcs for retry

    The designate charm uses the designate client to create server and
    domain objects during unit setup. This relies on the local designate
    service and keystone being up and responding. This has proved racey
    and this change extends the functions which have the retry decorator
    and extends the retry count.

    Partial-Bug: #1729231

    Change-Id: I35162817afe3ff770f5a8781a29f3be1642687ba

James Page (james-page)
Changed in charm-designate:
milestone: 17.11 → 18.02
Ryan Beisner (1chb1n)
Changed in charm-designate:
milestone: 18.02 → 18.05
David Ames (thedac)
Changed in charm-designate:
milestone: 18.05 → 18.08
James Page (james-page)
Changed in charm-designate:
milestone: 18.08 → 18.11
David Ames (thedac)
Changed in charm-designate:
milestone: 18.11 → 19.04
David Ames (thedac)
Changed in charm-designate:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-designate:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-designate:
milestone: 19.10 → 20.01
James Page (james-page)
Changed in charm-designate:
milestone: 20.01 → 20.05
David Ames (thedac)
Changed in charm-designate:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-designate:
milestone: 20.08 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.