Deployment of Keystone cluster often fails due to haproxy service not listening on ports that it should

Bug #1648396 reported by Frode Nordahl
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Keystone Charm
Fix Released
Medium
Frode Nordahl
keystone (Juju Charms Collection)
Invalid
Medium
Frode Nordahl

Bug Description

$ juju status --format=tabular
keystone/0 active idle 1.25.8 188 5000/tcp 10.5.14.180 Unit is ready
keystone/1 blocked idle 1.25.8 189 5000/tcp 10.5.14.181 Services not running that should be: haproxy; Ports which should be open, but are not: 5000, 35357
keystone/2 blocked idle 1.25.8 190 5000/tcp 10.5.14.182 Services not running that should be: haproxy; Ports which should be open, but are not: 5000, 35357

Log excerpt from working node /var/log/juju/unit-keystone-0.log:
2016-12-08 10:33:49 INFO juju-log shared-db:332: Wrote template /etc/haproxy/haproxy.cfg.
2016-12-08 10:33:54 INFO shared-db-relation-changed haproxy is running.
2016-12-08 10:36:32 INFO juju-log Registered config file: /etc/haproxy/haproxy.cfg

Log excerpt from non-working node /var/log/juju/unit-keystone-1.log:
2016-12-08 10:34:02 INFO juju-log Wrote template /etc/haproxy/haproxy.cfg.
2016-12-08 10:34:02 INFO leader-settings-changed haproxy not running.

On both nodes there are various log messages with this error:
error: no relation id specified

This might be related to haproxy steps (install, leader-settings-changed, config-changed, start, update-status)

Tags: openstack sts
Revision history for this message
Frode Nordahl (fnordahl) wrote :

No log evidence as of why haproxy does not start.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

In simple one-shot trial run we have the following stats for observation:
trusty-icehouse Observed
trusty-kilo Observed
trusty-liberty Not observed
trusty-mitaka Not observed
xenial-mitaka Not observed
xenial-newton Not observed

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Second run of trials show same observations.

dpkg used for trusty-icehouse: haproxy 1.4.24-2ubuntu0.4 amd64
dpkg used for trusty-kilo: haproxy 1.4.24-2ubuntu0.4 amd64
dpkg used for trusty-liberty: haproxy 1.5.14-1ubuntu0.15.10.1~cloud0 amd64

I have a working theory that the root cause for this issue comes from the haproxy package in trusty. For Ubuntu OpenStack versions >= trusty-liberty we include haproxy in the Ubuntu Cloud Archive.

Frode Nordahl (fnordahl)
Changed in keystone (Juju Charms Collection):
assignee: nobody → Frode Nordahl (fnordahl)
Frode Nordahl (fnordahl)
tags: added: openstack sts
Revision history for this message
Frode Nordahl (fnordahl) wrote :
Revision history for this message
Frode Nordahl (fnordahl) wrote :
Revision history for this message
Frode Nordahl (fnordahl) wrote :
Revision history for this message
James Page (james-page) wrote :

Frode - looks like you might have a smoking gun with the older haproxy version.

Changed in keystone (Juju Charms Collection):
importance: Undecided → Medium
Revision history for this message
James Page (james-page) wrote :

Hmm although one of those latest test failures is xenial mitaka - where it looks like the haproxy.cfg was updated, but haproxy was not actually restarted.

Revision history for this message
James Page (james-page) wrote :

Message is:

message: 'Ports which should be open, but are not: 5000, 35357'

haproxy is running, but not listening on either port.

Frode Nordahl (fnordahl)
summary: Deployment of Keystone cluster often fails due to haproxy service not
- being started
+ listening on ports that it should
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote : Re: [Bug 1648396] Re: Deployment of Keystone cluster often fails due to haproxy service not being started

I've seen this and then it clears on the next update-status hook. Maybe the
internal status update code is not being executed for the relevant hook?

On 5 Jan 2017 10:31 am, "James Page" <email address hidden> wrote:

> Message is:
>
> message: 'Ports which should be open, but are not: 5000, 35357'
>
> haproxy is running, but not listening on either port.
>
> --
> You received this bug notification because you are a member of OpenStack
> Charmers, which is subscribed to keystone in Juju Charms Collection.
> https://bugs.launchpad.net/bugs/1648396
>
> Title:
> Deployment of Keystone cluster often fails due to haproxy service not
> being started
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charms/+source/keystone/+bug/
> 1648396/+subscriptions
>

Revision history for this message
Frode Nordahl (fnordahl) wrote :

I indeed have a sample of haproxy not being restarted even though the configuration is changed:

$ ps axu | grep haproxy
root 19997 0.0 0.0 4228 652 ? Ss 13:49 0:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy 20001 0.0 0.3 37692 5000 ? S 13:49 0:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
haproxy 20003 0.0 0.0 37692 936 ? Ss 13:49 0:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

$ ls -l /etc/haproxy/haproxy.cfg
-rw-r--r-- 1 root root 1224 Jan 11 13:59 haproxy.cfg

As you say, after some time it solves itself, but not always.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Keystone full amulet tests are failing on Trusty-Mitaka, where one of the 3 keystone units has no haproxy service running and is not listening on 5000.

keystone/7 blocked idle 1.25.10 18 5000/tcp 172.17.106.63 Ports which should be open, but are not: 5000, 35357

Revision history for this message
Ryan Beisner (1chb1n) wrote :
James Page (james-page)
Changed in charm-keystone:
assignee: nobody → Frode Nordahl (fnordahl)
importance: Undecided → Medium
Changed in keystone (Juju Charms Collection):
status: New → Invalid
Revision history for this message
Frode Nordahl (fnordahl) wrote :

2017-02-27 15:05:00 INFO cluster-relation-changed Adding user juju_keystone to group juju_keystone
2017-02-27 15:05:00 INFO cluster-relation-changed # 10.5.26.147 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8
2017-02-27 15:05:00 INFO cluster-relation-changed # 10.5.26.147 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8
2017-02-27 15:05:02 INFO cluster-relation-changed * Stopping haproxy haproxy
2017-02-27 15:05:02 INFO cluster-relation-changed ...done.
2017-02-27 15:05:02 INFO cluster-relation-changed * Starting haproxy haproxy
2017-02-27 15:05:02 INFO cluster-relation-changed Available polling systems :
2017-02-27 15:05:02 INFO cluster-relation-changed sepoll : pref=400, test result OK
2017-02-27 15:05:02 INFO cluster-relation-changed epoll : pref=300, test result OK
2017-02-27 15:05:02 INFO cluster-relation-changed poll : pref=200, test result OK
2017-02-27 15:05:02 INFO cluster-relation-changed select : pref=150, test result FAILED
2017-02-27 15:05:02 INFO cluster-relation-changed Total: 4 (3 usable), will use sepoll.
2017-02-27 15:05:02 INFO cluster-relation-changed [ALERT] 057/150502 (14281) : Starting frontend tcp-in_admin-port: cannot bind socket
2017-02-27 15:05:02 INFO cluster-relation-changed [ALERT] 057/150502 (14281) : Starting frontend tcp-in_public-port: cannot bind socket
2017-02-27 15:05:02 INFO cluster-relation-changed Using sepoll() as the polling mechanism.
2017-02-27 15:05:02 INFO cluster-relation-changed ...fail!

Revision history for this message
Frode Nordahl (fnordahl) wrote :

There is a few issues with haproxy logging adding to the complexity of finding the root cause of this bug.

1) The rsyslog daemon is not restarted after configuration is injected in /etc/rsyslog.d by the the haproxy package. This leads to /var/lib/haproxy/dev/log socket not being created.

2) The charm-helpers haproxy.cfg template configures haproxy to log to 127.0.0.1 UDP/514, but rsyslog is by default not configured to receive on UDP/514

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/438617

Changed in charm-keystone:
status: New → In Progress
Revision history for this message
Frode Nordahl (fnordahl) wrote :

For the record, updates for findings in comment #17:
1) Proposed fix for rsyslog here:
  https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1668639
2) Proposed fix for charm-helpers here:
  https://code.launchpad.net/~fnordahl/charm-helpers/haproxy-logging/+merge/318476

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (master)

Reviewed: https://review.openstack.org/438617
Committed: https://git.openstack.org/cgit/openstack/charm-keystone/commit/?id=01816c846616ceb73717aff48e53da98cb661ce0
Submitter: Jenkins
Branch: master

commit 01816c846616ceb73717aff48e53da98cb661ce0
Author: Frode Nordahl <email address hidden>
Date: Mon Feb 27 17:33:04 2017 +0100

    Make sure haproxy runs post-install

    Unconfigured keystone service listening on ports destined for haproxy
    race with start of haproxy service.

    Change-Id: I9f601344e72bd67738429f82151f9683f5ecf8e4
    Closes-Bug: #1648396

Changed in charm-keystone:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-keystone:
milestone: none → 17.08
James Page (james-page)
Changed in charm-keystone:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.