haproxy unable to start; None referenced in haproxy.conf

Bug #1849901 reported by Ryan Farrell
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned
OpenStack Octavia Charm
Fix Released
Medium
James Page

Bug Description

A server was taken down for maintenance. It's old units were removed from the juju model. Then re-added when the host was back in commission. Re-adding octavia has encountered inability to start on one of the two original octavia units.

octavia/2 /etc/haproxy/haproxy.cfg:

backend octavia-api_admin_10.246.65.140
  balance leastconn
  server octavia-2 10.246.65.140:9866 check
  server octavia-0 10.246.65.169:9866 check
  server octavia-1 None:9866 check
  server octavia-3 10.246.65.110:9866 check

The config references an old unit; octavia/1. We can see massive amounts of "could not resolve address 'None'." in /var/log/haproxy.log:

#/var/log/haproxy.log
Oct 25 20:23:15 juju-27a90a-26-lxd-5 haproxy[1135775]: [ALERT] 297/202302 (1135775) : Failed to initialize server(s) addr.
Oct 25 20:27:28 juju-27a90a-26-lxd-5 haproxy[1136425]: [ALERT] 297/202716 (1136425) : parsing [/etc/haproxy/haproxy.cfg:52] : 'server octavia-1' : could not resolve address 'None'
Oct 25 20:27:28 juju-27a90a-26-lxd-5 haproxy[1136425]: [ALERT] 297/202716 (1136425) : Failed to initialize server(s) addr.
Oct 25 20:30:42 juju-27a90a-26-lxd-5 haproxy[1144795]: [ALERT] 297/203030 (1144795) : parsing [/etc/haproxy/haproxy.cfg:52] : 'server octavia-1' : could not resolve address 'None'.
Oct 25 20:30:42 juju-27a90a-26-lxd-5 haproxy[1144795]: [ALERT] 297/203030 (1144795) : Failed to initialize server(s) addr.
Oct 25 20:33:56 juju-27a90a-26-lxd-5 haproxy[1149966]: [ALERT] 297/203343 (1149966) : parsing [/etc/haproxy/haproxy.cfg:52] : 'server octavia-1' : could not resolve address 'None'.
Oct 25 20:33:56 juju-27a90a-26-lxd-5 haproxy[1149966]: [ALERT] 297/203343 (1149966) : Failed to initialize server(s) addr.

The other two units have different values for backend octavia-api_admin in the haproxy.cfg file.

octavia/0 - just right
backend octavia-api_admin_10.246.65.169
    balance leastconn
    server octavia-0 10.246.65.169:9866 check
    server octavia-2 10.246.65.140:9866 check
    server octavia-3 10.246.65.110:9866 check

octavia/3 - too few
backend octavia-api_admin_10.246.65.110
    balance leastconn
    server octavia-3 10.246.65.110:9866 check
    server octavia-0 10.246.65.169:9866 check

Revision history for this message
Ryan Farrell (whereisrysmind) wrote :

Sosreports for the octavia/2 unit have been uploaded and reference SF ticket #246626.

tags: added: scaleback
Changed in charm-octavia:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Dorina Timbur (dorina-t) wrote :

Hello,
Is it possible to have an update on this issue,please? Our customer requires a status update.
Thank you.

Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
James Page (james-page) wrote :

I'm fairly sure this is related to bug 1858132

Revision history for this message
James Page (james-page) wrote :

Fixes related to the bug referenced above where included in the 20.05 charm release so its quite possible this issue has been resolved.

Revision history for this message
James Page (james-page) wrote :

Ryan - I looked at SF #246626 but it seemed unrelated to this bug report - please either provide the correct reference or attach the crashdump directly to this bug.

Changed in charm-octavia:
status: Triaged → Incomplete
Revision history for this message
James Page (james-page) wrote :

backend octavia-api_admin_10.246.65.140
  balance leastconn
  server octavia-2 10.246.65.140:9866 check
  server octavia-0 10.246.65.169:9866 check
  server octavia-1 None:9866 check
  server octavia-3 10.246.65.110:9866 check

looks odd - it would appear that octavia/1 is still in the related unit list, but is not presenting any data about addresses.

Revision history for this message
James Page (james-page) wrote :

Why that occurs - I don't know - but we can make the peer list build out more defensive:

https://review.opendev.org/741882

Changed in charm-octavia:
assignee: nobody → James Page (james-page)
status: Incomplete → In Progress
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

There was a similar bug for K8s charms:

https://github.com/johnsca/juju-relation-mysql/issues/5

Looking at the Juju code, entries in mongodb for relation settings are created earlier than the scope doc so the charms only know about relations for which there are already relation settings present.

https://github.com/juju/juju/blob/juju-2.7.8/apiserver/facades/agent/uniter/uniter.go#L1461-L1483 (ingress-address and private-address get placed into relation settings)
https://github.com/juju/juju/blob/juju-2.7.8/state/relationunit.go#L112-L150 (transaction ordering: settings go first then goes the scope doc)

However, looking more closely, inability to retrieve address information is not an error condition (merely a warning):

https://github.com/juju/juju/blob/juju-2.7.8/apiserver/facades/agent/uniter/uniter.go#L1476-L1479

There seem to be conditions in NetworksForRelation where an address may not be available:

https://github.com/juju/juju/blob/juju-2.7.8/apiserver/facades/agent/uniter/networkinfo.go#L281-L308

And these two Juju bugs are related to this:

https://bugs.launchpad.net/juju/+bug/1830252
https://bugs.launchpad.net/juju/+bug/1848628

I am going to add Juju to comment - if ingress-address is not guaranteed to be present in relation settings during the handling of <relname>-relation-joined then it would be best to document it for both IAAS and CAAS models. I couldn't find any reference to this behavior in the current docs.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-octavia (master)

Fix proposed to branch: master
Review: https://review.opendev.org/742413

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-octavia (master)

Change abandoned by James Page (<email address hidden>) on branch: master
Review: https://review.opendev.org/742413
Reason: Rebuild ahead of this one picked up the interface fixes.

Revision history for this message
James Page (james-page) wrote :

Another review landed ahead of mine picking up the required interface update in openstack-ha.

Changed in charm-octavia:
status: In Progress → Fix Committed
milestone: none → 20.08
Revision history for this message
Joseph Phillips (manadart) wrote :

If I read it correctly for IAAS models, we should always have an ingress-address at the point where a unit is running hooks.

The machine agent sets (machine) addresses when it starts, so before units land on a machine it will have a value for its preferred public and private addresses.

Unit addresses (requested in NetworksForRelation mentioned above) are derived from the machine they are running on.

Revision history for this message
Pen Gale (pengale) wrote :

Marked as invalid for Juju, because I don't think that we have concrete evidence that the bug is in Juju, and the charm has worked around it. Feel free to post more evidence and reopen if necessary :-)

Changed in juju:
status: New → Invalid
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

As manadart observed a few days, ago, for IAAS models, the ingress-address should always be present. Given that we are seeing IAAS models where ingress-address is not set, it seems to be an acknowledged bug in Juju, rather than invalid.

Revision history for this message
Pen Gale (pengale) wrote :

Re-opening as I have been convinced that this might be a Juju issue after all.

The next steps here are to try to reproduce it so that we can prove that it is a Juju bug and fix it.

Changed in juju:
status: Invalid → New
Revision history for this message
John A Meinel (jameinel) wrote :

According to this:
octavia/2 /etc/haproxy/haproxy.cfg:

backend octavia-api_admin_10.246.65.140
  balance leastconn
  server octavia-2 10.246.65.140:9866 check
  server octavia-0 10.246.65.169:9866 check
  server octavia-1 None:9866 check
  server octavia-3 10.246.65.110:9866 check

vs
octavia/0 - just right
backend octavia-api_admin_10.246.65.169
    balance leastconn
    server octavia-0 10.246.65.169:9866 check
    server octavia-2 10.246.65.140:9866 check
    server octavia-3 10.246.65.110:9866 check

And the original statement:
A server was taken down for maintenance. It's old units were removed from the juju model. Then re-added when the host was back in commission.

It sounds like octavia-1 was removed, and then when the machine was back up, we introduced octavia-3.

Which means the fact that octavia-2 is still thinking about octavia-1 is the problem, as octavia-1 definitely doesn't have an IP address anymore.

Now I don't know why octavia-3 doesn't know about octavia-2
octavia/3 - too few
backend octavia-api_admin_10.246.65.110
    balance leastconn
    server octavia-3 10.246.65.110:9866 check
    server octavia-0 10.246.65.169:9866 check

Potentially the fact that octavia-2 is still thinking about octavia-1 is causing a problem in the charm, causing it to not respond to relation-joined from octavia-3, so octavia-3 is not then seeing octavia-2. If that were happening, I'd definitely expect to see charms in error state and errors in the Juju unit logs.

Changed in charm-octavia:
status: Fix Committed → Fix Released
Pen Gale (pengale)
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.