haproxy in bionic can get stuck
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| haproxy (Ubuntu) |
High
|
Unassigned | ||
| Bionic |
High
|
Unassigned |
Bug Description
[Impact]
* The master process will exit with the status of the last worker.
When the worker is killed with SIGTERM, it is expected to get 143 as an
exit status. Therefore, we consider this exit status as normal from a
systemd point of view. If it happens when not stopping, the systemd
unit is configured to always restart, so it has no adverse effect.
* Backport upstream fix - adding another accepted RC to the systemd
service
[Test Case]
* You want to install haproxy and have it running. Then sigterm it a lot.
With the fix it would restart the service all the time, well except
restart limit. But in the bad case it will just stay down and didn't
even try to restart it.
$ apt install haproxy
$ for x in {1..100}; do pkill -TERM -x haproxy ; sleep 0.1 ; done
$ systemctl status haproxy
The above is a hacky way to trigger some A/B behavior on the fix.
It isn't perfect as systemd restart counters will kick in and you
essentially check a secondary symptom.
I'd recommend to in addition run the following:
$ apt install haproxy
$ for x in {1..1000}; do pkill -TERM -x haproxy ; sleep 0.001 systemctl
reset-failed haproxy.service; done
$ systemctl status haproxy
You can do so with even smaller sleeps, that should keep the service up
and running (this isn't changing with the fix, but should work with the new code).
[Regression Potential]
* This eventually is a conffile modification, so if there are other
modifications done by the user they will get a prompt. But that isn't a
regression. I checked the code and I can't think of another RC=143 that
would due to that "no more" detected as error. I really think other
than the update itself triggering a restart (as usual for services)
there is no further regression potential to this.
[Other Info]
* Fix already active in IS hosted cloud without issues since a while
* Also reports (comment #5) show that others use this in production as
well
---
On a Bionic/Stein cloud, after a network partition, we saw several units (glance, swift-proxy and cinder) fail to start haproxy, like so:
root@juju-
● haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/
Active: failed (Result: exit-code) since Sun 2019-10-20 00:23:18 UTC; 1h 35min ago
Docs: man:haproxy(1)
Process: 2002655 ExecStart=
Process: 2002649 ExecStartPre=
Main PID: 2002655 (code=exited, status=143)
Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Starting HAProxy Load Balancer...
Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Started HAProxy Load Balancer.
Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopping HAProxy Load Balancer...
Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 (2002655) : Exiting Master process...
Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [ALERT] 292/001652 (2002655) : Current worker 2002661 exited with code 143
Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 (2002655) : All workers exited. Exiting... (143)
Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Main process exited, code=exited, status=143/n/a
Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Failed with result 'exit-code'.
Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopped HAProxy Load Balancer.
root@juju-
The Debian maintainer came up with the following patch for this:
https://<email address hidden>
Which was added to the 1.8.10-1 Debian upload and merged into upstream 1.8.13.
Unfortunately Bionic is on 1.8.8-1ubuntu0.4 and doesn't have this patch.
Please consider pulling this patch into an SRU for Bionic.
Related branches
- Bryce Harrington: Needs Fixing on 2019-11-16
- Canonical Server Team: Pending requested 2019-11-12
- Canonical Server packageset reviewers: Pending requested 2019-11-12
-
Diff: 91 lines (+69/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/lp-1848902-MINOR-systemd-consider-exit-status-143-as-successful.patch (+61/-0)
debian/patches/series (+1/-0)
Paul Collins (pjdc) wrote : | #2 |
It looks like this is the patch that was merged: https://<email address hidden>
2018/07/30 : 1.8.13
- MINOR: systemd: consider exit status 143 as successful
I've built a package with this patch and deployed it to the problematic cloud.
Here's the PPA: https:/
Paul Collins (pjdc) wrote : | #3 |
This seems to fix our haproxy problem, although the HA stack still needs help to recover from the situation. But at least haproxy isn't getting in the way anymore.
Paul Collins (pjdc) wrote : | #4 |
To reproduce:
apt install haproxy
for x in {1..100}; do pkill -TERM -x haproxy ; sleep 0.1 ; done
systemctl status haproxy
Observe that the unit has failed due to exit-code: "Active: failed (Result: exit-code)"
To test the fix:
Repeat the steps above with the fixed package installed.
Observe that although the unit may still have failed, it is not due to exit-code, e.g.: "Active: failed (Result: start-limit-hit)".
If you perform upgrade testing, note that simply installing the fixed package seems to revive haproxy without needing systemctl reset-failed haproxy. Confirm this by checking systemctl status haproxy for "Active: active (running)" after the upgrade.
tags: | added: server-next |
Simon Déziel (sdeziel) wrote : | #5 |
In our HAProxy deployments, we always add the following drop-in on Bionic:
[Service]
# XXX: returns 143 when SIGTERM'ed by systemd
SuccessExit
It would be great to have the default unit fixed, so thanks!
Changed in haproxy (Ubuntu): | |
importance: | Undecided → High |
status: | Confirmed → Triaged |
Christian Ehrhardt (paelzer) wrote : | #6 |
This was also backported to the 2.x and 1.8 series.
According to git this is included in:
1.8.13
fdc6c62dbebf4b6
Anything >=1.9
3b479bd5f5f50ce
Thereby all >Bionic are fixed already.
Consider pulling the fix into the Bionic version.
Changed in haproxy (Ubuntu Bionic): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in haproxy (Ubuntu): | |
status: | Triaged → Fix Released |
Christian Ehrhardt (paelzer) wrote : | #7 |
Note it was not ported to the 1.6.x series and I'd also want to keep Xenial out of this in general.
Christian Ehrhardt (paelzer) wrote : | #9 |
Confirmed the test case, waiting for MP review now
Hello James, or anyone else affected,
Accepted haproxy into bionic-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-
Further information regarding the verification process can be found at https:/
N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.
Changed in haproxy (Ubuntu Bionic): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed verification-needed-bionic |
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (haproxy/1.8.8-1ubuntu0.8) | #12 |
All autopkgtests for the newly accepted haproxy (1.8.8-1ubuntu0.8) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:
haproxy/
Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUp
https:/
[1] https:/
Thank you!
Simon Déziel (sdeziel) wrote : | #13 |
Verified to be working on Bionic using the provided test case and another simpler one (simply stopping haproxy resulted in the error/143 status).
Preparing to unpack .../haproxy_
Unpacking haproxy (1.8.8-1ubuntu0.8) over (1.8.8-1ubuntu0.7) ...
Setting up haproxy (1.8.8-1ubuntu0.8) ...
Processing triggers for systemd (237-3ubuntu10.33) ...
Processing triggers for rsyslog (8.32.0-1ubuntu4) ...
root@hap1:~# for x in {1..1000}; do pkill -TERM -x haproxy ; sleep 0.001; systemctl reset-failed haproxy.service; done
root@hap1:~# systemctl status haproxy
● haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/
Active: active (running) since Thu 2019-11-28 15:20:01 UTC; 1s ago
Docs: man:haproxy(1)
Process: 8076 ExecStartPre=
Main PID: 8077 (haproxy)
Tasks: 2 (limit: 4915)
CGroup: /system.
├─8077 /usr/sbin/haproxy -Ws -f /etc/haproxy/
└─8078 /usr/sbin/haproxy -Ws -f /etc/haproxy/
Nov 28 15:20:01 hap1 systemd[1]: Starting HAProxy Load Balancer...
Nov 28 15:20:01 hap1 systemd[1]: Started HAProxy Load Balancer.
tags: |
added: verification-done verification-done-bionic removed: verification-needed verification-needed-bionic |
The verification of the Stable Release Update for haproxy has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Launchpad Janitor (janitor) wrote : | #15 |
This bug was fixed in the package haproxy - 1.8.8-1ubuntu0.8
---------------
haproxy (1.8.8-1ubuntu0.8) bionic; urgency=medium
* d/p/lp-
fix potential hang in haproxy (LP: #1848902)
-- Christian Ehrhardt <email address hidden> Tue, 12 Nov 2019 13:16:22 +0100
Changed in haproxy (Ubuntu Bionic): | |
status: | Fix Committed → Fix Released |
Status changed to 'Confirmed' because the bug affects multiple users.