[SRU] Apply Bash 4.4.20 to fix cpu spinning on built-in wait

Bug #1822776 reported by halfgaar on 2019-04-02
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bash (Ubuntu)
High
Bryce Harrington
Bionic
High
Bryce Harrington
Cosmic
High
Bryce Harrington

Bug Description

[Impact]

Long running bash loops that create and reap processes will crash, hanging at 100% CPU.

[Test Case]

A PPA with the proposed fix included is at:

  https://launchpad.net/~bryce/+archive/ubuntu/bash-sru-19-010-1

Install the PPA with the fix via:

  sudo add-apt-repository ppa:bryce/bash-sru-19-010-1
  sudo apt-get update
  sudo apt-get install bash

Run this loop for a few days/weeks:

  #!/bin/bash
  while true; do
    sleep 0.5 &
    wait
  done

Reproducer script: https://bugs.launchpad.net/ubuntu/+source/bash/+bug/1822776/+attachment/5275112/+files/bash-crash-test.sh

It will eventually cause the 'wait' statement to hang, consuming 100% after some indeterminate amount of time, dependent on how fast PIDs are cycled in the machine.

The Bash bug report mentions longer running loops, but it seems hash collisions are the cause, meaning it's just a matter of chance, influenced by how fast PIDs are cycled on the machine.

[Regression Potential]

The fix has been reviewed and accepted upstream. The patch adds a test at time of pid determination for if the pid is already in use and if so, skip it and pick a different one. This does change behavior slightly in that different pid numbers will be generated in rare cases, but nothing should depend on how pids are generated, as the behavior is not specified to be anything but random.

The patch adds a new warning message, "bgp_delete: LOOP: psi (%d) == storage[psi].bucket_next", but this only shows when the original bug would have been triggered.

Using 'apt-get source bash' to get the original source version, I created a deb that includes the 4.4.20 patch and have been running it since April 2nd. The 100% CPU spinning is solved, and no other regressions have been observed.

Ubuntu 18.04 is already at 4.4.19, which is one patch level behind, so this involves linearly progressing to the next version (so not skipping patches).

[Fix]

Official patch to fix, and to bump to 4.4.20:

http://ftp.gnu.org/gnu/bash/bash-4.4-patches/bash44-020

The newest Ubuntu tar.xz with patches I could find at:

http://archive.ubuntu.com/ubuntu/pool/main/b/bash/

also didn't have the 4.4.20 patch, so it seems no Ubuntu release has the fix yet.

Although not completely sure, this problem seems to have been introduced in the 4.4 version of Bash, so in term of LTS versions, 18.04 and up are affected.

[Original Report]
Bash pre-4.4.20 has a bug in its PID hash table that causes spin-loops when spawning sub processes and waiting for them. There is a fix:

https://ftp.gnu.org/gnu/bash/bash-4.4-patches/bash44-020

Our application started being affected (locking up) by this since migrating from Ubuntu 14.04 to 18.04. Ubuntu 14.04 has bash 4.3.11(1), Ubuntu 18.04 has bash 4.4.19 (that is, when running 'bash --version', because of their unusual versions as patches, apt shows it as 4.4.18-2ubuntu1).

The 4.4-020 version needs to be included. I think it's actually quite critical.

A justification for including the fix would be that a standard language feature in a script language is broken, and that it's indeterminate when it breaks. Considering the wide spread use of bash, I'm surprised not more people have reported issues. My and a client started having issues with independently of each other very soon after upgrading to an affected version.

Related branches

halfgaar (wiebe-halfgaar) wrote :

I see this bug hasn't gotten attention, but isn't this quite critical?

Robie Basak (racb) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

Please see https://wiki.ubuntu.com/StableReleaseUpdates for details of Ubuntu's stable release update process. Based on your report it sounds like this patch would qualify, but please could you update us on the status of the bug in newer releases of Ubuntu so we can note where work is needed?

I'm not sure how to prioritize this bug. Under what circumstances are users affected by this problem? Is this problem likely to affect a large number of Ubuntu users?

Either way, if it's a bug with a straightforward patch then you're welcome to follow https://wiki.ubuntu.com/StableReleaseUpdates#Procedure, and if you can provide us with patches ready for inclusion in Ubuntu under that procedure (and it turns out to qualify under our policy), I'd be happy to help you get the fix landed.

halfgaar (wiebe-halfgaar) wrote :

I added more info (edited the original description) hoping it covers everything the SRU requires.

description: updated
Changed in bash (Ubuntu):
importance: Undecided → High
tags: added: rls-bb-incoming
Robie Basak (racb) on 2019-05-29
tags: added: bitesize server-next
Changed in bash (Ubuntu):
assignee: nobody → Bryce Harrington (bryce)
Bryce Harrington (bryce) on 2019-05-29
Changed in bash (Ubuntu Bionic):
importance: Undecided → High
assignee: nobody → Bryce Harrington (bryce)
Bryce Harrington (bryce) on 2019-05-29
description: updated
Bryce Harrington (bryce) wrote :

Hi halfgaar,

Robie asked me to help you with this bug, thanks for reporting it. I may not be able to get full attention on this until next week due to a project deadline, but I've had a quick look at the patch and your problem description, and it looks pretty straightforward. Thanks also for the test case, I'll run it and see if I can repro the bug myself.

It looks like both bionic and cosmic are running 4.4.18-x, so I'm gathering cosmic will need the fix as well. disco and eoan have moved to bash 5.0, and I've verified the upstream source code includes the fix, so no changes are needed for those distro releases.

halfgaar (wiebe-halfgaar) wrote :

Thanks; it's not about getting it done yesterday so to speak, so I'll patiently await next week.

I suspect you'll be able to reproduce it faster if you lower sysctl kernel.pid_max. With 32k PIDs, one of my apps spawning about a process every two seconds, needed about a week of running to hit it.

Bryce Harrington (bryce) on 2019-05-30
Changed in bash (Ubuntu Cosmic):
importance: Undecided → High
assignee: nobody → Bryce Harrington (bryce)
Changed in bash (Ubuntu):
status: New → Fix Released
tags: removed: rls-bb-incoming
Bryce Harrington (bryce) on 2019-06-07
Changed in bash (Ubuntu Bionic):
status: New → In Progress
Changed in bash (Ubuntu Cosmic):
status: New → In Progress
Bryce Harrington (bryce) on 2019-06-07
description: updated
Bryce Harrington (bryce) wrote :

Fixes have been pushed to cosmic-proposed and bionic-proposed.

summary: - Apply Bash 4.4.20 to fix cpu spinning on built-in wait
+ [SRU] Apply Bash 4.4.20 to fix cpu spinning on built-in wait

Hello halfgaar, or anyone else affected,

Accepted bash into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/bash/4.4.18-2ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in bash (Ubuntu Cosmic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Changed in bash (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Brian Murray (brian-murray) wrote :

Hello halfgaar, or anyone else affected,

Accepted bash into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/bash/4.4.18-2ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

halfgaar (wiebe-halfgaar) wrote :

I installed bash from 'proposed' on a test server, and am running some tests on it, and parallel also on normal 18.04 servers. I'll expand this soon, and report back with the results.

Bryce Harrington (bryce) wrote :

Hi halfgaar, just checking in on how the testing has been coming along?

halfgaar (wiebe-halfgaar) wrote :

Sorry for the delay. I had to write a bash script that increased the hit-chance of the bug. I now have 13 spinning bash on a non-updated test server. The one with bash 4.20 is working fine.

Bash is also working without issues.

The new script is attached.

So, all good.

Bryce Harrington (bryce) wrote :

Thanks for testing so thoroughly! I've gone ahead and set the verification tags to done, to allow this to now go out. I've also added your script to the Test Case, for future reference.

description: updated
tags: added: verification-done verification-done-bionic verification-done-cosmic
removed: verification-needed verification-needed-bionic verification-needed-cosmic
Łukasz Zemczak (sil2100) wrote :

It's not clear to me if cosmic has been tested as part of the validation. Please make sure that's the case and then adjust the tag accordingly.
That being said, since cosmic is going EOL soon, I will not block the release of bionic on the lack of cosmic validation.

tags: added: verification-needed verification-needed-cosmic
removed: verification-done verification-done-cosmic
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package bash - 4.4.18-2ubuntu1.2

---------------
bash (4.4.18-2ubuntu1.2) bionic; urgency=medium

  * d/p/bash44-020.diff: Add fix for hang on 'wait' statement
    (LP: #1822776)

 -- Bryce Harrington <email address hidden> Thu, 06 Jun 2019 15:28:15 -0700

Changed in bash (Ubuntu Bionic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for bash has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers