ssh client spins if output fd closed

Bug #1986521 reported by Michael Rutter
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
portable OpenSSH
Unknown
Unknown
openssh (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Medium
Bryce Harrington

Bug Description

[Impact]
In certain edge cases where the terminal goes away while an ssh process is running, ssh can be left consuming 100% CPU. This increases processing costs for cloud users and wastes energy. While this is an uncommon error, googling indicates many people have run into it in several different ways. It seems important to get this fixed in stable releases.

This is a regression in jammy presumably due to change from select() to poll() (see OpennSSH 8.9 Release Announcement [1] ), fixed by upstream commit d6556de1db0822c76ba2745cf5c097d9472adf7c "upstream: fix poll() spin when a channel's output fd closes..." [2].

1: https://lwn.net/Articles/885886/
2. https://github.com/openssh/openssh-portable/commit/d6556de1db0822c76ba2745cf5c097d9472adf7c

[Test Case]
$ lxc launch ubuntu-daily:jammy ssh-cpu
$ lxc shell ssh-cpu

# passwd -d root
# ssh-keygen -t rsa -N '' -f /root/.ssh/id_rsa

# cat << EOF >>/etc/ssh/ssh_config
StrictHostKeyChecking accept-new
EOF

# sed -ri 's/^PasswordAuthentication/#PasswordAuthentication/' /etc/ssh/sshd_config
# cat << EOF >>/etc/ssh/sshd_config
PermitRootLogin yes
PubkeyAuthentication yes
PermitEmptyPasswords yes
PasswordAuthentication yes
ChallengeResponseAuthentication no
UsePAM no
EOF

# systemctl restart sshd

# ssh localhost 2> >({exec 1>&2})

You can shell into the container from a second terminal and use "htop"
to verify that ssh is using 100% of one of the CPU cores:

$ lxc shell ssh-cpu
# htop

This should show one CPU pegged at 100% due to the 'ssh localhost' process

Next, return to the first terminal, exit out of the sub-ssh session and
install the fix:

# logout
# add-apt-repository -yus ppa:bryce/openssh-sru-lp1986521
# apt-get full-upgrade -y

Now repeat the test in the first terminal window, while viewing htop in
the second terminal:

# ssh localhost 2> >({exec 1>&2})

[Where Problems Could Occur]

While the patch in question is well tested upstream, it has a relatively high line count and as such is difficult to assure correctness by visual code checking. However, it's not clear that the line count could be significantly reduced without risking loss of correctness. Thus this relies more on testing to assure robustness, than on code review.

The code involves polling behavior, so issues to watch for would more likely involve process handling, i.e. problems with socket polling.

Beyond that, the usual generic issues to watch for - build issues, dependency issues during build or on upgrade, and service restarting.

[Original Report]
The OpenSSH package 8.9p1 as shipped with U22.04 (8.9p1-3) suffers from the bug described at
https://bugzilla.mindrot.org/show_bug.cgi?id=3411 and https://bugzilla.mindrot.org/show_bug.cgi?id=3405

A command such as "xterm -e 'ssh -f remote.host sleep 60'" will pop up an xterm, ask for whatever authentication is needed, close the xterm, and leave the ssh client spinning consuming CPU time for 60 seconds before it exits. It should leave the ssh client idle for 60 seconds. Many uses of ssh to launch graphical applications will be caught by this bug.

This is fixed in OpenSSH 9.0p1 as the first bugfix listed in its release notes at https://www.openssh.com/txt/release-9.0

Related branches

Revision history for this message
Michael Rutter (mjr19) wrote :

And I can confirm that the patch at https://bugzilla.mindrot.org/attachment.cgi?id=3581 applies cleanly and fixes this issue.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for taking the time to report the bug and make Ubuntu better.

I can reproduce the bug using your testcase, but that requires a VM with a graphical environment installed. Another way to reproduce the bug is (from the upstream bug report):

$ lxc launch ubuntu-daily:jammy ssh-cpu
$ lxc shell ssh-cpu
# ssh HOST 2> >({exec 1>&2})

You can shell into the container from another terminal and use "htop" to verify that ssh is using 100% of one of the CPU cores.

This seems to have been fixed upstream by the following commit:

https://github.com/openssh/openssh-portable/commit/d6556de1db0822c76ba2745cf5c097d9472adf7c

I confirmed that this only happens on Jammy. Focal and Kinetic are not affected.

Changed in openssh (Ubuntu):
status: New → Fix Released
Changed in openssh (Ubuntu Jammy):
status: New → Triaged
importance: Undecided → Medium
Bryce Harrington (bryce)
tags: added: server-todo
Changed in openssh (Ubuntu Jammy):
assignee: nobody → Bryce Harrington (bryce)
Revision history for this message
Bryce Harrington (bryce) wrote :

I've posted a PPA with the patch to fix this issue here:

    https://launchpad.net/~bryce/+archive/ubuntu/openssh-sru-lp1986521

This can be installed via:

  $ sudo add-apt-repository -yus ppa:bryce/openssh-sru-lp1986521
  $ sudo apt-get install openssh

Can you please upgrade to this and verify it fixes the reported issue?

Revision history for this message
Bryce Harrington (bryce) wrote :

I've reproduced the issue and confirmed the PPA fixes it, as expected.

description: updated
Bryce Harrington (bryce)
description: updated
Bryce Harrington (bryce)
description: updated
Bryce Harrington (bryce)
Changed in openssh (Ubuntu Jammy):
status: Triaged → In Progress
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Michael, or anyone else affected,

Accepted openssh into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openssh/1:8.9p1-3ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in openssh (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (openssh/1:8.9p1-3ubuntu0.1)

All autopkgtests for the newly accepted openssh (1:8.9p1-3ubuntu0.1) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

xen-tools/4.9.1-1 (arm64, ppc64el)
vorta/0.8.3-1 (armhf)
pkg-perl-tools/0.65 (armhf)
hg-git/0.10.4-3 (amd64, armhf, arm64, s390x, ppc64el)
gvfs/1.48.2-0ubuntu1 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#openssh

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Michael Rutter (mjr19) wrote :

I can confirm that the updated package works in that the real use case which triggered the original bug (involving a script setting up an ssh tunnel) no longer does. Sorry for the delay in responding -- things have been busy, so I have also not tested the updated package much.

I have not updated the verification status as the VM on which I tested this is not entirely up-to-date and standard. I hope that someone is in a better position than I to test more thoroughly.

Revision history for this message
Bryce Harrington (bryce) wrote :

I've verified the test case as written. I reproduced the issue, enabled the -proposed package and did apt-get full-upgrade to pull in the new openssh from -proposed. The CPU usage dropped from 100% to <1% as soon as the operation concluded.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:8.9p1-3ubuntu0.1

---------------
openssh (1:8.9p1-3ubuntu0.1) jammy; urgency=medium

  * d/p/fix-poll-spin.patch: Fix poll(2) spin when a channel's output
    fd closes without data in the channel buffer.
    (LP: #1986521)

 -- Bryce Harrington <email address hidden> Tue, 22 Nov 2022 23:38:19 -0800

Changed in openssh (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for openssh has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.