tty hangup regression in 3.13 kernel (trusty LTS)

Bug #1397976 reported by Björn Ramberg on 2014-12-01
46
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned
Trusty
Undecided
Tim Gardner
linux-lts-trusty (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned
Trusty
Undecided
Unassigned

Bug Description

From https://lkml.org/lkml/2014/10/10/345

#####
Commit f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")
introduces a race window where a pty master can be signalled that the ptyslave was closed before all the data that the slave wrote is delivered.
Commit f8747d4a466a ("tty: Fix pty master read() after slave closes") fixed theproblem in case of n_tty_read, but the problem still exists for n_tty_poll.This can be seen by running 'for ((i=0; i<100;i++));do ./test.py ;done'where test.py is:

import os, select, pty
(pid, pty_fd) = pty.fork()

if pid == 0:
   os.write(1, 'This string should be received by parent')
else:
   poller = select.epoll()
   poller.register( pty_fd, select.EPOLLIN )
   ready = poller.poll( 1 * 1000 )
   for fd, events in ready:
      if not events & select.EPOLLIN:
         print 'missed POLLIN event'
      else:
         print os.read(fd, 100)
   poller.close()

The string from the slave is missed several times.
This patch takes the same approach as the fix for read and special casesthis condition for poll.
Tested on 3.16.
#####

This is has been merged to Linus Torvalds branch: https://github.com/torvalds/linux/commit/c4dc304677e8d566572c4738d95c48be150c6606

This would be needed to be implemented in to 3.13 kernel too. As both 12.04 and 14.04 is currently running the Trusty LTS it affectes both.

br,

Björn

description: updated
Changed in linux-lts-trusty (Ubuntu):
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for linux-lts-trusty (Ubuntu) because there has been no activity for 60 days.]

Changed in linux-lts-trusty (Ubuntu):
status: Incomplete → Expired
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Tim Gardner (timg-tpi) on 2015-08-28
affects: linux-lts-trusty (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu Trusty):
status: New → Confirmed
Tim Gardner (timg-tpi) on 2015-08-28
Changed in linux (Ubuntu):
status: Expired → Fix Released
Changed in linux-lts-trusty (Ubuntu):
status: New → In Progress
Changed in linux-lts-trusty (Ubuntu Trusty):
status: New → In Progress
Changed in linux-lts-trusty (Ubuntu):
status: In Progress → Invalid
Changed in linux (Ubuntu Trusty):
status: Confirmed → In Progress
Brad Figg (brad-figg) on 2015-09-01
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Delmic BV (delmicbv) wrote :

Hi,
I've just got the new -proposed kernel with that fix, and got an important regression: it broke tty connection to my devices using FTDI chips.

I haven't fully checked yet, but I think that's because you've cherry-picked commit eafbe67f84761d787802e5113d895a316b6292fe "n_tty: Refactor input_available_p() by call site", without also taking commit a5934804a834f525c9e6289935ceef65b952b101 "n_tty: Fix poll() when TIME_CHAR and MIN_CHAR == 0" (which fixes that first commit).

Note: I'm not affected by the original bug, so cannot comment on the effect of the other commits.

How to reproduce the regression:
? python
> import serial
> p = serial.Serial("/dev/ttyACM0", timeout=1)
> p.read()
SerialException: device reports readiness to read but returned no data (device disconnected?)

Expected:
p.read() waits one second and returns nothing (due to timeout).

Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Trusty):
assignee: nobody → Tim Gardner (timg-tpi)
status: Fix Committed → In Progress
Chris J Arges (arges) on 2015-09-15
Changed in linux (Ubuntu Precise):
status: New → Invalid
Changed in linux-lts-trusty (Ubuntu Trusty):
status: In Progress → Invalid
Changed in linux-lts-trusty (Ubuntu Precise):
status: New → In Progress
Brad Figg (brad-figg) on 2015-09-15
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Dimitrios Ntoulas (ntoulasd) wrote :

14.04 with kernel 3.13.0-64-generic proposed enabled I get SerialException

>>> import serial
>>> p = serial.Serial("/dev/ttyUSB0", timeout=1)
>>> p.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/serial/serialposix.py", line 460, in read
    raise SerialException('device reports readiness to read but returned no data (device disconnected?)')
serial.serialutil.SerialException: device reports readiness to read but returned no data (device disconnected?)

tags: removed: 3.13 kernel poll timout tty
tags: added: verification-done-trusty
removed: verification-needed-trusty
Björn Ramberg (bjoern-ramberg) wrote :

Could not verify the issues noted in comment #6, but the original issue with the pty hangup was tested and confirmed fixed on 14.04 3.13.0-64-generic.

Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package linux - 3.13.0-65.105

---------------
linux (3.13.0-65.105) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1498108

  [ Upstream Kernel Changes ]

  * net: Fix skb_set_peeked use-after-free bug
      - LP: #1497184

linux (3.13.0-64.104) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1493803

  [ Chris J Arges ]

  * [Config] DEFAULT_IOSCHED="deadline" for ppc64el
    - LP: #1469829

  [ Upstream Kernel Changes ]

  * tcp: fix recv with flags MSG_WAITALL | MSG_PEEK
    - LP: #1486146
  * libceph: abstract out ceph_osd_request enqueue logic
    - LP: #1488035
  * libceph: resend lingering requests with a new tid
    - LP: #1488035
  * n_tty: Refactor input_available_p() by call site
    - LP: #1397976
  * tty: Fix pty master poll() after slave closes v2
    - LP: #1397976
  * md: use kzalloc() when bitmap is disabled
    - LP: #1493305
  * ata: pmp: add quirk for Marvell 4140 SATA PMP
    - LP: #1493305
  * libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for HP 250GB SATA disk
    VB0250EAVER
    - LP: #1493305
  * libata: add ATA_HORKAGE_NOTRIM
    - LP: #1493305
  * libata: force disable trim for SuperSSpeed S238
    - LP: #1493305
  * libata: increase the timeout when setting transfer mode
    - LP: #1493305
  * libata: Do not blacklist M510DC
    - LP: #1493305
  * mac80211: clear subdir_stations when removing debugfs
    - LP: #1493305
  * ALSA: hda - Add new GPU codec ID 0x10de007d to snd-hda
    - LP: #1493305
  * drm: Stop resetting connector state to unknown
    - LP: #1493305
  * usb: dwc3: Reset the transfer resource index on SET_INTERFACE
    - LP: #1493305
  * usb: xhci: Bugfix for NULL pointer deference in xhci_endpoint_init()
    function
    - LP: #1493305
  * xhci: Calculate old endpoints correctly on device reset
    - LP: #1493305
  * xhci: report U3 when link is in resume state
    - LP: #1493305
  * xhci: prevent bus_suspend if SS port resuming in phase 1
    - LP: #1493305
  * xhci: do not report PLC when link is in internal resume state
    - LP: #1493305
  * USB: OHCI: Fix race between ED unlink and URB submission
    - LP: #1493305
  * usb-storage: ignore ZTE MF 823 card reader in mode 0x1225
    - LP: #1493305
  * blkcg: fix gendisk reference leak in blkg_conf_prep()
    - LP: #1493305
  * tile: use free_bootmem_late() for initrd
    - LP: #1493305
  * Input: usbtouchscreen - avoid unresponsive TSC-30 touch screen
    - LP: #1493305
  * md/raid1: fix test for 'was read error from last working device'.
    - LP: #1493305
  * mmc: omap_hsmmc: Fix DTO and DCRC handling
    - LP: #1493305
  * isdn/gigaset: reset tty->receive_room when attaching ser_gigaset
    - LP: #1493305
  * mmc: sdhci-pxav3: fix platform_data is not initialized
    - LP: #1493305
  * mmc: block: Add missing mmc_blk_put() in power_ro_lock_show()
    - LP: #1493305
  * mmc: sdhci-esdhc: Make 8BIT bus work
    - LP: #1493305
  * bonding: correctly handle bonding type change on enslave failure
    - LP: #1493305
  * net: Clone skb before setting peeked flag
    - LP: #1493305
  * bridge: mdb: fix double add notification
    - LP: #1493305
  * usb: gadget: mv_udc_c...

Read more...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (5.5 KiB)

This bug was fixed in the package linux-lts-trusty - 3.13.0-65.105~precise1

---------------
linux-lts-trusty (3.13.0-65.105~precise1) precise; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1498291

  [ Upstream Kernel Changes ]

  * net: Fix skb_set_peeked use-after-free bug
      - LP: #1497184

linux (3.13.0-64.104) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1493803

  [ Chris J Arges ]

  * [Config] DEFAULT_IOSCHED="deadline" for ppc64el
    - LP: #1469829

  [ Upstream Kernel Changes ]

  * tcp: fix recv with flags MSG_WAITALL | MSG_PEEK
    - LP: #1486146
  * libceph: abstract out ceph_osd_request enqueue logic
    - LP: #1488035
  * libceph: resend lingering requests with a new tid
    - LP: #1488035
  * n_tty: Refactor input_available_p() by call site
    - LP: #1397976
  * tty: Fix pty master poll() after slave closes v2
    - LP: #1397976
  * md: use kzalloc() when bitmap is disabled
    - LP: #1493305
  * ata: pmp: add quirk for Marvell 4140 SATA PMP
    - LP: #1493305
  * libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for HP 250GB SATA disk
    VB0250EAVER
    - LP: #1493305
  * libata: add ATA_HORKAGE_NOTRIM
    - LP: #1493305
  * libata: force disable trim for SuperSSpeed S238
    - LP: #1493305
  * libata: increase the timeout when setting transfer mode
    - LP: #1493305
  * libata: Do not blacklist M510DC
    - LP: #1493305
  * mac80211: clear subdir_stations when removing debugfs
    - LP: #1493305
  * ALSA: hda - Add new GPU codec ID 0x10de007d to snd-hda
    - LP: #1493305
  * drm: Stop resetting connector state to unknown
    - LP: #1493305
  * usb: dwc3: Reset the transfer resource index on SET_INTERFACE
    - LP: #1493305
  * usb: xhci: Bugfix for NULL pointer deference in xhci_endpoint_init()
    function
    - LP: #1493305
  * xhci: Calculate old endpoints correctly on device reset
    - LP: #1493305
  * xhci: report U3 when link is in resume state
    - LP: #1493305
  * xhci: prevent bus_suspend if SS port resuming in phase 1
    - LP: #1493305
  * xhci: do not report PLC when link is in internal resume state
    - LP: #1493305
  * USB: OHCI: Fix race between ED unlink and URB submission
    - LP: #1493305
  * usb-storage: ignore ZTE MF 823 card reader in mode 0x1225
    - LP: #1493305
  * blkcg: fix gendisk reference leak in blkg_conf_prep()
    - LP: #1493305
  * tile: use free_bootmem_late() for initrd
    - LP: #1493305
  * Input: usbtouchscreen - avoid unresponsive TSC-30 touch screen
    - LP: #1493305
  * md/raid1: fix test for 'was read error from last working device'.
    - LP: #1493305
  * mmc: omap_hsmmc: Fix DTO and DCRC handling
    - LP: #1493305
  * isdn/gigaset: reset tty->receive_room when attaching ser_gigaset
    - LP: #1493305
  * mmc: sdhci-pxav3: fix platform_data is not initialized
    - LP: #1493305
  * mmc: block: Add missing mmc_blk_put() in power_ro_lock_show()
    - LP: #1493305
  * mmc: sdhci-esdhc: Make 8BIT bus work
    - LP: #1493305
  * bonding: correctly handle bonding type change on enslave failure
    - LP: #1493305
  * net: Clone skb before setting peeked flag
    - LP: #1493305
  * bridge: mdb: fix double add notification
   ...

Read more...

Changed in linux-lts-trusty (Ubuntu Precise):
status: In Progress → Fix Released
status: In Progress → Fix Released
Delmic BV (delmicbv) wrote :

As described in comment #4, the regression that the 'fix' introduced is still present in 3.13.0-65.105~precise1.

Should I reopen this bug report, or create a new one to track the regression?

Laz Karydas (lkary) wrote :

pyserial is totally broken on 3.13.0-65. Every time I call read() I get the same serial exception as comment #6.

Luis Henriques (henrix) wrote :

The fix pointed in comment #4 is already queued for the next kernel SRU in Trusty git tree (master-next branch). Really soon we will have a kernel in the -proposed pocket with this fix.

James Cameron (quozl) wrote :

The fix made in this bug broke production systems for me. Same symptom as in comment #4. Which bug is tracking the regression? Meanwhile, I downgrade kernel.

Delmic BV (delmicbv) wrote :

Yep, me too, about 10 production systems broke because of that regression. I had to push a special kernel on our own PPA to fix it :-/

Well, now there is a new kernel in -proposed with the additional patch that fix that regression. That's probably the easiest now: just add the -proposed repository temporarily to the package sources, and update the kernel.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers