read() from pty doesn't finish.

Bug #1512815 reported by Eric Desrochers
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Trusty
Fix Released
Medium
Joseph Salisbury
Vivid
Fix Released
Medium
Joseph Salisbury
Wily
Fix Released
Medium
Joseph Salisbury

Bug Description

It has been brought to my attention

By the attached test program pty, a pair of process repeats writing and reading only '\n' to a master pseudoterminal device (/dev/ptmx) and a slave pseudoterminal device (/dev/pts/N) each. When we carry out the following 30 pairs 10,000 times, in a pair each process doesn't finish reading.

$ pty 30
The following message will be usually indicated immediately.
#name copy num/sec usec/num
pty_switch 30 1541842 0.648575

When a message wasn't indicated any more, we got the attached dump file.
A kernel was 3.13.0-45.74.
A system was Ubuntu 14.04 LTS.

The same problem occurred in case of 3.13.0-55.92 kernel and the following kernel have been tested as well:
2.6.32, 3.10: not reproduces.
3.19, 4.0.0, 4.1.3: reproduces.

CVE References

Revision history for this message
Eric Desrochers (slashd) wrote :
Download full text (4.3 KiB)

According to the dump a pair of process PID 7347 and 7348 were still waiting for read() from /dev/ptmx (struct file ffff880c65357c00) and /dev/pts/18 (struct file ffff880c65357500) respectively .

Normal behavior is as follows;
PID 7347: read(/dev/ptmx) -> write(/dev/ptmx),
PID 7348: write(/dev/pts/18) -> read(/dev/pts/18).

In the dump, PID 7348 finished writing to /dev/pts/18, and was waiting for completion of reading, while PID 7347 continued waiting for reading /dev/ptmx.

Though data has already arrived at /dev/ptmx, PID 7347 doesn't seem to have been woken up while being linked to the queue of the data to be read at tty_struct 0xffff88085438ac00.

- PID: 7347 TASK: ffff880853111800 CPU: 15 COMMAND: "pty"
#0 [ffff88084477dc90] __schedule at ffffffff81724e19
#1 [ffff88084477dcf8] schedule at ffffffff817252d9
#2 [ffff88084477dd08] schedule_timeout at ffffffff81724529
#3 [ffff88084477ddb8] n_tty_read at ffffffff8144f6a4
#4 [ffff88084477dec0] tty_read at ffffffff8144a94d
#5 [ffff88084477df08] vfs_read at ffffffff811bda65
#6 [ffff88084477df40] sys_read at ffffffff811be579
#7 [ffff88084477df80] system_call_fastpath at ffffffff8173196d
RIP: 00007fb735a52290 RSP: 00007fff290b4ee8 RFLAGS: 00010212
RAX: 0000000000000000 RBX: ffffffff8173196d RCX: 0000000010a8b550
RDX: 0000000000000001 RSI: 00007fff290b51af RDI: 0000000000000023
RBP: 00007fff290b51b0 R8: 00007fff290b51c0 R9: 00007fff290b5128
R10: 00007fff290b4f60 R11: 0000000000000246 R12: 00007fff290b53d0
R13: 000000000000003b R14: 0000000000000020 R15: 0000000000000010
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b

#5 [ffff88084477df08] vfs_read at ffffffff811bda65
ffff88084477df10: ffff880c65357c00 00007fff290b51af
ffff88084477df20: 0000000000000001 0000000000000000
ffff88084477df30: 000000000000001e ffff88084477df78
ffff88084477df40: ffffffff811be579

- PID: 7348 TASK: ffff88084354c800 CPU: 4 COMMAND: "pty"
#0 [ffff880844631c90] __schedule at ffffffff81724e19
#1 [ffff880844631cf8] schedule at ffffffff817252d9
#2 [ffff880844631d08] schedule_timeout at ffffffff81724529
#3 [ffff880844631db8] n_tty_read at ffffffff8144f6a4
#4 [ffff880844631ec0] tty_read at ffffffff8144a94d
#5 [ffff880844631f08] vfs_read at ffffffff811bda65
#6 [ffff880844631f40] sys_read at ffffffff811be579
#7 [ffff880844631f80] system_call_fastpath at ffffffff8173196d
RIP: 00007fb735a52290 RSP: 00007fff290b4ee8 RFLAGS: 00010206
RAX: 0000000000000000 RBX: ffffffff8173196d RCX: 000000007c9d4d40
RDX: 0000000000000001 RSI: 00007fff290b51af RDI: 0000000000000024
RBP: 00007fff290b51b0 R8: 00007fff290b51c0 R9: 00007fff290b5128
R10: 00007fff290b4f60 R11: 0000000000000246 R12: 00007fff290b53d0
R13: 000000000000003b R14: 0000000000000021 R15: 0000000000000010
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b

#5 [ffff880844631f08] vfs_read at ffffffff811bda65
ffff880844631f10: ffff880c65357500 00007fff290b51af
ffff880844631f20: 0000000000000001 0000000000000000
ffff880844631f30: 000000000000001e ffff880844631f78
ffff880844631f40: ffffffff811be579

- Files opened by PID 7347 and 7348:

FD FILE DENTRY INODE TYPE PATH tty_struct n_tty_data
....

35 ffff880c65357c00 ffff88085f5dbc80 ffff880c674e5fd8 CHR /dev/ptmx 0xffff88085438ac00 0xffffc9001edfb000
...

Read more...

Revision history for this message
Eric Desrochers (slashd) wrote :

This is a reproducer for the stall problem in drivers/tty/n_tty.c

To reproduce the problem, save the program below as pty.c, compile it,
and run it in parallel.

# cc -o pty pty.c
# for i in {1..16}; do ./pty& done; wait
The problem can be reproduced on a multi-socket server with recent CPUs.
The program always stalled during the first run when I used a server
with the following CPU.
  Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
  2-sockets x 16-cores x 2-threads

Revision history for this message
Eric Desrochers (slashd) wrote :

LKML reference:
https://lkml.org/lkml/2015/9/28/849

There is a fix applied upstream starting at v4.3-rc5

---
commit e81107d4c6bd098878af9796b24edc8d4a9524fd
Author: Kosuke Tatsukawa <email address hidden>
Date: Fri Oct 2 08:27:05 2015 +0000

    tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c
---

Eric Desrochers (slashd)
summary: - 14.04: read() from pty doesn't finish.
+ read() from pty doesn't finish.
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1512815

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key needs-bisect trusty vivid wily
Changed in linux (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Wily test kernel with a cherry pick of commit e81107d. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1512815/

Can you test this kernel and see if it resolves this bug?

Note, you need to install both the linux-image and linux-image-extra .deb packages.

Revision history for this message
Eric Desrochers (slashd) wrote :

I'm writing this comment on behalf of someone that does experience the problem and have tested the test kernel (Wily kernel with commit e81107d) by Joseph Salisbury.

Here is the feedbacks:

---
Hello,

We examined whether the provided kernel works on Ubuntu 15.10 by running our test program.

4.2.0-16 kernel: The problem occurred 30 times at 630 running.
4.2.0-17.21~lp1512815 kernel: 0 time at 100000 running. (We didn't install linux-cloud-tools packages.)

We think the fixed kernel works fine.

By the way, if you are referencing 3.18.23, there is an extra
spin_unlock_irqrestore() which should be removed by
http://pastebin.com.keephe.com/L14RzUcF
,so please be aware.

We will wait for a kernel backported for Ubuntu 14.04 LTS (x86_64).
---

Revision history for this message
Eric Desrochers (slashd) wrote :

Here is a test kernel based on kernel v3.13.0-67 (Trusty).

Installation instructions:
----
$ sudo add-apt-repository ppa:eric-desrochers-z/lp1512815
$ sudo apt-get update
$ sudo apt-get install linux-image-3.13.0-67-generic=3.13.0-67.110hf85627v20151105b2
$ sudo apt-get install linux-image-extra-3.13.0-67-generic=3.13.0-67.110hf85627v20151105b2
$ sudo update-grub
----

Please let us know if it mitigate/solve the problem.

Revision history for this message
Masaki Tachibana (tachibana-5) wrote :

We tested
linux-image-3.13.0-67-generic_3.13.0-67.110hf85627v20151105b2_amd64.deb
and
linux-image-extra-3.13.0-67-generic_3.13.0-67.110hf85627v20151105b2_amd64.deb
on Ubuntu 14.04 LTS by running our test program.

3.13.0-45.74 kernel: The problem occurred 2 times at 3300 running.
3.13.0-67.110hf85627v20151105b2 kernel: 0 time at 300000 running.

We think the fixed kernel works fine.

Changed in linux (Ubuntu Trusty):
status: New → In Progress
Changed in linux (Ubuntu Vivid):
status: New → In Progress
Changed in linux (Ubuntu Wily):
status: New → In Progress
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
Changed in linux (Ubuntu Vivid):
importance: Undecided → Medium
Changed in linux (Ubuntu Wily):
importance: Undecided → Medium
Changed in linux (Ubuntu Trusty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Wily):
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: removed: needs-bisect
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Wily):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid
Revision history for this message
Tom Zhou (zhouqt) wrote :

Hello,

One of our customer tested 3.19.0-41.46 on Ubuntu 15.04 by running test program.

Proposed kernel(3.19.0-41.46 kernel):
The problem occurred 0 time in 100000 running.

Conventional kernel:
The problem occurred 32 times in 4917 running.

Eric Desrochers (slashd)
tags: added: verification-done-vivid
removed: verification-needed-vivid
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.4 KiB)

This bug was fixed in the package linux - 3.19.0-41.46

---------------
linux (3.19.0-41.46) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1522918

  [ Upstream Kernel Changes ]

  * Revert "dm: fix AB-BA deadlock in __dm_destroy()"
    - LP: #1522766
  * dm: fix AB-BA deadlock in __dm_destroy()
    - LP: #1522766

linux (3.19.0-40.45) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1522786

  [ Andy Whitcroft ]

  * [Packaging] control -- prepare for new kernel-wedge semantics
    - LP: #1516686
  * [Debian] rebuild should only trigger for non-linux packages
    - LP: #1498862, #1516686
  * [Tests] gcc-multilib does not exist on ppc64el
    - LP: #1515541

  [ Joseph Salisbury ]

  * SAUCE: scsi_sysfs: protect against double execution of
    __scsi_remove_device()
    - LP: #1509029

  [ Luis Henriques ]

  * [Config] updateconfigs after 3.19.8-ckt10 stable update

  [ Upstream Kernel Changes ]

  * Revert "ARM64: unwind: Fix PC calculation"
    - LP: #1520309
  * Revert "md: allow a partially recovered device to be hot-added to an
    array."
    - LP: #1520309
  * tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c
    - LP: #1512815
  * HID: rmi: Print the firmware id of the touchpad
    - LP: #1515503
  * HID: rmi: Add functions for writing to registers
    - LP: #1515503
  * HID: rmi: Disable scanning if the device is not a wake source
    - LP: #1515503
  * HID: rmi: Set F01 interrupt enable register when not set
    - LP: #1515503
  * be2net: log link status
    - LP: #1513980
  * xhci: Workaround to get Intel xHCI reset working more reliably
  * Drivers: hv: hv_balloon: refuse to balloon below the floor
    - LP: #1294283
  * Drivers: hv: hv_balloon: survive ballooning request with num_pages=0
    - LP: #1294283
  * Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case
    - LP: #1294283
  * Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case
    - LP: #1294283
  * Drivers: hv: balloon: check if ha_region_mutex was acquired in
    MEM_CANCEL_ONLINE case
    - LP: #1294283
  * mm: meminit: make __early_pfn_to_nid SMP-safe and introduce
    meminit_pfn_in_nid
    - LP: #1294283
  * mm: meminit: inline some helper functions
    - LP: #1294283
  * mm, meminit: allow early_pfn_to_nid to be used during runtime
    - LP: #1294283
  * mm: initialize hotplugged pages as reserved
    - LP: #1294283
  * gut proc_register() a bit
    - LP: #1519106
  * arm: factor out mmap ASLR into mmap_rnd
    - LP: #1518483
  * x86: standardize mmap_rnd() usage
    - LP: #1518483
  * arm64: standardize mmap_rnd() usage
    - LP: #1518483
  * mips: extract logic for mmap_rnd()
    - LP: #1518483
  * powerpc: standardize mmap_rnd() usage
    - LP: #1518483
  * s390: standardize mmap_rnd() usage
    - LP: #1518483
  * mm: expose arch_mmap_rnd when available
    - LP: #1518483
  * s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
    - LP: #1518483
  * mm: split ET_DYN ASLR from mmap ASLR
    - LP: #1518483
  * mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
    - LP: #1518483
  * isdn_ppp: Add checks for allocation failure in isdn_ppp_open()
   ...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Andy Whitcroft (apw)
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Revision history for this message
Eric Desrochers (slashd) wrote :

It has been brought to my attention :

"We tested 3.13.0-75-generic #119-Ubuntu proposed kernel.
Our test program detects the problem 12 times in 7000 running on the conventional kernel.
On the 2 systems with the proposed kernel the problem didn't occur in 1000000 running on each system.
It seems that the problem has been fixed."

tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Andy Whitcroft (apw) wrote :

Fix released in 3.13.0-77.121

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.