Kernel 3.13.0-77 crashes (can be triggered by Samba)

Bug #1543980 reported by Karolin Seeger on 2016-02-10
60
This bug affects 14 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Kamal Mostafa
Trusty
High
Kamal Mostafa

Bug Description

Ubuntu 14.04.3 LTS

After updating to kernel 3.13.0-77 system crashes.
First, network dies, then the whole system.
There are several kernel crashes in the logs.

--- snip ---
ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
Date: Wed Feb 10 09:20:35 2016
Failure: oops
OopsText:
 BUG: soft lockup - CPU#1 stuck for 23s! [smbd:5908]
--- snap ---

Followed by kernel stack traces.

After some investigation, it turned out that the crash can be triggered by Samba.
It's easily reproducible by running the following commands in the Samba master branch:
./configure.developer
TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite"

Downgrading to kernel 3.13.0-76 solves this problem.

Please let me know if we can provide more information or help testing.

Thanks!

Karolin Seeger (kseeger-b) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-trusty (Ubuntu):
status: New → Confirmed
Philipp Hahn (pmhahn) wrote :

We've seen similar reports of (always) smbd getting stuck with our own 4.1.16 kernel: <https://forge.univention.org/bugzilla/show_bug.cgi?id=40558>

The common patch from my research so far looks like "unix: properly account for FDs passed over unix sockets", which might have introduced some suitable locking problem. It has been back-ported from 4.4 to several kernels by now.

I'm not able to reproduce the bug with samba3.raw.composite in my 8 vCPU KVM environment.
Some time ago I had a similar bug which I could only trigger on real-HW with 8 CPU.

Changed in linux-lts-trusty (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key regression-update trusty
affects: linux-lts-trusty (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
status: New → Confirmed
tags: added: needs-bisect
Philipp Hahn (pmhahn) wrote :

Reverting the patch "unix: avoid use-after-free in ep_remove_wait_queue" in 4.1 fixes my problem (for now). The original patch went into 4.4, but was back-ported to several stable trees:

v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
v3.18: 72032798034d921ed565e3bf8dfdc3098f6473e2
v4.1: 5c77e26862ce604edea05b3442ed765e9756fe0f
v4.2: bad967fdd8ecbdd171f5f243657be033d2d081a7
v4.3: 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
v4.4: 7d267278a9ece963d77eefec61630223fce08c6c

See <https://lkml.org/lkml/2016/2/2/474>

Changed in linux (Ubuntu Trusty):
status: Confirmed → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Triaged
tags: removed: needs-bisect
Changed in linux (Ubuntu Trusty):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux (Ubuntu):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Stefan Metzmacher (metze) wrote :

https://forge.univention.org/bugzilla/show_bug.cgi?id=40558#c11

Indicates that commit
51cd3ed4c41b3895869925b99dd95a704bd2c91a unix: avoid use-after-free in ep_remove_wait_queue
should be reverted in order to avoid this regression.

Can we please get some progress on this?

Hello,

Am 23.02.2016 um 08:34 schrieb Stefan Metzmacher:
> https://forge.univention.org/bugzilla/show_bug.cgi?id=40558#c11
>
> Indicates that commit
> 51cd3ed4c41b3895869925b99dd95a704bd2c91a unix: avoid use-after-free in ep_remove_wait_queue
> should be reverted in order to avoid this regression.
>
> Can we please get some progress on this?

Rainer Weikusat sent a patch named
 [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
 <https://patchwork.ozlabs.org/patch/582017/>
which fixes the problem.

For our distribution we released chose to revert the original patch as
we needed a working kernel as fast as possible, as several of our
customers were hit by that bug.

I tested the patch from Rainer and it also made the bug disappear.
David Miller also picked the patch for stable and we will do the same
when next be build a new kernel for our release.

Philipp

Stefan Metzmacher (metze) wrote :

Ping!

Philipp Hahn (pmhahn) wrote :

This is the 4.5 commit: <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a5527dda344fff0514b7989ef7a755729769daa1>

For 4.4 it is in review right now for 4.4.4 as announced by greg k-h yesterday: <https://lkml.org/lkml/2016/3/1/828>

Philipp Hahn (pmhahn) wrote :

More broken kernel versions:
v3.14: 9d054f57adc981a5f503d5eb9b259aa450b90dc5
v3.12: 9964b4c4ee925b2910723e509abd7241cff1ef84
v3.10: da8db0830a2ce63f628150307a01a315f5081202
ckt/linux-3.13.y: 6505b15f7f7efde1853b5a7641e9ce675c2b1a96
v3.4: -
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a

Stefan Metzmacher (metze) wrote :

Thanks Philipp!

I just hope to trigger some reaction from the ubuntu maintainers
in order get a usable kernel more than two week after breaking it.

Karolin Seeger (kseeger-b) wrote :

Any news on this one?

Kamal Mostafa (kamalmostafa) wrote :

This fix (a5527dd af_unix: Guard against other == sk in unix_dgram_sendmsg) has now been committed to Trusty, scheduled for the next kernel release (3.13.0-84.128).

Changed in linux (Ubuntu Trusty):
status: Triaged → Fix Committed
Stefan Metzmacher (metze) wrote :

Are the 3.13.0-84.128 packages available somewhere for testing?

Kamal Mostafa (kamalmostafa) wrote :

The 3.13.0-84.128 packages (including this fix) are currently building, here:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+packages?field.series_filter=trusty

After the builds finish (several hours), the packages will enter the -proposed archive and an announcement will be posted here (likely tomorrow).

Kamal Mostafa (kamalmostafa) wrote :

The Trusty 3.13.0-84.128 kernel packages (including this fix) are now available for testing in the -proposed archive. To test, select your architecture from the "Published Versions" here:

  https://launchpad.net/ubuntu/trusty/+package/linux-image-3.13.0-84-generic

Then download and install the "linux-image...deb" package under "Downloadable files".

Karolin Seeger (kseeger-b) wrote :

Could not reproduce with 3.13.0-84-generic :-)

Thanks a lot!

Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
tags: added: verification-done-trusty
removed: verification-needed-trusty
Kamal Mostafa (kamalmostafa) wrote :

Already confirmed; Thanks @Karolin.

Launchpad Janitor (janitor) wrote :
Download full text (11.0 KiB)

This bug was fixed in the package linux - 3.13.0-85.129

---------------
linux (3.13.0-85.129) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1558727

  [ Upstream Kernel Changes ]

  * Revert "Revert "af_unix: Revert 'lock_interruptible' in stream receive
    code""

linux (3.13.0-84.128) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1557596

  [ Upstream Kernel Changes ]

  * Revert "af_unix: Revert 'lock_interruptible' in stream receive code"
    - LP: #1540731
  * seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO
    - LP: #1496073
  * net/mlx4_en: Remove dependency between timestamping capability and
    service_task
    - LP: #1537859
  * net/mlx4_en: Fix HW timestamp init issue upon system startup
    - LP: #1537859
  * x86/mm: Fix slow_virt_to_phys() for X86_PAE again
    - LP: #1549601
  * iw_cxgb3: Fix incorrectly returning error on success
    - LP: #1557191
  * EVM: Use crypto_memneq() for digest comparisons
    - LP: #1557191
  * x86/entry/compat: Add missing CLAC to entry_INT80_32
    - LP: #1557191
  * iio: dac: mcp4725: set iio name property in sysfs
    - LP: #1557191
  * iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
    - LP: #1557191
  * PCI/AER: Flush workqueue on device remove to avoid use-after-free
    - LP: #1557191
  * libata: disable forced PORTS_IMPL for >= AHCI 1.3
    - LP: #1557191
  * mac80211: start_next_roc only if scan was actually running
    - LP: #1557191
  * mac80211: Requeue work after scan complete for all VIF types.
    - LP: #1557191
  * rfkill: fix rfkill_fop_read wait_event usage
    - LP: #1557191
  * crypto: shash - Fix has_key setting
    - LP: #1557191
  * drm/i915/dp: fall back to 18 bpp when sink capability is unknown
    - LP: #1557191
  * target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors
    - LP: #1557191
  * crypto: algif_hash - wait for crypto_ahash_init() to complete
    - LP: #1557191
  * iio: inkern: fix a NULL dereference on error
    - LP: #1557191
  * intel_scu_ipcutil: underflow in scu_reg_access()
    - LP: #1557191
  * ALSA: seq: Fix race at closing in virmidi driver
    - LP: #1557191
  * ALSA: rawmidi: Remove kernel WARNING for NULL user-space buffer check
    - LP: #1557191
  * ALSA: pcm: Fix potential deadlock in OSS emulation
    - LP: #1557191
  * ALSA: seq: Fix yet another races among ALSA timer accesses
    - LP: #1557191
  * ALSA: timer: Fix link corruption due to double start or stop
    - LP: #1557191
  * libata: fix sff host state machine locking while polling
    - LP: #1557191
  * cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()
    - LP: #1557191
  * ASoC: dpcm: fix the BE state on hw_free
    - LP: #1557191
  * module: wrapper for symbol name.
    - LP: #1557191
  * ALSA: hda - Add fixup for Mac Mini 7,1 model
    - LP: #1557191
  * ALSA: Move EXPORT_SYMBOL() in appropriate places
    - LP: #1557191
  * ALSA: rawmidi: Make snd_rawmidi_transmit() race-free
    - LP: #1557191
  * ALSA: rawmidi: Fix race at copying & updating the position
    - LP: #1557191
  * ALSA: seq: Fix lockdep warnings due to double mutex locks
    - LP: #1557191
  * drivers/scsi/sg.c: mark VMA as VM_IO...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.