[Xenial] net: better skb->sender_cpu and skb->napi_id cohabitation

Bug #1673303 reported by Leann Ogasawara
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Xenial
Medium
Leann Ogasawara
Yakkety
Undecided
Unassigned

Bug Description

== Xenial SRU ==
We've twice now tried to roll out new firewalls and twice had to
revert back when the new firewalls almost immediately hung after
cutover.

At first we thought it was hardware issues, but after we reproduced it
on 4 different firewalls, we realised it was more likely to be a
problem with the Xenial kernel.

We think we're running into something similar to:

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1579943

And Joel thinks the following patch might fix it:

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb

Unfortunately, even when we mimic live production traffic on the new
firewalls with port mirroring, we only have a ~20% success rate at
reproducing the kernel hang and I'm keen not to have any more failed
migration attempts (and the corresponding downtime for many many
services).

== Fix ==
See http://kernel.ubuntu.com/~ogasawara/lp1579943/

== Testing ==
We've just successfully migrated four firewalls
that are running with the patched kernel. Previously two of them would
have survived for less than 2 minutes, both have now been running in
production for over an hour.

I'll provide another update tomorrow, however at this stage I'd suggest
that it makes sense to get this into an SRU.

CVE References

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

This is already applied to Yakkety and Zesty, marking those tasks Fix Released.

~/linux$ git describe --contains 52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb
v4.5-rc1~128^2~276^2~13

~/linux$ git show 52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb
commit 52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb
Author: Eric Dumazet <email address hidden>
Date: Wed Nov 18 06:30:50 2015 -0800

    net: better skb->sender_cpu and skb->napi_id cohabitation

Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Yakkety):
status: New → Fix Released
Changed in linux (Ubuntu Xenial):
assignee: nobody → Leann Ogasawara (leannogasawara)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Joel Sing (jsing) wrote :

The firewalls deployed with the patched kernel have now been running stably in production for more than 24 hours.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
James Troup (elmo)
tags: added: verification-done-xenial
removed: verification-needed-xenial
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (29.1 KiB)

This bug was fixed in the package linux - 4.4.0-75.96

---------------
linux (4.4.0-75.96) xenial; urgency=low

  * linux: 4.4.0-75.96 -proposed tracker (LP: #1684441)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.4.0-74.95) xenial; urgency=low

  * linux: 4.4.0-74.95 -proposed tracker (LP: #1682041)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.4.0-73.94) xenial; urgency=low

  * linux: 4.4.0-73.94 -proposed tracker (LP: #1680416)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with nested namespaces
    (LP: #1660832)
    - SAUCE: apparmor: fix cross ns perm of unix domain sockets

  * Xenial update to v4.4.59 stable release (LP: #1678960)
    - xfrm: policy: init locks early
    - virtio_balloon: init ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers