Bug #1650336 “NFS client : kernel 4.4.0-57 crash with nfsv4 enri...” : Bugs : linux package : Ubuntu

Revision history for this message

Thomas Fili (tfili69) wrote on 2016-12-15:

#1

kernel_4.4.0-57_crash.txt Edit (213.1 KiB, text/plain)

Revision history for this message

Brad Figg (brad-figg) wrote on 2016-12-15: Missing required logs.

#2

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1650336

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Thomas Fili (tfili69) wrote on 2016-12-16:

#3

Yes,
in fact it is not possible to execute the apoort-collect command when the kernel crashed.

If someone need information about the hardware, i can boot the computer without the nfs stab entry and execute then the command.

But i can also report the same problem from other computers ( for example Supermicro servers )... and not only runing Xenial ... computers running Trusty with the linux-generic-lts-xenial kernel are also affected.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Joseph Salisbury (jsalisbury) on 2016-12-19

Changed in linux (Ubuntu):
importance:	Undecided → Medium
Changed in linux (Ubuntu Xenial):
importance:	Undecided → Medium
status:	New → Confirmed
Changed in linux (Ubuntu):
importance:	Medium → High
Changed in linux (Ubuntu Xenial):
importance:	Medium → High
tags:	added: performing-bisect xenial

Joseph Salisbury (jsalisbury) on 2016-12-19

Changed in linux (Ubuntu):
status:	Confirmed → In Progress
Changed in linux (Ubuntu Xenial):
status:	Confirmed → In Progress
Changed in linux (Ubuntu):
assignee:	nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee:	nobody → Joseph Salisbury (jsalisbury)

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-12-19:

#4

I started a kernel bisect between Ubuntu 4.4.0-54 and Ubuntu 4.4.0-57. The kernel bisect will require testing of about 4-5 test kernels.

I built the first test kernel, up to the following commit:
0cd611da7d4c01b178144bc17da8cd92cae2b1fa

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1650336

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Thomas Fili (tfili69) wrote on 2016-12-21:

#5

Thank you for this test kernel.

Unfortunately this one has the same problems like the official 4.4.0-57

Without /etc/fstab or with noauto entries for nfs the kernel boot fine.
With the nfs entries the kernel crashes ... sorry

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-01-05:

#6

I built the next test kernel, up to the following commit:
ed8c9a98e60fc731a9d83a7a137d5d84210967f5

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1650336

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-06:

#7

Unfortunately no changes ... same behavior ...

But i notice something strange ... if i boot some of this corrupt kernels that crash and then restart with the Magic SysRQ into a "good" kernel ...this kernel crashes at the same point.

Another Magic SysRQ or a cold start let boot the "good" kernel normal.

The affected kernels will boot in no kind ... neither cold start, warm start / reboot or minutes without power ...

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-01-09:

#8

I built the next test kernel, up to the following commit:
764d47217b1f3881600e11c08f109b177e521b15

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1650336

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-10:

#9

Sorry, no changes ... but i noticed something that could be usefull.

To ensure there is no hardware problem on my computer, i installed a fresh kubuntu 16.04.01 on a newer maschine and configured it.

Same behaviour ... the base kernel from installation 4.4.0-31 work without problem ... but all newer kernels i tested there crashes sometimes, also the 4.4 mainline kernels :(
Sometimes, when disconnecting power for some minutes the mainline kernel boot successfully ...

In the boot logs i see the mount of the first nfs entry seems to be successfull, the kernel crashes when trying to mount the second entry ...

With or without kerberos ... i tried several combinations ... having only one nfs entry in fstab booting without problem, adding the second one makes the kernel crash on boot time.

Mounting a secound nfs share after boot is complete is no problem

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-10:

#10

I noticed some additional detail ...

The problem only occur if the second nfs share is on the same NFS Server as the first share.
I tried to mount a second share from another NFSv4 Server, also running under FreeBSD 11, without problem

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-10:

#11

The mainline kernel 4.8.17-040817 do not have the problem, 4.4.41-040441 have the problem

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-01-11:

#12

Thanks for finding out the bug does not exist in the upstream 4.8 kernel. There are only a couple more test kernels for the bisect, so we may as well finish it.

I built the next test kernel, up to the following commit:
4891ae8e5d0801f13739c26300ac4cd162c3e63c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1650336

It might also be worthwhile to test the latest upstream 4.4 kernel to see if the commit that fixes the bug in 4.8 was also cc'd to stable.

The latest 4.4 kernel is available from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.41/

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-12:

#13

Thank you for the test kernel :)

But unfortunately it also crashs at the same position.
Beginns with: BUG: unable to handle kernel paging request at ffffffff814121a8
...

The 4.4.41 also crash, as i mention before ... On all computers i tested i have the same behavior that i can successfully boot such a kernel once when the computer was without power some minutes before ... or sometimes when i boot a good kernel before ... very stange

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-01-12:

#14

I built the next test kernel, up to the following commit:
50f208e18014589971583a8495987194724d56e4

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1650336

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-13:

#15

No, sorry the kernel crash, too

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-01-13:

#16

The bisect reported commit 50f208e18014589971583a8495987194724d56e4 as the first bad commit. I built a Xenial test kernel with this commit reverted. It can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1650336/

Can you test this kernel and see if it resolves this bug?

Note, you need to install both this linux-image and linux-image-extra .deb packages with this kernel.

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-16:

#17

Sorry, the same behavior.

But i found something else...

It is very frustrating, to be apparent the only one having this problem, so i try to setup a nfsv4 server on Ubuntu (14.04.5 with linux-generic-lts-wily kernel 4.2.0-42)

/etc/exports:

/home/exports *(rw,fsid=0,crossmnt,no_subtree_check,sync)
/home/exports/user *(rw,nohide,insecure,no_subtree_check,sync)
/home/exports/staff *(rw,nohide,insecure,no_subtree_check,sync)

Client (Ubuntu 16.04.1)

/etc/fstab

server:/user /home/server/user nfs _netdev,auto,rw,noatime,nfsvers=4,sync 0 0
server:/staff /home/server/staff nfs _netdev,auto,rw,noatime,nfsvers=4,sync 0 0

mount -a or command line mount works without problem ...

---

The behavior is not exactly the same as in our enviorment with a FreeBSD Server but similar i think.

Default installation kernel 4.4.0-31-generic : Ok, boot without problem

Latest kernel from repo 4.4.0-59 and all other kernel i tested, inclusive the latest mainline kernel 4.9.4

a. Booting with only one auto entry in /etc/fstab : No problem

b. Booting with both auto entries in /etc/fstab : The first share will mount fast.

The boot log show : "A start job is running for /home/server/staff (1min 14s / 1min 38s)"

After timeout is expired the computer finished booting but without mounting the second share.

After logging i can mount the second share without problems

So the kernel do not crash with a ubuntu nfs server ... bit maybe the main reason for the problem is the same ?!
---

Is there anyone could confirm this behavior in his environment ?

So that i do not feel so alone any longer ;)

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-01-17:

#18

We may have provided an improper "Good" or "Bad" result to the bisect. We may have to test the previously posted kernels again to confirm test results.

However, can you first test the latest upstream 4.4 stable kernel and mainline kernel to see if this bug is already fixed upstream? They can be downloaded from:

4.4.43: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.43/

Mainline: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc4

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-18:

#19

Thank you for your effort

Unfortunately the tests are very confusing

With the two shares from the FreeBSD 11 Server the mainline kernel 4.10.0 has no problem !!!
Also if i try to mount two additional shares from another FreeBSD 11 Server. No Problem at all !

But if i am try to mount the two shares from the ubuntu 14.04.01 nfsv4 server only one share will mounted at boot time but after the timeout expired it fully boot.

With the mainline kernel 4.9.4 the computer boots in every combination an numbers of shares without problem just like 4.8.17 mainline kernel and the 16.04.01 installation source kernel 4.0.31

Mainline kernel 4.4.43 crash just like 4.4.42 and 4.4.41

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-01-18:

#20

So this bug does not happen with the 4.10-rc4 kernel from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc4

If that is the case, we can perform a "Reverse" bisect to identify the commit that resolves that bug upstream.

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-18:

#21

When you ask if the kernel 4.10-rc4 do not crash any long on boot time, then i can answer : Yes, the kernel do not crash any longer.

But when you ask if the kernel 4.10-rc4 work as expected with nfs

Then i can answer No, he do not work as expected with nfs shares.

IMHO the mainline kernel 4.9.4 work as expeced ;)

But maybe you think there are different parts in the kernel responsibly for this hehaviour ?

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-19:

#22

Very stange ...

I test 4.10-rc4 again and found that the kernel is booting without problems with all shares when using the pure IP address for the server in /etc/fstab ... it is irrelevant if the correct entry in /etc/hosts exist or dns resolution is ok.

I dunno why 4.9.4 is working with dns names only ...

Revision history for this message

Seth Forshee (sforshee) wrote on 2017-01-19:

#23

@Thomas: I've been going back through the test results here and something isn't making sense. During the bisect you reported that the kernel built at commit 50f208e18014589971583a8495987194724d56e4 was bad. This commit has no code changes relative to 4.4.0-54, so they should behave the same. This makes me think that either the crash is intermittent, i.e. it might happen sometimes but not other times with the same kernel, or else that something is changing in your testing or environment. There's a slight chance that it could be some difference in the builds, but that's pretty unlikely.

The other question I have is whether or not you're always seeing the same problem each time you say a kernel is bad. By that I mean you see a crash with nearly identical messages in the kernel log. If there are multiple different issues going on it's best to try to focus on one at a time, if possible (it isn't always possible though if one problem is interfering with testing for another).

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-19:

#24

@Seth : Yes, this problem is very confusing

Since posting #8 i used a dedicated test system with a fresh kubuntu 16.04.1 installation on a well tested system.
During the testing i modify /etc/fstab switching between noauto/auto options of nfs shares and later changing server uri to ip address or adding new entries from other servers.

And i install the test kernels, of course ;)

Today i install the commit 50f208e18014589971583a8495987194724d56e4 again ... and it crashes as i said before.

Unfortunately the kernel 4.4.0-54 is not available any longer from offical repos ... so i not able to test this one again ... but maybe this kernel had also the problem ?!

At the beginning i was very confused because of some kernel crashes ... not on the first try but on the second or third try.

In fact i found only one old offical kernel that do not have the problem and this is the 16.04.1 installation kernel 4.0.31

But i will try to find the last working kernel between 4.4.0-34 to 4.4.0-53 tomorrow.

Mainline Kernel 4.10-rc4 and 4.9.4 both work since i modify server uri to ip address in /etc/stab

> By that I mean you see a crash with nearly identical messages in the kernel log.

Yes, of cource ... when i wrote the kernel crash this happend allways at the same place in the log with similar log entries

> If there are multiple different issues going on it's best to try to focus on one at a time, if possible

That is clear ... for example the resolv the ip from uri problem is secondary.
And there is also another bug with accessing a nfsv4 subshare

Revision history for this message

Seth Forshee (sforshee) wrote on 2017-01-19: Re: [Bug 1650336] Re: NFS client : kernel 4.4.0-57 crash with nfsv4 enries in /etc/fstab

#25

On Thu, Jan 19, 2017 at 06:14:33PM -0000, Thomas Fili wrote:
> @Seth : Yes, this problem is very confusing

Thanks for the clarifications.

> Unfortunately the kernel 4.4.0-54 is not available any longer from
> offical repos ... so i not able to test this one again ... but maybe
> this kernel had also the problem ?!

Yeah, looks like this build never made it out of our PPA.

> But i will try to find the last working kernel between 4.4.0-34 to
> 4.4.0-53 tomorrow.

Please do. So far we've been working under the assumption that this is a
bug introduced after 4.4.0-54, so if it was introduced before that we
would never have found it. Honestly though if the build at commit
50f208e18014589971583a8495987194724d56e4 is bad then 4.4.0-54 is almost
certainly bad as well.

> And there is also another bug with accessing a nfsv4 subshare

Yes, I haven't forgotten about this one. I'm waiting on the upstream
developers right now, but if they don't come back with something by
early next week I'll pursue a temporary fix in x/y.

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-20:

#26

@Seth

> Thanks for the clarifications.

Har Har

> Yeah, looks like this build never made it out of our PPA.

Unfortunately, i do not found this kernel versions on one of our computers.
But i am rather sure having seen this kernel in the offical repos

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-20:

#27

So, now i try to formulate a new short bug description with all facts i know at the moment :

Since kernel version 4.4.0-42 (offical repo for 16.04.1) the boot process crashed when there are at least two nfsv4 entries to the same nfs-server in /etc/fstab

With only one share entry in the /etc/fstab the boot prozess do not crash.

The last working kernel not having this problem is 4.4.0-38.

Mainline Kernel 4.10-rc4 and 4.9.4 both work without problems

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-01-30:

#28

Hm, sorry i hope there was no misunderstanding ?!

As i mentioned some times before, there was a strange behaviour when i test the kernels the first time.
Sometimes a kernel boot successfully for one or two times and crash not before the third try ...

I could reproduce this behaviour on different computers, but at the beginning of the tests i declare kernel would be ok by mistake this way. Maybe someone else are able to explain such a behaviour ...

When i tested the kernels the second time, i try to boot every kernel three time to ensure getting the correct tag.

Sorry again, for the unnecessary work

Revision history for this message

Seth Forshee (sforshee) wrote on 2017-01-30:

#29

There were two commits to sunrpc between 4.4.0-38 and 4.4.0-42 which came from upstream 4.4 stable.

4bb0ea1f3289 SUNRPC: Handle EADDRNOTAVAIL on connection failures
8785a1d6c5b3 SUNRPC: allow for upcalls for same uid but different gss service

There's a later commit which says it fixes problems with the latter of these, and specifically mentions a NULL derefernce in rpc_pipe_read:

1cded9d2974f SUNRPC: fix refcounting problems with auth_gss messages.

That one should be coming from upstream stable too, but it looks like we don't have it yet.

Joe, could you provide Thomas with a test kernel containing that fix that he can test? Thanks.

Revision history for this message

Seth Forshee (sforshee) wrote on 2017-02-15:

#30

Sorry for the delay. Please test the kernel below to see if it fixes the problem. It also includes the submount permission fix from the other bug.

http://people.canonical.com/~sforshee/lp1650336/linux-4.4.0-63.84+lp1650336v201702150737/

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-02-16:

#31

Ok, looks good, i tried several reboots with this kernel and all were successfull :)

Seth Forshee (sforshee) on 2017-02-16

Changed in linux (Ubuntu):
status:	In Progress → Fix Released
Changed in linux (Ubuntu Xenial):
assignee:	Joseph Salisbury (jsalisbury) → Seth Forshee (sforshee)

Seth Forshee (sforshee) on 2017-02-16

description:

updated

Tim Gardner (timg-tpi) on 2017-02-16

Changed in linux (Ubuntu Xenial):
status:	In Progress → Fix Committed

Revision history for this message

Brad Figg (brad-figg) wrote on 2017-02-27:

#32

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-xenial

Revision history for this message

Thomas Fili (tfili69) wrote on 2017-03-01:

#33

Looks good,

this bug seems to be solved with the -proposed kernel 4.4.0-65.86 ... good work ... thank you very much

I changed the tag 'verification-needed-xenial' to 'verification-done-xenial'

tags:

added: verification-done-xenial
removed: verification-needed-xenial

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-03-02:

#34

Download full text (14.5 KiB)

This bug was fixed in the package linux - 4.4.0-65.86

---------------
linux (4.4.0-65.86) xenial; urgency=low

* linux: 4.4.0-65.86 -proposed tracker (LP: #1667052)

  [ Stefan Bader ]
  * Upgrade Redpine RS9113 driver to support AP mode (LP: #1665211)
    - SAUCE: Redpine driver to support Host AP mode

  * NFS client : permission denied when trying to access subshare, since kernel
    4.4.0-31 (LP: #1649292)
    - fs: Better permission checking for submounts

  * [Hyper-V] SAUCE: pci-hyperv fixes for SR-IOV on Azure (LP: #1665097)
    - SAUCE: PCI: hv: Fix wslot_to_devfn() to fix warnings on device removal
    - SAUCE: pci-hyperv: properly handle pci bus remove
    - SAUCE: pci-hyperv: lock pci bus on device eject

  * [Hyper-V/Azure] Please include Mellanox OFED drivers in Azure kernel and
    image (LP: #1650058)
    - net/mlx4_en: Fix bad WQE issue
    - net/mlx4_core: Fix racy CQ (Completion Queue) free
    - net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT
      transitions
    - net/mlx4_core: Avoid command timeouts during VF driver device shutdown

  * Xenial update to v4.4.49 stable release (LP: #1664960)
    - ARC: [arcompact] brown paper bag bug in unaligned access delay slot fixup
    - selinux: fix off-by-one in setprocattr
    - Revert "x86/ioapic: Restore IO-APIC irq_chip retrigger callback"
    - cpumask: use nr_cpumask_bits for parsing functions
    - hns: avoid stack overflow with CONFIG_KASAN
    - ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write
    - target: Don't BUG_ON during NodeACL dynamic -> explicit conversion
    - target: Use correct SCSI status during EXTENDED_COPY exception
    - target: Fix early transport_generic_handle_tmr abort scenario
    - target: Fix COMPARE_AND_WRITE ref leak for non GOOD status
    - ARM: 8642/1: LPAE: catch pending imprecise abort on unmask
    - mac80211: Fix adding of mesh vendor IEs
    - netvsc: Set maximum GSO size in the right place
    - scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed
      send
    - scsi: aacraid: Fix INTx/MSI-x issue with older controllers
    - scsi: mpt3sas: disable ASPM for MPI2 controllers
    - xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
    - ALSA: seq: Fix race at creating a queue
    - ALSA: seq: Don't handle loop timeout at snd_seq_pool_done()
    - drm/i915: fix use-after-free in page_flip_completed()
    - Linux 4.4.49

  * NFS client : kernel 4.4.0-57 crash with nfsv4 enries in /etc/fstab
    (LP: #1650336)
    - SUNRPC: fix refcounting problems with auth_gss messages.

* [0bda:0328] Card reader failed after S3 (LP: #1664809)
- usb: hub: Wait for connection to be reestablished after port reset

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

* ibmvscsis: Add SGL LIMIT (LP: #1662551)
- ibmvscsis: Add SGL limit

  * [Hyper-V] Bug fixes for storvsc (tagged queuing, error conditions)
    (LP: #1663687)
    - scsi: storvsc: Enable tracking of queue depth
    - scsi: storvsc: Remove the ...

This bug was fixed in the package linux - 4.4.0-65.86

---------------
linux (4.4.0-65.86) xenial; urgency=low

* linux: 4.4.0-65.86 -proposed tracker (LP: #1667052)

[ Stefan Bader ]
  * Upgrade Redpine RS9113 driver to support AP mode (LP: #1665211)
    - SAUCE: Redpine driver to support Host AP mode

* NFS client : permission denied when trying to access subshare, since kernel
    4.4.0-31 (LP: #1649292)
    - fs: Better permission checking for submounts

* [Hyper-V] SAUCE: pci-hyperv fixes for SR-IOV on Azure (LP: #1665097)
    - SAUCE: PCI: hv: Fix wslot_to_devfn() to fix warnings on device removal
    - SAUCE: pci-hyperv: properly handle pci bus remove
    - SAUCE: pci-hyperv: lock pci bus on device eject

* [Hyper-V/Azure] Please include Mellanox OFED drivers in Azure kernel and
    image (LP: #1650058)
    - net/mlx4_en: Fix bad WQE issue
    - net/mlx4_core: Fix racy CQ (Completion Queue) free
    - net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT
      transitions
    - net/mlx4_core: Avoid command timeouts during VF driver device shutdown

* Xenial update to v4.4.49 stable release (LP: #1664960)
    - ARC: [arcompact] brown paper bag bug in unaligned access delay slot fixup
    - selinux: fix off-by-one in setprocattr
    - Revert "x86/ioapic: Restore IO-APIC irq_chip retrigger callback"
    - cpumask: use nr_cpumask_bits for parsing functions
    - hns: avoid stack overflow with CONFIG_KASAN
    - ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write
    - target: Don't BUG_ON during NodeACL dynamic -> explicit conversion
    - target: Use correct SCSI status during EXTENDED_COPY exception
    - target: Fix early transport_generic_handle_tmr abort scenario
    - target: Fix COMPARE_AND_WRITE ref leak for non GOOD status
    - ARM: 8642/1: LPAE: catch pending imprecise abort on unmask
    - mac80211: Fix adding of mesh vendor IEs
    - netvsc: Set maximum GSO size in the right place
    - scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed
      send
    - scsi: aacraid: Fix INTx/MSI-x issue with older controllers
    - scsi: mpt3sas: disable ASPM for MPI2 controllers
    - xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
    - ALSA: seq: Fix race at creating a queue
    - ALSA: seq: Don't handle loop timeout at snd_seq_pool_done()
    - drm/i915: fix use-after-free in page_flip_completed()
    - Linux 4.4.49

* NFS client : kernel 4.4.0-57 crash with nfsv4 enries in /etc/fstab
    (LP: #1650336)
    - SUNRPC: fix refcounting problems with auth_gss messages.

* [0bda:0328] Card reader failed after S3 (LP: #1664809)
    - usb: hub: Wait for connection to be reestablished after port reset

* linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

* ibmvscsis: Add SGL LIMIT (LP: #1662551)
    - ibmvscsis: Add SGL limit

* [Hyper-V] Bug fixes for storvsc (tagged queuing, error conditions)
    (LP: #1663687)
    - scsi: storvsc: Enable tracking of queue depth
    - scsi: storvsc: Remove the restriction on max segment size
    - scsi: storvsc: Enable multi-queue support
    - scsi: storvsc: use tagged SRB requests if supported by the device
    - scsi: storvsc: properly handle SRB_ERROR when sense message is present
    - scsi: storvsc: properly set residual data length on errors

* ISST-LTE:pNV: ppc64_cpu command is hung w HDs, SSDs and NVMe (LP: #1662666)
    - blk-mq: Avoid memory reclaim when remapping queues
    - blk-mq: Fix failed allocation path when mapping queues

* Possible missing firmware /lib/firmware/i915/kbl_dmc_ver1.bin for module
    i915_bpo (LP: #1624164)
    - SAUCE: i915_bpo: Remove MODULE_FIRMWARE statement for i915/kbl_dmc_ver1.bin

*  Intel I210 ethernet does not work both after S3 (LP: #1662763)
    - igb: implement igb_ptp_suspend
    - igb: call igb_ptp_suspend during suspend/resume cycle

* [Hyper-V] Fix ring buffer handling to avoid host throttling (LP: #1661430)
    - Drivers: hv: vmbus: On write cleanup the logic to interrupt the host
    - Drivers: hv: vmbus: On the read path cleanup the logic to interrupt the host
    - Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()

* brd module compiled as built-in (LP: #1593293)
    - [Config] CONFIG_BLK_DEV_RAM=m

* regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

* Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

* flock not mediated by 'k' (LP: #1658219)
    - SAUCE: apparmor: flock mediation is not being enforced on cache check

* unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

* apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

* tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

* apparmor_parser hangs indefinitely when called by multiple threads
    (LP: #1645037)
    - SAUCE: apparmor: fix lock ordering for mkdir

* apparmor leaking securityfs pin count (LP: #1660846)
    - SAUCE: apparmor: fix leak on securityfs pin count

* apparmor reference count leak when securityfs_setup_d_inode\ () fails
    (LP: #1660845)
    - SAUCE: apparmor: fix reference count leak when securityfs_setup_d_inode()
      fails

* apparmor not checking error if security_pin_fs() fails (LP: #1660842)
    - SAUCE: apparmor: fix not handling error case when securityfs_pin_fs() fails

* apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

* apparmor  auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

* apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

* apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

* apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

* unix domain socket cross permission check failing with nested namespaces
    (LP: #1660832)
    - SAUCE: apparmor: fix cross ns perm of unix domain sockets

* docker permission issues with overlay2 storage driver (LP: #1659417)
    - SAUCE: overlayfs: Replace ovl_prepare_creds() with ovl_override_creds()
    - Revert "UBUNTU: SAUCE: cred: Add clone_cred() interface"
    - ovl: check mounter creds on underlying lookup

* Enable CONFIG_NET_DROP_MONITOR=m in Ubuntu Kernel (LP: #1660634)
    - [Config] CONFIG_NET_DROP_MONITOR=m

* Xenial update to v4.4.48 stable release (LP: #1663657)
    - PCI/ASPM: Handle PCI-to-PCIe bridges as roots of PCIe hierarchies
    - ext4: validate s_first_meta_bg at mount time
    - drm/nouveau/disp/gt215: Fix HDA ELD handling (thus, HDMI audio) on gt215
    - drm/nouveau/nv1a,nv1f/disp: fix memory clock rate retrieval
    - crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg
    - crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes
    - perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory
    - ata: sata_mv:- Handle return value of devm_ioremap.
    - libata: apply MAX_SEC_1024 to all CX1-JB*-HP devices
    - powerpc/eeh: Fix wrong flag passed to eeh_unfreeze_pe()
    - powerpc: Add missing error check to prom_find_boot_cpu()
    - NFSD: Fix a null reference case in find_or_create_lock_stateid()
    - svcrpc: fix oops in absence of krb5 module
    - zswap: disable changing params if init fails
    - cifs: initialize file_info_lock
    - mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()
    - mm, fs: check for fatal signals in do_generic_file_read()
    - can: bcm: fix hrtimer/tasklet termination in bcm op removal
    - mmc: sdhci: Ignore unexpected CARD_INT interrupts
    - percpu-refcount: fix reference leak during percpu-atomic transition
    - HID: wacom: Fix poor prox handling in 'wacom_pl_irq'
    - KVM: x86: do not save guest-unsupported XSAVE state
    - USB: serial: qcserial: add Dell DW5570 QDL
    - USB: serial: pl2303: add ATEN device ID
    - USB: Add quirk for WORLDE easykey.25 MIDI keyboard
    - usb: gadget: f_fs: Assorted buffer overflow checks.
    - USB: serial: option: add device ID for HP lt2523 (Novatel E371)
    - x86/irq: Make irq activate operations symmetric
    - base/memory, hotplug: fix a kernel oops in show_valid_zones()
    - Linux 4.4.48

* Xenial update to v4.4.47 stable release (LP: #1662507)
    - r8152: fix the sw rx checksum is unavailable
    - mlxsw: spectrum: Fix memory leak at skb reallocation
    - mlxsw: switchx2: Fix memory leak at skb reallocation
    - mlxsw: pci: Fix EQE structure definition
    - net: lwtunnel: Handle lwtunnel_fill_encap failure
    - net: ipv4: fix table id in getroute response
    - net: systemport: Decouple flow control from __bcm_sysport_tx_reclaim
    - tcp: fix tcp_fastopen unaligned access complaints on sparc
    - openvswitch: maintain correct checksum state in conntrack actions
    - ravb: do not use zero-length alignment DMA descriptor
    - ax25: Fix segfault after sock connection timeout
    - net: fix harmonize_features() vs NETIF_F_HIGHDMA
    - net: phy: bcm63xx: Utilize correct config_intr function
    - ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock
    - tcp: initialize max window for a new fastopen socket
    - bridge: netlink: call br_changelink() during br_dev_newlink()
    - r8152: don't execute runtime suspend if the tx is not empty
    - af_unix: move unix_mknod() out of bindlock
    - qmi_wwan/cdc_ether: add device ID for HP lt2523 (Novatel E371) WWAN card
    - net: dsa: Bring back device detaching in dsa_slave_suspend()
    - Linux 4.4.47

* Xenial update to v4.4.46 stable release (LP: #1660994)
    - fbdev: color map copying bounds checking
    - tile/ptrace: Preserve previous registers for short regset write
    - drm: Fix broken VT switch with video=1366x768 option
    - mm/mempolicy.c: do not put mempolicy before using its nodemask
    - sysctl: fix proc_doulongvec_ms_jiffies_minmax()
    - ISDN: eicon: silence misleading array-bounds warning
    - RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
    - s390/ptrace: Preserve previous registers for short regset write
    - can: c_can_pci: fix null-pointer-deref in c_can_start() - set device pointer
    - can: ti_hecc: add missing prepare and unprepare of the clock
    - ARC: udelay: fix inline assembler by adding LP_COUNT to clobber list
    - ARC: [arcompact] handle unaligned access delay slot corner case
    - parisc: Don't use BITS_PER_LONG in userspace-exported swab.h header
    - nfs: Don't increment lock sequence ID after NFS4ERR_MOVED
    - NFSv4.0: always send mode in SETATTR after EXCLUSIVE4
    - SUNRPC: cleanup ida information when removing sunrpc module
    - drm/i915: Don't leak edid in intel_crt_detect_ddc()
    - IB/ipoib: move back IB LL address into the hard header
    - IB/umem: Release pid in error and ODP flow
    - s5k4ecgx: select CRC32 helper
    - pinctrl: broxton: Use correct PADCFGLOCK offset
    - platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT
    - mm, memcg: do not retry precharge charges
    - Linux 4.4.46

* Xenial update to v4.4.45 stable release (LP: #1660993)
    - ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to
      it
    - IB/mlx5: Wait for all async command completions to complete
    - IB/mlx4: Set traffic class in AH
    - IB/mlx4: Fix out-of-range array index in destroy qp flow
    - IB/mlx4: Fix port query for 56Gb Ethernet links
    - IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs
    - IB/IPoIB: Remove can't use GFP_NOIO warning
    - perf scripting: Avoid leaking the scripting_context variable
    - ARM: dts: imx31: fix clock control module interrupts description
    - ARM: dts: imx31: move CCM device node to AIPS2 bus devices
    - ARM: dts: imx31: fix AVIC base address
    - tmpfs: clear S_ISGID when setting posix ACLs
    - x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F
    - svcrpc: don't leak contexts on PROC_DESTROY
    - fuse: clear FR_PENDING flag when moving requests out of pending queue
    - PCI: Enumerate switches below PCI-to-PCIe bridges
    - HID: corsair: fix DMA buffers on stack
    - HID: corsair: fix control-transfer error handling
    - mmc: mxs-mmc: Fix additional cycles after transmission stop
    - ieee802154: atusb: do not use the stack for buffers to make them DMA able
    - mtd: nand: xway: disable module support
    - x86/ioapic: Restore IO-APIC irq_chip retrigger callback
    - qla2xxx: Fix crash due to null pointer access
    - ubifs: Fix journal replay wrt. xattr nodes
    - clocksource/exynos_mct: Clear interrupt when cpu is shut down
    - svcrdma: avoid duplicate dma unmapping during error recovery
    - ARM: 8634/1: hw_breakpoint: blacklist Scorpion CPUs
    - ceph: fix bad endianness handling in parse_reply_info_extra
    - ARM: dts: da850-evm: fix read access to SPI flash
    - arm64/ptrace: Preserve previous registers for short regset write
    - arm64/ptrace: Preserve previous registers for short regset write - 2
    - arm64/ptrace: Preserve previous registers for short regset write - 3
    - arm64/ptrace: Avoid uninitialised struct padding in fpr_set()
    - arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields
    - ARM: dts: imx6qdl-nitrogen6_max: fix sgtl5000 pinctrl init
    - ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation
    - ARM: 8613/1: Fix the uaccess crash on PB11MPCore
    - blackfin: check devm_pinctrl_get() for errors
    - ite-cir: initialize use_demodulator before using it
    - dmaengine: pl330: Fix runtime PM support for terminated transfers
    - selftest/powerpc: Wrong PMC initialized in pmc56_overflow test
    - arm64: avoid returning from bad_mode
    - Linux 4.4.45

-- Thadeu Lima de Souza Cascardo <cascardo@canonical.com>  Thu, 23 Feb 2017 12:37:21 -0300

Changed in linux (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Revision history for this message

urraca (urraca) wrote on 2017-03-28:

#35

Can whoever found the root cause in this case have a look at
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1466654
as well please? It sounds very much related, and is a major issue in our environment.

Revision history for this message

urraca (urraca) wrote on 2017-03-28:

#36

N.B. that according to the changelog of the 4.4.0-70 kernel package, the patch has only been applied to -67 (thus effectively to -70)!

Now, can we assess if the forementioned bug will be fixed by this as well?!?

Revision history for this message

Chris Mohler (evilbob) wrote on 2017-09-26:

#37

I'm getting this same issue in kernel 4.4.0-96 on Xenial. Any suggestions?

Ubuntu
linux package

NFS client : kernel 4.4.0-57 crash with nfsv4 enries in /etc/fstab

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Fix Released	High	Joseph Salisbury
	Xenial	Fix Released	High	Seth Forshee

Ubuntulinux package

NFS client : kernel 4.4.0-57 crash with nfsv4 enries in /etc/fstab

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package