FSCache kernel panic on presumed CIFS share disconnect
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
High
|
Unassigned |
Bug Description
What Happened
-------------
I have a setup with my home NAS server mounted via CIFS to an Ubuntu 18.04 VPS over OpenVPN. The CIFS share is served by Nginx as an open file directory, with fscache/cachefilesd being used to cache the files locally, to reduce the load on my home connection.
I was using this setup for web development by editing files directly on the CIFS share from another machine (mounted via LAN, independently of the OpenVPN server). After making changes to an HTML file, I noticed that the Nginx server was no longer responding.
I connected to the server via SSH to diagnose the issue and found that trying to perform a file listing (using ls) on the fileshare mount froze my shell, so I opened another SSH connection. I ran dmesg, expecting to find a CIFS timeout error, but instead there was an fscache error trace. Thinking it would help resolve the issue, I ran `systemd restart cachefilesd` to restart cachefilesd. My SSH connection froze again, but this time the server was no longer responding to SSH connections and pings. I was forced to perform a hard reset to restore operation.
Upon further inspection of the logs, I noticed that the NAS reconnected to the OpenVPN server between my first and second SSH connection, so it's possible that a network issue caused the initial fscache error. I have no way of confirming this, that's why I put "presumed" in the title.
I suspect the issue here might be a race condition when invalidating a cache object (the updated HTML file) during a network issue.
What I expected to happen
-------
Accessing an updated file shouldn't cause an fscache error, even when a network error occurs.
Restarting cachefilesd shouldn't cause a fatal panic, even when a network error occurs.
Machine information
-------------------
/proc/version_
lsb_release:
Description: Ubuntu 18.04.1 LTS
Release: 18.04
apt-cache policy linux-image-
linux-image-
Installed: 4.15.0-20.21
Candidate: 4.15.0-20.21
Version table:
*** 4.15.0-20.21 500
500 http://
100 /var/lib/
apt-cache policy cachefilesd cachefilesd:
cachefilesd:
Installed: 0.10.10-0.1
Candidate: 0.10.10-0.1
Version table:
*** 0.10.10-0.1 500
500 http://
100 /var/lib/
apt-cache policy libsmbclient:
libsmbclient:
Installed: (none)
Candidate: 2:4.7.6+
Version table:
2:
500 http://
2:
500 http://
2:
500 http://
apt-cache policy cifs-utils:
cifs-utils:
Installed: 2:6.8-1
Candidate: 2:6.8-1
Version table:
*** 2:6.8-1 500
500 http://
100 /var/lib/
The VPS provider is UK2, I believe they use KVM.
journalctl output
-----------------
Private and irrelevant information removed.
Nov 26 00:30:31 hostname kernel: [kworke] preemptive burial: OBJ66 [DROP_OBJECT] 000000004b2987a1
--- [snip] (CacheFiles error + kernel trace) ---
Nov 26 00:34:01 hostname kernel: CacheFiles: Error: Object already active
Nov 26 00:34:01 hostname kernel: CacheFiles: object: OBJ6d
Nov 26 00:34:01 hostname kernel: CacheFiles: objstate=
Nov 26 00:34:01 hostname kernel: CacheFiles: ops=0 inp=0 exc=0
Nov 26 00:34:01 hostname kernel: CacheFiles: parent=
Nov 26 00:34:01 hostname kernel: CacheFiles: cookie=
Nov 26 00:34:01 hostname kernel: CacheFiles: key=[8] 'dd05000000000000'
Nov 26 00:34:01 hostname kernel: ------------[ cut here ]------------
Nov 26 00:34:01 hostname kernel: kernel BUG at /build/
Nov 26 00:34:01 hostname kernel: invalid opcode: 0000 [#1] SMP PTI
Nov 26 00:34:01 hostname kernel: Modules linked in: binfmt_misc cmac arc4 md4 nls_utf8 cifs ccm cachefiles fscache ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_addrtype ipt_MASQUERADE nf_nat_
Nov 26 00:34:01 hostname kernel: async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid cirrus ttm psmouse drm_kms_helper syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops virtio_blk drm i2c_piix4 pata_acpi floppy
Nov 26 00:34:01 hostname kernel: CPU: 0 PID: 22679 Comm: kworker/u64:2 Not tainted 4.15.0-20-generic #21-Ubuntu
Nov 26 00:34:01 hostname kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Nov 26 00:34:01 hostname kernel: Workqueue: fscache_object fscache_
Nov 26 00:34:01 hostname kernel: RIP: 0010:cachefiles
Nov 26 00:34:01 hostname kernel: RSP: 0018:ffffa36bc5
Nov 26 00:34:01 hostname kernel: RAX: ffff96b739444801 RBX: ffff96b732f5e3a8 RCX: 0000000000847bda
Nov 26 00:34:01 hostname kernel: RDX: 0000000000847bd9 RSI: ffff96b73be27140 RDI: ffff96b73b807000
Nov 26 00:34:01 hostname kernel: RBP: ffffa36bc5a87de8 R08: 0000000000027140 R09: ffffffffc0709f4b
Nov 26 00:34:01 hostname kernel: R10: ffffd9cc40e51100 R11: 0000000000010101 R12: ffff96b732f5d200
Nov 26 00:34:01 hostname kernel: R13: ffff96b732f5d310 R14: ffff96b70730ec00 R15: ffff96b732f5e3a8
Nov 26 00:34:01 hostname kernel: FS: 000000000000000
Nov 26 00:34:01 hostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 00:34:01 hostname kernel: CR2: 00007ff8ad9d9f60 CR3: 000000002d522000 CR4: 00000000000006f0
Nov 26 00:34:01 hostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 26 00:34:01 hostname kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 26 00:34:01 hostname kernel: Call Trace:
Nov 26 00:34:01 hostname kernel: ? wait_woken+
Nov 26 00:34:01 hostname kernel: cachefiles_
Nov 26 00:34:01 hostname kernel: fscache_
Nov 26 00:34:01 hostname kernel: fscache_
Nov 26 00:34:01 hostname kernel: process_
Nov 26 00:34:01 hostname kernel: worker_
Nov 26 00:34:01 hostname kernel: kthread+0x121/0x140
Nov 26 00:34:01 hostname kernel: ? process_
Nov 26 00:34:01 hostname kernel: ? kthread_
Nov 26 00:34:01 hostname kernel: ? do_syscall_
Nov 26 00:34:01 hostname kernel: ? kthread_
Nov 26 00:34:01 hostname kernel: ret_from_
Nov 26 00:34:01 hostname kernel: Code: 1e ca 48 c7 c7 70 b1 70 c0 e8 ed 58 1e ca 0f 0b e8 d6 3a 18 ca 48 c7 c7 30 b2 70 c0 e8 da 58 1e ca 31 f6 4c 89 e7 e8 e8 2a 00 00 <0f> 0b 80 bb 1c 01 00 00 00 0f 85 6c f9 ff ff 3d 00 00 20 00 0f
Nov 26 00:34:01 hostname kernel: RIP: cachefiles_
Nov 26 00:34:01 hostname kernel: ---[ end trace 788b2ad4e37538d5 ]---
--- [snip] (fileshare server reconnects to OpenVPN) ---
Nov 26 00:34:43 hostname openvpn[15768]: [snip]:[snip] TLS: tls_process: killed expiring key
Nov 26 00:34:46 hostname openvpn[15768]: [snip]:[snip] TLS: soft reset sec=0
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] VERIFY OK: depth=1, CN=[snip]
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] VERIFY KU OK
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Validating certificate extended key usage
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] ++ Certificate has EKU (str) TLS Web Client Authentication, expects TLS Web Client Authentication
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] VERIFY EKU OK
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] VERIFY OK: depth=0, CN=[snip]
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] peer info: IV_VER=2.3.11
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] peer info: IV_PLAT=linux
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] peer info: IV_PROTO=2
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Outgoing Data Channel: Cipher 'AES-256-CBC' initialized with 256 bit key
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Outgoing Data Channel: Using 256 bit message hash 'SHA256' for HMAC authentication
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Incoming Data Channel: Cipher 'AES-256-CBC' initialized with 256 bit key
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Incoming Data Channel: Using 256 bit message hash 'SHA256' for HMAC authentication
Nov 26 00:34:48 hostname openvpn[15768]: [snip]:[snip] Control Channel: TLSv1.2, cipher TLSv1.2 DHE-RSA-
--- [snip] (i connect via ssh) ---
Nov 26 00:35:25 hostname sshd[23061]: Connection from [snip] port [snip] on [snip] port [snip]
Nov 26 00:35:25 hostname sshd[23061]: Postponed publickey for tjg from [snip] port [snip] ssh2 [preauth]
Nov 26 00:35:26 hostname sshd[23061]: Accepted publickey for tjg from [snip] port [snip] ssh2: RSA SHA256:/[snip]
Nov 26 00:35:26 hostname sshd[23061]: pam_unix(
Nov 26 00:35:26 hostname systemd-
Nov 26 00:35:26 hostname systemd[1]: Started Session 214 of user tjg.
Nov 26 00:35:27 hostname sshd[23061]: User child is on pid 23141
Nov 26 00:35:27 hostname sshd[23141]: Starting session: shell on pts/1 for tjg from [snip] port [snip] id 0
-- [snip] (i try to restart cachefilesd) --
Nov 26 00:35:51 hostname sudo[23179]: tjg : TTY=pts/1 ; PWD=/home/tjg ; USER=root ; COMMAND=
Nov 26 00:35:51 hostname sudo[23179]: pam_unix(
Nov 26 00:35:51 hostname systemd[1]: Stopping LSB: CacheFiles daemon...
Nov 26 00:35:51 hostname cachefilesd[23183]: * Stopping FilesCache daemon cachefilesd
Nov 26 00:35:51 hostname cachefilesd[903]: Daemon Terminated
Nov 26 00:35:51 hostname kernel: CacheFiles: File cache on vda1 unregistering
Nov 26 00:35:51 hostname kernel: FS-Cache: Withdrawing cache "mycache"
--- [crash] (no more logs after this point until hard reset 5 minutes later) ---
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1805062
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.