cifs: Copying file to same directory results in page fault

Bug #2060919 reported by fprietog
54
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Mantic
Fix Committed
High
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/2060919

[Impact]

Copying or modifying a file to the same directory within a cifs mount results in a page fault, and the process that initiated the copy being killed. This could be cp, nautilus, etc.

This results in the following oops:

BUG: unable to handle page fault for address: fffffffffffffffe
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD f45a3f067 P4D f45a3f067 PUD f45a41067 PMD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 28103 Comm: Thread (pooled) Tainted: P OE 6.5.0-27-generic #28-Ubuntu
RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 89 45 d4 e8 79 b3 23 f1 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
RSP: 0018:ffffaab6865ffbf8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffaab6865ffc28 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000023854 R11: 0000000000000000 R12: 0000000000000000
R13: ffffaab6865ffc78 R14: ffff906675d8aed0 R15: ffffaab6865ffc70
FS: 00007bd4d594b6c0(0000) GS:ffff90753f800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffffe CR3: 000000017022a000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? show_regs+0x6d/0x80
? __die+0x24/0x80
? page_fault_oops+0x99/0x1b0
? kernelmode_fixup_or_oops+0xb2/0x140
? __bad_area_nosemaphore+0x1a5/0x2c0
? bad_area_nosemaphore+0x16/0x30
? do_kern_addr_fault+0x7b/0xa0
? exc_page_fault+0x1a4/0x1b0
? asm_exc_page_fault+0x27/0x30
? cifs_flush_folio+0x41/0xf0 [cifs]
? cifs_flush_folio+0x37/0xf0 [cifs]
cifs_remap_file_range+0x172/0x660 [cifs]
do_clone_file_range+0x101/0x2d0
vfs_clone_file_range+0x3f/0x150
ioctl_file_clone+0x52/0xc0
do_vfs_ioctl+0x68f/0x910
? __fget_light+0xa5/0x120
__x64_sys_ioctl+0x7d/0xf0
do_syscall_64+0x59/0x90
? kmem_cache_free+0x22/0x3e0
? putname+0x5b/0x80
? exit_to_user_mode_prepare+0x30/0xb0
? syscall_exit_to_user_mode+0x37/0x60
? do_syscall_64+0x68/0x90
? do_syscall_64+0x68/0x90
? do_syscall_64+0x68/0x90

There is no known workaround.

[Fix]

The stacktrace is very similar to a regression reported to upstream 6.1.y:

https://lore.kernel<email address hidden>/T/

The thread mentions that:

commit 7b2404a886f8b91250c31855d287e632123e1746
Author: David Howells <email address hidden>
Date: Fri Dec 1 00:22:00 2023 +0000
Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b2404a886f8b91250c31855d287e632123e1746

introduced the issue to Debian's 6.1 kernel.

This got backported to Ubuntu in:

commit 3adbe2ccd8b9b8fde93e03958d6176945794d288
Author: David Howells <email address hidden>
Date: Fri Dec 1 00:22:00 2023 +0000
Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()

$ git describe --contains 3adbe2ccd8b9b8fde93e03958d6176945794d288
Ubuntu-6.5.0-20.20~107

Which we have been using for some time now, and is not the culprit.

Reading the regression mailing list thread, they mention that things work differently in 6.1:

> Yeah. __filemap_get_folio() works differently in v6.1.y. There it returns a
> folio or NULL. In 6.7 it returns a folio or a negative error code. The error
> check in cifs_flush_folio() needs to change to something like:
>
> folio = filemap_get_folio(inode->i_mapping, index);
> if (!folio)
> return -ENOMEM;
>
> David

6.1.y then got a specific patch to fix the issue in 6.1, which is:

commit 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
Author: Steve French <email address hidden>
Date: Fri Jan 12 23:08:51 2024 -0600
Subject: cifs: fix flushing folio regression for 6.1 backport
Link: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1

$ git describe --contains 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
Ubuntu-6.5.0-27.28~162

Since the Ubuntu mantic kernel consumes both 6.1.y and 6.7.y / 6.8.y stable patches, this patch was applied to mantic's 6.5 kernel by mistake, and contains the wrong logic for how __filemap_get_folio() works in 6.5.

The fix is to revert "cifs: fix flushing folio regression for 6.1 backport" as a SAUCE patch.

[Testcase]

Start two VMs. One is recommended to be Debian 12, which is what some users have had luck with in the past, as the server, and the client can be mantic.

Server
------

$ sudo apt update
$ sudo apt upgrade
$ sudo apt install samba
$ sudo vim /etc/samba/smb.conf
server min protocol = NT1
[sambashare]
    comment = Samba on Ubuntu
    path = /home/ubuntu/sambashare
    read only = no
    browsable = yes
$ mkdir ~/sambashare
$ sudo smbpasswd -a ubuntu

Client
------

$ sudo apt update
$ sudo apt install cifs-utils
$ mkdir ~/share
$ sudo mount -t cifs -o username=ubuntu //192.168.122.185/sambashare ~/share
Password for ubuntu@//192.168.122.185/sambashare:
$ mount -l
...
//192.168.122.185/sambashare on /home/ubuntu/share type cifs (rw,relatime,vers=3.1.1,cache=strict,username=ubuntu,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.122.185,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)

$ ls
hallo.txt hello.txt sample.txt sample2.txt
$ sudo cp hello.txt hello.txt.1
Killed

If you install the test kernel available from the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp2060919-test

The copy will work as expected.

[Where problems could occur]

Reverting the patch restores logic back to how it was between 6.5.0-20-generic through to 6.5.0-26-generic, which functions, and is well tested by the community.

If a regression were to occur, it would impact all writes to cifs mounts, particularly to the same destination directory as the origin file. There is no known workarounds.

Revision history for this message
fprietog (fprietog) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux-raspi (Ubuntu):
status: New → Confirmed
Revision history for this message
fprietog (fprietog) wrote :
Download full text (5.7 KiB)

==========================================================
Example for Kernel "6.5.0-1014-raspi #17-Ubuntu" (aarch64)
==========================================================
# lsb_release -rd
No LSB modules are available.
Description: Ubuntu 23.10
Release: 23.10

# uname -a
Linux fpgrpi 6.5.0-1014-raspi #17-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 21 11:24:03 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

# cat /proc/version_signature
Ubuntu 6.5.0-1014.17-raspi 6.5.13

----------------------------
How to reproduce the problem
----------------------------
For instance, I'm using KeePassXC (https://launchpad.net/ubuntu/+source/keepassxc) to update a database located at CIFS filesystem. Any change done to that database causes this Kernel error exception:

abr 11 09:11:02 fpgrpi kernel: Unable to handle kernel paging request at virtual address fffffffffffffffe
abr 11 09:11:02 fpgrpi kernel: Mem abort info:
abr 11 09:11:02 fpgrpi kernel: ESR = 0x0000000096000004
abr 11 09:11:02 fpgrpi kernel: EC = 0x25: DABT (current EL), IL = 32 bits
abr 11 09:11:02 fpgrpi kernel: SET = 0, FnV = 0
abr 11 09:11:02 fpgrpi kernel: EA = 0, S1PTW = 0
abr 11 09:11:02 fpgrpi kernel: FSC = 0x04: level 0 translation fault
abr 11 09:11:02 fpgrpi kernel: Data abort info:
abr 11 09:11:02 fpgrpi kernel: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
abr 11 09:11:02 fpgrpi kernel: CM = 0, WnR = 0, TnD = 0, TagAccess = 0
abr 11 09:11:02 fpgrpi kernel: GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
abr 11 09:11:02 fpgrpi kernel: swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000001498000
abr 11 09:11:02 fpgrpi kernel: [fffffffffffffffe] pgd=0000000000000000, p4d=0000000000000000
abr 11 09:11:02 fpgrpi kernel: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
abr 11 09:11:02 fpgrpi kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs cmac algif_hash algif_skcipher af_alg bnep lz4 lz4_compress zram zsmalloc nft_chain_nat nf_nat sunrpc xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 binfmt_misc nft_compat nf_tables nfnetlink brcmfmac_wcc btsdio brcmfmac vc4 brcmutil snd_soc_hdmi_codec hci_uart cfg80211 drm_display_helper btqca btrtl raspberrypi_hwmon cec drm_dma_helper btbcm btintel bluetooth raspberrypi_gpiomem drm_kms_helper ecdh_generic rfkill ecc snd_soc_core bcm2835_v4l2(CE) bcm2835_codec(CE) bcm2835_isp(CE) bcm2835_mmal_vchiq(CE) snd_compress ac97_bus snd_pcm_dmaengine rpivid_hevc(CE) vc_sm_cma(CE) snd_bcm2835(CE) v4l2_mem2mem videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc snd_pcm snd_timer joydev nvmem_rmem input_leds uio_pdrv_genirq uio tcp_bbr sch_fq ecryptfs parport_pc ppdev lp parport fuse
abr 11 09:11:02 fpgrpi kernel: ip_tables x_tables autofs4 btrfs blake2b_generic raid10 hid_generic raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx usbhid xor xor_neon uas usb_storage raid6_pq libcrc32c raid1 raid0 multipath linear md_mod dm_mirror dm_region_hash dm_log dm_mod dax zstd z3fold spidev dwc2 v3d gpu_sched drm_shmem_helper drm roles crct10dif_ce drm_panel_...

Read more...

summary: Remote filesystems mounted as CIFS not working after update to Kernel
- "6.5.0-27-generic #28-Ubuntu"
+ "6.5.0-27-generic #28-Ubuntu" (amd64) or Kernel "6.5.0-1014-raspi
+ #17-Ubuntu" (aarch64).
Revision history for this message
Chrescht (sekateur) wrote : Re: Remote filesystems mounted as CIFS not working after update to Kernel "6.5.0-27-generic #28-Ubuntu" (amd64) or Kernel "6.5.0-1014-raspi #17-Ubuntu" (aarch64).

Same issue on kernel 6.5.0-27-generic #28~22.04.1-Ubuntu

Revision history for this message
fprietog (fprietog) wrote :

> Same issue on kernel 6.5.0-27-generic #28~22.04.1-Ubuntu

That kernel is from the "linux-hwe-6.5" package. I'll try to link this bug to that package to avoid duplication.

Revision history for this message
Pgenest (genpas) wrote :
Download full text (11.0 KiB)

Same issue when I copy files from a Synology smb share to another directory on the same smb share
Issue occured after system update.

$ lsb_release -rd
No LSB modules are available.
Description: Ubuntu 23.10
Release: 23.10

$ uname -a
Linux Panoramix 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 7 18:21:00 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/version_signature
Ubuntu 6.5.0-27.28-generic 6.5.13

--------------------
/var/log/syslog
--------------------
2024-04-11T15:20:25.107768+02:00 Panoramix kernel: [ 5748.234124] BUG: unable to handle page fault for address: fffffffffffffffe
2024-04-11T15:20:25.107782+02:00 Panoramix kernel: [ 5748.234132] #PF: supervisor read access in kernel mode
2024-04-11T15:20:25.107784+02:00 Panoramix kernel: [ 5748.234136] #PF: error_code(0x0000) - not-present page
2024-04-11T15:20:25.107786+02:00 Panoramix kernel: [ 5748.234138] PGD 44d43f067 P4D 44d43f067 PUD 44d441067 PMD 0
2024-04-11T15:20:25.107788+02:00 Panoramix kernel: [ 5748.234146] Oops: 0000 [#1] PREEMPT SMP NOPTI
2024-04-11T15:20:25.107790+02:00 Panoramix kernel: [ 5748.234150] CPU: 10 PID: 326370 Comm: pool-nemo Tainted: G OE 6.5.0-27-generic #28-Ubuntu
2024-04-11T15:20:25.107792+02:00 Panoramix kernel: [ 5748.234155] Hardware name: Micro-Star International Co., Ltd. MS-7C56/MPG B550 GAMING PLUS (MS-7C56), BIOS 1.90 03/17/2022
2024-04-11T15:20:25.107794+02:00 Panoramix kernel: [ 5748.234158] RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
2024-04-11T15:20:25.107809+02:00 Panoramix kernel: [ 5748.234214] Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 89 45 d4 e8 79 23 f5 e0 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
2024-04-11T15:20:25.107811+02:00 Panoramix kernel: [ 5748.234217] RSP: 0018:ffffacb394a9fc30 EFLAGS: 00010282
2024-04-11T15:20:25.107813+02:00 Panoramix kernel: [ 5748.234221] RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
2024-04-11T15:20:25.107815+02:00 Panoramix kernel: [ 5748.234223] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2024-04-11T15:20:25.107817+02:00 Panoramix kernel: [ 5748.234226] RBP: ffffacb394a9fc60 R08: 0000000000000001 R09: 0000000000000000
2024-04-11T15:20:25.107819+02:00 Panoramix kernel: [ 5748.234228] R10: 0000000000003950 R11: 0000000000000000 R12: 0000000000000000
2024-04-11T15:20:25.107821+02:00 Panoramix kernel: [ 5748.234230] R13: ffffacb394a9fcb0 R14: ffff97b687327508 R15: ffffacb394a9fca8
2024-04-11T15:20:25.107823+02:00 Panoramix kernel: [ 5748.234233] FS: 000079a0b57f26c0(0000) GS:ffff97bd0dc80000(0000) knlGS:0000000000000000
2024-04-11T15:20:25.107825+02:00 Panoramix kernel: [ 5748.234236] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-04-11T15:20:25.107826+02:00 Panoramix kernel: [ 5748.234239] CR2: fffffffffffffffe CR3: 000000018f466000 CR4: 0000000000750ee0
2024-04-11T15:20:25.107828+02:00 Panoramix kernel: [ 5748.234242] PKRU: 55555554
2024-04-11T15:20:25.107830+02:00 Panoramix kernel: [ 5748.234244] Call Trace:
2024-04-11T15:20:25.107831+02:00 Panoramix kernel: [ 5748.234247] <TASK>
2024-04-11T15:20:25.107833+02:00 Panoramix ke...

Revision history for this message
Chrescht (sekateur) wrote :

No issue in kernel 6.5.0-26-generic.

Changed in linux (Ubuntu):
assignee: nobody → Jose Ogando Justo (joseogando)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-hwe-6.5 (Ubuntu):
status: New → Confirmed
Revision history for this message
Dean Attewell (eitsop) wrote :

Same issue on 6.5.0-1017-azure

Revision history for this message
Peter Mühlenpfordt (muehlenp) wrote :
Download full text (9.4 KiB)

I can mount cifs shares with kernel 6.5.0-27, but when copying files from the share to the same share the process is killed everytime. Same effect using bash "cp" or KDE kio copy via Krusader.
Running with kernel 6.5.0-26 anything works perfect.

Syslog says:
2024-04-14T16:06:01.997083+02:00 monster kernel: [ 42.667380] BUG: unable to handle page fault for address: fffffffffffffffe
2024-04-14T16:06:01.997099+02:00 monster kernel: [ 42.667391] #PF: supervisor read access in kernel mode
2024-04-14T16:06:01.997100+02:00 monster kernel: [ 42.667393] #PF: error_code(0x0000) - not-present page
2024-04-14T16:06:01.997101+02:00 monster kernel: [ 42.667396] PGD 1afa3f067 P4D 1afa3f067 PUD 1afa41067 PMD 0
2024-04-14T16:06:01.997101+02:00 monster kernel: [ 42.667411] Oops: 0000 [#1] PREEMPT SMP PTI
2024-04-14T16:06:01.997102+02:00 monster kernel: [ 42.667414] CPU: 0 PID: 4326 Comm: cp Tainted: P O 6.5.0-27-generic #28-Ubuntu
2024-04-14T16:06:01.997104+02:00 monster kernel: [ 42.667418] Hardware name: System manufacturer System Product Name/P8B75-V, BIOS 1608 03/18/2014
2024-04-14T16:06:01.997104+02:00 monster kernel: [ 42.667420] RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
2024-04-14T16:06:01.997105+02:00 monster kernel: [ 42.667495] Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 89 45 d4 e8 79 b3 13 ed 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
2024-04-14T16:06:01.997107+02:00 monster kernel: [ 42.667498] RSP: 0018:ffff9f2c835bbcb0 EFLAGS: 00010282
2024-04-14T16:06:01.997107+02:00 monster kernel: [ 42.667501] RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
2024-04-14T16:06:01.997108+02:00 monster kernel: [ 42.667503] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2024-04-14T16:06:01.997109+02:00 monster kernel: [ 42.667505] RBP: ffff9f2c835bbce0 R08: 0000000000000001 R09: 0000000000000000
2024-04-14T16:06:01.997110+02:00 monster kernel: [ 42.667507] R10: 0000000000000028 R11: 0000000000000000 R12: 0000000000000000
2024-04-14T16:06:01.997110+02:00 monster kernel: [ 42.667509] R13: ffff9f2c835bbd30 R14: ffff89ca1f3d8d60 R15: ffff9f2c835bbd28
2024-04-14T16:06:01.997111+02:00 monster kernel: [ 42.667511] FS: 00007a893d461540(0000) GS:ffff89cb15c00000(0000) knlGS:0000000000000000
2024-04-14T16:06:01.997112+02:00 monster kernel: [ 42.667513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-04-14T16:06:01.997113+02:00 monster kernel: [ 42.667516] CR2: fffffffffffffffe CR3: 000000017f140006 CR4: 00000000001706f0
2024-04-14T16:06:01.997113+02:00 monster kernel: [ 42.667518] Call Trace:
2024-04-14T16:06:01.997114+02:00 monster kernel: [ 42.667521] <TASK>
2024-04-14T16:06:01.997115+02:00 monster kernel: [ 42.667524] ? show_regs+0x6d/0x80
2024-04-14T16:06:01.997116+02:00 monster kernel: [ 42.667532] ? __die+0x24/0x80
2024-04-14T16:06:01.997116+02:00 monster kernel: [ 42.667536] ? page_fault_oops+0x99/0x1b0
2024-04-14T16:06:01.997117+02:00 monster kernel: [ 42.667541] ? kernelmode_fixup_or_oops+0xb2/0x140
2024-04-14T16:06:01.997117+02:00 monster kernel: [ 42.667544] ...

Read more...

Revision history for this message
Peter Mühlenpfordt (muehlenp) wrote :

Same problem on Debian 12 with kernel 6.1.0-17-amd64 (OK with kernel 6.1.0-16-amd64).

Revision history for this message
fprietog (fprietog) wrote :

Today's released kernels:
- 6.5.0-28-generic (amd64)
- 6.5.0-1015-raspi (arm64)

Still have this bug.

Revision history for this message
Matthew Ruffell (mruffell) wrote :
Download full text (4.2 KiB)

Hello,

I can't actually reproduce the issue, I seem to be missing something.

Server
------

$ sudo apt update
$ sudo apt upgrade
$ sudo apt install samba
$ sudo vim /etc/samba/smb.conf
server min protocol = NT1
[sambashare]
    comment = Samba on Ubuntu
    path = /home/ubuntu/sambashare
    read only = no
    browsable = yes
$ mkdir ~/sambashare
$ sudo smbpasswd -a ubuntu

Client
------

$ sudo apt update
$ sudo apt install cifs-utils
$ mkdir ~/share
$ sudo mount -t cifs -o username=ubuntu //192.168.122.185/sambashare ~/share
Password for ubuntu@//192.168.122.185/sambashare:
$ mount -l
...
//192.168.122.185/sambashare on /home/ubuntu/share type cifs (rw,relatime,vers=3.1.1,cache=strict,username=ubuntu,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.122.185,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)

$ ls
hallo.txt hello.txt sample.txt sample2.txt
$ sudo cp hello.txt hello.txt.1
$ ll
total 2097176
drwxr-xr-x 2 root root 0 Apr 19 04:46 ./
drwxr-x--- 5 ubuntu ubuntu 4096 Apr 19 03:57 ../
-rwxr-xr-x 1 root root 1960 Apr 19 03:55 hallo.txt*
-rwxr-xr-x 1 root root 1960 Apr 19 04:04 hello.txt*
-rwxr-xr-x 1 root root 1960 Apr 19 04:46 hello.txt.1*
-rwxr-xr-x 1 root root 1073741824 Apr 19 04:01 sample.txt*
-rwxr-xr-x 1 root root 1073741824 Apr 19 04:04 sample2.txt*

No oops in dmesg. I'm not sure what I'm doing wrong. Maybe you can help.

Anyway, I had a look at the stack trace, and it seems to be related to the regression reported here:

https://lore.kernel<email address hidden>/T/

But I think its slightly different in the Ubuntu 6.5 kernel.

The thread mentions that:

commit 7b2404a886f8b91250c31855d287e632123e1746
Author: David Howells <email address hidden>
Date: Fri Dec 1 00:22:00 2023 +0000
Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b2404a886f8b91250c31855d287e632123e1746

introduced the issue to Debian's 6.1 kernel.

This got backported to Ubuntu in:

commit 3adbe2ccd8b9b8fde93e03958d6176945794d288
Author: David Howells <email address hidden>
Date: Fri Dec 1 00:22:00 2023 +0000
Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()

$ git describe --contains 3adbe2ccd8b9b8fde93e03958d6176945794d288
Ubuntu-6.5.0-20.20~107

Which we have been using for some time now.

Reading the regression mailing list thread, they mention that things work differently in 6.1:

> Yeah. __filemap_get_folio() works differently in v6.1.y. There it returns a
> folio or NULL. In 6.7 it returns a folio or a negative error code. The error
> check in cifs_flush_folio() needs to change to something like:
>
> folio = filemap_get_folio(inode->i_mapping, index);
> if (!folio)
> return -ENOMEM;
>
> David

okay... so 6.1 is different from 6.5. Since we were okay with the patch, we probably didn't need to add anything further.

But they fixed 6.1 with:

commit 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
Author: Steve Fre...

Read more...

tags: added: seg
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi everyone,

As promised, I uploaded test kernels to the below ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp2060919-test

They are based on 6.5.0-27-generic to keep it simple.

It just has "cifs: fix flushing folio regression for 6.1 backport". Thats it.
Pretty small patch:

commit 9dc02a5b7540d18a69bcbaf8f4fa428e32075b4b (HEAD -> lp2060919-test)
Author: Matthew Ruffell <email address hidden>
Date: Fri Apr 19 17:25:48 2024 +1200

    Revert "cifs: fix flushing folio regression for 6.1 backport"

    This reverts commit 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1.

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 55a6d0296ec8..82313b253463 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1245,7 +1245,7 @@ static int cifs_flush_folio(struct inode *inode, loff_t pos, loff_t *_fstart, lo
        int rc = 0;

        folio = filemap_get_folio(inode->i_mapping, index);
- if (!folio)
+ if (IS_ERR(folio))
                return 0;

        size = folio_size(folio);

They are going to take about three hours to build, so check back later on. It's close to my EOD, and I can't watch these builds complete.

Please note this package is NOT SUPPORTED by Canonical, and is for TESTING
PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to Install (On a jammy or mantic system):
1) sudo add-apt-repository ppa:mruffell/lp2060919-test
2) sudo apt update
3) sudo apt install linux-image-unsigned-6.5.0-27-generic linux-modules-6.5.0-27-generic linux-modules-extra-6.5.0-27-generic linux-headers-6.5.0-27-generic
4) sudo reboot
5) uname -rv
6.5.0-27.28+TEST2060919v20240419b1
or
6.5.0-27.28~22.04.1+TEST2060919v20240419b1

Make sure the +TEST2060919v20240419b1 is present.

Can you let me know if it solves the problem?

Additionally, if you could help me create a reproducer that would help immensely.

Remember, they still need to build, check the ppa link to make sure it built successfully and is published before installing. About 3 hours from now.

Thanks,
Matthew

Revision history for this message
fprietog (fprietog) wrote :

> I think this patch is probably not needed in 6.5, and causes the issue. How about I build
> you a test kernel with "cifs: fix flushing folio regression for 6.1 backport" reverted,
> and then we see if it fixes things?

Yep, fantastic. I can reproduce this problem anytime so I can test it.

> If you can help me make a reproducer, that would be great. I can test myself then.

Here is some info from a faulty system. Hope it helps:

***** fstab mount:
//192.168.1.10/FPGNAS /mnt/FPGNAS_CIFS cifs _netdev,x-systemd.requires=network-online.target,async,nosuid,nodev,noexec,iocharset=utf8,file_mode=0640,dir_mode=0750,uid=1026,gid=100,credentials=/etc/.cifspwd,rw 0 0

***** mount info:
//192.168.1.10/FPGNAS on /mnt/FPGNAS_CIFS type cifs (rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=strict,username=fprietog,uid=1026,noforceuid,gid=100,noforcegid,addr=192.168.1.10,file_mode=0640,dir_mode=0750,iocharset=utf8,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1,_netdev,x-systemd.requires=network-online.target)

***** cp from a file to a file both in same share:
# cd /mnt/FPGNAS_CIFS/tmp
/mnt/FPGNAS_CIFS/tmp# touch sample1.txt
/mnt/FPGNAS_CIFS/tmp# cp sample1.txt sample2.txt
Terminado (killed)

After that error the CIFS mount become unusable as any operation just hangs.

***** The remote cifs server is a Synology DS213j NAS. Samba version is:
# smbd -V
Version 4.15.9
Synology Build 42934, Jul 5 2023 16:52:06

Please, let me know if I can give you any other info that helps to solve the problem.

Revision history for this message
fprietog (fprietog) wrote :

>As promised, I uploaded test kernels to the below ppa:
>
>https://launchpad.net/~mruffell/+archive/ubuntu/lp2060919-test

OK. I'll test once it's built ends. Thank you!

Revision history for this message
fprietog (fprietog) wrote :

Tested:

# uname -rv
6.5.0-27-generic #28+TEST2060919v20240419b1-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 19

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.10
Release: 23.10
Codename: mantic

# cd /mnt/fpgnas/tmp
/mnt/fpgnas/tmp# touch sample1.txt
/mnt/fpgnas/tmp# cp sample1.txt sample2.txt

Success!

I've tested some other cases as well and CIFS works as expected for all of them. So I think you have find the culprit.

Thank you very much.

Revision history for this message
fprietog (fprietog) wrote :
Revision history for this message
johnfoss (johnfoss) wrote (last edit ):

In my reproduction environment i am testing against synology smb server too.

now when i boot the test kernel:
$ uname -rv
6.5.0-27-generic #28~22.04.1+TEST2060919v20240419b1-Ubuntu SMP PREEMPT_DYNAMIC Fr
sudo mount -t cifs -o username=jan,uid=1001,gid=1001 //192.168.8.3/home testhome/
nautilus testhome
everything works in nautilus and with cp :-)

Thank you very much Matthew Ruffell (mruffell)

When i run current kernel only nautilus File-Manager copy funktion is affected here !
I mean i can use cp in terminal without getting a stracktrace in kernel linux-image-6.5.0-28-generic

when i do this:
$ uname -rv
6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2
$ sudo mount -t cifs -o username=ubuntu,$(id -u),gid=$(id -g) //192.168.8.3/home testhome/
nautilus testhome
copy any file in testhome to testhome

i get attached stacktrace

Revision history for this message
Peter Mühlenpfordt (muehlenp) wrote :

Many thanks for the successful investigation!
Running the ppa test kernel I can't reproduce the problem anymore. :-)

I use two cifs servers running Debian 11, with a couple of shares. With both standard kernels (6.5.0-27, 6.5.0-28) copying inside a share fails immediately on any server.
My fstab mounts look like this:
//server/store /mnt/store cifs credentials=/home/xxx/.smbpass,uid=1000,gid=1000,_netdev 0 0

Revision history for this message
fprietog (fprietog) wrote :

@mruffell

I've noticed that the users who have tested your kernel got something in common: the remote system mounted as cifs isn't in an Ubuntu server and it's kernel seems to be older than Ubuntu's client one... Maybe it's something that can help you to reproduce the problem yourself.

Revision history for this message
Peter Mühlenpfordt (muehlenp) wrote :

I can produce the error with two fresh installed VMs.

Server
- Debian 12 installed from debian-12.5.0-amd64-netinst.iso with current updates
- kernel "Linux cifstest-server 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux"
- cifs share as in #14

Client
- Ubuntu 23.10 installed from ubuntu-23.10.1-desktop-amd64.iso with current updates
- kernel "Linux cifs-client 6.5.0-28-generic #29-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 28 23:46:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux"
- cifs mount as in #14
- cd ~/share; touch abc; cp abc xyz -> "Killed"

no longer affects: linux-raspi (Ubuntu)
no longer affects: linux-hwe-6.5 (Ubuntu)
no longer affects: linux-azure (Ubuntu)
Changed in linux (Ubuntu):
assignee: Jose Ogando Justo (joseogando) → nobody
status: Confirmed → Invalid
Changed in linux (Ubuntu Mantic):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Matthew Ruffell (mruffell)
summary: - Remote filesystems mounted as CIFS not working after update to Kernel
- "6.5.0-27-generic #28-Ubuntu" (amd64) or Kernel "6.5.0-1014-raspi
- #17-Ubuntu" (aarch64).
+ cifs: Copying file to same directory results in page fault
description: updated
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi everyone,

Thanks for helping to test the test kernel, and I am glad that it fixes the issue.

I have written a SRU template and have set it as the description of the bug.

I have submitted the revert to the Kernel team mailing list for SRU:

Cover Letter:
https://lists.ubuntu.com/archives/kernel-team/2024-April/150433.html
Patch:
https://lists.ubuntu.com/archives/kernel-team/2024-April/150434.html

I'll go and have a talk to the kernel team now. This is a regression, yes, but I am not sure if they will be interested in respinning the 2024.04.01 SRU cycle to include the fix, due to it being quite late in the cycle. https://kernel.ubuntu.com/

We just did an emergency respin last week to fix bug 2060780, which was another cifs regression for the 5.15 series, with an unrelated cause.

I'll write back with what the kernel team say, and hopefully give you an estimated time of release. If its this SRU cycle, or the next.

Thanks,
Matthew

PS: Peter, I tried Debian 12.5 and I still can't reproduce the issue.

Revision history for this message
Roxana Nicolescu (roxanan) wrote :

Hi all,

This fix should be released in ~2 weeks.

Changed in linux (Ubuntu Mantic):
status: In Progress → Fix Committed
Revision history for this message
windracer (windracer) wrote :

If I upgrade to 24.04 which just came out today with the 6.8 kernel, would that also fix this?

Revision history for this message
fprietog (fprietog) wrote :

> If I upgrade to 24.04 which just came out today with the 6.8 kernel, would that also fix this?

Seems that this kernel doesn't have this problem, so yes, it will fix the problem.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.5.0-35.35 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux' to 'verification-done-mantic-linux'. If the problem still exists, change the tag 'verification-needed-mantic-linux' to 'verification-failed-mantic-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-v2 verification-needed-mantic-linux
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi everyone,

The kernel team has built the patch into the s2024.04.1 SRU cycle, as 6.5.0-35-generic.

An ETA for this is the week of 13th of May, as per https://kernel.ubuntu.com/

Would anyone be able to help test the kernel by installing 6.5.0-35-generic from -proposed and doing a cifs mount and copying files?

Currently only the Mantic kernel is built. The HWE kernel for Jammy hasn't been built yet, but I will write back once it is available.

The kernel is currently in the kernel team's proposed2 ppa, instead of the usual -proposed pocket, but you can have a look here:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/proposed2

Instructions to install (On a mantic system):
1) sudo add-apt-repository ppa:canonical-kernel-team/proposed2
2) sudo apt update
3) sudo apt install linux-image-6.5.0-35-generic linux-modules-6.5.0-35-generic linux-modules-extra-6.5.0-35-generic linux-headers-6.5.0-35-generic
4) sudo rm /etc/apt/sources.list.d/canonical-kernel-team-ubuntu-proposed2-mantic.sources
5) sudo apt update
6) sudo reboot
7) uname -rv
6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 26 11:23:57 UTC 2024

Let me know if this kernel fixes the issue for you, and if it does, I'll mark the bug as verified.

Thanks,
Matthew

Revision history for this message
fprietog (fprietog) wrote :

Test done. Kernel 6.5.0-35-generic works as expected and solved the problem:

root@fpgmsi:~# uname -rv
6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 26 11:23:57 UTC 2024
root@fpgmsi:~# mount -t cifs -o username=fprietog //192.168.1.10/FPGNAS/ /mnt/share
Password for fprietog@//192.168.1.10/FPGNAS/:
root@fpgmsi:~# cd /mnt/share/tmp
root@fpgmsi:/mnt/share/tmp# touch file1.txt
root@fpgmsi:/mnt/share/tmp# cp file1.txt file2.txt

Thank you very much!

tags: added: verification-done-mantic-linux
removed: verification-needed-mantic-linux
Revision history for this message
Peter Mühlenpfordt (muehlenp) wrote :

I tested two systems with different samba servers and can no longer reproduce the error.
Kernel 6.5.0-35 works for me.
Many thanks! :-)

Revision history for this message
schnebeck (thorsten-schnebeck) wrote :
Download full text (9.1 KiB)

Jammy 22.04
linux-image-6.5.0-28-generic (6.5.0-28.29~22.04.1) is also affected by this error

Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945190] BUG: unable to handle page fault for address: fffffffffffffffe
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945207] #PF: supervisor read access in kernel mode
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945215] #PF: error_code(0x0000) - not-present page
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945222] PGD 25923f067 P4D 25923f067 PUD 259241067 PMD 0
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945240] Oops: 0000 [#1] PREEMPT SMP PTI
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945252] CPU: 1 PID: 39052 Comm: KIO::WorkerThre Tainted: P OE 6.5.0-28-generic #29~22.04.1-Ubuntu
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945264] Hardware name: Dell Inc. Precision 3630 Tower/0NNNCT, BIOS 2.15.0 07/04/2022
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945271] RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945504] Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 89 45 d4 e8 c9 d1 4f e9 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945514] RSP: 0018:ffffadb0875e3cc8 EFLAGS: 00010282
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945524] RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945532] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945539] RBP: ffffadb0875e3cf8 R08: 0000000000000000 R09: 0000000000000000
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945545] R10: 000000000000d3b4 R11: 0000000000000000 R12: 000000000000d3b4
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945552] R13: ffffadb0875e3d40 R14: ffff937a81d2dda0 R15: ffffadb0875e3d38
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945559] FS: 00007b76137fe640(0000) GS:ffff937beba40000(0000) knlGS:0000000000000000
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945568] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945575] CR2: fffffffffffffffe CR3: 00000001683fa004 CR4: 00000000003706e0
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945589] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945596] Call Trace:
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945602] <TASK>
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945611] ? show_regs+0x6d/0x80
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945630] ? __die+0x24/0x80
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945645] ? page_fault_oops+0x99/0x1b0
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945662] ? kernelmode_fixup_or_oops+0xb2/0x140
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945679] ? __bad_area_nosemaphore+0x1a5/0x2c0
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945692] ? alloc_skb_with_frags+0x4a/0x280
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945714] ? bad_area_nosemaphore+0x16/0x30
Apr 30 08:18:25 pzh-pc-sk3 kernel: [73202.945723] ? do_kern_a...

Read more...

Revision history for this message
Pgenest (genpas) wrote :

It's ok for me too.
Tested with the new kernel and everything works as expected.
Thank you very much.

$ uname -rv
6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 26 11:23:57 UTC 2024

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.