Mounting LVM snapshots with xfs can hit kernel BUG in nvme driver
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Heitor Alves de Siqueira |
Bug Description
[Impact]
When mounting LVM snapshots using xfs, it's possible to hit a BUG_ON() in nvme driver.
Upstream commit 729204ef49ec ("block: relax check on sg gap") introduced a way to merge bios if they are physically contiguous. This can lead to issues if one rq starts with a non-aligned buffer, as it can cause the merged segment to end in an unaligned virtual boundary. In some AWS instances, it's possible to craft such a request when attempting to mount LVM snapshots using xfs. This will then cause a kernel spew due to a BUG_ON in nvme_setup_prps(), which checks if dma_len is aligned to the page size.
[Fix]
Upstream commit 5a8d75a1b8c9 ("block: fix bio_will_gap() for first bvec with offset") prevents requests that begin with an unaligned buffer from being merged.
[Test Case]
This has been verified on AWS with c5d.large instances:
1) Prepare the LVM device + snapshot
$ sudo vgcreate vg0 /dev/nvme1n1
$ sudo lvcreate -L5G -n data0 vg0
$ sudo mkfs.xfs /dev/vg0/data0
$ sudo mount /dev/vg0/data0 /mnt
$ sudo touch /mnt/test
$ sudo touch /mnt/test2
$ sudo ls /mnt
$ sudo umount /mnt
$ sudo lvcreate -l100%FREE -s /dev/vg0/data0 -n data0_snap
2) Attempting to mount the previously created snapshot results in the Oops:
$ sudo mount /dev/vg0/data0_snap /mnt
Segmentation fault (core dumped)
[Regression Potential]
The fix prevents some bios from being merged, so it can have a performance impact in certain scenarios. The patch only targets misaligned segments, so the impact should be less noticeable in the general case.
The commit is also present in mainline kernels since 4.13, and hasn't been changed significantly, so potential for other regressions should be low.
CVE References
Changed in linux (Ubuntu): | |
assignee: | Heitor Alves de Siqueira (halves) → nobody |
status: | New → Fix Released |
Changed in linux (Ubuntu Xenial): | |
status: | New → Confirmed |
assignee: | nobody → Heitor Alves de Siqueira (halves) |
description: | updated |
Changed in linux (Ubuntu Xenial): | |
status: | Confirmed → Fix Committed |
For reference, the kernel spew of the BUG_ON:
[ 78.354129] kernel BUG at /home/ubuntu/ xenial- aws/drivers/ nvme/host/ pci.c:619! iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ena ffffffff815dbd0 6>] [<ffffffff815db d06>] nvme_queue_ rq+0x8c6/ 0xa60 3bf7c8 EFLAGS: 00010286 0(0000) GS:ffff880130a0 0000(0000) knlGS:000000000 0000000 617>] blk_mq_ make_request+ 0x407/0x550 f14>] generic_ make_request+ 0x114/0x2d0 371>] ? bvec_alloc+ 0x91/0x100 146>] submit_ bio+0x76/ 0x160 a14>] _xfs_buf_ ioapply+ 0x2e4/0x4a0 [xfs] 2e0>] ? wake_up_q+0x70/0x70 c94>] ? xfs_bwrite+ 0x24/0x60 [xfs] 75d>] xfs_buf_ submit_ wait+0x5d/ 0x230 [xfs] c94>] xfs_bwrite+ 0x24/0x60 [xfs] 08f>] xlog_bwrite+ 0x7f/0x100 [xfs] f34>] xlog_write_ log_records+ 0x1a4/0x230 [xfs] 077>] xlog_clear_ stale_blocks+ 0xb7/0x1b0 [xfs] 98f>] ? xlog_bread+ 0x3f/0x50 [xfs] 5eb>] xlog_find_ tail+0x2db/ 0x3b0 [xfs] 6ed>] xlog_recover+ 0x2d/0x160 [xfs]
[ 78.357297] invalid opcode: 0000 [#1] SMP
[ 78.359613] Modules linked in: dm_snapshot dm_bufio xfs ppdev serio_raw parport_pc 8250_fintek parport i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
[ 78.387878] CPU: 0 PID: 1687 Comm: mount Not tainted 4.4.0-1105-aws #116
[ 78.390837] Hardware name: Amazon EC2 c5d.large/, BIOS 1.0 10/16/2017
[ 78.393692] task: ffff8800bb155400 ti: ffff8800b93bc000 task.ti: ffff8800b93bc000
[ 78.396973] RIP: 0010:[<
[ 78.400787] RSP: 0018:ffff8800b9
[ 78.403151] RAX: 0000000000000078 RBX: 0000000000001000 RCX: 0000000000001000
[ 78.406276] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000000
[ 78.409390] RBP: ffff8800b93bf8a8 R08: ffff8800b916c700 R09: 0000000000001000
[ 78.412518] R10: 000000000001ec00 R11: ffff8800b8e30000 R12: 00000000fffffc00
[ 78.417056] R13: 0000000000000010 R14: 000000000000fc00 R15: 0000000035fd5000
[ 78.421581] FS: 00007f30fe04384
[ 78.427884] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 78.431827] CR2: 00007f57d4057889 CR3: 0000000035974000 CR4: 0000000000360670
[ 78.436322] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 78.440821] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 78.445316] Stack:
[ 78.447706] ffff880036009480 ffff880036009700 ffff8800b7782800 0000000000000ff8
[ 78.454583] ffff8800b8e30420 ffff8800360a9400 ffff88000001fc00 ffff8800b7697b00
[ 78.461462] ffff880100001000 ffff8800b8e30000 ffff88003604c000 00000001ffc00400
[ 78.468332] Call Trace:
[ 78.470921] [<ffffffff813e6
[ 78.475001] [<ffffffff813d8
[ 78.479110] [<ffffffff813d0
[ 78.482936] [<ffffffff813d9
[ 78.486680] [<ffffffffc0347
[ 78.490866] [<ffffffff810b2
[ 78.494601] [<ffffffffc0349
[ 78.498583] [<ffffffffc0349
[ 78.502861] [<ffffffffc0349
[ 78.506785] [<ffffffffc0371
[ 78.510787] [<ffffffffc0371
[ 78.515192] [<ffffffffc0372
[ 78.519596] [<ffffffffc0371
[ 78.523588] [<ffffffffc0376
[ 78.527705] [<ffffffffc0376
[ 78.531720] [<ffffffff...