Activity log for bug #2003714

Date Who What changed Old value New value Message
2023-01-23 13:44:00 Tim Gardner bug added bug
2023-01-23 13:44:17 Tim Gardner affects linux (Ubuntu) linux-azure (Ubuntu)
2023-01-23 13:44:17 Tim Gardner linux-azure (Ubuntu): importance Undecided High
2023-01-23 13:44:17 Tim Gardner linux-azure (Ubuntu): status New In Progress
2023-01-23 13:44:17 Tim Gardner linux-azure (Ubuntu): assignee Tim Gardner (timg-tpi)
2023-01-23 13:49:37 Tim Gardner description SRU Justification [Impact] Microsoft TDX enabled hyper visors cause a segfault due to an upstream glibc bug. This can be worked around with a kernel patch. Issue Description: When I start an Intel TDX Ubuntu 22.04 (or RHEL 9.0) guest on Hyper-V, the guest always hits segfaults and can’t boot up. Here the kernel running in the guest is the upstream kernel + my TDX patchset, or the 5.19.0-azure kernel + the same TDX patchset: [Fix] We confirmed the segfault also happens to TDX guests on the KVM hypervisor. After I checked with more Intel folks, it turns out this is indeed a glibc bug (https://sourceware.org/bugzilla/show_bug.cgi?id=28784), which has been fixed in the upsteram glibc, but Ubuntu 22.04 and newer haven’t picked up the glibc fix yet. I got a kernel side temporary workarouond from Intel: https://github.com/dcui/tdx/commit/16218cf73491e867fd39c16c9e4b8aa926cbda68, which is on the same existing branch “decui/upstream-kinetic-22.10/master-next/1209”. [ 21.081453] Run /inits init process [ 21.086896] with arguments: [ 21.095790] /init [ 21.100982] with environment: [ 21.106611] HOME=/ [ 21.112463] TERM=linux [ 21.119850] BOOT_IMAGE=/boot/vmlinuz-6.1.0-rc7-decui+ Loading, please wait... Starting version 249.11-0ubuntu3.6 [ 21.253908] udevadm[144]: segfault at 56538d61e0c0 ip 00007f8f5899efeb sp 00007ffd08fb7648 error 6 in libc.so.6[7f8f58820000+195000] likely on CPU 0 (core 0, socket 0) [ 21.316549] Code: 07 62 e1 7d 48 e7 4f 01 62 e1 7d 48 e7 67 40 62 e1 7d 48 e7 6f 41 62 61 7d 48 e7 87 00 20 00 00 62 61 7d 48 e7 8f 40 20 00 00 <62> 61 7d 48 e7 a7 00 30 00 00 62 61 7d 48 e7 af 40 30 00 00 48 83 Segmentation fault [ 22.499317] setfont[153]: segfault at 55ef3b91b000 ip 00007f5899899fa4 sp 00007ffc8008f628 error 4 in libc.so.6[7f589971b000+195000] likely on CPU 0 (core 0, socket 0) [ 22.602677] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83 [ 22.732413] loadkeys[156]: segfault at 563ffe292000 ip 00007fbff957afa4 sp 00007ffe31453808 error 4 in libc.so.6[7fbff93fc000+195000] likely on CPU 0 (core 0, socket 0) [ 22.833061] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83 The segfault only happens to recent glibc versions (e.g. v2.35 in Ubuntu 22.04, and v2.34 in RHEL 9.0). It doesn’t happens to v2.31 in Ubuntu 20.04, or v2.32 in Ubuntu 20.10. So something in glibc must have changed between v2.32 (good) and 2.34+ (not working for TDX). The oddity is: when I run the same Ubuntu 22.04/RHEL 9.0 image as a regular non-TDX guest, the segfault never happens. If I boot up a Ubuntu 20.04 TDX guest (which works fine), mount a Ubuntu 22.04 VHD image (“mount /dev/sdd1 /mnt”) and try to run “chroot /mnt”, I hit the same segfault: [ 109.478556] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Quota mode: none. [ 129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48) [ 129.242434] Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7 It looks like the application is referencing a memory location that somehow triggers a page fault, which is converted to a sigal SIGSEGV, which causes a segfault and terminates the application (I’m not sure where the below “movntdq” instructions come from): root@decui-u2004-u28:/opt/linus-0824# echo 'Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7' | scripts/decodecode Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7 All code ======== 0: e7 bf out %eax,$0xbf 2: 30 10 xor %dl,(%rax) 4: 00 00 add %al,(%rax) 6: 66 44 0f e7 87 00 20 movntdq %xmm8,0x2000(%rdi) d: 00 00 f: 66 44 0f e7 8f 10 20 movntdq %xmm9,0x2010(%rdi) 16: 00 00 18: 66 44 0f e7 97 20 20 movntdq %xmm10,0x2020(%rdi) 1f: 00 00 21: 66 44 0f e7 9f 30 20 movntdq %xmm11,0x2030(%rdi) 28: 00 00 2a:* 66 44 0f e7 a7 00 30 movntdq %xmm12,0x3000(%rdi) <-- trapping instruction 31: 00 00 33: 66 44 0f e7 af 10 30 movntdq %xmm13,0x3010(%rdi) 3a: 00 00 3c: 66 data16 3d: 44 rex.R 3e: 0f .byte 0xf 3f: e7 .byte 0xe7 Code starting with the faulting instruction =========================================== 0: 66 44 0f e7 a7 00 30 movntdq %xmm12,0x3000(%rdi) 7: 00 00 9: 66 44 0f e7 af 10 30 movntdq %xmm13,0x3010(%rdi) 10: 00 00 12: 66 data16 13: 44 rex.R 14: 0f .byte 0xf 15: e7 .byte 0xe7 After I add a delay of “sleep 2 minutes” in the kernel’s arch/x86/mm/fault.c: show_signal_msg(), it turns out somehow the application is trying to write to the end of the heap area (which doesn’t seem to be mapped in the process’s address space), and the segfault is triggered: [ 129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48) root@decui-u2004-u28:/proc/2112# cat maps 5569874a9000-5569874d8000 r--p 00000000 08:31 1582 /mnt/usr/bin/bash 5569874d8000-5569875b7000 r-xp 0002f000 08:31 1582 /mnt/usr/bin/bash 5569875b7000-5569875f1000 r--p 0010e000 08:31 1582 /mnt/usr/bin/bash 5569875f2000-5569875f6000 r--p 00148000 08:31 1582 /mnt/usr/bin/bash 5569875f6000-5569875ff000 rw-p 0014c000 08:31 1582 /mnt/usr/bin/bash 5569875ff000-55698760a000 rw-p 00000000 00:00 0 556987833000-556987854000 rw-p 00000000 00:00 0 [heap] 7f8846400000-7f88466e9000 r--p 00000000 08:31 6124 /mnt/usr/lib/locale/locale-archive 7f8846800000-7f8846828000 r--p 00000000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846828000-7f88469bd000 r-xp 00028000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f88469bd000-7f8846a15000 r--p 001bd000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a15000-7f8846a19000 r--p 00214000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a19000-7f8846a1b000 rw-p 00218000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a1b000-7f8846a28000 rw-p 00000000 00:00 0 7f8846b09000-7f8846b10000 r--s 00000000 08:31 3841 /mnt/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache 7f8846b10000-7f8846b13000 rw-p 00000000 00:00 0 7f8846b13000-7f8846b21000 r--p 00000000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b21000-7f8846b32000 r-xp 0000e000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b32000-7f8846b40000 r--p 0001f000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b40000-7f8846b44000 r--p 0002c000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b44000-7f8846b45000 rw-p 00030000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b4b000-7f8846b4d000 rw-p 00000000 00:00 0 7f8846b4d000-7f8846b4f000 r--p 00000000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b4f000-7f8846b79000 r-xp 00002000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b79000-7f8846b84000 r--p 0002c000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b85000-7f8846b87000 r--p 00037000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b87000-7f8846b89000 rw-p 00039000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7ffc22eb1000-7ffc22ed2000 rw-p 00000000 00:00 0 [stack] 7ffc22fcd000-7ffc22fd1000 r--p 00000000 00:00 0 [vvar] 7ffc22fd1000-7ffc22fd3000 r-xp 00000000 00:00 0 [vdso] [Where things could go wrong] TDX is a new feature and is unlikely to have regressions. SRU Justification [Impact] Microsoft TDX enabled hyper visors cause a segfault due to an upstream glibc bug. This can be worked around with a kernel patch. Issue Description: When I start an Intel TDX Ubuntu 22.04 (or RHEL 9.0) guest on Hyper-V, the guest always hits segfaults and can’t boot up. Here the kernel running in the guest is the upstream kernel + my TDX patchset, or the 5.19.0-azure kernel + the same TDX patchset: [Fix] We confirmed the segfault also happens to TDX guests on the KVM hypervisor. After I checked with more Intel folks, it turns out this is indeed a glibc bug (https://sourceware.org/bugzilla/show_bug.cgi?id=28784), which has been fixed in the upsteram glibc, but Ubuntu 22.04 and newer haven’t picked up the glibc fix yet. I got a kernel side temporary workarouond from Intel: https://github.com/dcui/tdx/commit/16218cf73491e867fd39c16c9e4b8aa926cbda68, which is on the same existing branch “decui/upstream-kinetic-22.10/master-next/1209”. [ 21.081453] Run /inits init process [ 21.086896] with arguments: [ 21.095790] /init [ 21.100982] with environment: [ 21.106611] HOME=/ [ 21.112463] TERM=linux [ 21.119850] BOOT_IMAGE=/boot/vmlinuz-6.1.0-rc7-decui+ Loading, please wait... Starting version 249.11-0ubuntu3.6 [ 21.253908] udevadm[144]: segfault at 56538d61e0c0 ip 00007f8f5899efeb sp 00007ffd08fb7648 error 6 in libc.so.6[7f8f58820000+195000] likely on CPU 0 (core 0, socket 0) [ 21.316549] Code: 07 62 e1 7d 48 e7 4f 01 62 e1 7d 48 e7 67 40 62 e1 7d 48 e7 6f 41 62 61 7d 48 e7 87 00 20 00 00 62 61 7d 48 e7 8f 40 20 00 00 <62> 61 7d 48 e7 a7 00 30 00 00 62 61 7d 48 e7 af 40 30 00 00 48 83 Segmentation fault [ 22.499317] setfont[153]: segfault at 55ef3b91b000 ip 00007f5899899fa4 sp 00007ffc8008f628 error 4 in libc.so.6[7f589971b000+195000] likely on CPU 0 (core 0, socket 0) [ 22.602677] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83 [ 22.732413] loadkeys[156]: segfault at 563ffe292000 ip 00007fbff957afa4 sp 00007ffe31453808 error 4 in libc.so.6[7fbff93fc000+195000] likely on CPU 0 (core 0, socket 0) [ 22.833061] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83 The segfault only happens to recent glibc versions (e.g. v2.35 in Ubuntu 22.04, and v2.34 in RHEL 9.0). It doesn’t happens to v2.31 in Ubuntu 20.04, or v2.32 in Ubuntu 20.10. So something in glibc must have changed between v2.32 (good) and 2.34+ (not working for TDX). The oddity is: when I run the same Ubuntu 22.04/RHEL 9.0 image as a regular non-TDX guest, the segfault never happens. If I boot up a Ubuntu 20.04 TDX guest (which works fine), mount a Ubuntu 22.04 VHD image (“mount /dev/sdd1 /mnt”) and try to run “chroot /mnt”, I hit the same segfault: [ 109.478556] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Quota mode: none. [ 129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48) [ 129.242434] Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7 It looks like the application is referencing a memory location that somehow triggers a page fault, which is converted to a sigal SIGSEGV, which causes a segfault and terminates the application (I’m not sure where the below “movntdq” instructions come from): root@decui-u2004-u28:/opt/linus-0824# echo 'Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7' | scripts/decodecode Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7 All code ========    0: e7 bf out %eax,$0xbf    2: 30 10 xor %dl,(%rax)    4: 00 00 add %al,(%rax)    6: 66 44 0f e7 87 00 20 movntdq %xmm8,0x2000(%rdi)    d: 00 00    f: 66 44 0f e7 8f 10 20 movntdq %xmm9,0x2010(%rdi)   16: 00 00   18: 66 44 0f e7 97 20 20 movntdq %xmm10,0x2020(%rdi)   1f: 00 00   21: 66 44 0f e7 9f 30 20 movntdq %xmm11,0x2030(%rdi)   28: 00 00   2a:* 66 44 0f e7 a7 00 30 movntdq %xmm12,0x3000(%rdi) <-- trapping instruction   31: 00 00   33: 66 44 0f e7 af 10 30 movntdq %xmm13,0x3010(%rdi)   3a: 00 00   3c: 66 data16   3d: 44 rex.R   3e: 0f .byte 0xf   3f: e7 .byte 0xe7 Code starting with the faulting instruction ===========================================    0: 66 44 0f e7 a7 00 30 movntdq %xmm12,0x3000(%rdi)    7: 00 00    9: 66 44 0f e7 af 10 30 movntdq %xmm13,0x3010(%rdi)   10: 00 00   12: 66 data16   13: 44 rex.R   14: 0f .byte 0xf   15: e7 .byte 0xe7 After I add a delay of “sleep 2 minutes” in the kernel’s arch/x86/mm/fault.c: show_signal_msg(), it turns out somehow the application is trying to write to the end of the heap area (which doesn’t seem to be mapped in the process’s address space), and the segfault is triggered: [ 129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48) root@decui-u2004-u28:/proc/2112# cat maps 5569874a9000-5569874d8000 r--p 00000000 08:31 1582 /mnt/usr/bin/bash 5569874d8000-5569875b7000 r-xp 0002f000 08:31 1582 /mnt/usr/bin/bash 5569875b7000-5569875f1000 r--p 0010e000 08:31 1582 /mnt/usr/bin/bash 5569875f2000-5569875f6000 r--p 00148000 08:31 1582 /mnt/usr/bin/bash 5569875f6000-5569875ff000 rw-p 0014c000 08:31 1582 /mnt/usr/bin/bash 5569875ff000-55698760a000 rw-p 00000000 00:00 0 556987833000-556987854000 rw-p 00000000 00:00 0 [heap] 7f8846400000-7f88466e9000 r--p 00000000 08:31 6124 /mnt/usr/lib/locale/locale-archive 7f8846800000-7f8846828000 r--p 00000000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846828000-7f88469bd000 r-xp 00028000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f88469bd000-7f8846a15000 r--p 001bd000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a15000-7f8846a19000 r--p 00214000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a19000-7f8846a1b000 rw-p 00218000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a1b000-7f8846a28000 rw-p 00000000 00:00 0 7f8846b09000-7f8846b10000 r--s 00000000 08:31 3841 /mnt/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache 7f8846b10000-7f8846b13000 rw-p 00000000 00:00 0 7f8846b13000-7f8846b21000 r--p 00000000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b21000-7f8846b32000 r-xp 0000e000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b32000-7f8846b40000 r--p 0001f000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b40000-7f8846b44000 r--p 0002c000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b44000-7f8846b45000 rw-p 00030000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b4b000-7f8846b4d000 rw-p 00000000 00:00 0 7f8846b4d000-7f8846b4f000 r--p 00000000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b4f000-7f8846b79000 r-xp 00002000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b79000-7f8846b84000 r--p 0002c000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b85000-7f8846b87000 r--p 00037000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b87000-7f8846b89000 rw-p 00039000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7ffc22eb1000-7ffc22ed2000 rw-p 00000000 00:00 0 [stack] 7ffc22fcd000-7ffc22fd1000 r--p 00000000 00:00 0 [vvar] 7ffc22fd1000-7ffc22fd3000 r-xp 00000000 00:00 0 [vdso] [Test Plan] Microsoft tested [Where things could go wrong] TDX is a new feature and is unlikely to have regressions.
2023-01-23 13:52:00 Tim Gardner description SRU Justification [Impact] Microsoft TDX enabled hyper visors cause a segfault due to an upstream glibc bug. This can be worked around with a kernel patch. Issue Description: When I start an Intel TDX Ubuntu 22.04 (or RHEL 9.0) guest on Hyper-V, the guest always hits segfaults and can’t boot up. Here the kernel running in the guest is the upstream kernel + my TDX patchset, or the 5.19.0-azure kernel + the same TDX patchset: [Fix] We confirmed the segfault also happens to TDX guests on the KVM hypervisor. After I checked with more Intel folks, it turns out this is indeed a glibc bug (https://sourceware.org/bugzilla/show_bug.cgi?id=28784), which has been fixed in the upsteram glibc, but Ubuntu 22.04 and newer haven’t picked up the glibc fix yet. I got a kernel side temporary workarouond from Intel: https://github.com/dcui/tdx/commit/16218cf73491e867fd39c16c9e4b8aa926cbda68, which is on the same existing branch “decui/upstream-kinetic-22.10/master-next/1209”. [ 21.081453] Run /inits init process [ 21.086896] with arguments: [ 21.095790] /init [ 21.100982] with environment: [ 21.106611] HOME=/ [ 21.112463] TERM=linux [ 21.119850] BOOT_IMAGE=/boot/vmlinuz-6.1.0-rc7-decui+ Loading, please wait... Starting version 249.11-0ubuntu3.6 [ 21.253908] udevadm[144]: segfault at 56538d61e0c0 ip 00007f8f5899efeb sp 00007ffd08fb7648 error 6 in libc.so.6[7f8f58820000+195000] likely on CPU 0 (core 0, socket 0) [ 21.316549] Code: 07 62 e1 7d 48 e7 4f 01 62 e1 7d 48 e7 67 40 62 e1 7d 48 e7 6f 41 62 61 7d 48 e7 87 00 20 00 00 62 61 7d 48 e7 8f 40 20 00 00 <62> 61 7d 48 e7 a7 00 30 00 00 62 61 7d 48 e7 af 40 30 00 00 48 83 Segmentation fault [ 22.499317] setfont[153]: segfault at 55ef3b91b000 ip 00007f5899899fa4 sp 00007ffc8008f628 error 4 in libc.so.6[7f589971b000+195000] likely on CPU 0 (core 0, socket 0) [ 22.602677] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83 [ 22.732413] loadkeys[156]: segfault at 563ffe292000 ip 00007fbff957afa4 sp 00007ffe31453808 error 4 in libc.so.6[7fbff93fc000+195000] likely on CPU 0 (core 0, socket 0) [ 22.833061] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83 The segfault only happens to recent glibc versions (e.g. v2.35 in Ubuntu 22.04, and v2.34 in RHEL 9.0). It doesn’t happens to v2.31 in Ubuntu 20.04, or v2.32 in Ubuntu 20.10. So something in glibc must have changed between v2.32 (good) and 2.34+ (not working for TDX). The oddity is: when I run the same Ubuntu 22.04/RHEL 9.0 image as a regular non-TDX guest, the segfault never happens. If I boot up a Ubuntu 20.04 TDX guest (which works fine), mount a Ubuntu 22.04 VHD image (“mount /dev/sdd1 /mnt”) and try to run “chroot /mnt”, I hit the same segfault: [ 109.478556] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Quota mode: none. [ 129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48) [ 129.242434] Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7 It looks like the application is referencing a memory location that somehow triggers a page fault, which is converted to a sigal SIGSEGV, which causes a segfault and terminates the application (I’m not sure where the below “movntdq” instructions come from): root@decui-u2004-u28:/opt/linus-0824# echo 'Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7' | scripts/decodecode Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7 All code ========    0: e7 bf out %eax,$0xbf    2: 30 10 xor %dl,(%rax)    4: 00 00 add %al,(%rax)    6: 66 44 0f e7 87 00 20 movntdq %xmm8,0x2000(%rdi)    d: 00 00    f: 66 44 0f e7 8f 10 20 movntdq %xmm9,0x2010(%rdi)   16: 00 00   18: 66 44 0f e7 97 20 20 movntdq %xmm10,0x2020(%rdi)   1f: 00 00   21: 66 44 0f e7 9f 30 20 movntdq %xmm11,0x2030(%rdi)   28: 00 00   2a:* 66 44 0f e7 a7 00 30 movntdq %xmm12,0x3000(%rdi) <-- trapping instruction   31: 00 00   33: 66 44 0f e7 af 10 30 movntdq %xmm13,0x3010(%rdi)   3a: 00 00   3c: 66 data16   3d: 44 rex.R   3e: 0f .byte 0xf   3f: e7 .byte 0xe7 Code starting with the faulting instruction ===========================================    0: 66 44 0f e7 a7 00 30 movntdq %xmm12,0x3000(%rdi)    7: 00 00    9: 66 44 0f e7 af 10 30 movntdq %xmm13,0x3010(%rdi)   10: 00 00   12: 66 data16   13: 44 rex.R   14: 0f .byte 0xf   15: e7 .byte 0xe7 After I add a delay of “sleep 2 minutes” in the kernel’s arch/x86/mm/fault.c: show_signal_msg(), it turns out somehow the application is trying to write to the end of the heap area (which doesn’t seem to be mapped in the process’s address space), and the segfault is triggered: [ 129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48) root@decui-u2004-u28:/proc/2112# cat maps 5569874a9000-5569874d8000 r--p 00000000 08:31 1582 /mnt/usr/bin/bash 5569874d8000-5569875b7000 r-xp 0002f000 08:31 1582 /mnt/usr/bin/bash 5569875b7000-5569875f1000 r--p 0010e000 08:31 1582 /mnt/usr/bin/bash 5569875f2000-5569875f6000 r--p 00148000 08:31 1582 /mnt/usr/bin/bash 5569875f6000-5569875ff000 rw-p 0014c000 08:31 1582 /mnt/usr/bin/bash 5569875ff000-55698760a000 rw-p 00000000 00:00 0 556987833000-556987854000 rw-p 00000000 00:00 0 [heap] 7f8846400000-7f88466e9000 r--p 00000000 08:31 6124 /mnt/usr/lib/locale/locale-archive 7f8846800000-7f8846828000 r--p 00000000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846828000-7f88469bd000 r-xp 00028000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f88469bd000-7f8846a15000 r--p 001bd000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a15000-7f8846a19000 r--p 00214000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a19000-7f8846a1b000 rw-p 00218000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a1b000-7f8846a28000 rw-p 00000000 00:00 0 7f8846b09000-7f8846b10000 r--s 00000000 08:31 3841 /mnt/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache 7f8846b10000-7f8846b13000 rw-p 00000000 00:00 0 7f8846b13000-7f8846b21000 r--p 00000000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b21000-7f8846b32000 r-xp 0000e000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b32000-7f8846b40000 r--p 0001f000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b40000-7f8846b44000 r--p 0002c000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b44000-7f8846b45000 rw-p 00030000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b4b000-7f8846b4d000 rw-p 00000000 00:00 0 7f8846b4d000-7f8846b4f000 r--p 00000000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b4f000-7f8846b79000 r-xp 00002000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b79000-7f8846b84000 r--p 0002c000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b85000-7f8846b87000 r--p 00037000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b87000-7f8846b89000 rw-p 00039000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7ffc22eb1000-7ffc22ed2000 rw-p 00000000 00:00 0 [stack] 7ffc22fcd000-7ffc22fd1000 r--p 00000000 00:00 0 [vvar] 7ffc22fd1000-7ffc22fd3000 r-xp 00000000 00:00 0 [vdso] [Test Plan] Microsoft tested [Where things could go wrong] TDX is a new feature and is unlikely to have regressions. SRU Justification [Impact] Microsoft TDX enabled hyper visors cause a segfault due to an upstream glibc bug. This can be worked around with a kernel patch. Issue Description: When I start an Intel TDX Ubuntu 22.04 (or RHEL 9.0) guest on Hyper-V, the guest always hits segfaults and can’t boot up. Here the kernel running in the guest is the upstream kernel + my TDX patchset, or the 5.19.0-azure kernel + the same TDX patchset: [Fix] We confirmed the segfault also happens to TDX guests on the KVM hypervisor. After I checked with more Intel folks, it turns out this is indeed a glibc bug (https://sourceware.org/bugzilla/show_bug.cgi?id=28784), which has been fixed in the upsteram glibc, but Ubuntu 22.04 and newer haven’t picked up the glibc fix yet. I got a kernel side temporary workarouond from Intel: https://github.com/dcui/tdx/commit/16218cf73491e867fd39c16c9e4b8aa926cbda68, which is on the same existing branch “decui/upstream-kinetic-22.10/master-next/1209”. [ 21.081453] Run /inits init process [ 21.086896] with arguments: [ 21.095790] /init [ 21.100982] with environment: [ 21.106611] HOME=/ [ 21.112463] TERM=linux [ 21.119850] BOOT_IMAGE=/boot/vmlinuz-6.1.0-rc7-decui+ Loading, please wait... Starting version 249.11-0ubuntu3.6 [ 21.253908] udevadm[144]: segfault at 56538d61e0c0 ip 00007f8f5899efeb sp 00007ffd08fb7648 error 6 in libc.so.6[7f8f58820000+195000] likely on CPU 0 (core 0, socket 0) [ 21.316549] Code: 07 62 e1 7d 48 e7 4f 01 62 e1 7d 48 e7 67 40 62 e1 7d 48 e7 6f 41 62 61 7d 48 e7 87 00 20 00 00 62 61 7d 48 e7 8f 40 20 00 00 <62> 61 7d 48 e7 a7 00 30 00 00 62 61 7d 48 e7 af 40 30 00 00 48 83 Segmentation fault [ 22.499317] setfont[153]: segfault at 55ef3b91b000 ip 00007f5899899fa4 sp 00007ffc8008f628 error 4 in libc.so.6[7f589971b000+195000] likely on CPU 0 (core 0, socket 0) [ 22.602677] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83 [ 22.732413] loadkeys[156]: segfault at 563ffe292000 ip 00007fbff957afa4 sp 00007ffe31453808 error 4 in libc.so.6[7fbff93fc000+195000] likely on CPU 0 (core 0, socket 0) [ 22.833061] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83 The segfault only happens to recent glibc versions (e.g. v2.35 in Ubuntu 22.04, and v2.34 in RHEL 9.0). It doesn’t happens to v2.31 in Ubuntu 20.04, or v2.32 in Ubuntu 20.10. So something in glibc must have changed between v2.32 (good) and 2.34+ (not working for TDX). The oddity is: when I run the same Ubuntu 22.04/RHEL 9.0 image as a regular non-TDX guest, the segfault never happens. If I boot up a Ubuntu 20.04 TDX guest (which works fine), mount a Ubuntu 22.04 VHD image (“mount /dev/sdd1 /mnt”) and try to run “chroot /mnt”, I hit the same segfault: [ 109.478556] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Quota mode: none. [ 129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48) [ 129.242434] Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7 It looks like the application is referencing a memory location that somehow triggers a page fault, which is converted to a sigal SIGSEGV, which causes a segfault and terminates the application (I’m not sure where the below “movntdq” instructions come from): root@decui-u2004-u28:/opt/linus-0824# echo 'Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7' | scripts/decodecode Code: e7 bf 30 10 00 00 66 44 0f e7 87 00 20 00 00 66 44 0f e7 8f 10 20 00 00 66 44 0f e7 97 20 20 00 00 66 44 0f e7 9f 30 20 00 00 <66> 44 0f e7 a7 00 30 00 00 66 44 0f e7 af 10 30 00 00 66 44 0f e7 All code ========    0: e7 bf out %eax,$0xbf    2: 30 10 xor %dl,(%rax)    4: 00 00 add %al,(%rax)    6: 66 44 0f e7 87 00 20 movntdq %xmm8,0x2000(%rdi)    d: 00 00    f: 66 44 0f e7 8f 10 20 movntdq %xmm9,0x2010(%rdi)   16: 00 00   18: 66 44 0f e7 97 20 20 movntdq %xmm10,0x2020(%rdi)   1f: 00 00   21: 66 44 0f e7 9f 30 20 movntdq %xmm11,0x2030(%rdi)   28: 00 00   2a:* 66 44 0f e7 a7 00 30 movntdq %xmm12,0x3000(%rdi) <-- trapping instruction   31: 00 00   33: 66 44 0f e7 af 10 30 movntdq %xmm13,0x3010(%rdi)   3a: 00 00   3c: 66 data16   3d: 44 rex.R   3e: 0f .byte 0xf   3f: e7 .byte 0xe7 Code starting with the faulting instruction ===========================================    0: 66 44 0f e7 a7 00 30 movntdq %xmm12,0x3000(%rdi)    7: 00 00    9: 66 44 0f e7 af 10 30 movntdq %xmm13,0x3010(%rdi)   10: 00 00   12: 66 data16   13: 44 rex.R   14: 0f .byte 0xf   15: e7 .byte 0xe7 After I add a delay of “sleep 2 minutes” in the kernel’s arch/x86/mm/fault.c: show_signal_msg(), it turns out somehow the application is trying to write to the end of the heap area (which doesn’t seem to be mapped in the process’s address space), and the segfault is triggered: [ 129.224444] bash[2112]: segfault at 556987854000 ip 00007f88468c4ea4 sp 00007ffc22ecf158 error 6 in libc.so.6[7f8846828000+195000] likely on CPU 48 (core 0, socket 48) root@decui-u2004-u28:/proc/2112# cat maps 5569874a9000-5569874d8000 r--p 00000000 08:31 1582 /mnt/usr/bin/bash 5569874d8000-5569875b7000 r-xp 0002f000 08:31 1582 /mnt/usr/bin/bash 5569875b7000-5569875f1000 r--p 0010e000 08:31 1582 /mnt/usr/bin/bash 5569875f2000-5569875f6000 r--p 00148000 08:31 1582 /mnt/usr/bin/bash 5569875f6000-5569875ff000 rw-p 0014c000 08:31 1582 /mnt/usr/bin/bash 5569875ff000-55698760a000 rw-p 00000000 00:00 0 556987833000-556987854000 rw-p 00000000 00:00 0 [heap] 7f8846400000-7f88466e9000 r--p 00000000 08:31 6124 /mnt/usr/lib/locale/locale-archive 7f8846800000-7f8846828000 r--p 00000000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846828000-7f88469bd000 r-xp 00028000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f88469bd000-7f8846a15000 r--p 001bd000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a15000-7f8846a19000 r--p 00214000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a19000-7f8846a1b000 rw-p 00218000 08:31 4966 /mnt/usr/lib/x86_64-linux-gnu/libc.so.6 7f8846a1b000-7f8846a28000 rw-p 00000000 00:00 0 7f8846b09000-7f8846b10000 r--s 00000000 08:31 3841 /mnt/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache 7f8846b10000-7f8846b13000 rw-p 00000000 00:00 0 7f8846b13000-7f8846b21000 r--p 00000000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b21000-7f8846b32000 r-xp 0000e000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b32000-7f8846b40000 r--p 0001f000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b40000-7f8846b44000 r--p 0002c000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b44000-7f8846b45000 rw-p 00030000 08:31 4729 /mnt/usr/lib/x86_64-linux-gnu/libtinfo.so.6.3 7f8846b4b000-7f8846b4d000 rw-p 00000000 00:00 0 7f8846b4d000-7f8846b4f000 r--p 00000000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b4f000-7f8846b79000 r-xp 00002000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b79000-7f8846b84000 r--p 0002c000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b85000-7f8846b87000 r--p 00037000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f8846b87000-7f8846b89000 rw-p 00039000 08:31 4960 /mnt/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7ffc22eb1000-7ffc22ed2000 rw-p 00000000 00:00 0 [stack] 7ffc22fcd000-7ffc22fd1000 r--p 00000000 00:00 0 [vvar] 7ffc22fd1000-7ffc22fd3000 r-xp 00000000 00:00 0 [vdso] [Test Plan] Microsoft tested [Where things could go wrong] TDX is a new feature and is unlikely to have regressions.
2023-01-24 20:27:35 Dexuan Cui bug added subscriber Dexuan Cui
2023-01-24 20:30:03 Dexuan Cui bug watch added https://sourceware.org/bugzilla/show_bug.cgi?id=28784
2023-01-24 20:30:03 Dexuan Cui bug watch added https://sourceware.org/bugzilla/show_bug.cgi?id=30037
2023-01-31 22:38:12 Ubuntu Kernel Bot tags kernel-spammed-kinetic-linux-azure verification-needed-kinetic
2023-02-01 14:31:19 Tim Gardner tags kernel-spammed-kinetic-linux-azure verification-needed-kinetic kernel-spammed-kinetic-linux-azure verification-done-kinetic
2023-02-13 12:34:26 Launchpad Janitor linux-azure (Ubuntu): status In Progress Fix Released
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-3524
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-3564
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-3565
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-3566
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-3567
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-3594
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-3621
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-3643
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-42896
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-4378
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-43945
2023-02-13 12:34:26 Launchpad Janitor cve linked 2022-45934
2023-07-19 13:12:19 Ubuntu Kernel Bot tags kernel-spammed-kinetic-linux-azure verification-done-kinetic kernel-spammed-kinetic-linux-azure kernel-spammed-lunar-linux-azure verification-done-kinetic verification-needed-lunar
2023-07-19 13:44:47 Tim Gardner tags kernel-spammed-kinetic-linux-azure kernel-spammed-lunar-linux-azure verification-done-kinetic verification-needed-lunar kernel-spammed-kinetic-linux-azure kernel-spammed-lunar-linux-azure verification-done-kinetic verification-done-lunar