lvm2 throws segfault segfault at 0 ip 00007f6372655981 sp 00007fffc5cf71e0 error 4 in libc.so.6

Bug #785124 reported by Thomas Schweikle on 2011-05-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: lvm2

Here is what the kernel logs give back:
kern.log:May 16 10:51:32 vh01 kernel: [1055964.887657] general protection fault: 0000 [#1] SMP
kern.log:May 16 10:51:32 vh01 kernel: [1055965.157246] general protection fault: 0000 [#2] SMP
kern.log:May 16 10:51:32 vh01 kernel: [1055965.157419] Fixing recursive fault but reboot is needed!
kern.log:May 16 22:31:06 vh01 kernel: [ 0.000000] MTRR default type: uncachable
kern.log:May 16 22:31:06 vh01 kernel: [ 0.000009] pid_max: default: 32768 minimum: 301
kern.log:May 16 22:31:06 vh01 kernel: [ 3.871719] NetLabel: unlabeled traffic allowed by default
kern.log:May 16 22:31:06 vh01 kernel: [ 3.892425] PCI: CLS 64 bytes, default 64
kern.log:May 16 22:31:06 vh01 kernel: [ 3.911503] io scheduler deadline registered (default)
kern.log:May 16 22:31:06 vh01 kernel: [ 6.993155] lvm[318]: segfault at 0 ip 00007f6372655981 sp 00007fffc5cf71e0 error 4 in libc.so.6[7f63725ea000+18a000]
kern.log:May 18 23:43:06 vh01 kernel: [176856.402064] general protection fault: 0000 [#1] SMP
kern.log:May 19 07:48:01 vh01 kernel: [ 0.000000] MTRR default type: uncachable
kern.log:May 19 07:48:01 vh01 kernel: [ 0.000009] pid_max: default: 32768 minimum: 301
kern.log:May 19 07:48:01 vh01 kernel: [ 3.811901] NetLabel: unlabeled traffic allowed by default
kern.log:May 19 07:48:01 vh01 kernel: [ 3.832757] PCI: CLS 64 bytes, default 64
kern.log:May 19 07:48:01 vh01 kernel: [ 3.851843] io scheduler deadline registered (default)

Any idea what had been going on here?

I have found some additional irregularities:
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857030] sysfs: cannot create duplicate filename '/devices/platform/GHES.9'
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857031] Modules linked in:
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857034] Pid: 1, comm: swapper Not tainted 2.6.38-8-server #42-Ubuntu
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857036] Call Trace:
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857041] [<ffffffff81065d1f>] ? warn_slowpath_common+0x7f/0xc0
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857044] [<ffffffff81065e16>] ? warn_slowpath_fmt+0x46/0x50
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857047] [<ffffffff811d50c8>] ? sysfs_add_one+0xc8/0xf0
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857049] [<ffffffff811d516f>] ? create_dir+0x7f/0xd0
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857051] [<ffffffff811d525d>] ? sysfs_create_dir+0x7d/0xc0
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857056] [<ffffffff812dbef7>] ? kobject_add_internal+0xb7/0x240
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857059] [<ffffffff812dc30d>] ? kobject_add+0x6d/0xb0
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857063] [<ffffffff813b8025>] ? device_add+0x115/0x410
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857066] [<ffffffff813bca58>] ? platform_device_add+0x118/0x1f0
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857070] [<ffffffff81b161d5>] ? hest_parse_ghes+0x0/0x86
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857072] [<ffffffff81b1622f>] ? hest_parse_ghes+0x5a/0x86
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857076] [<ffffffff8136653f>] ? apei_hest_parse+0x9f/0x150
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857078] [<ffffffff81b155f4>] ? acpi_pci_root_init+0x0/0x2d
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857081] [<ffffffff81b1629e>] ? hest_ghes_dev_register+0x43/0x79
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857083] [<ffffffff81b1636c>] ? acpi_hest_init+0x98/0xb6
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857085] [<ffffffff81b155fd>] ? acpi_pci_root_init+0x9/0x2d
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857090] [<ffffffff81002175>] ? do_one_initcall+0x45/0x190
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857093] [<ffffffff81ae1dff>] ? kernel_init+0x169/0x1f3
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857097] [<ffffffff8100cde4>] ? kernel_thread_helper+0x4/0x10
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857099] [<ffffffff81ae1c96>] ? kernel_init+0x0/0x1f3
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857101] [<ffffffff8100cde0>] ? kernel_thread_helper+0x0/0x10
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857106] ---[ end trace 0abc8a486e654d83 ]---
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857109] kobject_add_internal failed for GHES.9 with -EEXIST, don't try to register things with the same name in the same directory.
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857111] Pid: 1, comm: swapper Tainted: G W 2.6.38-8-server #42-Ubuntu
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857113] Call Trace:
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857115] [<ffffffff812dbfe2>] ? kobject_add_internal+0x1a2/0x240
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857118] [<ffffffff812dc30d>] ? kobject_add+0x6d/0xb0
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857121] [<ffffffff813b8025>] ? device_add+0x115/0x410
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857123] [<ffffffff813bca58>] ? platform_device_add+0x118/0x1f0
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857125] [<ffffffff81b161d5>] ? hest_parse_ghes+0x0/0x86
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857128] [<ffffffff81b1622f>] ? hest_parse_ghes+0x5a/0x86
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857130] [<ffffffff8136653f>] ? apei_hest_parse+0x9f/0x150
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857132] [<ffffffff81b155f4>] ? acpi_pci_root_init+0x0/0x2d
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857134] [<ffffffff81b1629e>] ? hest_ghes_dev_register+0x43/0x79
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857137] [<ffffffff81b1636c>] ? acpi_hest_init+0x98/0xb6
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857139] [<ffffffff81b155fd>] ? acpi_pci_root_init+0x9/0x2d
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857141] [<ffffffff81002175>] ? do_one_initcall+0x45/0x190
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857144] [<ffffffff81ae1dff>] ? kernel_init+0x169/0x1f3
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857146] [<ffffffff8100cde4>] ? kernel_thread_helper+0x4/0x10
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857148] [<ffffffff81ae1c96>] ? kernel_init+0x0/0x1f3
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857151] [<ffffffff8100cde0>] ? kernel_thread_helper+0x0/0x10
kern.log:May 16 22:31:06 vh01 kernel: [ 3.857164] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug

And three days later:
kern.log:May 18 23:43:06 vh01 kernel: [176856.402064] general protection fault: 0000 [#1] SMP
kern.log:May 18 23:43:06 vh01 kernel: [176856.404243] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
kern.log:May 18 23:43:06 vh01 kernel: [176856.407599] CPU 9
kern.log:May 18 23:43:06 vh01 kernel: [176856.408447] Modules linked in: drbd lru_cache nfs lockd fscache nfs_acl auth_rpcgss sunrpc ip6table_filter ip6_tables ipt_REJECT xt_CHECKSUM iptable_mangle xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables kvm_intel kvm autofs4 vesafb bridge stp lp parport power_meter i7core_edac edac_core usbhid hid ahci libahci igb dca megaraid_sas
kern.log:May 18 23:43:06 vh01 kernel: [176856.425668]
kern.log:May 18 23:43:06 vh01 kernel: [176856.426296] Pid: 10431, comm: kvm Tainted: G W 2.6.38-8-server #42-Ubuntu FUJITSU PRIMERGY RX300 S6 /D2619
kern.log:May 18 23:43:06 vh01 kernel: [176856.432268] RIP: 0010:[<ffffffffa0100471>] [<ffffffffa0100471>] kvm_set_pte_rmapp+0x51/0x130 [kvm]
kern.log:May 18 23:43:06 vh01 kernel: [176856.436151] RSP: 0018:ffff8802a0371778 EFLAGS: 00010202
kern.log:May 18 23:43:06 vh01 kernel: [176856.438400] RAX: 0000880af9a827f8 RBX: 0000880af9a827f8 RCX: ffffffffa0100420
kern.log:May 18 23:43:06 vh01 kernel: [176856.441428] RDX: ffff8802a0371838 RSI: 0000000000000000 RDI: 0000880af9a827f8
kern.log:May 18 23:43:06 vh01 kernel: [176856.441429] RBP: ffff8802a03717b8 R08: 0000000000000001 R09: 0000000000000100
kern.log:May 18 23:43:06 vh01 kernel: [176856.441430] R10: 0000000000000002 R11: 0000000000000000 R12: ffffMay 19 07:48:01 vh01 kernel: imklog 4.6.4, log source = /proc/kmsg started.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: lvm2 2.02.66-4ubuntu2
ProcVersionSignature: Ubuntu 2.6.38-8.42-server 2.6.38.2
Uname: Linux 2.6.38-8-server x86_64
Architecture: amd64
Date: Thu May 19 13:32:02 2011
InstallationMedia: Ubuntu-Server 10.10 "Maverick Meerkat" - Release amd64 (20101007)
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=de_DE.UTF-8
SourcePackage: lvm2
UpgradeStatus: Upgraded to natty on 2011-05-04 (15 days ago)

Thomas Schweikle (tps) wrote :
Thomas Schweikle (tps) wrote :

After really digging deep into this issue, I found what caused the kernel panic:
it was creating an additional lvm volume within the pool I had created. Since this pool shares volumes for the system and ones for virtual machines nothing not common. I'm wondering, why this action took about 24h to lead to a crash.

Here is what was done before:
1. Created an additional lvm-volume to be assigned to a virtual machine later on.
2. copied data on this volume (netcat+dd from an other network reachable server).
3. shutdown the vm in question (named afs).
4. assigned the volume, removed the old file based volume.
5. started the vm again.
6. vm started OK, no problems so far.
7. 8h later: restarted libvirtd for other reasons.
7. system crashed trying to access this volume and write data to it. The crash wasn't immediate. It was a slow down of the vm until it was inaccessible, after restarting the vm the whole system crashed.

Here is what was not done assigning the volume to the virtual machine:
Normally virtlibd creates two files in /etc/apparmore.d/libvirtd named "libvirt-{uuid-of-virtual-machine}" and "libvirt-{uuid-of-virtual-machine}.files". The first includes the second. After creating these files apparmore is reloaded.

These files were not created, and apparmore was not reloaded after assigning the volume to the vm. But the starting process had necessary rights to access and write the volume.

For other reasons (I could not find out -- does libvirtd if restarted reload apparmore?), apparmore was reloaded about 8h later. The moment this was done the vm did not have access to its volume and crashed. This wasn't really fatal, since this was only one vm within a whole bunch of them. It would not have had any consequences except it had been restarted. But: this crash lead to a lvm crash, which turned out fatal to the running host: it stopped working with a kernel panic. This should not happen at all!

Thomas Schweikle (tps) wrote :

I could verify this bug with Ubuntu 10.04.2, 10.10, 11.04, Centos 6. In all cases the inaccessible volume crashed lvm, in tune a host kernel panic.

If you are using file system based disks I could only crash the one virtual machine, the volume was assigned to.
Looks like there is a bad bug with virtlibd, lvm, and apparmore and how they work together.

Thomas Schweikle (tps) wrote :

This bug is fixed with newer versions of libvirt. AFAIS all Ubuntu versions out in the wild and supported are equipped with the fix necessary.

Thomas Schweikle (tps) wrote :

Could we please close this report?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers