dd on gfs2 cause kernel oops

Bug #276641 reported by Staff Unipg on 2008-10-01
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Andy Whitcroft
Hardy
High
Tim Gardner

Bug Description

Binary package hint: linux-image-xen

seems to be a problem with call of gfs2_bitfit(), relevant kern.log:

[..]
Sep 30 13:49:53 hnode1 kernel: [101482.345634] Unable to handle kernel paging request at ffff8801b0a9d000 RIP:
Sep 30 13:49:53 hnode1 kernel: [101482.345666] [gfs2:gfs2_bitfit+0x3d/0x90] :gfs2:gfs2_bitfit+0x3d/0x90
Sep 30 13:49:53 hnode1 kernel: [101482.345780] PGD 2be5067 PUD 37ec067 PMD 3972067 PTE 0
Sep 30 13:49:53 hnode1 kernel: [101482.345844] Oops: 0000 [1] SMP
Sep 30 13:49:53 hnode1 kernel: [101482.345896] CPU 2
Sep 30 13:49:53 hnode1 kernel: [101482.345939] Modules linked in: xt_tcpudp xt_physdev bridge sctp ipv6 lock_dlm gfs2 dlm configfs iptable_filter ip_tables x_tables dm_round_robin crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_si ipmi_msghandler lp loop 8250_pnp evdev 8250 serial_core parport_pc parport iTCO_wdt iTCO_vendor_support serio_raw psmouse pcspkr container button i5000_edac edac_core shpchp pci_hotplug dm_multipath ext3 jbd mbcache sr_mod cdrom ata_generic pata_acpi sg sd_mod ehci_hcd uhci_hcd ata_piix bnx2 libata floppy usbcore megaraid_sas scsi_mod e1000 dm_mirror dm_snapshot dm_mod thermal processor fan fuse
Sep 30 13:49:53 hnode1 kernel: [101482.346708] Pid: 8560, comm: dd Not tainted 2.6.24-19-xen #1
Sep 30 13:49:53 hnode1 kernel: [101482.346760] RIP: e030:[gfs2:gfs2_bitfit+0x3d/0x90] [gfs2:gfs2_bitfit+0x3d/0x90] :gfs2:gfs2_bitfit+0x3d/0x90
Sep 30 13:49:53 hnode1 kernel: [101482.346859] RSP: e02b:ffff8801a6c8f928 EFLAGS: 00010246
Sep 30 13:49:53 hnode1 kernel: [101482.346910] RAX: 0000000000000000 RBX: 5555555555555555 RCX: 0000000000000000
Sep 30 13:49:53 hnode1 kernel: [101482.346994] RDX: 0000000000003de4 RSI: 0000000000000000 RDI: ffff8801b0a9d000
Sep 30 13:49:53 hnode1 kernel: [101482.347076] RBP: 0000000000000000 R08: ffff8801b0a9cff9 R09: ffff8801b0a9cff9
Sep 30 13:49:53 hnode1 kernel: [101482.347159] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801ef59f410
Sep 30 13:49:53 hnode1 kernel: [101482.347245] R13: 0000000000000005 R14: 0000000000000000 R15: ffff8801ee5210c0
Sep 30 13:49:53 hnode1 kernel: [101482.347331] FS: 00007f5b54ce96e0(0000) GS:ffffffff805c6100(0000) knlGS:0000000000000000
Sep 30 13:49:53 hnode1 kernel: [101482.347418] CS: e033 DS: 0000 ES: 0000
Sep 30 13:49:53 hnode1 kernel: [101482.347464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 30 13:49:53 hnode1 kernel: [101482.347547] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 30 13:49:53 hnode1 kernel: [101482.347630] Process dd (pid: 8560, threadinfo ffff8801a6c8e000, task ffff8801ed98d800)
Sep 30 13:49:53 hnode1 kernel: [101482.347716] Stack: ffff8801ee522e80 ffffffff88343973 0000000000012f00 ffff8801ee5210c0
Sep 30 13:49:53 hnode1 kernel: [101482.347820] ffff8801e6550328 ffff8801ef59f410 ffff8801eeca8c00 ffff8801ebc6e000
Sep 30 13:49:53 hnode1 kernel: [101482.347919] ffff8800e2e85bc0 ffffffff8834590b ffff8801e6550328 ffff8800e2e85bc0
Sep 30 13:49:53 hnode1 kernel: [101482.347987] Call Trace:
Sep 30 13:49:53 hnode1 kernel: [101482.348071] [gfs2:rgblk_search+0xf3/0x180] :gfs2:rgblk_search+0xf3/0x180
Sep 30 13:49:53 hnode1 kernel: [101482.348136] [gfs2:gfs2_alloc_data+0x5b/0x140] :gfs2:gfs2_alloc_data+0x5b/0x140
Sep 30 13:49:53 hnode1 kernel: [101482.348199] [gfs2:lookup_block+0x137/0x140] :gfs2:lookup_block+0x137/0x140
Sep 30 13:49:53 hnode1 kernel: [101482.348260] [gfs2:gfs2_block_map+0x247/0x3a0] :gfs2:gfs2_block_map+0x247/0x3a0
Sep 30 13:49:53 hnode1 kernel: [101482.348319] [sctp:kmem_cache_alloc+0xaf/0x710] kmem_cache_alloc+0xaf/0x120
Sep 30 13:49:53 hnode1 kernel: [101482.348375] [gfs2:alloc_buffer_head+0x58/0x180] alloc_buffer_head+0x58/0x70
Sep 30 13:49:53 hnode1 kernel: [101482.348428] [alloc_page_buffers+0x60/0xe0] alloc_page_buffers+0x60/0xe0
Sep 30 13:49:53 hnode1 kernel: [101482.348484] [__block_prepare_write+0x252/0x450] __block_prepare_write+0x252/0x450
Sep 30 13:49:53 hnode1 kernel: [101482.348546] [gfs2:gfs2_block_map+0x0/0x3a0] :gfs2:gfs2_block_map+0x0/0x3a0
Sep 30 13:49:53 hnode1 kernel: [101482.348606] [ext3:block_prepare_write+0x1a/0x30] block_prepare_write+0x1a/0x30
Sep 30 13:49:53 hnode1 kernel: [101482.348668] [gfs2:gfs2_write_begin+0x28c/0x2e0] :gfs2:gfs2_write_begin+0x28c/0x2e0
Sep 30 13:49:53 hnode1 kernel: [101482.348728] [generic_file_buffered_write+0x149/0x6e0] generic_file_buffered_write+0x149/0x6e0
Sep 30 13:49:53 hnode1 kernel: [101482.348792] [current_fs_time+0x1e/0x30] current_fs_time+0x1e/0x30
Sep 30 13:49:53 hnode1 kernel: [101482.348847] [__generic_file_aio_write_nolock+0x24f/0x400] __generic_file_aio_write_nolock+0x24f/0x400
Sep 30 13:49:53 hnode1 kernel: [101482.348937] [gfs2:generic_file_aio_write+0x64/0x2f0] generic_file_aio_write+0x64/0xd0
Sep 30 13:49:53 hnode1 kernel: [101482.348995] [gfs2:do_sync_write+0xd9/0x120] do_sync_write+0xd9/0x120
Sep 30 13:49:53 hnode1 kernel: [101482.349051] [<ffffffff8024cc20>] autoremove_wake_function+0x0/0x30
Sep 30 13:49:53 hnode1 kernel: [101482.349111] [__clear_user+0x35/0x70] __clear_user+0x35/0x70
Sep 30 13:49:53 hnode1 kernel: [101482.349166] [vfs_write+0xed/0x190] vfs_write+0xed/0x190
Sep 30 13:49:53 hnode1 kernel: [101482.349219] [sys_write+0x53/0x90] sys_write+0x53/0x90
Sep 30 13:49:53 hnode1 kernel: [101482.349273] [system_call+0x68/0x6d] system_call+0x68/0x6d
Sep 30 13:49:53 hnode1 kernel: [101482.349325] [system_call+0x0/0x6d] system_call+0x0/0x6d
Sep 30 13:49:53 hnode1 kernel: [101482.349378]
Sep 30 13:49:53 hnode1 kernel: [101482.349415]
Sep 30 13:49:53 hnode1 kernel: [101482.349415] Code: 49 39 19 74 3e 41 0f b6 00 d3 f8 83 e0 03 44 39 d0 74 29 83
Sep 30 13:49:53 hnode1 kernel: [101482.349637] RIP [gfs2:gfs2_bitfit+0x3d/0x90] :gfs2:gfs2_bitfit+0x3d/0x90
Sep 30 13:49:53 hnode1 kernel: [101482.349701] RSP <ffff8801a6c8f928>
Sep 30 13:49:53 hnode1 kernel: [101482.349745] CR2: ffff8801b0a9d000
Sep 30 13:49:53 hnode1 kernel: [101482.350439] ---[ end trace aba7f6ec817decfc ]---
Sep 30 13:51:15 hnode1 kernel: [101522.359616] eth0: port 3(vif3.0) entering disabled state
Sep 30 13:51:15 hnode1 kernel: [101522.375716] eth0: port 3(vif3.0) entering disabled state
Sep 30 13:51:35 hnode1 kernel: [101581.950657] eth0: port 2(vif4.0) entering disabled state
Sep 30 13:51:35 hnode1 kernel: [101581.979825] eth0: port 2(vif4.0) entering disabled state
Sep 30 14:34:09 hnode1 kernel: Inspecting /boot/System.map-2.6.24-19-xen
[..]

OS is 8.04.1

uname -a:
Linux hnode1 2.6.24-19-xen #1 SMP Wed Aug 20 21:08:51 UTC 2008 x86_64 GNU/Linux

In attachment dmesg output (after reboot).

Kind regards

--Sergio

Staff Unipg (staff-unipg) wrote :
Staff Unipg (staff-unipg) wrote :
Sergio Tosti (zeno979) on 2008-10-02
Changed in linux:
assignee: nobody → zeno979
status: New → Confirmed
status: Confirmed → New
assignee: zeno979 → nobody
Staff Unipg (staff-unipg) wrote :

The problem happens sometimes moving large files (>8GB)

Sergio Tosti (zeno979) wrote :

Porting gfs2_bitfit function definition from last vanilla kernel resolves the problem.
Here's the patch, please include it in the kernel source for hardy.

--Sergio

Changed in linux:
status: New → Confirmed
Bryce Harrington (bryce) wrote :

Am I understanding correctly that this issue doesn't exist for Intrepid? I'll only leave the hardy task open - if I'm wrong, please correct.

Changed in linux:
status: Confirmed → Fix Released
Sergio Tosti (zeno979) wrote :

correct, it affects only hardy. please evaluate the patch, this is a serious problem.
Regards
--Sergio

Changed in linux:
status: New → Confirmed
Martin Pitt (pitti) wrote :

Looks ok for hardy SRU.

Sergio, do you happen to have a pointer to the original upstream git commit? I understand that we can' t pull that directly since it's a port, but it would nevertheless be good to refer to it for documentation and comparison.

Changed in linux:
assignee: nobody → timg-tpi
Steve Conklin (sconklin) on 2008-11-13
Changed in linux:
assignee: nobody → sconklin
status: Fix Released → In Progress
Andy Whitcroft (apw) on 2008-11-13
Changed in linux:
assignee: timg-tpi → apw
assignee: sconklin → apw
Tim Gardner (timg-tpi) wrote :

SRU Justification

Impact: kernel can oops

Patch Description: replace gfs2_bitfit with upstream version to prevent oops

Patch: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=c3400e846d2c4568529c537be3555ff6c6108e76

Test Case: See bug description

Changed in linux:
assignee: apw → timg-tpi
milestone: none → ubuntu-8.04.4
status: Confirmed → Fix Committed
importance: Undecided → High
Andy Whitcroft (apw) wrote :

This bug is only present in v2.6.24 and thus only applies to Hardy. The patch is a backport from a mainline commit which is included in the Intrepid and Jaunty kernels. Now that Hardy is Fix Committed, the Jaunty task can go. Moved Invalid

Changed in linux:
status: In Progress → Invalid
Martin Pitt (pitti) wrote :

Accepted linux into hardy-proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Sergio Tosti (zeno979) wrote :

Please add xen custom binary of this kernel version.

Sergio Tosti (zeno979) wrote :

whoops, please ignore previous comment, sorry.

Steve Beattie (sbeattie) wrote :

I suspect Tim meant to target this bugfix for the upcoming 8.04.2 release, not the far-off 8.04.4.

Changed in linux:
milestone: ubuntu-8.04.4 → ubuntu-8.04.2
Launchpad Janitor (janitor) wrote :
Download full text (12.0 KiB)

This bug was fixed in the package linux - 2.6.24-23.46

---------------
linux (2.6.24-23.46) hardy-proposed; urgency=low

  [Alessio Igor Bogani]

  * rt: Updated PREEMPT_RT support to rt21
    - LP: #302138

  [Amit Kucheria]

  * SAUCE: Update lpia patches from moblin tree
    - LP: #291457

  [Andy Whitcroft]

  * SAUCE: replace gfs2_bitfit with upstream version to prevent oops
    - LP: #276641

  [Colin Ian King]

  * isdn: Do not validate ISDN net device address prior to interface-up
    - LP: #237306
  * hwmon: (coretemp) Add Penryn CPU to coretemp
    - LP: #235119
  * USB: add support for Motorola ROKR Z6 cellphone in mass storage mode
    - LP: #263217
  * md: fix an occasional deadlock in raid5
    - LP: #208551

  [Stefan Bader]

  * SAUCE: buildenv: Show CVE entries in printchanges
  * SAUCE: buildenv: Send git-ubuntu-log informational message to stderr
  * Xen: dma: avoid unnecessarily SWIOTLB bounce buffering
    - LP: #247148
  * Update openvz patchset to apply to latest stable tree.
    - LP: #301634
  * XEN: Fix FTBS with stable updates
    - LP: #301634

  [Steve Conklin]

  * Add HID quirk for dual USB gamepad
    - LP: #140608

  [Tim Gardner]

  * Enable CONFIG_AX25_DAMA_SLAVE=y
    - LP: #257684
  * SAUCE: Correctly blacklist Thinkpad r40e in ACPI
    - LP: #278794
  * SAUCE: ALPS touchpad for Dell Latitude E6500/E6400
    - LP: #270643

  [Upstream Kernel Changes]

  * Revert "[Bluetooth] Eliminate checks for impossible conditions in IRQ
    handler"
    - LP: #217659
  * KVM: VMX: Clear CR4.VMXE in hardware_disable
    - LP: #268981
  * iov_iter_advance() fix
    - LP: #231746
  * Fix off-by-one error in iov_iter_advance()
    - LP: #231746
  * USB: serial: ch341: New VID/PID for CH341 USB-serial
    - LP: #272485
  * x86: Fix 32-bit x86 MSI-X allocation leakage
    - LP: #273103
  * b43legacy: Fix failure in rate-adjustment mechanism
    - LP: #273143
  * x86: Reserve FIRST_DEVICE_VECTOR in used_vectors bitmap.
    - LP: #276334
  * openvz: merge missed fixes from vanilla 2.6.24 openvz branch
    - LP: #298059
  * openvz: some autofs related fixes
    - LP: #298059
  * openvz: fix ve stop deadlock after nfs connect
    - LP: #298059
  * openvz: fix netlink and rtnl inside container
    - LP: #298059
  * openvz: fix wrong size of ub0_percpu
    - LP: #298059
  * openvz: fix OOPS while stopping VE started before binfmt_misc.ko loaded
    - LP: #298059
  * x86-64: Fix "bytes left to copy" return value for copy_from_user()
  * NET: Fix race in dev_close(). (Bug 9750)
    - LP: #301608
  * IPV6: Fix IPsec datagram fragmentation
    - LP: #301608
  * IPV6: dst_entry leak in ip4ip6_err.
    - LP: #301608
  * IPV4: Remove IP_TOS setting privilege checks.
    - LP: #301608
  * IPCONFIG: The kernel gets no IP from some DHCP servers
    - LP: #301608
  * IPCOMP: Disable BH on output when using shared tfm
    - LP: #301608
  * IRQ_NOPROBE helper functions
    - LP: #301608
  * MIPS: Mark all but i8259 interrupts as no-probe.
    - LP: #301608
  * ub: fix up the conversion to sg_init_table()
    - LP: #301608
  * x86: adjust enable_NMI_through_LVT0()
    - LP: #301608
  * SCSI ips: handle scsi_add_host() failure, and other err cl...

Changed in linux:
status: Fix Committed → Fix Released

It would be great to get confirmation that this newer kernel does indeed resolve the oops reported here. Thanks.

Sergio Tosti (zeno979) wrote :

I've been used the proposed 2.6.24-23.46 kernel in a 4 nodes cluster for a period of about 40 days so I can confirm that the newer kernel resolves the oops.
Sergio

hans maurer (hjm-pmeonline) wrote :

This bug still exists in intrepid. It happens during a backup dd with a file which has 35GB.

Command:

dd if=$LVM_DIR$LVM_SNAP of=$STORAGE_DIR$LVM_DISK.img

attached output of /var/log/syslog

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers