dd on gfs2 cause kernel oops

Bug #276641 reported by Staff Unipg
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Andy Whitcroft
Hardy
Fix Released
High
Tim Gardner

Bug Description

Binary package hint: linux-image-xen

seems to be a problem with call of gfs2_bitfit(), relevant kern.log:

[..]
Sep 30 13:49:53 hnode1 kernel: [101482.345634] Unable to handle kernel paging request at ffff8801b0a9d000 RIP:
Sep 30 13:49:53 hnode1 kernel: [101482.345666] [gfs2:gfs2_bitfit+0x3d/0x90] :gfs2:gfs2_bitfit+0x3d/0x90
Sep 30 13:49:53 hnode1 kernel: [101482.345780] PGD 2be5067 PUD 37ec067 PMD 3972067 PTE 0
Sep 30 13:49:53 hnode1 kernel: [101482.345844] Oops: 0000 [1] SMP
Sep 30 13:49:53 hnode1 kernel: [101482.345896] CPU 2
Sep 30 13:49:53 hnode1 kernel: [101482.345939] Modules linked in: xt_tcpudp xt_physdev bridge sctp ipv6 lock_dlm gfs2 dlm configfs iptable_filter ip_tables x_tables dm_round_robin crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_si ipmi_msghandler lp loop 8250_pnp evdev 8250 serial_core parport_pc parport iTCO_wdt iTCO_vendor_support serio_raw psmouse pcspkr container button i5000_edac edac_core shpchp pci_hotplug dm_multipath ext3 jbd mbcache sr_mod cdrom ata_generic pata_acpi sg sd_mod ehci_hcd uhci_hcd ata_piix bnx2 libata floppy usbcore megaraid_sas scsi_mod e1000 dm_mirror dm_snapshot dm_mod thermal processor fan fuse
Sep 30 13:49:53 hnode1 kernel: [101482.346708] Pid: 8560, comm: dd Not tainted 2.6.24-19-xen #1
Sep 30 13:49:53 hnode1 kernel: [101482.346760] RIP: e030:[gfs2:gfs2_bitfit+0x3d/0x90] [gfs2:gfs2_bitfit+0x3d/0x90] :gfs2:gfs2_bitfit+0x3d/0x90
Sep 30 13:49:53 hnode1 kernel: [101482.346859] RSP: e02b:ffff8801a6c8f928 EFLAGS: 00010246
Sep 30 13:49:53 hnode1 kernel: [101482.346910] RAX: 0000000000000000 RBX: 5555555555555555 RCX: 0000000000000000
Sep 30 13:49:53 hnode1 kernel: [101482.346994] RDX: 0000000000003de4 RSI: 0000000000000000 RDI: ffff8801b0a9d000
Sep 30 13:49:53 hnode1 kernel: [101482.347076] RBP: 0000000000000000 R08: ffff8801b0a9cff9 R09: ffff8801b0a9cff9
Sep 30 13:49:53 hnode1 kernel: [101482.347159] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801ef59f410
Sep 30 13:49:53 hnode1 kernel: [101482.347245] R13: 0000000000000005 R14: 0000000000000000 R15: ffff8801ee5210c0
Sep 30 13:49:53 hnode1 kernel: [101482.347331] FS: 00007f5b54ce96e0(0000) GS:ffffffff805c6100(0000) knlGS:0000000000000000
Sep 30 13:49:53 hnode1 kernel: [101482.347418] CS: e033 DS: 0000 ES: 0000
Sep 30 13:49:53 hnode1 kernel: [101482.347464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 30 13:49:53 hnode1 kernel: [101482.347547] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 30 13:49:53 hnode1 kernel: [101482.347630] Process dd (pid: 8560, threadinfo ffff8801a6c8e000, task ffff8801ed98d800)
Sep 30 13:49:53 hnode1 kernel: [101482.347716] Stack: ffff8801ee522e80 ffffffff88343973 0000000000012f00 ffff8801ee5210c0
Sep 30 13:49:53 hnode1 kernel: [101482.347820] ffff8801e6550328 ffff8801ef59f410 ffff8801eeca8c00 ffff8801ebc6e000
Sep 30 13:49:53 hnode1 kernel: [101482.347919] ffff8800e2e85bc0 ffffffff8834590b ffff8801e6550328 ffff8800e2e85bc0
Sep 30 13:49:53 hnode1 kernel: [101482.347987] Call Trace:
Sep 30 13:49:53 hnode1 kernel: [101482.348071] [gfs2:rgblk_search+0xf3/0x180] :gfs2:rgblk_search+0xf3/0x180
Sep 30 13:49:53 hnode1 kernel: [101482.348136] [gfs2:gfs2_alloc_data+0x5b/0x140] :gfs2:gfs2_alloc_data+0x5b/0x140
Sep 30 13:49:53 hnode1 kernel: [101482.348199] [gfs2:lookup_block+0x137/0x140] :gfs2:lookup_block+0x137/0x140
Sep 30 13:49:53 hnode1 kernel: [101482.348260] [gfs2:gfs2_block_map+0x247/0x3a0] :gfs2:gfs2_block_map+0x247/0x3a0
Sep 30 13:49:53 hnode1 kernel: [101482.348319] [sctp:kmem_cache_alloc+0xaf/0x710] kmem_cache_alloc+0xaf/0x120
Sep 30 13:49:53 hnode1 kernel: [101482.348375] [gfs2:alloc_buffer_head+0x58/0x180] alloc_buffer_head+0x58/0x70
Sep 30 13:49:53 hnode1 kernel: [101482.348428] [alloc_page_buffers+0x60/0xe0] alloc_page_buffers+0x60/0xe0
Sep 30 13:49:53 hnode1 kernel: [101482.348484] [__block_prepare_write+0x252/0x450] __block_prepare_write+0x252/0x450
Sep 30 13:49:53 hnode1 kernel: [101482.348546] [gfs2:gfs2_block_map+0x0/0x3a0] :gfs2:gfs2_block_map+0x0/0x3a0
Sep 30 13:49:53 hnode1 kernel: [101482.348606] [ext3:block_prepare_write+0x1a/0x30] block_prepare_write+0x1a/0x30
Sep 30 13:49:53 hnode1 kernel: [101482.348668] [gfs2:gfs2_write_begin+0x28c/0x2e0] :gfs2:gfs2_write_begin+0x28c/0x2e0
Sep 30 13:49:53 hnode1 kernel: [101482.348728] [generic_file_buffered_write+0x149/0x6e0] generic_file_buffered_write+0x149/0x6e0
Sep 30 13:49:53 hnode1 kernel: [101482.348792] [current_fs_time+0x1e/0x30] current_fs_time+0x1e/0x30
Sep 30 13:49:53 hnode1 kernel: [101482.348847] [__generic_file_aio_write_nolock+0x24f/0x400] __generic_file_aio_write_nolock+0x24f/0x400
Sep 30 13:49:53 hnode1 kernel: [101482.348937] [gfs2:generic_file_aio_write+0x64/0x2f0] generic_file_aio_write+0x64/0xd0
Sep 30 13:49:53 hnode1 kernel: [101482.348995] [gfs2:do_sync_write+0xd9/0x120] do_sync_write+0xd9/0x120
Sep 30 13:49:53 hnode1 kernel: [101482.349051] [<ffffffff8024cc20>] autoremove_wake_function+0x0/0x30
Sep 30 13:49:53 hnode1 kernel: [101482.349111] [__clear_user+0x35/0x70] __clear_user+0x35/0x70
Sep 30 13:49:53 hnode1 kernel: [101482.349166] [vfs_write+0xed/0x190] vfs_write+0xed/0x190
Sep 30 13:49:53 hnode1 kernel: [101482.349219] [sys_write+0x53/0x90] sys_write+0x53/0x90
Sep 30 13:49:53 hnode1 kernel: [101482.349273] [system_call+0x68/0x6d] system_call+0x68/0x6d
Sep 30 13:49:53 hnode1 kernel: [101482.349325] [system_call+0x0/0x6d] system_call+0x0/0x6d
Sep 30 13:49:53 hnode1 kernel: [101482.349378]
Sep 30 13:49:53 hnode1 kernel: [101482.349415]
Sep 30 13:49:53 hnode1 kernel: [101482.349415] Code: 49 39 19 74 3e 41 0f b6 00 d3 f8 83 e0 03 44 39 d0 74 29 83
Sep 30 13:49:53 hnode1 kernel: [101482.349637] RIP [gfs2:gfs2_bitfit+0x3d/0x90] :gfs2:gfs2_bitfit+0x3d/0x90
Sep 30 13:49:53 hnode1 kernel: [101482.349701] RSP <ffff8801a6c8f928>
Sep 30 13:49:53 hnode1 kernel: [101482.349745] CR2: ffff8801b0a9d000
Sep 30 13:49:53 hnode1 kernel: [101482.350439] ---[ end trace aba7f6ec817decfc ]---
Sep 30 13:51:15 hnode1 kernel: [101522.359616] eth0: port 3(vif3.0) entering disabled state
Sep 30 13:51:15 hnode1 kernel: [101522.375716] eth0: port 3(vif3.0) entering disabled state
Sep 30 13:51:35 hnode1 kernel: [101581.950657] eth0: port 2(vif4.0) entering disabled state
Sep 30 13:51:35 hnode1 kernel: [101581.979825] eth0: port 2(vif4.0) entering disabled state
Sep 30 14:34:09 hnode1 kernel: Inspecting /boot/System.map-2.6.24-19-xen
[..]

OS is 8.04.1

uname -a:
Linux hnode1 2.6.24-19-xen #1 SMP Wed Aug 20 21:08:51 UTC 2008 x86_64 GNU/Linux

In attachment dmesg output (after reboot).

Kind regards

--Sergio

Revision history for this message
Staff Unipg (staff-unipg) wrote :
Revision history for this message
Staff Unipg (staff-unipg) wrote :
Sergio Tosti (zeno979)
Changed in linux:
assignee: nobody → zeno979
status: New → Confirmed
status: Confirmed → New
assignee: zeno979 → nobody
Revision history for this message
Staff Unipg (staff-unipg) wrote :

The problem happens sometimes moving large files (>8GB)

Revision history for this message
Sergio Tosti (zeno979) wrote :

Porting gfs2_bitfit function definition from last vanilla kernel resolves the problem.
Here's the patch, please include it in the kernel source for hardy.

--Sergio

Changed in linux:
status: New → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

Am I understanding correctly that this issue doesn't exist for Intrepid? I'll only leave the hardy task open - if I'm wrong, please correct.

Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
Sergio Tosti (zeno979) wrote :

correct, it affects only hardy. please evaluate the patch, this is a serious problem.
Regards
--Sergio

Changed in linux:
status: New → Confirmed
Revision history for this message
Martin Pitt (pitti) wrote :

Looks ok for hardy SRU.

Sergio, do you happen to have a pointer to the original upstream git commit? I understand that we can' t pull that directly since it's a port, but it would nevertheless be good to refer to it for documentation and comparison.

Changed in linux:
assignee: nobody → timg-tpi
Revision history for this message
Sergio Tosti (zeno979) wrote :
Steve Conklin (sconklin)
Changed in linux:
assignee: nobody → sconklin
status: Fix Released → In Progress
Andy Whitcroft (apw)
Changed in linux:
assignee: timg-tpi → apw
assignee: sconklin → apw
Revision history for this message
Tim Gardner (timg-tpi) wrote :

SRU Justification

Impact: kernel can oops

Patch Description: replace gfs2_bitfit with upstream version to prevent oops

Patch: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=c3400e846d2c4568529c537be3555ff6c6108e76

Test Case: See bug description

Changed in linux:
assignee: apw → timg-tpi
milestone: none → ubuntu-8.04.4
status: Confirmed → Fix Committed
importance: Undecided → High
Revision history for this message
Andy Whitcroft (apw) wrote :

This bug is only present in v2.6.24 and thus only applies to Hardy. The patch is a backport from a mainline commit which is included in the Intrepid and Jaunty kernels. Now that Hardy is Fix Committed, the Jaunty task can go. Moved Invalid

Changed in linux:
status: In Progress → Invalid
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into hardy-proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Sergio Tosti (zeno979) wrote :

Please add xen custom binary of this kernel version.

Revision history for this message
Sergio Tosti (zeno979) wrote :

whoops, please ignore previous comment, sorry.

Revision history for this message
Steve Beattie (sbeattie) wrote :

I suspect Tim meant to target this bugfix for the upcoming 8.04.2 release, not the far-off 8.04.4.

Changed in linux:
milestone: ubuntu-8.04.4 → ubuntu-8.04.2
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.0 KiB)

This bug was fixed in the package linux - 2.6.24-23.46

---------------
linux (2.6.24-23.46) hardy-proposed; urgency=low

  [Alessio Igor Bogani]

  * rt: Updated PREEMPT_RT support to rt21
    - LP: #302138

  [Amit Kucheria]

  * SAUCE: Update lpia patches from moblin tree
    - LP: #291457

  [Andy Whitcroft]

  * SAUCE: replace gfs2_bitfit with upstream version to prevent oops
    - LP: #276641

  [Colin Ian King]

  * isdn: Do not validate ISDN net device address prior to interface-up
    - LP: #237306
  * hwmon: (coretemp) Add Penryn CPU to coretemp
    - LP: #235119
  * USB: add support for Motorola ROKR Z6 cellphone in mass storage mode
    - LP: #263217
  * md: fix an occasional deadlock in raid5
    - LP: #208551

  [Stefan Bader]

  * SAUCE: buildenv: Show CVE entries in printchanges
  * SAUCE: buildenv: Send git-ubuntu-log informational message to stderr
  * Xen: dma: avoid unnecessarily SWIOTLB bounce buffering
    - LP: #247148
  * Update openvz patchset to apply to latest stable tree.
    - LP: #301634
  * XEN: Fix FTBS with stable updates
    - LP: #301634

  [Steve Conklin]

  * Add HID quirk for dual USB gamepad
    - LP: #140608

  [Tim Gardner]

  * Enable CONFIG_AX25_DAMA_SLAVE=y
    - LP: #257684
  * SAUCE: Correctly blacklist Thinkpad r40e in ACPI
    - LP: #278794
  * SAUCE: ALPS touchpad for Dell Latitude E6500/E6400
    - LP: #270643

  [Upstream Kernel Changes]

  * Revert "[Bluetooth] Eliminate checks for impossible conditions in IRQ
    handler"
    - LP: #217659
  * KVM: VMX: Clear CR4.VMXE in hardware_disable
    - LP: #268981
  * iov_iter_advance() fix
    - LP: #231746
  * Fix off-by-one error in iov_iter_advance()
    - LP: #231746
  * USB: serial: ch341: New VID/PID for CH341 USB-serial
    - LP: #272485
  * x86: Fix 32-bit x86 MSI-X allocation leakage
    - LP: #273103
  * b43legacy: Fix failure in rate-adjustment mechanism
    - LP: #273143
  * x86: Reserve FIRST_DEVICE_VECTOR in used_vectors bitmap.
    - LP: #276334
  * openvz: merge missed fixes from vanilla 2.6.24 openvz branch
    - LP: #298059
  * openvz: some autofs related fixes
    - LP: #298059
  * openvz: fix ve stop deadlock after nfs connect
    - LP: #298059
  * openvz: fix netlink and rtnl inside container
    - LP: #298059
  * openvz: fix wrong size of ub0_percpu
    - LP: #298059
  * openvz: fix OOPS while stopping VE started before binfmt_misc.ko loaded
    - LP: #298059
  * x86-64: Fix "bytes left to copy" return value for copy_from_user()
  * NET: Fix race in dev_close(). (Bug 9750)
    - LP: #301608
  * IPV6: Fix IPsec datagram fragmentation
    - LP: #301608
  * IPV6: dst_entry leak in ip4ip6_err.
    - LP: #301608
  * IPV4: Remove IP_TOS setting privilege checks.
    - LP: #301608
  * IPCONFIG: The kernel gets no IP from some DHCP servers
    - LP: #301608
  * IPCOMP: Disable BH on output when using shared tfm
    - LP: #301608
  * IRQ_NOPROBE helper functions
    - LP: #301608
  * MIPS: Mark all but i8259 interrupts as no-probe.
    - LP: #301608
  * ub: fix up the conversion to sg_init_table()
    - LP: #301608
  * x86: adjust enable_NMI_through_LVT0()
    - LP: #301608
  * SCSI ips: handle scsi_add_host() failure, and other err cl...

Changed in linux:
status: Fix Committed → Fix Released
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

It would be great to get confirmation that this newer kernel does indeed resolve the oops reported here. Thanks.

Revision history for this message
Sergio Tosti (zeno979) wrote :

I've been used the proposed 2.6.24-23.46 kernel in a 4 nodes cluster for a period of about 40 days so I can confirm that the newer kernel resolves the oops.
Sergio

Revision history for this message
hans maurer (hjm-pmeonline) wrote :

This bug still exists in intrepid. It happens during a backup dd with a file which has 35GB.

Command:

dd if=$LVM_DIR$LVM_SNAP of=$STORAGE_DIR$LVM_DISK.img

attached output of /var/log/syslog

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.