Ubuntu
linux package

Bug #1020207
Comment #6

Comment 6 for bug 1020207

Revision history for this message

Dave Spano (dspano) wrote on 2012-08-21: Re: [Bug 1020207] Re: gfs2 kernel oops when deleting file on other cluster node

I ended up using ceph instead of gfs2 because of this error. I do still have the kernel source (linux 3.2.0) that I was using at the time, and it's actually completely different than the gfs2_unlink routine in the patch.

There are no lines with if(!rgd).

This is what I've got:

static int gfs2_unlink(struct inode *dir, struct dentry *dentry)
{
struct gfs2_inode *dip = GFS2_I(dir);
struct gfs2_sbd *sdp = GFS2_SB(dir);
struct inode *inode = dentry->d_inode;
struct gfs2_inode *ip = GFS2_I(inode);
struct buffer_head *bh;
struct gfs2_holder ghs[3];
struct gfs2_rgrpd *rgd;
int error;

gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs);
gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + 1);

rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr);
gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2);

error = gfs2_glock_nq(ghs); /* parent */
if (error)
goto out_parent;

error = gfs2_glock_nq(ghs + 1); /* child */
if (error)
goto out_child;

error = -ENOENT;
if (inode->i_nlink == 0)
goto out_rgrp;

if (S_ISDIR(inode->i_mode)) {
error = -ENOTEMPTY;
if (ip->i_entries > 2 || inode->i_nlink > 2)
goto out_rgrp;
}

error = gfs2_glock_nq(ghs + 2); /* rgrp */
if (error)
goto out_rgrp;

error = gfs2_unlink_ok(dip, &dentry->d_name, ip);
if (error)
goto out_gunlock;

error = gfs2_trans_begin(sdp, 2*RES_DINODE + 3*RES_LEAF + RES_RG_BIT, 0);
if (error)
goto out_gunlock;

error = gfs2_meta_inode_buffer(ip, &bh);
if (error)
goto out_end_trans;

error = gfs2_unlink_inode(dip, dentry, bh);
brelse(bh);

out_end_trans:
gfs2_trans_end(sdp);
out_gunlock:
gfs2_glock_dq(ghs + 2);
out_rgrp:
gfs2_holder_uninit(ghs + 2);
gfs2_glock_dq(ghs + 1);
out_child:
gfs2_holder_uninit(ghs + 1);
gfs2_glock_dq(ghs);
out_parent:
gfs2_holder_uninit(ghs);
return error;
}

Dave Spano
Optogenics
Systems Administrator

----- Original Message -----

From: "Bart Verwilst" <email address hidden>
To: <email address hidden>
Sent: Tuesday, August 21, 2012 8:26:22 AM
Subject: [Bug 1020207] Re: gfs2 kernel oops when deleting file on other cluster node

"I wonder whether your distro kernel has this patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=718b97bd6b03445be53098e3c8f896aeebc304aa

Thats the most likely thing that I can see that has been fixed
recently."

>> Steven Whitehouse from Redhat.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1020207

Title:
gfs2 kernel oops when deleting file on other cluster node

Status in “gfs2-utils” package in Ubuntu:
Confirmed

Bug description:
I have an active/active drbd cluster with pacemaker running on cman.
If I create a file on the shared gfs2 file system on one node, then
try to delete it on the other, I'm receiving this kernel panic. I get
the same error on both nodes, so it's not specific to either machine.

Jul 2 12:54:37 ha1 lrmd: [40983]: info: operation monitor[35] on nova-volumes:0 for client 40986: pid 2385 exited with return code 0
Jul 2 12:54:43 ha1 lrmd: [40983]: debug: rsc:p_gfsd:1 monitor[63] (pid 2459)
Jul 2 12:54:43 ha1 lrmd: [40983]: info: operation monitor[63] on p_gfsd:1 for client 40986: pid 2459 exited with return code 0
Jul 2 12:54:49 ha1 kernel: [238066.234067] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
Jul 2 12:54:49 ha1 kernel: [238066.236034] IP: [<ffffffffa037c00a>] gfs2_unlink+0x8a/0x220 [gfs2]
Jul 2 12:54:49 ha1 kernel: [238066.237305] PGD 40ca9f067 PUD 40de7b067 PMD 0
Jul 2 12:54:49 ha1 kernel: [238066.237336] Oops: 0000 [#1] SMP
Jul 2 12:54:49 ha1 kernel: [238066.237336] CPU 7
Jul 2 12:54:49 ha1 kernel: [238066.237336] Modules linked in: gfs2 drbd lru_cache ipmi_si mpt2sas scsi_transport_sas raid_class mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu kvm_amd kvm dlm configfs vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext2 bonding psmouse sp5100_tco dcdbas tpm_tis joydev serio_raw i2c_piix4 amd64_edac_mod k10temp edac_core acpi_power_meter edac_mce_amd mac_hid lp parport usbhid hid ses enclosure igb megaraid_sas dca bnx2 [last unloaded: ipmi_si]
Jul 2 12:54:49 ha1 kernel: [238066.237336]
Jul 2 12:54:49 ha1 kernel: [238066.237336] Pid: 2544, comm: rm Not tainted 3.2.0-26-generic #41-Ubuntu Dell Inc. PowerEdge R515/03X0MN
Jul 2 12:54:49 ha1 kernel: [238066.237336] RIP: 0010:[<ffffffffa037c00a>] [<ffffffffa037c00a>] gfs2_unlink+0x8a/0x220 [gfs2]
Jul 2 12:54:49 ha1 kernel: [238066.237336] RSP: 0018:ffff8801fe13fd28 EFLAGS: 00010296
Jul 2 12:54:49 ha1 kernel: [238066.380032] RAX: 0000000000000000 RBX: ffff880410faa080 RCX: ffff8801fe13fd40
Jul 2 12:54:49 ha1 kernel: [238066.380032] RDX: 0000000000000000 RSI: 0000000000012346 RDI: ffff88040c8b1440
Jul 2 12:54:49 ha1 kernel: [238066.380032] RBP: ffff8801fe13fe38 R08: 4000000000000000 R09: 0000000000000000
Jul 2 12:54:49 ha1 kernel: [238066.380032] R10: fde3ec81bcd7720a R11: 0000000000000008 R12: ffff8801ffc709c0
Jul 2 12:54:49 ha1 kernel: [238066.380032] R13: ffff8801fe13fd80 R14: ffff8803f479d140 R15: ffff88040c8b1000
Jul 2 12:54:49 ha1 kernel: [238066.380032] FS: 00007fa1b5055700(0000) GS:ffff88041fa20000(0000) knlGS:0000000000000000
Jul 2 12:54:49 ha1 kernel: [238066.380032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 2 12:54:49 ha1 kernel: [238066.380032] CR2: 0000000000000018 CR3: 00000003f207c000 CR4: 00000000000006e0
Jul 2 12:54:49 ha1 kernel: [238066.380032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 2 12:54:49 ha1 kernel: [238066.380032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 2 12:54:49 ha1 kernel: [238066.380032] Process rm (pid: 2544, threadinfo ffff8801fe13e000, task ffff8802044b5bc0)
Jul 2 12:54:49 ha1 kernel: [238066.380032] Stack:
Jul 2 12:54:49 ha1 kernel: [238066.380032] 0000000000000003 ffff8801fa72e260 ffff8801fe13fd58 ffff8801fe13fd40
Jul 2 12:54:49 ha1 kernel: [238066.380032] ffff8801fe13fd40 ffff8801fa72e218 ffff88020476a680 0000000000000001
Jul 2 12:54:49 ha1 kernel: [238066.380032] 0000000000000000 0000000000000000 ffffffffa037bfda ffff8801fe13fd80
Jul 2 12:54:49 ha1 kernel: [238066.380032] Call Trace:
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffffa037bfda>] ? gfs2_unlink+0x5a/0x220 [gfs2]
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffffa037bff4>] ? gfs2_unlink+0x74/0x220 [gfs2]
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff8129cb2c>] ? security_inode_permission+0x1c/0x30
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff81184e70>] vfs_unlink.part.26+0x80/0xf0
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff81184f1c>] vfs_unlink+0x3c/0x60
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff8118758a>] do_unlinkat+0x1aa/0x1d0
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff8117ce5a>] ? sys_newfstatat+0x2a/0x40
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff811880d2>] sys_unlinkat+0x22/0x40
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff81661fc2>] system_call_fastpath+0x16/0x1b
Jul 2 12:54:49 ha1 kernel: [238066.380032] Code: 00 00 49 83 c5 40 31 d2 4c 89 e9 be 01 00 00 00 e8 fc 1e ff ff 48 8b b3 28 02 00 00 4c 89 ff e8 ad 7e 00 00 48 8d 8d 08 ff ff ff <48> 8b 78 18 31 d2 be 01 00 00 00 48 83 e9 80 e8 d2 1e ff ff 48
Jul 2 12:54:49 ha1 kernel: [238066.380032] RIP [<ffffffffa037c00a>] gfs2_unlink+0x8a/0x220 [gfs2]
Jul 2 12:54:49 ha1 kernel: [238066.380032] RSP <ffff8801fe13fd28>
Jul 2 12:54:49 ha1 kernel: [238066.380032] CR2: 0000000000000018
Jul 2 12:54:49 ha1 kernel: [238066.394510] ---[ end trace 2009fc896a3dd969 ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gfs2-utils/+bug/1020207/+subscriptions

There are no lines with if(!rgd).

This is what I've got:

static int gfs2_unlink(struct inode *dir, struct dentry *dentry) 
{ 
struct gfs2_inode *dip = GFS2_I(dir); 
struct gfs2_sbd *sdp = GFS2_SB(dir); 
struct inode *inode = dentry->d_inode; 
struct gfs2_inode *ip = GFS2_I(inode); 
struct buffer_head *bh; 
struct gfs2_holder ghs[3]; 
struct gfs2_rgrpd *rgd; 
int error;

gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs); 
gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + 1);

rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr); 
gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2);

error = gfs2_glock_nq(ghs); /* parent */ 
if (error) 
goto out_parent;

error = gfs2_glock_nq(ghs + 1); /* child */ 
if (error) 
goto out_child;

error = -ENOENT; 
if (inode->i_nlink == 0) 
goto out_rgrp;

if (S_ISDIR(inode->i_mode)) { 
error = -ENOTEMPTY; 
if (ip->i_entries > 2 || inode->i_nlink > 2) 
goto out_rgrp; 
}

error = gfs2_glock_nq(ghs + 2); /* rgrp */ 
if (error) 
goto out_rgrp;

error = gfs2_unlink_ok(dip, &dentry->d_name, ip); 
if (error) 
goto out_gunlock;

error = gfs2_trans_begin(sdp, 2*RES_DINODE + 3*RES_LEAF + RES_RG_BIT, 0); 
if (error) 
goto out_gunlock;

error = gfs2_meta_inode_buffer(ip, &bh); 
if (error) 
goto out_end_trans;

error = gfs2_unlink_inode(dip, dentry, bh); 
brelse(bh);

out_end_trans: 
gfs2_trans_end(sdp); 
out_gunlock: 
gfs2_glock_dq(ghs + 2); 
out_rgrp: 
gfs2_holder_uninit(ghs + 2); 
gfs2_glock_dq(ghs + 1); 
out_child: 
gfs2_holder_uninit(ghs + 1); 
gfs2_glock_dq(ghs); 
out_parent: 
gfs2_holder_uninit(ghs); 
return error; 
}

Dave Spano 
Optogenics 
Systems Administrator

----- Original Message -----

From: "Bart Verwilst" <bart@verwilst.be> 
To: dspano@optogenics.com 
Sent: Tuesday, August 21, 2012 8:26:22 AM 
Subject: [Bug 1020207] Re: gfs2 kernel oops when deleting file on other cluster node

"I wonder whether your distro kernel has this patch: 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=718b97bd6b03445be53098e3c8f896aeebc304aa

Thats the most likely thing that I can see that has been fixed 
recently."

>> Steven Whitehouse from Redhat.

-- 
You received this bug notification because you are subscribed to the bug 
report. 
https://bugs.launchpad.net/bugs/1020207

Title: 
gfs2 kernel oops when deleting file on other cluster node

Status in “gfs2-utils” package in Ubuntu: 
Confirmed

Bug description: 
I have an active/active drbd cluster with pacemaker running on cman. 
If I create a file on the shared gfs2 file system on one node, then 
try to delete it on the other, I'm receiving this kernel panic. I get 
the same error on both nodes, so it's not specific to either machine.

Jul 2 12:54:37 ha1 lrmd: [40983]: info: operation monitor[35] on nova-volumes:0 for client 40986: pid 2385 exited with return code 0 
Jul 2 12:54:43 ha1 lrmd: [40983]: debug: rsc:p_gfsd:1 monitor[63] (pid 2459) 
Jul 2 12:54:43 ha1 lrmd: [40983]: info: operation monitor[63] on p_gfsd:1 for client 40986: pid 2459 exited with return code 0 
Jul 2 12:54:49 ha1 kernel: [238066.234067] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 
Jul 2 12:54:49 ha1 kernel: [238066.236034] IP: [<ffffffffa037c00a>] gfs2_unlink+0x8a/0x220 [gfs2] 
Jul 2 12:54:49 ha1 kernel: [238066.237305] PGD 40ca9f067 PUD 40de7b067 PMD 0 
Jul 2 12:54:49 ha1 kernel: [238066.237336] Oops: 0000 [#1] SMP 
Jul 2 12:54:49 ha1 kernel: [238066.237336] CPU 7 
Jul 2 12:54:49 ha1 kernel: [238066.237336] Modules linked in: gfs2 drbd lru_cache ipmi_si mpt2sas scsi_transport_sas raid_class mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu kvm_amd kvm dlm configfs vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext2 bonding psmouse sp5100_tco dcdbas tpm_tis joydev serio_raw i2c_piix4 amd64_edac_mod k10temp edac_core acpi_power_meter edac_mce_amd mac_hid lp parport usbhid hid ses enclosure igb megaraid_sas dca bnx2 [last unloaded: ipmi_si] 
Jul 2 12:54:49 ha1 kernel: [238066.237336] 
Jul 2 12:54:49 ha1 kernel: [238066.237336] Pid: 2544, comm: rm Not tainted 3.2.0-26-generic #41-Ubuntu Dell Inc. PowerEdge R515/03X0MN 
Jul 2 12:54:49 ha1 kernel: [238066.237336] RIP: 0010:[<ffffffffa037c00a>] [<ffffffffa037c00a>] gfs2_unlink+0x8a/0x220 [gfs2] 
Jul 2 12:54:49 ha1 kernel: [238066.237336] RSP: 0018:ffff8801fe13fd28 EFLAGS: 00010296 
Jul 2 12:54:49 ha1 kernel: [238066.380032] RAX: 0000000000000000 RBX: ffff880410faa080 RCX: ffff8801fe13fd40 
Jul 2 12:54:49 ha1 kernel: [238066.380032] RDX: 0000000000000000 RSI: 0000000000012346 RDI: ffff88040c8b1440 
Jul 2 12:54:49 ha1 kernel: [238066.380032] RBP: ffff8801fe13fe38 R08: 4000000000000000 R09: 0000000000000000 
Jul 2 12:54:49 ha1 kernel: [238066.380032] R10: fde3ec81bcd7720a R11: 0000000000000008 R12: ffff8801ffc709c0 
Jul 2 12:54:49 ha1 kernel: [238066.380032] R13: ffff8801fe13fd80 R14: ffff8803f479d140 R15: ffff88040c8b1000 
Jul 2 12:54:49 ha1 kernel: [238066.380032] FS: 00007fa1b5055700(0000) GS:ffff88041fa20000(0000) knlGS:0000000000000000 
Jul 2 12:54:49 ha1 kernel: [238066.380032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
Jul 2 12:54:49 ha1 kernel: [238066.380032] CR2: 0000000000000018 CR3: 00000003f207c000 CR4: 00000000000006e0 
Jul 2 12:54:49 ha1 kernel: [238066.380032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
Jul 2 12:54:49 ha1 kernel: [238066.380032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Jul 2 12:54:49 ha1 kernel: [238066.380032] Process rm (pid: 2544, threadinfo ffff8801fe13e000, task ffff8802044b5bc0) 
Jul 2 12:54:49 ha1 kernel: [238066.380032] Stack: 
Jul 2 12:54:49 ha1 kernel: [238066.380032] 0000000000000003 ffff8801fa72e260 ffff8801fe13fd58 ffff8801fe13fd40 
Jul 2 12:54:49 ha1 kernel: [238066.380032] ffff8801fe13fd40 ffff8801fa72e218 ffff88020476a680 0000000000000001 
Jul 2 12:54:49 ha1 kernel: [238066.380032] 0000000000000000 0000000000000000 ffffffffa037bfda ffff8801fe13fd80 
Jul 2 12:54:49 ha1 kernel: [238066.380032] Call Trace: 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffffa037bfda>] ? gfs2_unlink+0x5a/0x220 [gfs2] 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffffa037bff4>] ? gfs2_unlink+0x74/0x220 [gfs2] 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff8129cb2c>] ? security_inode_permission+0x1c/0x30 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff81184e70>] vfs_unlink.part.26+0x80/0xf0 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff81184f1c>] vfs_unlink+0x3c/0x60 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff8118758a>] do_unlinkat+0x1aa/0x1d0 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff8117ce5a>] ? sys_newfstatat+0x2a/0x40 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff811880d2>] sys_unlinkat+0x22/0x40 
Jul 2 12:54:49 ha1 kernel: [238066.380032] [<ffffffff81661fc2>] system_call_fastpath+0x16/0x1b 
Jul 2 12:54:49 ha1 kernel: [238066.380032] Code: 00 00 49 83 c5 40 31 d2 4c 89 e9 be 01 00 00 00 e8 fc 1e ff ff 48 8b b3 28 02 00 00 4c 89 ff e8 ad 7e 00 00 48 8d 8d 08 ff ff ff <48> 8b 78 18 31 d2 be 01 00 00 00 48 83 e9 80 e8 d2 1e ff ff 48 
Jul 2 12:54:49 ha1 kernel: [238066.380032] RIP [<ffffffffa037c00a>] gfs2_unlink+0x8a/0x220 [gfs2] 
Jul 2 12:54:49 ha1 kernel: [238066.380032] RSP <ffff8801fe13fd28> 
Jul 2 12:54:49 ha1 kernel: [238066.380032] CR2: 0000000000000018 
Jul 2 12:54:49 ha1 kernel: [238066.394510] ---[ end trace 2009fc896a3dd969 ]---

To manage notifications about this bug go to: 
https://bugs.launchpad.net/ubuntu/+source/gfs2-utils/+bug/1020207/+subscriptions

Ubuntulinux package

Comment 6 for bug 1020207

Ubuntu
linux package