Comment 10 for bug 1708096

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Turns out this looks to be the same issue, due to filesystem corruption the disk is unavailable in that Linux image that runs petitboot; so it can't be scanned to find something to boot.

Furthermore, it seems like the issues are reliably fixed by running fsck.jfs on the filesystem (even mounted), but there is no fsck.jfs available in the petitboot environment for me to test this with.

For good measure, even when I do manage to boot to disk in the absence of multipath, but using JFS as a root filesystem, I still get crashes (this one comes from 16.04):

[ 71.761950] BUG: Bad page state in process jfsCommit pfn:77280
[ 71.762035] page:f000000001dca000 count:0 mapcount:0 mapping: (null) index:0x15008
[ 71.762115] flags: 0x3ffff80000080d(locked|referenced|uptodate|private)
[ 71.762225] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 71.762293] bad because of flags:
[ 71.762334] flags: 0x801(locked|private)
[ 71.762402] Modules linked in: vmx_crypto rtc_generic autofs4 jfs ibmveth ibmvscsi
[ 71.762415] CPU: 5 PID: 1723 Comm: jfsCommit Not tainted 4.4.0-92-generic #115-Ubuntu
[ 71.762418] Call Trace:
[ 71.762431] [c0000007704b3a40] [c000000000b18f28] dump_stack+0xb0/0xf0 (unreliable)
[ 71.762437] [c0000007704b3a80] [c000000000234d34] bad_page.part.10+0x114/0x170
[ 71.762440] [c0000007704b3b10] [c0000000002355b4] free_pages_prepare+0x424/0x4a0
[ 71.762443] [c0000007704b3b90] [c000000000238980] free_hot_cold_page+0x60/0x210
[ 71.762447] [c0000007704b3be0] [c000000000245058] put_page+0x78/0xb0
[ 71.762455] [c0000007704b3c10] [d000000003b02668] txUnlock+0x278/0x330 [jfs]
[ 71.762461] [c0000007704b3cd0] [d000000003b06208] jfs_lazycommit+0x1e8/0x3b0 [jfs]
[ 71.762466] [c0000007704b3d80] [c0000000000e7354] kthread+0x124/0x150
[ 71.762471] [c0000007704b3e30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
[ 71.762474] Disabling lock debugging due to kernel taint
[ 71.762477] BUG: Bad page state in process jfsCommit pfn:77280
[ 71.762541] page:f000000001dca000 count:0 mapcount:0 mapping: (null) index:0x15008
[ 71.762623] flags: 0x3ffff80000081c(referenced|uptodate|dirty|private)
[ 71.762737] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 71.762806] bad because of flags:
[ 71.762853] flags: 0x800(private)
[ 71.762910] Modules linked in: vmx_crypto rtc_generic autofs4 jfs ibmveth ibmvscsi
[ 71.762918] CPU: 5 PID: 1723 Comm: jfsCommit Tainted: G B 4.4.0-92-generic #115-Ubuntu
[ 71.762919] Call Trace:
[ 71.762923] [c0000007704b3940] [c000000000b18f28] dump_stack+0xb0/0xf0 (unreliable)
[ 71.762927] [c0000007704b3980] [c000000000234d34] bad_page.part.10+0x114/0x170
[ 71.762930] [c0000007704b3a10] [c0000000002355b4] free_pages_prepare+0x424/0x4a0
[ 71.762933] [c0000007704b3a90] [c000000000238980] free_hot_cold_page+0x60/0x210
[ 71.762936] [c0000007704b3ae0] [c000000000245058] put_page+0x78/0xb0
[ 71.762942] [c0000007704b3b10] [d000000003afde2c] release_metapage+0xfc/0x2c0 [jfs]
[ 71.762948] [c0000007704b3b90] [d000000003afe6f8] put_metapage+0xb8/0x250 [jfs]
[ 71.762954] [c0000007704b3c10] [d000000003b0253c] txUnlock+0x14c/0x330 [jfs]
[ 71.762960] [c0000007704b3cd0] [d000000003b06208] jfs_lazycommit+0x1e8/0x3b0 [jfs]
[ 71.762963] [c0000007704b3d80] [c0000000000e7354] kthread+0x124/0x150
[ 71.762967] [c0000007704b3e30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
[ 939.941389] ERROR: (device sda2): diRead [jfs]: i_ino != di_number

[ 1086.974949] ERROR: (device sda2): dtSearch [jfs]: stack overrun!

[ 1086.975058] btstack dump:
[ 1086.975059] bn = 0, index = 0
[ 1086.975084] bn = 180009, index = 0
[ 1086.975125] bn = 0, index = 0
[ 1086.975151] bn = 180009, index = 0
[ 1086.975177] bn = 0, index = 0
[ 1086.975204] bn = 180009, index = 0
[ 1086.975231] bn = 0, index = 0
[ 1086.975259] bn = c0000000fd107c80, index = -1360
[ 1086.975529] ERROR: (device sda2): dtSearch [jfs]: DT_GETPAGE: dtree page corrupt

I think it's been sufficiently established this is a kernel bug present in at the very least in 16.04, with kernel 4.4.0-92.115, but also on artful with the most recent kernel.

This bug needs to go to the kernel team, I don't think there's a d-i bug at all.