[Ubuntu 17.04] - JFS related call traces and system enters xmon when rebooted after installation

Bug #1685899 reported by bugproxy on 2017-04-24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Canonical Kernel Team
linux (Ubuntu)
Canonical Kernel Team
Nominated for Xenial by Juerg Haefliger
Nominated for Zesty by Juerg Haefliger

Bug Description

JFS related call traces and system enters xmon when rebooted after installation

Steps to reproduce:
1 - Install Ubuntu 17.04 the system with
 - prepboot
- /root [JFS filesystem]
- swap space

2 -After installation when rebooted it gives out call traces like as below:

[ 3.895246] Unable to handle kernel paging request for data at address 0x00000000
[ 3.895278] Faulting instruction address: 0xd000000004c5df1c
[ 3.895284] Oops: Kernel access of bad area, sig: 11 [#1]
[ 3.895287] SMP NR_CPUS=2048 [ 3.895288] NUMA
[ 3.895290] pSeries
[ 3.895292] Modules linked in: ip_tables x_tables autofs4 jfs ibmvscsi crc32c_vpmsum
[ 3.895301] CPU: 30 PID: 923 Comm: ureadahead Not tainted 4.9.0-15-generic #16-Ubuntu
[ 3.895304] task: c000000381d3c800 task.stack: c000000381fd0000
[ 3.895307] NIP: d000000004c5df1c LR: d000000004c5deb0 CTR: c0000000001279d0
[ 3.895310] REGS: c000000381fd3500 TRAP: 0300 Not tainted (4.9.0-15-generic)
[ 3.895313] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>[ 3.895322] CR: 48008804 XER: 00000001
[ 3.895324] CFAR: c000000000008a60 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
GPR00: d000000004c5deb0 c000000381fd3780 d000000004c78c28 c0000003802f40f0
GPR04: d000000004c6f6f0 d000000004c72b58 0000000000000563 d000000004c78c28
GPR08: 0000000000000000 0000000000180e97 0000000000000000 d000000004c6a608
GPR12: c0000000001279d0 c00000000fb90e00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000001000 0000000000000000 d000000004c72b38
GPR28: 0000000000180e97 f000000000e1d5c0 c0000003812af240 c0000003802f40b0
NIP [d000000004c5df1c] __get_metapage+0x204/0x6f0 [jfs]
[ 3.895372] LR [d000000004c5deb0] __get_metapage+0x198/0x6f0 [jfs]
[ 3.895374] Call Trace:
[ 3.895378] [c000000381fd3780] [d000000004c5de6c] __get_metapage+0x154/0x6f0 [jfs] (unreliable)
[ 3.895384] [c000000381fd3870] [d000000004c4c368] diRead+0x130/0x260 [jfs]
[ 3.895388] [c000000381fd3920] [d000000004c424f4] jfs_iget+0x6c/0x1e0 [jfs]
[ 3.895393] [c000000381fd3950] [d000000004c43adc] jfs_lookup+0xe4/0x100 [jfs]
[ 3.895398] [c000000381fd3a80] [c00000000032a120] lookup_slow+0xe0/0x240
[ 3.895402] [c000000381fd3b00] [c00000000032e8a8] walk_component+0x2d8/0x3f0
[ 3.895406] [c000000381fd3b70] [c00000000032eb94] link_path_walk+0x1d4/0x600
[ 3.895409] [c000000381fd3c00] [c000000000330c1c] path_openat+0xbc/0x480
[ 3.895413] [c000000381fd3c80] [c0000000003328ac] do_filp_open+0xec/0x160
[ 3.895417] [c000000381fd3db0] [c00000000031863c] do_sys_open+0x1cc/0x380
[ 3.895421] [c000000381fd3e30] [c00000000000bd84] system_call+0x38/0xe0
[ 3.895424] Instruction dump:
[ 3.895426] 7909f00e 7fc9f214 39200001 f93f0028 fbdf0030 e93d0000 71280800 41820460
[ 3.895433] ebdd0030 41920034 e91d0008 e93f0038 <e8e80000> 811e0000 80e70090 39080001
[ 3.895441] ---[ end trace c2aa9ba09ea05eac ]---
[ 3.895443]
[ 4.088560] systemd-journald[925]: Received request to flush runtime journal from PID 1
[ 4.362062] crypto_register_alg 'aes' = 0
[ 4.362112] crypto_register_alg 'cbc(aes)' = 0
[ 4.362150] crypto_register_alg 'ctr(aes)' = 0
[ 4.362191] crypto_register_alg 'xts(aes)' = 0
[ 4.366949] pseries_rng: Registering IBM pSeries RNG driver

When I first connected to the LPAR, it was unresponsive so I restarted it from the HMC and surprisingly it came up to the login prompt and I logged into the shell. I proceeded to install the matching linux-image-4.10.0-15-generic-dbgsym_4.10.0-15.17_ppc64el.ddeb. However, the installation of the matching dbgsym wasn't as helpful as I wanted it to be. objdump, crash tool, or addr2line wouldn't give me the source line correspond to the NIP address.

I then restarted the LPAR with xmon enabled and it would drop to xmon immediately after attempting to remount / and at the same location as before at __get_metapage+0x204/0x6f0 [jfs] and again with a dereference of 0x0 as the cause of the data exception

0xd00000000649df54 <__get_metapage+508>: ld r8,8(r29)
0xd00000000649df58 <__get_metapage+512>: ld r9,56(r31)
0xd00000000649df5c <__get_metapage+516>: ld r7,0(r8) <-- r8 = 0x0

Just to see if this was consistent, I rebooted the system another 6 times and it fell to xmon in the same location most of the time but one time it failed on release_metapage() instead.

[ 3.507823] Unable to handle kernel paging request for data at address 0x00000000
[ 3.507835] Faulting instruction address: 0xd00000000667db5c
cpu 0x16: Vector: 300 (Data Access) at [c000000639be3520]
    pc: d00000000667db5c: release_metapage+0x174/0x370 [jfs]
    lr: d00000000667da90: release_metapage+0xa8/0x370 [jfs]
    sp: c000000639be37a0
   msr: 800000000280b033
   dar: 0
 dsisr: 40000000
  current = 0xc000000639918c00
  paca = 0xc00000000fb8c600 softe: 0 irq_happened: 0x01
    pid = 946, comm = keyboard-setup.
Linux version 4.10.0-15-generic (buildd@bos01-ppc64el-025) (gcc version 6.3.0 20170221 (Ubuntu 6.3.0-8ubuntu1) ) #17-Ubuntu SMP
 Fri Mar 24 17:50:37 UTC 2017 (Ubuntu 4.10.0-15.17-generic 4.10.5)
enter ? for help
[c000000639be3820] d00000000666c3b4 diRead+0x17c/0x260 [jfs]
[c000000639be38d0] d0000000066624f4 jfs_iget+0x6c/0x1e0 [jfs]
[c000000639be3900] d000000006663adc jfs_lookup+0xe4/0x100 [jfs]
[c000000639be3a30] c00000000034fd60 lookup_slow+0xe0/0x240
[c000000639be3ab0] c0000000003543d8 walk_component+0x268/0x3f0
[c000000639be3b30] c000000000354724 link_path_walk+0x1c4/0x5e0
[c000000639be3bc0] c000000000354c98 path_lookupat+0xa8/0x1a0
[c000000639be3c00] c000000000357390 filename_lookup+0xa0/0x1a0
[c000000639be3d30] c000000000347f68 vfs_fstatat+0x88/0x120
[c000000639be3d90] c0000000003485b8 SyS_newstat+0x38/0x60
[c000000639be3e30] c00000000000b184 system_call+0x38/0xe0
--- Exception: c01 (System Call) at 00003fff81e80ab0
SP (3fffd930d6c0) is in userspace

During initrd process, the log replay of the filesystem was clean:

Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
Begin: Will now check root file system ... fsck from util-linux 2.29
[/sbin/fsck.jfs (1) -- /dev/sda2] fsck.jfs -a /dev/sda2
fsck.jfs version 1.1.15, 04-Mar-2011
processing started: 4/13/2017 0:32:47
The current device is: /dev/sda2
Block size in bytes: 4096
Filesystem size in blocks: 4882812
**Phase 0 - Replay Journal Log
Filesystem is clean.

and on my seventh attempt it ended up booting all the way to the login prompt again where I have left it.

I built a newer kernel based on 4.10.0-19.21 sources with CONFIG_JFS_DEBUG enabled and passed jfs.jfsloglevel=4 from the kernel command line and it took a few reboots but it hit the crash and extracted the traces. I need to review the output to see if there is anything there.

Just before it faults, it called chkSuper() as part of mount (or remount) of the / volume.

[ 3.978993] __get_metapage: returning = 0xc000000386731ba8 data = 0xc0000000fcfbb000
[ 3.978994] release_metapage: mp = 0xc000000386731ba8, flag = 0x1
[ 3.991206] jfs_lookup: name = blkid.conf
[ 3.991208] __get_metapage: ino = 32, lblock = 0x10020, abs=1
[ 3.991209] __get_metapage: returning = 0xc0000003852a8868 data = 0xc00000037d880000
[ 3.991211] release_metapage: mp = 0xc0000003852a8868, flag = 0x1
[ 3.991238] jfs_lookup: name = mount.jfs
[ 3.991239] __get_metapage: ino = 4114, lblock = 0x3c013b, abs=1
[ 3.991240] __get_metapage: returning = 0xc0000003852a8868 data = 0xc0000000fcfbb000
[ 3.991241] release_metapage: mp = 0xc0000003852a8868, flag = 0x1
[ 4.003520] superblock: flag:0x10200900 state:0x00000000 size:0x251ada0
[ 4.003521] Unable to handle kernel paging request for data at address 0x00000000
[ 4.003527] __get_metapage: ino = 16, lblock = 0x0, abs=0
[ 4.003529] __get_metapage: ino = 16, lblock = 0x100002, abs=1
[ 4.003533] Faulting instruction address: 0xd0000000062effc4

Since a successful boot wraps the kernel log buffer at the point it would have logged a successful chkSuper(), I don't have a baseline to compare but it is odd that there is no

__get_metapage: returning = 0xc000000387a19330 data = 0xc0000000fcbc4000
after the first
 __get_metapage: ino = 16, lblock = 0x0, abs=0

and as mentioned now it seems with the debug code included, it takes maybe five or six boots between failures but it seems consistent. Here is another crash from a different boot:

[ 4.000793] jfs_lookup: name = mount.jfs
[ 4.000795] __get_metapage: ino = 4114, lblock = 0x3c013b, abs=1
[ 4.000796] __get_metapage: returning = 0xc000000639db5208 data = 0xc0000000fdf8b000
[ 4.000798] release_metapage: mp = 0xc000000639db5208, flag = 0x1
[ 4.010217] superblock: flag:0x10200900 state:0x00000000 size:0x251ada0
[ 4.010224] Unable to handle kernel paging request for data at address 0x00000000
[ 4.010227] __get_metapage: ino = 16, lblock = 0x0, abs=0
[ 4.010240] __get_metapage: ino = 16, lblock = 0x100002, abs=1
[ 4.010242] Faulting instruction address: 0xd000000006d6ffc4

Although JFS originated from IBM, we haven't had a Linux JFS developer in years so mirroring to Canonical for their awareness of this problem and to see if they happen to have anyone knowledgeable in JFS internals while we keep trying to isolate this further.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-151438 severity-high targetmilestone-inin1704
bugproxy (bugproxy) wrote : dmesg
  • dmesg Edit (17.9 KiB, application/octet-stream)

Default Comment by Bridge

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
tags: added: kernel-da-key

------- Comment From <email address hidden> 2017-04-24 17:11 EDT-------
In jfs_mount_rw() we have the following:

249 if (remount) {
250 if (chkSuper(sb) || (sbi->state != FM_CLEAN))
251 return -EINVAL;
253 truncate_inode_pages(sbi->ipimap->i_mapping, 0);
254 truncate_inode_pages(sbi->ipbmap->i_mapping, 0);
255 diUnmount(sbi->ipimap, 1);
256 if ((rc = diMount(sbi->ipimap))) {

and so the two truncate_inode_pages()

diUnmount() also does the following which to me seems like it should invalidate it a second time:

191 int diUnmount(struct inode *ipimap, int mounterror)
192 {
193 struct inomap *imap = JFS_IP(ipimap)->i_imap;
195 /*
196 * update the on-disk inode map control structure
197 */
199 if (!(mounterror || isReadOnly(ipimap)))
200 diSync(ipimap);
202 /*
203 * Invalidate the page cache buffers
204 */
205 truncate_inode_pages(ipimap->i_mapping, 0);
207 /*
208 * free in-memory control structure
209 */
210 kfree(imap);

but the interesting part is that it does a kfree() of the same struct inomap *imap that the diRead() routine references when it does the diIAGRead() call

319 jfs_info("diRead: ino = %ld", ip->i_ino);
321 ipimap = sbi->ipimap;
322 JFS_IP(ip)->ipimap = ipimap;
324 /* determine the iag number for this inode (number) */
325 iagno = INOTOIAG(ip->i_ino);
327 /* read the iag */
328 imap = JFS_IP(ipimap)->i_imap;
330 rc = diIAGRead(imap, iagno, &mp);

and dilAGRead() does:

2674 static int diIAGRead(struct inomap * imap, int iagno, struct metapage ** mpp)
2675 {
2676 struct inode *ipimap = imap->im_ipimap;
2677 s64 blkno;
2679 /* compute the logical block number of the iag. */
2680 blkno = IAGTOLBLK(iagno, JFS_SBI(ipimap->i_sb)->l2nbperpage);
2682 /* read the iag. */
2683 *mpp = read_metapage(ipimap, blkno, PSIZE, 0);

where read_metapage() is essentially:

66 #define read_metapage(inode, lblock, size, absolute)\
67 __get_metapage(inode, lblock, size, absolute, false)

I have to wonder if we have a window between the time that jfs_mount_rw() frees some of these structures hanging of parts of the sbi between the time it does the diUnmount() and the diMount() kmallocs a new struct inomap that something else reuses that freed structure while a parallel systemd operation is still running trying to read from something on /

255 diUnmount(sbi->ipimap, 1);
256 if ((rc = diMount(sbi->ipimap))) {

I am trying see if there is anything that prevents an attempt to diRead() between the above window but don't offhand see anything.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-04-24 17:35 EDT-------
Actually, I see examples of taking read locks such as:

2627 /* obtain read lock on map */
2630 /* read the iag */
2631 if ((rc = diIAGRead(imap, iagno, &mp))) {
2632 IREAD_UNLOCK(ipimap);

So maybe a

/* acquire inode map lock */

followed by

/* release the inode map lock */

is warranted to keep the diUnmount()/diMount() somewhat atomic?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-04-24 18:30 EDT-------
The proper IWRITE_LOCK()/IWRITE_UNLOCK() calls around the diUnmount()/diMount() and similarly racy dbUnmount()/dbMount() may help but not without first cleaning up code such as:

328 imap = JFS_IP(ipimap)->i_imap;
330 rc = diIAGRead(imap, iagno, &mp);

that access the JFS_IP(ipimap)->i_imap outside the readlock.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-05-02 08:52 EDT-------
Wrote a trial patch that did not release and re-allocate the inode and block map structures during a remount and jusr re-initialize them as the locking changes were getting ugly. Still, after several reboots, I still ran into a similar fault. I also ran out of disk space from successful boots filling up syslog and kern.log so I am going to add a few choice trace entries and reduce the logging a bit to try and isolate why this is still happening.

Manoj Iyer (manjo) on 2017-05-09
Changed in ubuntu-power-systems:
status: New → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-05-09 14:58 EDT-------
Apologies but I have not had a chance to revisit this one but will try and get to it again next week.

Manoj Iyer (manjo) on 2017-06-01
tags: added: ubuntu-17.04
Manoj Iyer (manjo) on 2017-07-19
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in ubuntu-power-systems:
importance: Undecided → High
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Manoj Iyer (manjo) on 2017-07-31
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
status: New → Incomplete
tags: added: triage-g
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)

Default Comment by Bridge

bugproxy (bugproxy) wrote : dmesg
  • dmesg Edit (17.9 KiB, application/octet-stream)

Default Comment by Bridge

------- Comment From <email address hidden> 2017-09-25 12:20 EDT-------
(In reply to comment #28)
> Apologies but I have not had a chance to revisit this one but will try and
> get to it again next week.

Due to a rather large bug workload and higher priority bugs, I have not been able to look at this one in some time plus I have not ETA as to when I can revisit. Is there anyone in Canonical that works filesystem issues that could continue working this one?

Manoj Iyer (manjo) on 2017-09-25
Changed in ubuntu-power-systems:
status: Incomplete → Opinion
Changed in linux (Ubuntu):
status: Incomplete → Opinion
Changed in ubuntu-power-systems:
importance: High → Medium
Changed in linux (Ubuntu):
importance: High → Medium
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-07 13:40 EDT-------
I am sorry but I won't be able to investigate this issue any longer and since there aren't JFS maintainers within the LTC, it is unlikely I can turn this over to anyone else. Since the middle of 2017, there have been a few more commits to JFS but they didn't appear to be helpful in this area affected.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers