Running YCSB workload on MongoDB on Ubuntu 14.10 VM resulted in kernel bug

Bug #1354024 reported by bugproxy on 2014-08-07
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned

Bug Description

== Comment: #0 - Kalpana Shetty <email address hidden> - 2014-08-05 23:53:28 ==
---Problem Description---
Running YCSB workload on LongoDB on Ubuntu 14.10 VM resulted in kernel bug

---uname output---
root@u10vm15:~# uname -a Linux u10vm15 3.16.0-6-generic #11-Ubuntu SMP Mon Jul 28 02:00:45 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

---Additional Hardware Info---
Power 8 - Tuleta

 Machine Type = POWER 8

---System Hang---
 Ubuntu 14.10 LE guest needs to be restarted when seen this issue.

 Steps to reproduce:
- Install Ubuntu 14.10 on 2 VMs(July 30th build)
- Run Mongodb 2.6.2 on one of PowerKVM VM
- Run YCSB 0.1.4 on other VM
- Create 1million record load on MongoDB using YCSB; allow it to run for 4 to 5 hours or so.

Setup details:
- MongoDB server on one VM (version: 2.6.2)
- YCSB workload running on one VM (YCSB version - ycsb-0.1.4)

uname on Host:
[root@powerkvm5-lp1 ~]# uname -a
Linux powerkvm5-lp1.austin.ibm.com 3.10.42-2004.pkvm2_1_1.8.ppc64 #1 SMP Fri Jul 18 11:20:03 CDT 2014 ppc64 ppc64 ppc64 GNU/Linux

uname on Guest OS:
root@u10vm15:~# uname -a
Linux u10vm15 3.16.0-6-generic #11-Ubuntu SMP Mon Jul 28 02:00:45 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

[23001.071911] ------------[ cut here ]------------
[23001.071922] kernel BUG at /build/buildd/linux-3.16.0/fs/dcache.c:1626!
[23001.072917] Oops: Exception in kernel mode, sig: 5 [#1]
[23001.073620] SMP NR_CPUS=2048 NUMA pSeries
[23001.074149] Modules linked in: pseries_rng rtc_generic ohci_pci
[23001.075162] CPU: 8 PID: 3384 Comm: updatedb.mlocat Not tainted 3.16.0-6-generic #11-Ubuntu
[23001.076006] task: c000000006e00000 ti: c000000130364000 task.ti: c000000130364000
[23001.076834] NIP: c0000000002abc68 LR: c0000000002abf90 CTR: c00000000001f880
[23001.077650] REGS: c0000001303676d0 TRAP: 0700 Not tainted (3.16.0-6-generic)
[23001.078468] MSR: 8000000100029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004842 XER: 20000000
[23001.080432] CFAR: c0000000002abf8c SOFTE: 1
[23001.080432] GPR00: c0000000002abf90 c000000130367950 c000000001346618 c000000005dd0000
[23001.080432] GPR04: 0000000000000000 0000000000001000 c000000005dcd170 0000000000000fcc
[23001.080432] GPR08: 0000000000001000 0000000000000001 8803dabf05ffffff 000000000016eb0c
[23001.080432] GPR12: 0000000000004400 c00000000fe41c00 0000000000000000 0000000000100000
[23001.080432] GPR16: 000000001001d660 0000010034e94ec0 0000000000000001 0000000053d94034
[23001.080432] GPR20: 0000000000000000 0000000000000001 00003fffcaa1efb8 0000010034e842e0
[23001.080432] GPR24: 0000000000000000 0000000000000000 0000010034e94ec0 ffffffffffffff9c
[23001.080432] GPR28: 0000000000000040 0000000000000000 c000000005dd0000 0000000000000000
[23001.091266] NIP [c0000000002abc68] d_instantiate+0x38/0xf0
[23001.091837] LR [c0000000002abf90] d_splice_alias+0x60/0x1a0
[23001.092404] Call Trace:
[23001.092692] [c000000130367980] [c0000000002abf90] d_splice_alias+0x60/0x1a0
[23001.093544] [c0000001303679c0] [c00000000034c5b4] ext4_lookup+0xc4/0x1c0
[23001.094399] [c000000130367a50] [c000000000299944] lookup_real+0x64/0xc0
[23001.095261] [c000000130367a90] [c00000000029a790] __lookup_hash+0x60/0x80
[23001.096106] [c000000130367ae0] [c00000000029d610] lookup_slow+0x70/0x110
[23001.096946] [c000000130367b20] [c00000000029ea08] path_lookupat+0x958/0x9a0
[23001.097804] [c000000130367be0] [c00000000029eaa8] filename_lookup+0x58/0x140
[23001.098648] [c000000130367c30] [c0000000002a2524] user_path_at_empty+0x84/0xe0
[23001.099580] [c000000130367d20] [c0000000002937e4] vfs_fstatat+0x84/0x140
[23001.100432] [c000000130367d80] [c000000000293eb4] SyS_newlstat+0x34/0x60
[23001.101378] [c000000130367e30] [c00000000000a0fc] syscall_exit+0x0/0x7c
[23001.102193] Instruction dump:
[23001.102589] 7c0802a6 fbc1fff0 fbe1fff8 f8010010 f821ffd1 7c7e1b78 7c9f2378 60000000
[23001.103945] 60000000 e93e00b8 3149ffff 7d2a4910 <0b090000> 2fbf0000 419e0060 387f0088
[23001.105276] ---[ end trace b20dd6fbb5b21932 ]---
[23001.118598]
root@u10vm15:~#

After I rebooted I'm keep seeing below call traces:
Ubuntu Utopic Unicorn (development branch) u10vm15 hvc0

u10vm15 login: root
Password:
Last login: Wed Aug 6 00:02:18 IST 2014 on hvc0
Welcome to Ubuntu Utopic Unicorn (development branch) (GNU/Linux 3.16.0-6-generic ppc64le)

 * Documentation: https://help.ubuntu.com/
[32950.678160] systemd-logind[1071]: Removed session c1.
[32950.694697] systemd-logind[1071]: New session c2 of user root.
[32950.703411] Unable to handle kernel paging request for data at address 0x2f0000000000000
[32950.704886] Faulting instruction address: 0xc000000000260290
[32950.706148] Oops: Kernel access of bad area, sig: 11 [#2]
[32950.707098] SMP NR_CPUS=2048 NUMA pSeries
[32950.708098] Modules linked in: pseries_rng rtc_generic ohci_pci
[32950.709651] CPU: 8 PID: 342 Comm: cgmanager Tainted: G D 3.16.0-6-generic #11-Ubuntu
[32950.711433] task: c00000012e3854c0 ti: c00000012e410000 task.ti: c00000012e410000
[32950.712938] NIP: c000000000260290 LR: c000000000260384 CTR: c0000000004099f0
[32950.715490] REGS: c00000012e413970 TRAP: 0300 Tainted: G D (3.16.0-6-generic)
[32950.718098] MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28002448 XER: 00000000
[32950.723688] CFAR: c000000000010a30 DAR: 02f0000000000000 DSISR: 40000000 SOFTE: 1
GPR00: c000000000260384 c00000012e413bf0 c000000001346618 0000000000000000
GPR04: 00000000000000d0 c0000000012a4ad0 0000000000000001 c000000001919db0
GPR08: 0000000000000e90 0000000000000000 0000000000b30000 00065581e050bfe3
GPR12: c0000000004099f0 c00000000fe41c00 fffffffffffffe80 fffffffffffffe90
GPR16: fffffffffffffea0 fffffffffffffeb0 fffffffffffffec0 fffffffffffffed0
GPR20: fffffffffffffee0 fffffffffffffef0 ffffffffffffff00 ffffffffffffff10
GPR24: ffffffffffffff20 00003ffff2b48de0 c00000013604c600 00003ffff2b48b18
GPR28: c0000000002adc8c 00000000000000d0 02f0000000000000 c00000013604c600
[32950.742460] NIP [c000000000260290] kmem_cache_alloc+0x90/0x2d0
[32950.743997] LR [c000000000260384] kmem_cache_alloc+0x184/0x2d0
[32950.745105] Call Trace:
[32950.745637] [c00000012e413bf0] [c000000000260384] kmem_cache_alloc+0x184/0x2d0 (unreliable)
[32950.747609] [c00000012e413c40] [c0000000002adc8c] __d_alloc+0x4c/0x1c0
[32950.749004] [c00000012e413c80] [c00000000083bd58] sock_alloc_file+0x78/0x170
[32950.750433] [c00000012e413ce0] [c000000000841244] SyS_accept4+0xd4/0x280
[32950.751833] [c00000012e413dc0] [c000000000842c50] SyS_socketcall+0x3c0/0x400
[32950.753239] [c00000012e413e30] [c00000000000a0fc] syscall_exit+0x0/0x7c
[32950.754577] Instruction dump:
[32950.755324] 7f5fd378 e94d0040 e93f0000 7ce95214 e9070008 7fc9502a e9270010 2fbe0000
[32950.757463] 41de0070 2fa90000 419e0068 e93f0022 <7f7e482a> 39200000 88cd02ba 992d02ba
[32950.759787] ---[ end trace b20dd6fbb5b21933 ]---
[32950.775708]
[32950.813539] systemd-logind[1071]: cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
[32950.821827] systemd-logind[1071]: Failed to create cpuset:/user/0.user/c2.session: No such file or directory
[32950.825834] init: cgmanager main process (342) killed by SEGV signal
[32950.828432] systemd-logind[1071]: Failed to create devices:/user/0.user/c2.session: No such file or directory
[32950.832066] init: cgmanager main process ended, respawning
[32950.833739] systemd-logind[1071]: Failed to create freezer:/user/0.user/c2.session: No such file or directory
[32950.836759] systemd-logind[1071]: Failed to create hugetlb:/user/0.user/c2.session: No such file or directory
[32950.839464] systemd-logind[1071]: Failed to create memory:/user/0.user/c2.session: No such file or directory
[32950.842165] systemd-logind[1071]: Failed to create perf_event:/user/0.user/c2.session: No such file or directory
[32950.844964] systemd-logind[1071]: Failed to create net_cls:/user/0.user/c2.session: No such file or directory
[32950.847865] systemd-logind[1071]: Failed to create net_prio:/user/0.user/c2.session: No such file or directory
[32950.862660] Unable to handle kernel paging request for data at address 0x2f0000000000000
[32950.864661] Faulting instruction address: 0xc000000000260290
[32950.866115] Oops: Kernel access of bad area, sig: 11 [#3]
[32950.867269] SMP NR_CPUS=2048 NUMA pSeries
[32950.868437] Modules linked in: pseries_rng rtc_generic ohci_pci
[32950.870457] CPU: 8 PID: 3550 Comm: sh Tainted: G D 3.16.0-6-generic #11-Ubuntu
[32950.872181] task: c00000012d70e910 ti: c00000012d7a0000 task.ti: c00000012d7a0000
[32950.873907] NIP: c000000000260290 LR: c000000000260384 CTR: c000000000409c60
[32950.875637] REGS: c00000012d7a3780 TRAP: 0300 Tainted: G D (3.16.0-6-generic)
[32950.877373] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22002888 XER: 20000000
[32950.881408] CFAR: c00000000006aec8 DAR: 02f0000000000000 DSISR: 40000000 SOFTE: 1
GPR00: c000000000260384 c00000012d7a3a00 c000000001346618 0000000000000000
GPR04: 00000000000000d0 0000000000000001 c00000012d7a3b40 c000000001919db0
GPR08: 0000000000000e90 0000000000000000 0000000000b30000 c000000007c8a02d
GPR12: 0000000000002200 c00000000fe41c00 0000000000000000 0000000056fbd0b8
GPR16: 0000000056fbffa8 0000000056fbfeb8 0000010013380328 00003fffc4b6fed6
GPR20: 0000000000000000 0000010013380340 0000000056fbfe60 0000000000000004
GPR24: c00000012eeca100 c00000012d538300 c00000013604c600 0000000000000001
GPR28: c0000000002adc8c 00000000000000d0 02f0000000000000 c00000013604c600
[32950.904699] NIP [c000000000260290] kmem_cache_alloc+0x90/0x2d0
[32950.905897] LR [c000000000260384] kmem_cache_alloc+0x184/0x2d0
[32950.907011] Call Trace:
[32950.907437] [c00000012d7a3a00] [c000000000260384] kmem_cache_alloc+0x184/0x2d0 (unreliable)
[32950.909026] [c00000012d7a3a50] [c0000000002adc8c] __d_alloc+0x4c/0x1c0
[32950.910130] [c00000012d7a3a90] [c0000000002ade38] d_alloc+0x38/0xd0
[32950.911170] [c00000012d7a3ad0] [c00000000029a6fc] lookup_dcache+0x10c/0x140
[32950.912159] [c00000012d7a3b20] [c00000000029a774] __lookup_hash+0x44/0x80
[32950.913141] [c00000012d7a3b70] [c00000000029d610] lookup_slow+0x70/0x110
[32950.914120] [c00000012d7a3bb0] [c00000000029ea08] path_lookupat+0x958/0x9a0
[32950.915095] [c00000012d7a3c70] [c00000000029eaa8] filename_lookup+0x58/0x140
[32950.916069] [c00000012d7a3cc0] [c0000000002a2524] user_path_at_empty+0x84/0xe0
[32950.917208] [c00000012d7a3db0] [c000000000289908] SyS_faccessat+0xc8/0x2f0
[32950.918183] [c00000012d7a3e30] [c00000000000a0fc] syscall_exit+0x0/0x7c
[32950.919157] Instruction dump:
[32950.919643] 7f5fd378 e94d0040 e93f0000 7ce95214 e9070008 7fc9502a e9270010 2fbe0000
[32950.921265] 41de0070 2fa90000 419e0068 e93f0022 <7f7e482a> 39200000 88cd02ba 992d02ba
[32950.923055] ---[ end trace b20dd6fbb5b21934 ]---
[32950.939809]
[32950.940617] init: cgmanager main process (3550) killed by SEGV signal
[32950.942078] init: cgmanager main process ended, respawning
root@u10vm15:~# [32955.627058] Unable to handle kernel paging request for data at address 0x82cf8c206002008
[32955.628340] Faulting instruction address: 0xc00000000035eca8
[32955.629471] Oops: Kernel access of bad area, sig: 11 [#4]
[32955.630336] SMP NR_CPUS=2048 NUMA pSeries
[32955.631340] Modules linked in: pseries_rng rtc_generic ohci_pci
[32955.633486] CPU: 1 PID: 217 Comm: jbd2/sda2-8 Tainted: G D 3.16.0-6-generic #11-Ubuntu
[32955.635062] task: c00000012d983f90 ti: c00000012da0c000 task.ti: c00000012da0c000
[32955.636627] NIP: c00000000035eca8 LR: c00000000035ec84 CTR: c00000000035ec00
[32955.638053] REGS: c00000012da0f7e0 TRAP: 0300 Tainted: G D (3.16.0-6-generic)
[32955.639479] MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24022048 XER: 00000000
[32955.642493] CFAR: c00000000013bd44 DAR: 082cf8c206002008 DSISR: 42000000 SOFTE: 1
GPR00: c00000000035ec84 c00000012da0fa60 c000000001346618 c00000012d4c5a88
GPR04: c000000005de0000 0000000001f3a8b9 0030fc0000000000 00001df91effac68
GPR08: 0000000000000ccd 082bb8eb06002000 082cf8c206002000 00001b76fc8cf2f7
GPR12: c00000000035ec00 c00000000fe40380 6db6db6db6db6db7 0000000000000000
GPR16: 0000000000000000 0000000000000001 0000000000080000 0000000000000020
GPR20: c00000012fd6e824 0000000000000040 0000000000000008 0000000000400000
GPR24: 0000000000000000 0000000000000000 0000000000000000 c00000012d4c4800
GPR28: c00000012d4c5a88 c0000000052b4cd0 c00000012d4c5800 c0000000052b4c00
[32955.654852] NIP [c00000000035eca8] ext4_journal_commit_callback+0xa8/0x170
[32955.655489] LR [c00000000035ec84] ext4_journal_commit_callback+0x84/0x170
[32955.656137] Call Trace:
[32955.656382] [c00000012da0fa60] [c00000000035ec84] ext4_journal_commit_callback+0x84/0x170 (unreliable)
[32955.657670] [c00000012da0fac0] [c00000000039bb5c] jbd2_journal_commit_transaction+0x171c/0x1ea0
[32955.658712] [c00000012da0fcf0] [c0000000003a348c] kjournald2+0xec/0x300
[32955.659516] [c00000012da0fd80] [c0000000000cbc30] kthread+0x110/0x130
[32955.660317] [c00000012da0fe30] [c00000000000a3e8] ret_from_kernel_thread+0x5c/0x74
[32955.661207] Instruction dump:
[32955.661688] 7f83e378 3b000000 48658ac9 60000000 e89f00d0 3b400000 7fbd2040 419e0070
[32955.663050] 60000000 60420000 e9240008 e9440000 <f92a0008> f9490000 f8840000 f8840008
[32955.664260] ---[ end trace b20dd6fbb5b21935 ]---
[32955.677577]

== Comment: #1 - Kalpana Shetty <email address hidden> - 2014-08-05 23:54:12 ==
Setup details:
- MongoDB server on one VM (version: 2.6.2)
- YCSB workload running on one VM (YCSB version - ycsb-0.1.4)

[root@powerkvm5-lp1 ~]# uname -a
Linux powerkvm5-lp1.austin.ibm.com 3.10.42-2004.pkvm2_1_1.8.ppc64 #1 SMP Fri Jul 18 11:20:03 CDT 2014 ppc64 ppc64 ppc64 GNU/Linux

Guest OS: Ubuntu 14.10
[root@powerkvm5-lp1 ~]# virsh list --all
 Id Name State
----------------------------------------------------
 3 kal_u10_ycsb running
 5 kal_u10_mongosrv running

uname on Guest OS:
root@u10vm15:~# uname -a
Linux u10vm15 3.16.0-6-generic #11-Ubuntu SMP Mon Jul 28 02:00:45 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

root@u10vm15:~# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 8
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-15

bugproxy (bugproxy) on 2014-08-07
tags: added: architecture-ppc64le bugnameltc-114274 severity-critical targetmilestone-inin---
Luciano Chavez (lnx1138) on 2014-08-15
affects: ubuntu → linux (Ubuntu)
bugproxy (bugproxy) on 2014-08-15
tags: added: targetmilestone-inin1410
removed: targetmilestone-inin---

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1354024

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
tags: added: ppc64el
tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Medium → High
Andy Whitcroft (apw) wrote :

There is a strong possibility the dcache related panics are associated with the fixes applied for bug #1354234. It would be worth testing with kernels with those fixes applied for confirmation.

bugproxy (bugproxy) on 2014-09-24
tags: added: severity-high
removed: severity-critical
Chris J Arges (arges) wrote :

Per Andy's comment, please do upgrade to the latest 3.16 series kernel (3.16.0-17.23) and see if this issue still occurs.
Thanks,

Chris J Arges (arges) on 2014-11-20
Changed in linux (Ubuntu):
assignee: nobody → Chris J Arges (arges)
Chris J Arges (arges) on 2014-11-20
Changed in linux (Ubuntu):
status: Triaged → In Progress
bugproxy (bugproxy) on 2014-12-12
tags: added: targetmilestone-inin1504
removed: targetmilestone-inin1410
Chris J Arges (arges) on 2015-01-15
Changed in linux (Ubuntu):
assignee: Chris J Arges (arges) → nobody
status: In Progress → Triaged
Chris J Arges (arges) wrote :

Can you re-test with the latest 3.16 kernel to see if this is still an issue?
Thanks,

Changed in linux (Ubuntu):
status: Triaged → Incomplete

------- Comment From <email address hidden> 2015-02-05 22:52 EDT-------
(In reply to comment #19)
> Can you re-test with the latest 3.16 kernel to see if this is still an issue?
> Thanks,

I again ran the ycsb workload on mongodb after dist upgrading to later kernel version and not seen any issues.

uname on mongo and ycsb VMs:
root@mongosrv:~# uname -a
Linux mongosrv 3.16.0-17-generic #23-Ubuntu SMP Fri Sep 19 16:54:14 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

root@ycsb:~# uname -a
Linux ycsb 3.16.0-17-generic #23-Ubuntu SMP Fri Sep 19 16:54:14 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers