Kernel traces on controllers

Bug #1387087 reported by Sergey Galkin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Confirmed
High
MOS Linux

Bug Description

api: '1.0'
astute_sha: 97eea90efe0a1f17b4934919d6e459d270c10372
auth_required: true
build_id: 2014-10-28_00-01-12
build_number: '53'
feature_groups:
- mirantis
- techpreview
fuellib_sha: b8d244a900b25bed8f597e99b309f9ee4ad8ae56
fuelmain_sha: f3ad22d12c26794a05e62d46317fa1e47f7f1138
nailgun_sha: fb18068382d522b735ecf446c0f4166c129269fb
ostf_sha: f47fd1d66a7255213ee075d5c11b8f111f922000
production: docker
release: '6.0'

Steps to reproduce:
1. Install 3 controllers in HA with 97 compute nodes on Ubuntu on Supermicro X9SRD-F/X9SRD-F, BIOS 3.0a 02/12/2014 nodes (mongodb + cinder + iscsi lvm)
after installation 2 controllers became high load and every 20 seconds shows kernel trace

<4>Oct 29 09:30:42 node-1 kernel: [50686.111314] ------------[ cut here ]------------
<4>Oct 29 09:30:42 node-1 kernel: [50686.111332] WARNING: CPU: 8 PID: 29151 at /build/buildd/linux-lts-saucy-3.11.0/fs/dcache.c:2561 prepend_path+0x1d4/0x1e0()
<4>Oct 29 09:30:42 node-1 kernel: [50686.111337] Root dentry has weird name <>
<4>Oct 29 09:30:42 node-1 kernel: [50686.111340] Modules linked in: xt_nat xt_conntrack xt_REDIRECT ipt_REJECT xt_tcpudp iscsi_target_mod target_core_mod configfs ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_mangle xt_mark veth xt_state xt_CT iptable_raw xt_comment xt_multiport ip6table_filter ip6_tables iptable_filter ip_tables x_tables openvswitch(OF) psmouse serio_raw gpio_ich nf_conntrack_ipv6 nf_defrag_ipv6 xfs mei_me nf_conntrack_ipv4 mei joydev nf_conntrack lpc_ich nf_defrag_ipv4 ioatdma wmi mac_hid lp parport igb ixgbe hid_generic i2c_algo_bit dca usbhid ahci ptp hid libahci mdio pps_core
<4>Oct 29 09:30:42 node-1 kernel: [50686.111427] CPU: 8 PID: 29151 Comm: netstat Tainted: GF W O 3.11.0-18-generic #32~precise1-Ubuntu
<4>Oct 29 09:30:42 node-1 kernel: [50686.111432] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 3.0a 02/12/2014
<4>Oct 29 09:30:42 node-1 kernel: [50686.111436] 0000000000000a01 ffff88040c4f5d48 ffffffff8173d60f 0000000000000007
<4>Oct 29 09:30:42 node-1 kernel: [50686.111446] ffff88040c4f5d98 ffff88040c4f5d88 ffffffff8106540c ffff88040c4f5df8
<4>Oct 29 09:30:42 node-1 kernel: [50686.111454] ffff88040c4f5eb8 0000000000000000 ffff880407413a40 ffff8805b4c6f020
<4>Oct 29 09:30:42 node-1 kernel: [50686.111464] Call Trace:
<4>Oct 29 09:30:42 node-1 kernel: [50686.111478] [<ffffffff8173d60f>] dump_stack+0x46/0x58
<4>Oct 29 09:30:42 node-1 kernel: [50686.111489] [<ffffffff8106540c>] warn_slowpath_common+0x8c/0xc0
<4>Oct 29 09:30:42 node-1 kernel: [50686.111497] [<ffffffff810654f6>] warn_slowpath_fmt+0x46/0x50
<4>Oct 29 09:30:42 node-1 kernel: [50686.111505] [<ffffffff811c9c24>] prepend_path+0x1d4/0x1e0
<4>Oct 29 09:30:42 node-1 kernel: [50686.111513] [<ffffffff811cb5a6>] d_path+0xf6/0x180
<4>Oct 29 09:30:42 node-1 kernel: [50686.111523] [<ffffffff8121c6cb>] proc_pid_readlink+0x9b/0xf0
<4>Oct 29 09:30:42 node-1 kernel: [50686.111532] [<ffffffff811ba1d6>] SyS_readlinkat+0xf6/0x120
<4>Oct 29 09:30:42 node-1 kernel: [50686.111540] [<ffffffff811ba21b>] SyS_readlink+0x1b/0x20
<4>Oct 29 09:30:42 node-1 kernel: [50686.111550] [<ffffffff817521dd>] system_call_fastpath+0x1a/0x1f
<4>Oct 29 09:30:42 node-1 kernel: [50686.111555] ---[ end trace 1e7c819e296b5a29 ]---

can't attache snapshot because
[root@fuel ~]# du -sh /var/log/
18G /var/log/

Tags: scale
Sergey Galkin (sgalkin)
tags: added: scale
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Installed and run 'atop -w atop.raw' at node-44.
Will attach results of several minutes of recording

Changed in mos:
milestone: none → 6.0
status: New → Confirmed
importance: Undecided → High
assignee: nobody → MOS Linux (mos-linux)
Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

As far as I can see from live monitoring with atop,
LVM | image-glance | busy 94% | read 0 | write 242 | KiB/r 0 | | KiB/w 3 | MBr/s 0.00 | MBw/s 0.38 | avq 1.01 | avio 7.59 ms |
DSK | sdb | busy 94% | read 0 | write 345 | KiB/r 0 | | KiB/w 2 | MBr/s 0.00 | MBw/s 0.38 | avq 1.01 | avio 5.32 ms |

There is no read/write load rate at all but sdb and image-glance LVM are *extremely* busy...
According to atop man "Disk: A busy percentage of 70% or higher is considered ‘critical’."

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Note that while these 'busy periods' a periodic - they do not overlap much with kern stack traces log reports

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.