One high load node on the cluster with 20 nodes cl

Bug #1380968 reported by Sergey Galkin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Committed
Medium
Sergey Galkin
5.0.x
Won't Fix
Medium
MOS Cinder
5.1.x
Won't Fix
Medium
MOS Cinder
6.0.x
Fix Committed
Medium
MOS Cinder
6.1.x
Fix Committed
Medium
Sergey Galkin

Bug Description

api: '1.0'
astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
auth_required: true
build_id: 2014-10-13_00-01-06
build_number: '27'
feature_groups:
- mirantis
fuellib_sha: 46ad455514614ec2600314ac80191e0539ddfc04
fuelmain_sha: 431350ba204146f815f0e51dd47bf44569ae1f6d
nailgun_sha: 88a94a11426d356540722593af1603e5089d442c
ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
production: docker
release: 5.1.1
Steps to reproduce
1. Create cluster with 20 HW nodes
2. Run rally tests
Sometime one node becomes heavily loaded
In my case
14-10-2014 10:13:58Node 'Untitled (92:9e)' is back online
14-10-2014 10:10:33Node 'Untitled (92:9e)' has gone away
14-10-2014 10:06:59Node 'Untitled (92:9e)' is back online
14-10-2014 10:05:03Node 'Untitled (92:9e)' has gone away
14-10-2014 09:58:31Node 'Untitled (92:9e)' is back online
14-10-2014 09:55:32Node 'Untitled (92:9e)' has gone away
14-10-2014 08:52:21Node 'Untitled (92:9e)' is back online
14-10-2014 08:50:27Node 'Untitled (92:9e)' has gone away
14-10-2014 08:46:59Node 'Untitled (92:9e)' is back online
14-10-2014 08:44:57Node 'Untitled (92:9e)' has gone away
14-10-2014 08:41:34Node 'Untitled (92:9e)' is back online
14-10-2014 08:39:26Node 'Untitled (92:9e)' has gone away

top on this node

top - 09:21:17 up 3:36, 2 users, load average: 12.59, 11.67, 9.37
Tasks: 293 total, 1 running, 285 sleeping, 0 stopped, 7 zombie
Cpu(s): 0.9%us, 0.5%sy, 0.0%ni, 54.1%id, 44.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32911996k total, 14377740k used, 18534256k free, 13047952k buffers
Swap: 15999996k total, 0k used, 15999996k free, 147072k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23550 root 20 0 12476 1716 608 D 3 0.0 0:03.80 dd
23776 root 20 0 12476 1716 608 D 3 0.0 0:03.21 dd

part of dmesg
[12685.908490] bio: create slab <bio-1> at 1
[12926.610890] scsi12 : iSCSI Initiator over TCP/IP
[12927.116097] scsi 12:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
[12927.116236] scsi 12:0:0:0: Attached scsi generic sg2 type 12
[12927.116614] scsi 12:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
[12927.116747] sd 12:0:0:1: Attached scsi generic sg3 type 0
[12927.116963] sd 12:0:0:1: [sdc] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[12927.117711] sd 12:0:0:1: [sdc] Write Protect is off
[12927.117715] sd 12:0:0:1: [sdc] Mode Sense: 69 00 00 08
[12927.117884] sd 12:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[12927.119533] sdc: sdc1
[12927.120499] sd 12:0:0:1: [sdc] Attached SCSI disk
[12958.600566] sd 12:0:0:1: [sdc] Synchronizing SCSI cache
[12958.874873] connection7:0: detected conn error (1020)
[12986.390070] scsi13 : iSCSI Initiator over TCP/IP
[12986.895534] scsi 13:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
[12986.895693] scsi 13:0:0:0: Attached scsi generic sg2 type 12
[12986.896150] scsi 13:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
[12986.896262] sd 13:0:0:1: Attached scsi generic sg3 type 0
[12986.896747] sd 13:0:0:1: [sdc] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[12986.897224] sd 13:0:0:1: [sdc] Write Protect is off
[12986.897228] sd 13:0:0:1: [sdc] Mode Sense: 69 00 00 08
[12986.897401] sd 13:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[12987.102184] sdc: sdc1
[12987.103406] sd 13:0:0:1: [sdc] Attached SCSI disk
[13000.669725] sd 13:0:0:1: [sdc] Synchronizing SCSI cache
[13001.068262] connection8:0: detected conn error (1020)

Tags: scale
Revision history for this message
Sergey Galkin (sgalkin) wrote :

Logs from node

tags: added: scale
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Changed in mos:
milestone: none → 6.0
no longer affects: fuel
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

From talk in the chat, I assume that the issue has the same roots as https://bugs.launchpad.net/mos/+bug/1369524. Hence assigning it to Cinder team.

Changed in mos:
status: Confirmed → Won't Fix
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Is issue still reproducible?

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

The bug https://bugs.launchpad.net/mos/+bug/1369524 is fixed in 6.0, hence that one should be fixed too. Please reopen if it reoccurs.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.