Mirantis OpenStack

One high load node on the cluster with 20 nodes cl

Series 6.1.x
Bug #1380968

Bug #1380968 reported by Sergey Galkin on 2014-10-14

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Mirantis OpenStack	Fix Committed	Medium	Sergey Galkin	Mirantis OpenStack 6.0-updates
5.0.x	Won't Fix	Medium	MOS Cinder	Mirantis OpenStack 5.0-updates
5.1.x	Won't Fix	Medium	MOS Cinder	Mirantis OpenStack 5.1.1
6.0.x	Fix Committed	Medium	MOS Cinder	Mirantis OpenStack 6.0-updates
6.1.x	Fix Committed	Medium	Sergey Galkin	Mirantis OpenStack 6.1

Bug Description

api: '1.0'
astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
auth_required: true
build_id: 2014-10-13_00-01-06
build_number: '27'
feature_groups:
- mirantis
fuellib_sha: 46ad455514614ec2600314ac80191e0539ddfc04
fuelmain_sha: 431350ba204146f815f0e51dd47bf44569ae1f6d
nailgun_sha: 88a94a11426d356540722593af1603e5089d442c
ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
production: docker
release: 5.1.1
Steps to reproduce
1. Create cluster with 20 HW nodes
2. Run rally tests
Sometime one node becomes heavily loaded
In my case
14-10-2014 10:13:58Node 'Untitled (92:9e)' is back online
14-10-2014 10:10:33Node 'Untitled (92:9e)' has gone away
14-10-2014 10:06:59Node 'Untitled (92:9e)' is back online
14-10-2014 10:05:03Node 'Untitled (92:9e)' has gone away
14-10-2014 09:58:31Node 'Untitled (92:9e)' is back online
14-10-2014 09:55:32Node 'Untitled (92:9e)' has gone away
14-10-2014 08:52:21Node 'Untitled (92:9e)' is back online
14-10-2014 08:50:27Node 'Untitled (92:9e)' has gone away
14-10-2014 08:46:59Node 'Untitled (92:9e)' is back online
14-10-2014 08:44:57Node 'Untitled (92:9e)' has gone away
14-10-2014 08:41:34Node 'Untitled (92:9e)' is back online
14-10-2014 08:39:26Node 'Untitled (92:9e)' has gone away

top on this node

top - 09:21:17 up 3:36, 2 users, load average: 12.59, 11.67, 9.37
Tasks: 293 total, 1 running, 285 sleeping, 0 stopped, 7 zombie
Cpu(s): 0.9%us, 0.5%sy, 0.0%ni, 54.1%id, 44.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32911996k total, 14377740k used, 18534256k free, 13047952k buffers
Swap: 15999996k total, 0k used, 15999996k free, 147072k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23550 root 20 0 12476 1716 608 D 3 0.0 0:03.80 dd
23776 root 20 0 12476 1716 608 D 3 0.0 0:03.21 dd

part of dmesg
[12685.908490] bio: create slab <bio-1> at 1
[12926.610890] scsi12 : iSCSI Initiator over TCP/IP
[12927.116097] scsi 12:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
[12927.116236] scsi 12:0:0:0: Attached scsi generic sg2 type 12
[12927.116614] scsi 12:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
[12927.116747] sd 12:0:0:1: Attached scsi generic sg3 type 0
[12927.116963] sd 12:0:0:1: [sdc] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[12927.117711] sd 12:0:0:1: [sdc] Write Protect is off
[12927.117715] sd 12:0:0:1: [sdc] Mode Sense: 69 00 00 08
[12927.117884] sd 12:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[12927.119533] sdc: sdc1
[12927.120499] sd 12:0:0:1: [sdc] Attached SCSI disk
[12958.600566] sd 12:0:0:1: [sdc] Synchronizing SCSI cache
[12958.874873] connection7:0: detected conn error (1020)
[12986.390070] scsi13 : iSCSI Initiator over TCP/IP
[12986.895534] scsi 13:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
[12986.895693] scsi 13:0:0:0: Attached scsi generic sg2 type 12
[12986.896150] scsi 13:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
[12986.896262] sd 13:0:0:1: Attached scsi generic sg3 type 0
[12986.896747] sd 13:0:0:1: [sdc] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[12986.897224] sd 13:0:0:1: [sdc] Write Protect is off
[12986.897228] sd 13:0:0:1: [sdc] Mode Sense: 69 00 00 08
[12986.897401] sd 13:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[12987.102184] sdc: sdc1
[12987.103406] sd 13:0:0:1: [sdc] Attached SCSI disk
[13000.669725] sd 13:0:0:1: [sdc] Synchronizing SCSI cache
[13001.068262] connection8:0: detected conn error (1020)

Tags: