[nailgun-agent] Agent hangs on hw with huge block-dev count
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
Krzysztof Szukiełojć |
Bug Description
nailgun-agent gets stuck on system with huge amount of disks: (phys disks - 64)
$ lsblk |wc -l
716
After investigation, we found that root-cause of why it gets stuck - lshw call [0][1]
Strace log for 'strace lshw':
http://
lshw gets stuck on random disk each time. The root-cause is that it tries to reach a partition of multipath device, which can be inaccessible at this moment of time:
###
lsblk |grep -A 3 -B 3 3600144f0534f39
sdaw 67:0 0 15G 0 disk
`-3600144f0534f
|-3600144f053
|-3600144f053
`-3600144f053
sdbm 68:0 0 15G 0 disk
`-3600144f0534f
--
sddi 71:0 0 15G 0 disk
`-3600144f0534f
|-3600144f053
|-3600144f053
`-3600144f053
###
dd if=/dev/zero of=/dev/sdaw2
dd: writing to '/dev/sdaw2': No such process
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.052422 s, 0.0 kB/s
Otherwise, the second path of device is fine:
###
Also, device is fine using mapped name :
dd if=/dev/zero of=/dev/
409601+0 records in
409600+0 records out
209715200 bytes (210 MB) copied, 11.9792 s, 17.5 MB/s
Work-around - trigger lshw with '-disable scsi' key.
[0]https:/
[1]https:/
Changed in fuel: | |
assignee: | nobody → Aleksey Zvyagintsev (azvyagintsev) |
assignee: | Aleksey Zvyagintsev (azvyagintsev) → Fuel Python Team (fuel-python) |
tags: | added: team-mixed |
Changed in fuel: | |
status: | New → Confirmed |
Changed in fuel: | |
status: | Fix Committed → Fix Released |
> Work-around - trigger lshw with '-disable scsi' key.
what if there will be a huge amount of NVME disks? IIRC, they don't use SCSI protocol.