Activity log for bug #1559167

Date Who What changed Old value New value Message
2016-03-18 14:48:50 Aleksey Zvyagintsev bug added bug
2016-03-18 14:49:05 Aleksey Zvyagintsev fuel: assignee Aleksey Zvyagintsev (azvyagintsev)
2016-03-18 14:49:18 Aleksey Zvyagintsev fuel: assignee Aleksey Zvyagintsev (azvyagintsev) Fuel Python Team (fuel-python)
2016-03-18 14:49:27 Aleksey Zvyagintsev tags area-linux area-python area-linux area-python team-mixed
2016-03-18 15:03:21 Maciej Relewicz fuel: status New Confirmed
2016-03-18 15:06:37 Alexander Gordeev description Nailgun-agent stucks on system with huge amount of disks: (phys disks - 64) $ lsblk |wc -l 716 After investigation, we found that root-cause of stuck - lshw call [0][1] Strace log for 'strace shw': http://paste.openstack.org/show/491111/ lLshw stuck on random disk each time- root-cause , that he tries to reach part of device from multi-path , which can be unaccessible in this time: ### lsblk |grep -A 3 -B 3 3600144f0534f392c000056e972890031-part2 sdaw 67:0 0 15G 0 disk `-3600144f0534f392c000056e972890031 (dm-42) 252:42 0 15G 0 mpath |-3600144f0534f392c000056e972890031-part1 (dm-142) 252:142 0 24M 0 part |-3600144f0534f392c000056e972890031-part2 (dm-143) 252:143 0 200M 0 part `-3600144f0534f392c000056e972890031-part3 (dm-144) 252:144 0 14.4G 0 part sdbm 68:0 0 15G 0 disk `-3600144f0534f392c000056e9728b0041 (dm-60) 252:60 0 15G 0 mpath -- sddi 71:0 0 15G 0 disk `-3600144f0534f392c000056e972890031 (dm-42) 252:42 0 15G 0 mpath |-3600144f0534f392c000056e972890031-part1 (dm-142) 252:142 0 24M 0 part |-3600144f0534f392c000056e972890031-part2 (dm-143) 252:143 0 200M 0 part `-3600144f0534f392c000056e972890031-part3 (dm-144) 252:144 0 14.4G 0 part ### dd if=/dev/zero of=/dev/sdaw2 dd: writing to '/dev/sdaw2': No such process 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.052422 s, 0.0 kB/s Otherwise, second path of device are fine: ### Also, device are fine using mapped name : dd if=/dev/zero of=/dev/mapper/3600144f0534f392c000056e972890031-part2 409601+0 records in 409600+0 records out 209715200 bytes (210 MB) copied, 11.9792 s, 17.5 MB/s Work-around - trigger lshw with '-disable scsi' key. [0]https://github.com/openstack/fuel-nailgun-agent/blob/master/agent#L334 [1]https://github.com/openstack/fuel-nailgun-agent/blob/master/agent#L904-L920 nailgun-agent gets stuck on system with huge amount of disks: (phys disks - 64) $ lsblk |wc -l 716 After investigation, we found that root-cause of why it gets stuck - lshw call [0][1] Strace log for 'strace lshw': http://paste.openstack.org/show/491111/ lshw gets stuck on random disk each time. The root-cause is that it tries to reach a partition of multipath device, which can be inaccessible at this moment of time: ### lsblk |grep -A 3 -B 3 3600144f0534f392c000056e972890031-part2 sdaw 67:0 0 15G 0 disk `-3600144f0534f392c000056e972890031 (dm-42) 252:42 0 15G 0 mpath   |-3600144f0534f392c000056e972890031-part1 (dm-142) 252:142 0 24M 0 part   |-3600144f0534f392c000056e972890031-part2 (dm-143) 252:143 0 200M 0 part   `-3600144f0534f392c000056e972890031-part3 (dm-144) 252:144 0 14.4G 0 part sdbm 68:0 0 15G 0 disk `-3600144f0534f392c000056e9728b0041 (dm-60) 252:60 0 15G 0 mpath -- sddi 71:0 0 15G 0 disk `-3600144f0534f392c000056e972890031 (dm-42) 252:42 0 15G 0 mpath   |-3600144f0534f392c000056e972890031-part1 (dm-142) 252:142 0 24M 0 part   |-3600144f0534f392c000056e972890031-part2 (dm-143) 252:143 0 200M 0 part   `-3600144f0534f392c000056e972890031-part3 (dm-144) 252:144 0 14.4G 0 part ### dd if=/dev/zero of=/dev/sdaw2 dd: writing to '/dev/sdaw2': No such process 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.052422 s, 0.0 kB/s Otherwise, the second path of device is fine: ### Also, device is fine using mapped name : dd if=/dev/zero of=/dev/mapper/3600144f0534f392c000056e972890031-part2 409601+0 records in 409600+0 records out 209715200 bytes (210 MB) copied, 11.9792 s, 17.5 MB/s Work-around - trigger lshw with '-disable scsi' key. [0]https://github.com/openstack/fuel-nailgun-agent/blob/master/agent#L334 [1]https://github.com/openstack/fuel-nailgun-agent/blob/master/agent#L904-L920
2016-03-21 11:12:30 OpenStack Infra fuel: status Confirmed In Progress
2016-03-21 11:12:30 OpenStack Infra fuel: assignee Fuel Python Team (fuel-python) Krzysztof Szukiełojć (kszukielojc)
2016-03-22 15:16:09 OpenStack Infra fuel: status In Progress Fix Committed
2016-06-14 14:24:39 Aleksey Zvyagintsev fuel: status Fix Committed Fix Released