==== State: Open by: nguyenp on 31 May 2017 15:46:14 ==== Product Name : OpenPOWER Firmware Product Version : open-power-SMC-P8DTU-V2.00.GA2-20170126-prod Product Extra : op-build-3782262 Product Extra : hostboot-7fdfb37 Product Extra : occ-e6e194f Product Extra : skiboot-5.4.2 Product Extra : linux-4.4.24-openpower1-9641b3a Product Extra : petitboot-v1.4.0-2f8598b Product Extra : p8dtu-xml-9a8fee2 Cable configuration: ==================== On this P8-Briggs system, I have 2 Seagate Storages running with max configuration. There are 84 HDDs drives in each storage. So the total drives is 168 HDDs for both Seagate storages. I connected 2 LSI 9300-8e SAS adapters to 2 Seagate storages with alternate cabling for redundancy. See a Figure on the connection below: Note: Each Seagate storage has 2 I/O moudules connection in the back. Both I/O modules from each Seagate does see the same set of HDDs Cable connection: SAS adapter #1: port1 -----> Seagate #1-A I/O module port0 --------------------------------------> Seagate #2-B I/O module SAS adapter #2: port1 ----> Seagate #2-A I/O module port0 --------------------------------------> Seagate #1-B I/O module Ubuntu 16.04.2: =============== - Running with new kernel Ubuntu 4.8.0-520-generic #550~16.04.1+bz154734 from Mauricio Faria De Oliveira. Problem Description: ==================== In this Briggs system, I'm running with new Ubuntu 4.8.0-520-generic #550~16.04.1+bz154734 that has fix for Multipath problem. Mauricio helped to patch the system with this kernel last week to fix the multipath io_setup failed problem in LTCBug154734. This week, I went ahead and scaled up my test configuration to max configuration 2x5U84_Enclosures,_MaxCfg_168HDDs. This time, it hit a different issue. The issue is that some multipaths only have a single path and no redundancy. Others have multiple paths and redundancy. == Comment: #13 - Paul Nguyen - 2017-06-01 15:19:58 == - I agreed with Mauricio that this problem is a timing problem. - I re-ran the test and noticed that it took more than 50 minutes after system reboot to discover all disks and to build Multipaths correctly. - So for it to take this long, it's going to be a problem. - I have gathered all logs and attaching to the bug for Mauricio to look and confirm. - If there is a workaround or fix for faster probe time then I will try it out. - Below is more information I captured: Checkpoint #1: ============== - system reboot around 2pm (14:00) Checkpoint # 2: =============== - It took several minutes for first disk to be detected. root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | head [Thu Jun 1 14:06:48 2017] sd 17:0:1:0: [sdb] Attached SCSI disk [Thu Jun 1 14:06:51 2017] sd 17:0:2:0: [sdc] Attached SCSI disk [Thu Jun 1 14:06:53 2017] sd 17:0:3:0: [sdd] Attached SCSI disk [Thu Jun 1 14:06:57 2017] sd 17:0:4:0: [sde] Attached SCSI disk [Thu Jun 1 14:07:00 2017] sd 17:0:5:0: [sdf] Attached SCSI disk [Thu Jun 1 14:07:03 2017] sd 17:0:6:0: [sdg] Attached SCSI disk [Thu Jun 1 14:07:05 2017] sd 17:0:7:0: [sdh] Attached SCSI disk [Thu Jun 1 14:07:08 2017] sd 17:0:8:0: [sdi] Attached SCSI disk [Thu Jun 1 14:07:11 2017] sd 17:0:9:0: [sdj] Attached SCSI disk [Thu Jun 1 14:07:14 2017] sd 17:0:10:0: [sdk] Attached SCSI disk root@smb1p1:~# ... root@smb1p1:~# multipath -ll|grep dm |wc -l 103 root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail [Thu Jun 1 14:18:30 2017] sd 17:0:100:0: [sdcr] Attached SCSI disk [Thu Jun 1 14:18:35 2017] sd 17:0:101:0: [sdcs] Attached SCSI disk [Thu Jun 1 14:18:40 2017] sd 17:0:102:0: [sdct] Attached SCSI disk [Thu Jun 1 14:18:44 2017] sd 17:0:103:0: [sdcu] Attached SCSI disk [Thu Jun 1 14:18:54 2017] sd 17:0:105:0: [sdcv] Attached SCSI disk [Thu Jun 1 14:18:59 2017] sd 17:0:106:0: [sdcw] Attached SCSI disk [Thu Jun 1 14:19:04 2017] sd 17:0:107:0: [sdcx] Attached SCSI disk [Thu Jun 1 14:19:09 2017] sd 17:0:108:0: [sdcy] Attached SCSI disk [Thu Jun 1 14:19:14 2017] sd 17:0:109:0: [sdcz] Attached SCSI disk [Thu Jun 1 14:19:19 2017] sd 17:0:110:0: [sdda] Attached SCSI disk root@smb1p1:~# ... root@smb1p1:~# multipath -ll|grep dm |wc -l 126 root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail [Thu Jun 1 14:20:23 2017] sd 17:0:123:0: [sddn] Attached SCSI disk [Thu Jun 1 14:20:28 2017] sd 17:0:124:0: [sddo] Attached SCSI disk [Thu Jun 1 14:20:33 2017] sd 17:0:125:0: [sddp] Attached SCSI disk [Thu Jun 1 14:20:38 2017] sd 17:0:126:0: [sddq] Attached SCSI disk [Thu Jun 1 14:20:44 2017] sd 17:0:127:0: [sddr] Attached SCSI disk [Thu Jun 1 14:20:48 2017] sd 17:0:128:0: [sdds] Attached SCSI disk [Thu Jun 1 14:20:54 2017] sd 17:0:129:0: [sddt] Attached SCSI disk [Thu Jun 1 14:20:59 2017] sd 17:0:130:0: [sddu] Attached SCSI disk [Thu Jun 1 14:21:04 2017] sd 17:0:131:0: [sddv] Attached SCSI disk [Thu Jun 1 14:21:09 2017] sd 17:0:132:0: [sddw] Attached SCSI disk root@smb1p1:~# ... root@smb1p1:~# multipath -ll|grep dm |wc -l 142 root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail [Thu Jun 1 14:21:54 2017] sd 17:0:141:0: [sdee] Attached SCSI disk [Thu Jun 1 14:21:58 2017] sd 17:0:142:0: [sdef] Attached SCSI disk [Thu Jun 1 14:22:04 2017] sd 17:0:143:0: [sdeg] Attached SCSI disk [Thu Jun 1 14:22:08 2017] sd 17:0:144:0: [sdeh] Attached SCSI disk [Thu Jun 1 14:22:14 2017] sd 17:0:145:0: [sdei] Attached SCSI disk [Thu Jun 1 14:22:18 2017] sd 17:0:146:0: [sdej] Attached SCSI disk [Thu Jun 1 14:22:24 2017] sd 17:0:147:0: [sdek] Attached SCSI disk [Thu Jun 1 14:22:29 2017] sd 17:0:148:0: [sdel] Attached SCSI disk [Thu Jun 1 14:22:34 2017] sd 17:0:149:0: [sdem] Attached SCSI disk [Thu Jun 1 14:22:39 2017] sd 17:0:150:0: [sden] Attached SCSI disk root@smb1p1:~# ... root@smb1p1:~# multipath -ll|grep dm |wc -l 163 root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail [Thu Jun 1 14:23:48 2017] sd 17:0:164:0: [sdfa] Attached SCSI disk [Thu Jun 1 14:23:53 2017] sd 17:0:165:0: [sdfb] Attached SCSI disk [Thu Jun 1 14:23:58 2017] sd 17:0:166:0: [sdfc] Attached SCSI disk [Thu Jun 1 14:24:03 2017] sd 17:0:167:0: [sdfd] Attached SCSI disk [Thu Jun 1 14:24:08 2017] sd 17:0:168:0: [sdfe] Attached SCSI disk [Thu Jun 1 14:24:13 2017] sd 17:0:169:0: [sdff] Attached SCSI disk [Thu Jun 1 14:24:19 2017] sd 17:0:170:0: [sdfg] Attached SCSI disk [Thu Jun 1 14:24:23 2017] sd 17:0:171:0: [sdfh] Attached SCSI disk [Thu Jun 1 14:24:28 2017] sd 17:0:172:0: [sdfi] Attached SCSI disk [Thu Jun 1 14:24:33 2017] sd 17:0:173:0: [sdfj] Attached SCSI disk ... root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail [Thu Jun 1 14:24:03 2017] sd 17:0:167:0: [sdfd] Attached SCSI disk [Thu Jun 1 14:24:08 2017] sd 17:0:168:0: [sdfe] Attached SCSI disk [Thu Jun 1 14:24:13 2017] sd 17:0:169:0: [sdff] Attached SCSI disk [Thu Jun 1 14:24:19 2017] sd 17:0:170:0: [sdfg] Attached SCSI disk [Thu Jun 1 14:24:23 2017] sd 17:0:171:0: [sdfh] Attached SCSI disk [Thu Jun 1 14:24:28 2017] sd 17:0:172:0: [sdfi] Attached SCSI disk [Thu Jun 1 14:24:33 2017] sd 17:0:173:0: [sdfj] Attached SCSI disk [Thu Jun 1 14:24:38 2017] sd 17:0:174:0: [sdfk] Attached SCSI disk [Thu Jun 1 14:24:43 2017] sd 17:0:175:0: [sdfl] Attached SCSI disk [Thu Jun 1 14:24:48 2017] sd 17:0:176:0: [sdfm] Attached SCSI disk root@smb1p1:~# root@smb1p1:~# date Thu Jun 1 14:27:03 CDT 2017 root@smb1p1:~# multipath -ll | grep -c 'sd[a-z]\+' 168 root@smb1p1:~# Checkpoint #3: ============= - After 34 minutes, multipath -ll command shows paths with single path and no redundancy. root@smb1p1:~# multipath -ll > multipath.log.06012017.afterReboot root@smb1p1:~# cat multipath.log.06012017.afterReboot |more 35000c50086a3ca97 dm-161 IBM-ESXS,ST10000NM0226 E size=9.0T features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 17:0:170:0 sdfg 130:32 active ready running 35000c50086bae8bf dm-144 IBM-ESXS,ST10000NM0226 E size=9.0T features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 17:0:152:0 sdep 129:16 active ready running 35000c50086baa42f dm-143 IBM-ESXS,ST10000NM0226 E size=9.0T features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 17:0:151:0 sdeo 129:0 active ready running ... Check point #4: =============== - After 43 minutes, multipath -ll command shows some paths with only single path and no redundancy and some path with multiple paths and redundancy. root@smb1p1:~# date Thu Jun 1 14:43:00 CDT 2017 root@smb1p1:~# multipath -ll | grep -c 'sd[a-z]\+' 252 root@smb1p1:~# Checkpoint #5: ============== - After 47 minutes, multipath -ll command still shows some paths with only single path and no redundancy. root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | head [Thu Jun 1 14:06:48 2017] sd 17:0:1:0: [sdb] Attached SCSI disk [Thu Jun 1 14:06:51 2017] sd 17:0:2:0: [sdc] Attached SCSI disk [Thu Jun 1 14:06:53 2017] sd 17:0:3:0: [sdd] Attached SCSI disk [Thu Jun 1 14:06:57 2017] sd 17:0:4:0: [sde] Attached SCSI disk [Thu Jun 1 14:07:00 2017] sd 17:0:5:0: [sdf] Attached SCSI disk [Thu Jun 1 14:07:03 2017] sd 17:0:6:0: [sdg] Attached SCSI disk [Thu Jun 1 14:07:05 2017] sd 17:0:7:0: [sdh] Attached SCSI disk [Thu Jun 1 14:07:08 2017] sd 17:0:8:0: [sdi] Attached SCSI disk [Thu Jun 1 14:07:11 2017] sd 17:0:9:0: [sdj] Attached SCSI disk [Thu Jun 1 14:07:14 2017] sd 17:0:10:0: [sdk] Attached SCSI disk root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail [Thu Jun 1 14:46:15 2017] sd 18:0:112:0: [sdjo] Attached SCSI disk [Thu Jun 1 14:46:20 2017] sd 18:0:113:0: [sdjp] Attached SCSI disk [Thu Jun 1 14:46:25 2017] sd 18:0:114:0: [sdjq] Attached SCSI disk [Thu Jun 1 14:46:31 2017] sd 18:0:115:0: [sdjr] Attached SCSI disk [Thu Jun 1 14:46:36 2017] sd 18:0:116:0: [sdjs] Attached SCSI disk [Thu Jun 1 14:46:41 2017] sd 18:0:117:0: [sdjt] Attached SCSI disk [Thu Jun 1 14:46:46 2017] sd 18:0:118:0: [sdju] Attached SCSI disk [Thu Jun 1 14:46:51 2017] sd 18:0:119:0: [sdjv] Attached SCSI disk [Thu Jun 1 14:46:56 2017] sd 18:0:120:0: [sdjw] Attached SCSI disk [Thu Jun 1 14:47:01 2017] sd 18:0:121:0: [sdjx] Attached SCSI disk root@smb1p1:~# root@smb1p1:~# root@smb1p1:~# date Thu Jun 1 14:47:20 CDT 2017 root@smb1p1:~# multipath -ll | grep -c 'sd[a-z]\+' 288 root@smb1p1:~# Checkpoint #6: ============== - After 51 minutes after system reboot, looks like all disk are discovered and the Multipath is correctly built. root@smb1p1:~# multipath -ll | grep -c 'sd[a-z]\+' 336 root@smb1p1:~# date Thu Jun 1 14:52:05 CDT 2017 root@smb1p1:~# dmesg -T | grep 'sd 1[78]:' | grep 'Attached SCSI disk' | tail [Thu Jun 1 14:50:47 2017] sd 18:0:167:0: [sdlp] Attached SCSI disk [Thu Jun 1 14:50:52 2017] sd 18:0:168:0: [sdlq] Attached SCSI disk [Thu Jun 1 14:50:57 2017] sd 18:0:169:0: [sdlr] Attached SCSI disk [Thu Jun 1 14:51:02 2017] sd 18:0:170:0: [sdls] Attached SCSI disk [Thu Jun 1 14:51:07 2017] sd 18:0:171:0: [sdlt] Attached SCSI disk [Thu Jun 1 14:51:13 2017] sd 18:0:172:0: [sdlu] Attached SCSI disk [Thu Jun 1 14:51:17 2017] sd 18:0:173:0: [sdlv] Attached SCSI disk [Thu Jun 1 14:51:22 2017] sd 18:0:174:0: [sdlw] Attached SCSI disk [Thu Jun 1 14:51:27 2017] sd 18:0:175:0: [sdlx] Attached SCSI disk [Thu Jun 1 14:51:33 2017] sd 18:0:176:0: [sdly] Attached SCSI disk root@smb1p1:~# == Comment: #24 - Mauricio Faria De Oliveira - 2017-06-06 11:42:59 == Hi Paul, Per your logs, yes, it's the slowness with the SES driver. I'll ask Canonical to pick it up for 16.10 and 17.04 so it makes into 16.04.2 and 16.04.3. Thanks, Mauricio == Comment: #26 - Mauricio Faria De Oliveira