Ubuntu
linux package

Activity log for bug #1469829

Date	Who	What changed	Old value	New value	Message
2015-06-29 18:10:05	bugproxy	bug			added bug
2015-06-29 18:10:08	bugproxy	tags		architecture-ppc64le bugnameltc-125862 severity-critical targetmilestone-inin1410
2015-06-29 18:10:09	bugproxy	attachment added		dmesg https://bugs.launchpad.net/bugs/1469829/+attachment/4422022/+files/125862-dmesg
2015-06-29 18:20:44	Luciano Chavez	affects	ubuntu	linux (Ubuntu)
2015-06-29 19:16:36	Chris J Arges	linux (Ubuntu): assignee		Chris J Arges (arges)
2015-06-29 19:16:38	Chris J Arges	linux (Ubuntu): importance	Undecided	Medium
2015-06-29 19:20:19	Chris J Arges	nominated for series		Ubuntu Vivid
2015-06-29 19:20:19	Chris J Arges	bug task added		linux (Ubuntu Vivid)
2015-06-29 19:20:19	Chris J Arges	nominated for series		Ubuntu Trusty
2015-06-29 19:20:19	Chris J Arges	bug task added		linux (Ubuntu Trusty)
2015-06-29 19:20:19	Chris J Arges	nominated for series		Ubuntu Utopic
2015-06-29 19:20:19	Chris J Arges	bug task added		linux (Ubuntu Utopic)
2015-06-29 19:20:27	Chris J Arges	linux (Ubuntu Trusty): importance	Undecided	Medium
2015-06-29 19:20:28	Chris J Arges	linux (Ubuntu Utopic): importance	Undecided	Medium
2015-06-29 19:20:30	Chris J Arges	linux (Ubuntu Vivid): importance	Undecided	Medium
2015-06-29 19:20:32	Chris J Arges	linux (Ubuntu): status	New	In Progress
2015-06-29 19:20:34	Chris J Arges	linux (Ubuntu Trusty): status	New	In Progress
2015-06-29 19:20:37	Chris J Arges	linux (Ubuntu Vivid): status	New	In Progress
2015-06-29 19:20:39	Chris J Arges	linux (Ubuntu Utopic): status	New	In Progress
2015-06-29 19:32:32	Chris J Arges	summary	Firestone system I/O hang	ppc64el should use 'deadline' as default io scheduler
2015-07-14 11:32:48	Anton Blanchard	bug			added subscriber Anton Blanchard
2015-07-23 04:07:10	Launchpad Janitor	linux (Ubuntu): status	In Progress	Fix Released
2015-07-23 04:07:10	Launchpad Janitor	cve linked		2015-1328
2015-08-26 14:24:41	Chris J Arges	bug task added		linux-lts-utopic (Ubuntu)
2015-08-26 14:24:43	Chris J Arges	linux (Ubuntu Utopic): status	In Progress	Invalid
2015-08-26 14:25:39	Chris J Arges	description	-- Problem Description -- Firestone system given to DASD group failed HTX overnight test with miscompare error. HTX mdt.hdbuster was running on secondary drive and failed about 12 hours into test HTX miscompare analysis: ====================-== Device under test: /dev/sdb Stanza running: rule_3 miscompare offset: 0x40 Transfer size: Random Size LBA number: 0x70fc miscompare length: all the blocks in the transfer size - STANZA 3: Creates number of threads twice the queue depth. Each thread - - doing 20000 num_oper with RC operation with xfer size between 1 block - - to 256K. - This miscompare shows read operation is unable to get the expected data from the disk. The re-read buffer also shows the same data as the first read operation. Since the first read and next re-read shows same data, there could be a write operation (of previous rule stanza to initialize disk with pattern 007 ) failure on the disk. The same miscompare behavior shows for all the blocks in the transfer size. /dev/sdb Jun 2 02:29:43 2015 err=000003b6 sev=2 hxestorage <<=== device name (/dev/sdb) rule_3_13 numopers= 20000 loop= 767 blk=0x70fc len=89088 min_blkno=0 max_blkno=0x74706daf, RANDOM access Seed Values= 37303, 290, 23235 Data Pattern Seed Values = 37303, 291, 23235 BWRC LBA fencepost Detail: th_num min_lba max_lba status 0 0 1c9be3ff R 1 1d1c1b6c 3a3836d7 F 2 3a3836d8 57545243 F 3 57545244 74706daf F Miscompare at buffer offset 64 (0x40) <<=== miscompare offset (0x40) (Flags: badsig=0; cksum=0x60000) Maximum LBA = 0x74706daf wbuf (baseaddr 0x3ffe1c0e6600) b0ffffffffffffffffffffffffffffffffffffff rbuf (baseaddr 0x3ffe1c0fc400) 850100fc700200fd700300fe700400ff70050000 Write buffer saved in /tmp/htxsdb.wbuf1 Read buffer saved in /tmp/htxsdb.rbuf1 Re-read fails compare at offset64; buffer saved in /tmp/htxsdb.rerd1 errno: 950(Unknown error 950) Asghar reproduced that HTX hang he is seeing. Looking in the kernel logs I see some messages from the kernel that there are user threads blocked on getting reads serviced. So likely HTX is seeing the same thing. I've asked Asghar to try using the deadline I/O scheduler rather than CFQ to see if that makes any difference. If that does not make any difference, the next thing to try is reducing the queue depth of the device. Right now its 31, which I think is pretty high. Step 1: echo deadline > /sys/block/sda/queue/scheduler echo deadline > /sys/block/sdb/queue/scheduler If that reproduces the issue, go to step 2: echo cfq > /sys/block/sda/queue/scheduler echo cfq > /sys/block/sdb/queue/scheduler echo 8 > /sys/block/sda/device/queue_depth echo 8 > /sys/block/sdb/device/queue_depth Breno - it looks like the default I/O scheduler + default queue depth for the SATA disks in Firestone is not optimal, in that when running a heavy I/O workload, we see read starvation occurring, which is making the system nearly unusable. Once we changed the I/O scheduler from cfq to deadline, all the issues went away and the system is able to run the same workload yet still be responsive. Suggest we either encourage Canonical to change the default I/O scheduler to deadline or at the very least provide documentation to encourage our customers to make this change themselves.	[Impact] Using cfq instead of deadline as the default io scheduler starves certain workloads and causes performance issues. In addition every other arch we build uses deadline as the default scheduler. [Fix] Change the configuration to the following for ppc64el: CONFIG_DEFAULT_DEADLINE=y CONFIG_DEFAULT_IOSCHED="deadline" [Test Case] Boot and cat /sys/block//queue/scheduler to see if deadline is being used. -- -- Problem Description -- Firestone system given to DASD group failed HTX overnight test with miscompare error. HTX mdt.hdbuster was running on secondary drive and failed about 12 hours into test HTX miscompare analysis: ====================-== Device under test: /dev/sdb Stanza running: rule_3 miscompare offset: 0x40 Transfer size: Random Size LBA number: 0x70fc miscompare length: all the blocks in the transfer size - STANZA 3: Creates number of threads twice the queue depth. Each thread -* - doing 20000 num_oper with RC operation with xfer size between 1 block - - to 256K. - This miscompare shows read operation is unable to get the expected data from the disk. The re-read buffer also shows the same data as the first read operation. Since the first read and next re-read shows same data, there could be a write operation (of previous rule stanza to initialize disk with pattern 007 ) failure on the disk. The same miscompare behavior shows for all the blocks in the transfer size. /dev/sdb Jun 2 02:29:43 2015 err=000003b6 sev=2 hxestorage <<=== device name (/dev/sdb) rule_3_13 numopers= 20000 loop= 767 blk=0x70fc len=89088 min_blkno=0 max_blkno=0x74706daf, RANDOM access Seed Values= 37303, 290, 23235 Data Pattern Seed Values = 37303, 291, 23235 BWRC LBA fencepost Detail: th_num min_lba max_lba status 0 0 1c9be3ff R 1 1d1c1b6c 3a3836d7 F 2 3a3836d8 57545243 F 3 57545244 74706daf F Miscompare at buffer offset 64 (0x40) <<=== miscompare offset (0x40) (Flags: badsig=0; cksum=0x60000) Maximum LBA = 0x74706daf wbuf (baseaddr 0x3ffe1c0e6600) b0ffffffffffffffffffffffffffffffffffffff rbuf (baseaddr 0x3ffe1c0fc400) 850100fc700200fd700300fe700400ff70050000 Write buffer saved in /tmp/htxsdb.wbuf1 Read buffer saved in /tmp/htxsdb.rbuf1 Re-read fails compare at offset64; buffer saved in /tmp/htxsdb.rerd1 errno: 950(Unknown error 950) Asghar reproduced that HTX hang he is seeing. Looking in the kernel logs I see some messages from the kernel that there are user threads blocked on getting reads serviced. So likely HTX is seeing the same thing. I've asked Asghar to try using the deadline I/O scheduler rather than CFQ to see if that makes any difference. If that does not make any difference, the next thing to try is reducing the queue depth of the device. Right now its 31, which I think is pretty high. Step 1: echo deadline > /sys/block/sda/queue/scheduler echo deadline > /sys/block/sdb/queue/scheduler If that reproduces the issue, go to step 2: echo cfq > /sys/block/sda/queue/scheduler echo cfq > /sys/block/sdb/queue/scheduler echo 8 > /sys/block/sda/device/queue_depth echo 8 > /sys/block/sdb/device/queue_depth Breno - it looks like the default I/O scheduler + default queue depth for the SATA disks in Firestone is not optimal, in that when running a heavy I/O workload, we see read starvation occurring, which is making the system nearly unusable. Once we changed the I/O scheduler from cfq to deadline, all the issues went away and the system is able to run the same workload yet still be responsive. Suggest we either encourage Canonical to change the default I/O scheduler to deadline or at the very least provide documentation to encourage our customers to make this change themselves.
2015-08-26 14:25:46	Chris J Arges	linux-lts-utopic (Ubuntu): status	New	Invalid
2015-08-26 14:25:52	Chris J Arges	linux-lts-utopic (Ubuntu Utopic): status	New	Invalid
2015-08-26 14:25:53	Chris J Arges	linux-lts-utopic (Ubuntu Vivid): status	New	Invalid
2015-08-26 14:26:01	Chris J Arges	linux (Ubuntu Trusty): assignee		Chris J Arges (arges)
2015-08-26 14:26:03	Chris J Arges	linux (Ubuntu Vivid): assignee		Chris J Arges (arges)
2015-08-26 14:26:05	Chris J Arges	linux-lts-utopic (Ubuntu Trusty): assignee		Chris J Arges (arges)
2015-08-26 14:26:08	Chris J Arges	linux-lts-utopic (Ubuntu Trusty): status	New	In Progress
2015-08-27 14:42:33	Brad Figg	bug task deleted	linux-lts-utopic (Ubuntu Vivid)
2015-08-27 14:42:40	Brad Figg	bug task deleted	linux-lts-utopic (Ubuntu Utopic)
2015-08-27 14:42:56	Brad Figg	linux-lts-utopic (Ubuntu Trusty): status	In Progress	Fix Committed
2015-08-27 14:43:00	Brad Figg	linux (Ubuntu Vivid): status	In Progress	Fix Committed
2015-08-27 14:43:04	Brad Figg	linux (Ubuntu Trusty): status	In Progress	Fix Committed
2015-09-11 18:19:47	Launchpad Janitor	branch linked		lp:ubuntu/trusty-proposed/linux-lts-vivid
2015-09-13 22:37:46	Brad Figg	tags	architecture-ppc64le bugnameltc-125862 severity-critical targetmilestone-inin1410	architecture-ppc64le bugnameltc-125862 severity-critical targetmilestone-inin1410 verification-needed-trusty
2015-09-13 22:38:02	Brad Figg	tags	architecture-ppc64le bugnameltc-125862 severity-critical targetmilestone-inin1410 verification-needed-trusty	architecture-ppc64le bugnameltc-125862 severity-critical targetmilestone-inin1410 verification-needed-trusty verification-needed-vivid
2015-09-21 19:50:16	Mathew Hodson	linux (Ubuntu Utopic): status	Invalid	Won't Fix
2015-09-21 19:50:21	Mathew Hodson	linux-lts-utopic (Ubuntu Trusty): importance	Undecided	Medium
2015-09-21 19:50:26	Mathew Hodson	bug task deleted	linux-lts-utopic (Ubuntu)
2015-09-24 13:41:23	bugproxy	tags	architecture-ppc64le bugnameltc-125862 severity-critical targetmilestone-inin1410 verification-needed-trusty verification-needed-vivid	architecture-ppc64le bugnameltc-125862 severity-critical targetmilestone-inin1410 verification-done-trusty verification-done-vivid
2015-09-28 15:47:08	Launchpad Janitor	linux (Ubuntu Trusty): status	Fix Committed	Fix Released
2015-09-28 20:13:45	Launchpad Janitor	linux-lts-utopic (Ubuntu Trusty): status	Fix Committed	Fix Released
2015-09-28 20:13:46	Launchpad Janitor	linux-lts-utopic (Ubuntu Trusty): status	Fix Committed	Fix Released
2015-09-28 20:15:56	Launchpad Janitor	linux (Ubuntu Vivid): status	Fix Committed	Fix Released
2017-08-10 20:21:35	Frank Heimes	bug task added		ubuntu-power-systems
2017-08-10 20:21:46	Frank Heimes	ubuntu-power-systems: status	New	Fix Released
2017-08-25 17:00:27	Manoj Iyer	ubuntu-power-systems: importance	Undecided	Medium

Ubuntulinux package

Activity log for bug #1469829

Ubuntu
linux package