HTX (htxubuntu) DASD exercisers fail

Bug #1648561 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Undecided
Canonical Kernel Team
Xenial
Expired
Undecided
Unassigned
os-prober (Ubuntu)
Won't Fix
High
Mathieu Trudel-Lapierre
Xenial
Won't Fix
High
Mathieu Trudel-Lapierre

Bug Description

== Comment: #1 - Application Cdeadmin <email address hidden> - 2016-12-02 04:55:07 ==
==== State: Open by: tdylla on 01 December 2016 07:24:33 ====

Notice: This Note entry was modified. 2 non-ascii character(s) were replaced with question marks.

 BMC yl13u2bmc

OS yl13u2os

root@YL13U2OS:~# ver
cat: /proc/device-tree/openprom/model: No such file or directory
       ver 1.5.4.5 - OS, HTX, Firmware and Machine details

                           OS: GNU/Linux
                   OS Version: Ubuntu 16.04.1 LTS \n \l
               Kernel Version: 4.4.0-47-generic
                  HTX Version: htxubuntu-422
                    Host Name: YL13U2OS
            Machine Serial No: 100CC9A
           Machine Type/Model: 8335-GTB

root@YL13U2OS:~# uname -a
Linux YL13U2OS 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:38:24 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

root@YL13U2OS:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Dasd exercisers fail with a write error. These have never failed before.

root@YL13U2OS:~# lsblk -o KNAME,TYPE,SIZE,MODEL,ROTA
KNAME TYPE SIZE MODEL ROTA
sda disk 1.8T ST2000NX0253 1
sda1 part 1.8T 1
sdb disk 1.8T ST2000NX0253 1
sdb1 part 1.8T 1

Getting HTX erros from yl13u2os.rch.stglabs.ibm.com

######################## Result Starts Here ################################
Currently running ECG/MDT : /usr/lpp/htx//mdt/mdt.whit
===========================

---------------------------------------------------------------------
Device id:/dev/sda1
Timestamp:Dec 1 01:22:57 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:rule_1_3 numopers= 1907729 loop= 1322123 blk=0xc08768b0 len=262144 dir=DOWN min_blkno=0xaea86084 max_blkno=0xe8e080af
BWRC LBA fencepost Detail:
th_num min_lba max_lba status
0 0 2476e9ff R
1 4766ee58 74704057 R
2 74704058 99783457 R
3 c0876ab0 e8e080af R
write error - errno: 1(?)

---------------------------------------------------------------------

---------------------------------------------------------------------
Device id:/dev/sda1
Timestamp:Dec 1 01:22:57 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:Hardware Exerciser stopped on error

---------------------------------------------------------------------

---------------------------------------------------------------------
Device id:/dev/sdb1
Timestamp:Dec 1 01:23:08 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:rule_1_1 numopers= 1907729 loop= 1394165 blk=0x49e45458 len=262144 dir=DOWN min_blkno=0x3a38202c max_blkno=0x74704057
BWRC LBA fencepost Detail:
th_num min_lba max_lba status
0 0 247c47ff R
1 49e45658 74704057 R
2 74704058 99d2a657 R
3 c0d344b0 e8e080af R
write error - errno: 1(?)

---------------------------------------------------------------------

---------------------------------------------------------------------
Device id:/dev/sdb1
Timestamp:Dec 1 01:23:08 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:Hardware Exerciser stopped on error

---------------------------------------------------------------------

######################### Result Ends Here #################################

System is still running exercisers. Feel Free to play with the system. System is available for any debug that is needed.

==== State: Open by: mamukul1 on 01 December 2016 15:41:32 ====

Write() failing with errno 1 for both sda1 and sdb1.
Some errors seen in dmesg as well in same timeframe.

Over to hxestorage to debug further.
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

<Note by preeti, 2016/12/01 23:47:34 seq: 7 rel: 0 action: note>
Both the devices are failing with errno. set to 1 for write() system call,
which means "operation not permitted".

---------------------------------------------------------------------
Device id:/dev/sda1
Timestamp:Dec 1 01:22:57 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:rule_1_3 numopers= 1907729 loop= 1322123 blk=0xc08768b0 len=262144 dir=DOWN min_blkno=0xaea86084 max_blkno=0xe8e080af
BWRC LBA fencepost Detail:
th_num min_lba max_lba status
0 0 2476e9ff R
1 4766ee58 74704057 R
2 74704058 99783457 R
3 c0876ab0 ) e8e080af R
write error - errno: 1(??

Below is corresponding data in kernel logs (Not sure if it is related to error):

Dec 1 01:22:57 YL13U2OS kernel: [50119.193567] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
Dec 1 01:22:57 YL13U2OS kernel: [50119.201895] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
Dec 1 01:22:57 YL13U2OS kernel: [50119.207728] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
Dec 1 01:22:57 YL13U2OS kernel: [50119.234961] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sda1
Dec 1 01:22:57 YL13U2OS kernel: [50119.249926] FAT-fs (sda1): bogus number of FAT structure
Dec 1 01:22:57 YL13U2OS kernel: [50119.250215] FAT-fs (sda1): Can't find a valid FAT filesystem
Dec 1 01:22:58 YL13U2OS kernel: [50119.700556] XFS (sda1): Invalid superblock magic number
Dec 1 01:22:58 YL13U2OS kernel: [50120.448485] FAT-fs (sda1): bogus number of FAT structure
Dec 1 01:22:58 YL13U2OS kernel: [50120.448818] FAT-fs (sda1): Can't find a valid FAT filesystem
Dec 1 01:22:59 YL13U2OS kernel: [50120.463705] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sda1.
Dec 1 01:22:59 YL13U2OS kernel: [50120.468236] hfsplus: unable to find HFS+ superblock
Dec 1 01:22:59 YL13U2OS kernel: [50120.474019] qnx4: no qnx4 filesystem (no root dir).
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] ufs: You didn't specify the type of your ufs filesystem
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931]
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ...
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931]
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old
Dec 1 01:22:59 YL13U2OS kernel: [50120.481654] ufs: ufs_fill_super(): bad magic number
Dec 1 01:22:59 YL13U2OS kernel: [50120.487379] hfs: can't find a HFS filesystem on dev sda1

Will transfer to Linux to look further.
<Note by preeti, 2016/12/02 04:35:35 seq: 8 rel: 0 action: assign>

== Comment: #2 - Application Cdeadmin <email address hidden> - 2016-12-02 09:55:08 ==
==== State: Open by: tdylla on 02 December 2016 09:53:18 ====

I noticed on a different system that has htxubuntu-424 installed along with a patch from defect sw372840 that the sdb exercisers is running just fine. It currently has a cycle count of 2 and current stanza of 5. The device on this other system is exactly the same drive type.
sdb disk 1.8T ST2000NX0253
 sdb1 part 1.8T

== Comment: #3 - VIPIN K. PARASHAR <email address hidden> - 2016-12-05 05:43:45 ==
root@YL13U2OS:~# cat /proc/partitions
major minor #blocks name

   1 0 65536 ram0
   1 1 65536 ram1
   1 2 65536 ram2
   1 3 65536 ram3
   1 4 65536 ram4
   1 5 65536 ram5
   1 6 65536 ram6
   1 7 65536 ram7
   1 8 65536 ram8
   1 9 65536 ram9
   1 10 65536 ram10
   1 11 65536 ram11
   1 12 65536 ram12
   1 13 65536 ram13
   1 14 65536 ram14
   1 15 65536 ram15
 259 0 3125616984 nvme0n1
 259 1 7168 nvme0n1p1
 259 2 2999266304 nvme0n1p2
 259 3 126342144 nvme0n1p3
   8 0 1953514584 sda
   8 1 1953513560 sda1
   8 16 1953514584 sdb
   8 17 1953513560 sdb1
  11 0 1048575 sr0
  11 1 1048575 sr1
  11 2 1048575 sr2
  11 3 1048575 sr3
root@YL13U2OS:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=508856128k,nr_inodes=7950877,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=107151232k,mode=755)
/dev/nvme0n1p2 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=107151232k,mode=700)
root@YL13U2OS:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 508856128 0 508856128 0% /dev
tmpfs 107151232 32832 107118400 1% /run
/dev/nvme0n1p2 2952071944 7906084 2794186164 1% /
tmpfs 535756096 0 535756096 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 535756096 0 535756096 0% /sys/fs/cgroup
tmpfs 107151232 0 107151232 0% /run/user/0
root@YL13U2OS:~#

== Comment: #7 - VIPIN K. PARASHAR <email address hidden> - 2016-12-06 05:43:06 ==
root@YL13U2OS:~# df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
udev devtmpfs 508856128 0 508856128 0% /dev
tmpfs tmpfs 107151232 32832 107118400 1% /run
/dev/nvme0n1p2 ext4 2952071944 7931124 2794161124 1% /
tmpfs tmpfs 535756096 0 535756096 0% /dev/shm
tmpfs tmpfs 5120 0 5120 0% /run/lock
tmpfs tmpfs 535756096 0 535756096 0% /sys/fs/cgroup
tmpfs tmpfs 107151232 0 107151232 0% /run/user/0
root@YL13U2OS:~#

== Comment: #8 - VIPIN K. PARASHAR <email address hidden> - 2016-12-06 06:33:48 ==
root@YL13U2OS:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/nvme0n1p2 during installation
UUID=6cddb0e5-477c-4d64-807a-631b2d12dfac / ext4 errors=remount-ro 0 1
# swap was on /dev/nvme0n1p3 during installation
UUID=00693a84-74f6-4ded-b82d-6a938880ba8a none swap sw 0 0

root@YL13U2OS:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 1.8T 0 disk
??sda1 8:1 1 1.8T 0 part
sdb 8:16 1 1.8T 0 disk
??sdb1 8:17 1 1.8T 0 part
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 1024M 0 rom
sr2 11:2 1 1024M 0 rom
sr3 11:3 1 1024M 0 rom
nvme0n1 259:0 0 2.9T 0 disk
??nvme0n1p1 259:1 0 7M 0 part
??nvme0n1p2 259:2 0 2.8T 0 part /
??nvme0n1p3 259:3 0 120.5G 0 part [SWAP]

root@YL13U2OS:~# lsblk --fs
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
??sda1
sdb
??sdb1
sr0
sr1
sr2
sr3
nvme0n1
??nvme0n1p1
??nvme0n1p2 ext4 6cddb0e5-477c-4d64-807a-631b2d12dfac /
??nvme0n1p3 swap 00693a84-74f6-4ded-b82d-6a938880ba8a [SWAP]

root@YL13U2OS:~# grep -B 1 '"hxestorage"' /usr/lpp/htx/mdt/mdt
sda1:
 HE_name = "hxestorage" * Hardware Exerciser name, 14 char
--
sdb1:
 HE_name = "hxestorage" * Hardware Exerciser name, 14 char
root@YL13U2OS:~#
root@YL13U2OS:~#
root@YL13U2OS:~# grep 'Device id' /tmp/htxerr
Device id:/dev/sda1
Device id:/dev/sda1
Device id:/dev/sdb1
Device id:/dev/sdb1
root@YL13U2OS:~#

sda1 and sdb2 are only disks being exercised and both have errored out due after
write failure. nvme0n1p1 disk is being used by OS and thus not getting exercised by HTX.

== Comment: #9 - VIPIN K. PARASHAR <email address hidden> - 2016-12-06 07:52:38 ==
[Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:22:57 2016] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sda1
[Thu Dec 1 01:22:57 2016] FAT-fs (sda1): bogus number of FAT structure
[Thu Dec 1 01:22:57 2016] FAT-fs (sda1): Can't find a valid FAT filesystem
[Thu Dec 1 01:22:57 2016] XFS (sda1): Invalid superblock magic number
[Thu Dec 1 01:22:58 2016] FAT-fs (sda1): bogus number of FAT structure
[Thu Dec 1 01:22:58 2016] FAT-fs (sda1): Can't find a valid FAT filesystem
[Thu Dec 1 01:22:58 2016] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sda1.
[Thu Dec 1 01:22:58 2016] hfsplus: unable to find HFS+ superblock
[Thu Dec 1 01:22:58 2016] qnx4: no qnx4 filesystem (no root dir).
[Thu Dec 1 01:22:58 2016] ufs: You didn't specify the type of your ufs filesystem

                           mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ...

                           >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old
[Thu Dec 1 01:22:58 2016] ufs: ufs_fill_super(): bad magic number
[Thu Dec 1 01:22:58 2016] hfs: can't find a HFS filesystem on dev sda1
[Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:23:08 2016] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sdb1
[Thu Dec 1 01:23:08 2016] FAT-fs (sdb1): bogus number of FAT structure
[Thu Dec 1 01:23:08 2016] FAT-fs (sdb1): Can't find a valid FAT filesystem
[Thu Dec 1 01:23:08 2016] XFS (sdb1): Invalid superblock magic number
[Thu Dec 1 01:23:10 2016] FAT-fs (sdb1): bogus number of FAT structure
[Thu Dec 1 01:23:10 2016] FAT-fs (sdb1): Can't find a valid FAT filesystem
[Thu Dec 1 01:23:10 2016] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sdb1.
[Thu Dec 1 01:23:10 2016] hfsplus: unable to find HFS+ superblock
[Thu Dec 1 01:23:10 2016] qnx4: no qnx4 filesystem (no root dir).
[Thu Dec 1 01:23:10 2016] ufs: You didn't specify the type of your ufs filesystem

                           mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ...

                           >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old
[Thu Dec 1 01:23:10 2016] ufs: ufs_fill_super(): bad magic number
[Thu Dec 1 01:23:10 2016] hfs: can't find a HFS filesystem on dev sdb1

Linux has failed to detect file systems on sda1, sdb1 disks, causing write
failures for HTX exerciser. Similar fails are reported for nvme disk also in
Linux kernel log.

== Comment: #10 - VIPIN K. PARASHAR <email address hidden> - 2016-12-06 08:01:35 ==
Linux errors are being by os-prober. I ran os-probe manually and
FS fails got logged in Linux log. So os-probe got invoked while HTX
was running. This caused write fails for sda1, sdb1 disks along with
nvme disks and also logged Linux errors.

== Comment: #11 - VIPIN K. PARASHAR <email address hidden> - 2016-12-06 08:04:55 ==
What operation was tried while HTX was running, once these errors
were seen ? Was it apt upgrade or some thing else ?

== Comment: #12 - Application Cdeadmin <email address hidden> - 2016-12-07 10:56:09 ==
==== State: MoreInfo by: tdylla on 07 December 2016 10:53:58 ====

HTX was started using htx command line commands. From then on, the system was monitored through "System Live Monitor" No other commands were executed by a user. This failure happened during an overnight run. I believe that the Ubuntu OS was loaded to automatically load Security Fix's which is required.

Revision history for this message
bugproxy (bugproxy) wrote : sosreport - YL13U2OS

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-149477 severity-high targetmilestone-inin16042
Revision history for this message
bugproxy (bugproxy) wrote : Kernel log - dmesg

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → os-prober (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-12-08 12:34 EDT-------
== Comment: #12 - Application Cdeadmin <email address hidden> - 2016-12-07 10:56:09 ====== State: Open by: mamukul1 on 01 December 2016 15:41:32 ====

== Comment: #2 - Application Cdeadmin <email address hidden> - 2016-12-02 09:55:08 ====== State: Open by: tdylla on 01 December 2016 07:24:33 ====

summary: - htxubuntu SDB dasd exercisers fail
+ HTX (htxubuntu) SDB dasd exercisers fail
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-12-08 13:03 EDT-------
== Comment: #1 - Application Cdeadmin <email address hidden> - 2016-12-02 04:55:07 ====== State: Open by: mamukul1 on 01 December 2016 15:41:32 ====

summary: - HTX (htxubuntu) SDB dasd exercisers fail
+ HTX (htxubuntu) DASD exercisers fail
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-12-08 13:21 EDT-------
Hello Canonical,

HTX is a user space workload, performing write/read operation
to disks. We are seeing write system call failures for HTX workload
with errno = 1 (Operation not permitted) with intermediate OS package
update.

Along with HTX fails, we also see filesystem fails being logged in Linux
logs. Same linux fails are also seen after running os-prober command
manually. Seems that os-prober has seen similar issues reported in past
and were fixed, but still seeing such fails with Ubuntu 16.04.01. Please
have a look and advise.

Also do let know if its justified to see write/read system calls fail
while os-prober runs parallely ?

description: updated
Changed in os-prober (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-12-27 15:05 EDT-------
==== State: Assigned by: tdylla on 27 December 2016 13:57:28 ====

Could we please get an update on this defect. Thanks.

Tim Gardner (timg-tpi)
Changed in os-prober (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I'm claiming this bug for myself since I know about os-prober and have other grub/os-prober updates in flight. Targetting to ubuntu-17.01.

Is this HTX thing doing read/write operations directly on the disk, or does it use a filesystem? Is it using a known filesystem or its own format?

"This caused write fails for sda1, sdb1 disks along with nvme disks and also logged Linux errors."

Do you mean write errors as reported by HTX?

The os-prober errors listed are just cosmetic, caused by the fact that os-prober, when run (this is run by grub/update-grub to detect possible other OSes on the system), will "probe" the filesystems: first detecting the available partitions, then attempting to "mount" them to finish probing. This is normally done by "grub-probe" and "grub-mount", which should be relatively safe, but also *had* another code path to use straight "mount" which has shown issues (a fix is available in xenial-proposed for it, should be made available in updates soon).

What version of os-prober do you have installed on this system? (use dpkg -l os-prober)

If all else fails, we could disable os-prober altogether; there's another fix coming up that does so on PowerNV due to the effect on Petitboot.

Changed in os-prober (Ubuntu):
assignee: Canonical Foundations Team (canonical-foundations) → Mathieu Trudel-Lapierre (cyphermox)
status: Triaged → Incomplete
milestone: none → ubuntu-17.01
Changed in os-prober (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → High
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-13 03:54 EDT-------
(In reply to comment #21)
> I'm claiming this bug for myself since I know about os-prober and have other
> grub/os-prober updates in flight. Targetting to ubuntu-17.01.
>
> Is this HTX thing doing read/write operations directly on the disk, or does
> it use a filesystem? Is it using a known filesystem or its own format?
>
> "This caused write fails for sda1, sdb1 disks along with nvme disks and also
> logged Linux errors."
>
> Do you mean write errors as reported by HTX?

HTX is a user space test-suite using OS system calls to perform write/read
operations to disks. It opens disks using O_DIRECT flag, thus bypasses filesystem
cache with write/read ops. Here HTX has reported write fails with errno = 1
(Operation not permitted) at same time as os-prober fails get logged into
Linux logs. So wondering if its os-prober that somehow caused HTX write
operations to fail ?

And in case yes, wondering if there a way we can prevent write fails
for user space application due to conflict with os-prober ?

>
> The os-prober errors listed are just cosmetic, caused by the fact that
> os-prober, when run (this is run by grub/update-grub to detect possible
> other OSes on the system), will "probe" the filesystems: first detecting the
> available partitions, then attempting to "mount" them to finish probing.
> This is normally done by "grub-probe" and "grub-mount", which should be
> relatively safe, but also *had* another code path to use straight "mount"
> which has shown issues (a fix is available in xenial-proposed for it, should
> be made available in updates soon).
>
> What version of os-prober do you have installed on this system? (use dpkg -l
> os-prober)
>
> If all else fails, we could disable os-prober altogether; there's another
> fix coming up that does so on PowerNV due to the effect on Petitboot.

It will be good to avoid os-prober errors getting logged into Linux.
This will avoid confusion about things going wrong with file systems
onto box, which isn't the reality due to cosmetic errors.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

That wasn't really what I meant. We *can't* avoid having the errors as they are coming from the kernel as a result of what os-prober does. The best you can do is not run os-prober if you know it's not necessary (setting GRUB_DISABLE_OS_PROBER=1 in /etc/default/grub does that).

As far as the "Operation not permitted" errors, I'm not sure yet, I will need to look in the kernel sources to see if there's a reason to emit that particular error if someone tries to write directly to a device while it's being attempted to be mounted.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-24 13:51 EDT-------
(In reply to comment #23)
> That wasn't really what I meant. We *can't* avoid having the errors as they
> are coming from the kernel as a result of what os-prober does. The best you
> can do is not run os-prober if you know it's not necessary (setting
> GRUB_DISABLE_OS_PROBER=1 in /etc/default/grub does that).
>

Thanks!! we wiil use GRUB_DISABLE_OS_PROBER=1 in /etc/default/grub
to avoid os-prober warnings.

> As far as the "Operation not permitted" errors, I'm not sure yet, I will
> need to look in the kernel sources to see if there's a reason to emit that
> particular error if someone tries to write directly to a device while it's
> being attempted to be mounted.

That will be helpful. Please do let know about findings. In case
"Operation not permitted" error is expected outcome with a disk being
attempted to be mounted, we can close this bug.

Changed in os-prober (Ubuntu):
milestone: ubuntu-17.01 → ubuntu-17.02
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I couldn't find anything meaningful to me in a look at the kernel sources. I except maybe this is because inodes are being synchronized at the time the filesystem is being mounted, and are temporarily immutable? It's unclear, looking at this would require someone a lot more experienced in the kernel filesystem code to understand.

I'm closing the bug as Won't Fix, for the same reason as bug 1416396: the packages affected here aren't the packages we'd possibly apply changes to. If additional work is needed, please reopen against linux and include exactly what filesystem type is being used for the HTX tests (is it ext4? xfs?) so that a kernel engineer can look into it.

Changed in os-prober (Ubuntu Xenial):
status: Triaged → Won't Fix
Changed in os-prober (Ubuntu):
status: Incomplete → Won't Fix
bugproxy (bugproxy)
tags: added: targetmilestone-inin---
removed: targetmilestone-inin16042
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-02-09 07:35 EDT-------
==== State: Open by: tdylla on 09 February 2017 06:26:29 ====

#=#=# 2017-02-09 06:26:27 (CST) #=#=#
Action = [reopen]

I just looked at my Ubuntu and Redhat systems and I see the following:
Redhat - GRUB_DISABLE_OS_PROBER=true
Ubuntu - GRUB_DISABLE_OS_PROBER=true

So if its already disabled, you need to find another fix.
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

If os-prober is not running, I don't see what else might be poking the disks to cause the warnings. We must ignore the other warning messages that were listed from syslog earlier and look at it a different way. Is the Redhat system showing the same write errors?

Is HTX doing raw writes to disk or are you writing files on a filesystem? Again, what filesystems exist on the disks being exercised? Could it be that the filesystem or the disk fails as a consequence of the HTX workload?

Reassigning to 'linux' so that further investigation can be done on the issue.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in os-prober (Ubuntu):
milestone: ubuntu-17.02 → none
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-02-09 13:15 EDT-------
==== State: Open by: tdylla on 09 February 2017 12:06:31 ====

HTX will have to answer the question about what they are using to write to disk. I'm only a tester using the HTX tool.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-02-10 04:05 EDT-------
==== State: Open by: preedhir on 10 February 2017 02:57:34 ====

TO answer the query posted in seq. 33, HTX does read/write to the disk in raw mode (uses O_DIRECT flag to open the device). It uses read()/write() system calls to do the IO operation. In the defect, this write() system call is failing with errno. 1.

Saying that, No filesystem exist on the disk where HTX is running.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Right, that's what I expected, but it's good to be sure.

The bug is assigned to the kernel team, they will want to look into it.

To further define the case, are these devices claimed by multipath or by mdadm? What does 'sudo multipath -ll' show? Multipath or mdadm (or other low-level storage services like LVM) may still claim the disk even if it's not partitioned; those should be eliminated as well as cause for the issue.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-02 12:07 EDT-------
I have not seen a recreate of HTX exercisers fail on the SDB disk. I've been running static HTX and soft bootme HTX on my systems ever since I wrote this defect.

------- Comment From <email address hidden> 2017-04-07 05:39 EDT-------
(In reply to comment #37)
> I have not seen a recreate of HTX exercisers fail on the SDB disk. I've been
> running static HTX and soft bootme HTX on my systems ever since I wrote this
> defect.

This problem is no more recreating now. Closing the bug.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Xenial) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Xenial):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.