dasdfmt fails after vary online. syslog show strange message "The disk layout of the DASD is not supported"

Bug #1643527 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Won't Fix
Undecided
Unassigned
linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Problem description:
procedure:
   echo 1 > /sys/bus/ccw/devices/0.0.0199/online
   /sbin/chzdev dasd-eckd 0199 -e -p
   /sbin/dasdfmt -b 4096 -d cdl -f /dev/disk/by-path/ccw-0.0.0199 -y

dasdfmt fails with:
   /sbin/dasdfmt: Unable to open device /dev/disk/by-path/ccw-0.0.0199: No such device

looking at the syslog i can see a strange message that only comes out when i am having this issue:
Nov 21 08:52:12 JUUB16MS kernel: [ 67.412484] dasd-eckd 0.0.0199: The disk layout of the DASD is not supported
lsdasd returns:
   root@JUUB16MS:~# lsdasd
Bus-ID Status Name Device Type BlkSz Size Blocks
==============================================================================
0.0.0192 active dasda 94:0 ECKD 4096 5070MB 1298160
0.0.0195 active dasdb 94:4 FBA 512 40MB 81920
0.0.0193 n/f dasdc 94:8 ECKD
0.0.0196 active dasdd 94:12 FBA 512 40MB 81920
0.0.0194 n/f dasde 94:16 ECKD
0.0.0197 n/f dasdf 94:20 ECKD
0.0.0198 n/f dasdg 94:24 ECKD
0.0.0199 n/f dasdh 94:28 ECKD

you can see that the 199 is not format and can't be formatted...
a workaround is to vary the disk offline and then online again.
this time it will start ok and you can format it and work with it.
------------------------------
just to be clear... this happens after linking to an un-formatted disk dynamically and then vary it online...
for example:
   vmcp link '*' 199 199
   echo 1 > /sys/bus/ccw/devices/0.0.0199/online
   /sbin/chzdev dasd-eckd 0199 -e -p
   /sbin/dasdfmt -b 4096 -d cdl -f /dev/disk/by-path/ccw-0.0.0199 -y

Revision history for this message
bugproxy (bugproxy) wrote : dbginfo

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-148980 severity-high targetmilestone-inin1704
Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

May I ask why:
echo 1 > /sys/bus/ccw/devices/0.0.0199/online
was used and not:
chccwdev -e 0.0.0199

Does the usage of chccwdev to set the device only change the situation?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-21 08:18 EDT-------
we are using the echo method because the code is generic to support different distributions in different releases...
this is the common way to do it on all of them.
I will test if chccwdev works around the issue and update later...
please keep investigating in the meanwhile...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Ubuntu defaults to using and supporting chzdev. There are no Ubuntu releases without chzdev. Please ask other distros to ship chzdev for cross-distro compatibility and support.

If this is being automated inside scripts, one needs to replicate everything that chzdev does - including waiting for the udev to settle.

I recommend you to do something like this:

if ! chzdev dasd-eckd 0199 -e -p; then
echo 1 > /sys/bus/ccw/devices/0.0.0199/online
fi

Or e.g.:
chzdev dasd-eckd 0199 -e -p || :
echo 1 > /sys/bus/ccw/devices/0.0.0199/online

Such that the default and preffered method to enable the devices is always attempted and used first, rather than as a fallback. Especially since all distributions will support chzdev eventually.

Please note, that since many distributions ar emerging sbin & bin, and / with /usr, encoding full path to chzdev might not be as future proof as using $PATH look-up.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-22 01:36 EDT-------
just to mention that this is indeed automated code...
We are using chzdev on Ubuntu to make the device persistent using the following command:
/sbin/chzdev dasd-eckd 0199 -e -p

you suggested we should use this command instead of:
echo 1 > /sys/bus/ccw/devices/0.0.0199/online

i guess you meant chccwdev -e...

We have no problem changing the code to use chccwdev instead of the echo.
other distros are not relevant i only mentioned them because you asked why we use the echo command. we can run whatever is needed on Ubuntu regardless of other distros.

we are using udevadm to settle after the device is linked. we did not settle after the device was brought online.

please ignore the full path for each command... that is generated dynamically by the code. the command path will change dynamically if the command is moved in the future... this is done to overcome cases where the command is not in the PATH.

Now, for the problem itself...
i changed the code to use chccwdev -e to vary the device online.
i changed the code to also settle after the device is varied online.

i am getting:
root@JUUB16MS:~# /sbin/dasdfmt -b 4096 -d cdl -f /dev/disk/by-path/ccw-0.0.019a -y
/sbin/dasdfmt: Unable to open device /dev/disk/by-path/ccw-0.0.019a: No such device

I have uploaded a new sosreport and dbginfo for you to review.

Revision history for this message
bugproxy (bugproxy) wrote : dbginfo using chccwdev and udevadm settle after online

------- Comment (attachment only) From <email address hidden> 2016-11-22 01:32 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport using chccwdev and udevadm settle after online

------- Comment (attachment only) From <email address hidden> 2016-11-22 01:33 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-22 01:45 EDT-------
on device 198 and 199 the output is different:
/sbin/dasdfmt -b 4096 -d cdl -f /dev/disk/by-path/ccw-0.0.0199 -y
Disk in use!

all the details should be in the sosreport and dbginfo.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

What I'm trying to ask you, is to only use chzdev. No echos. No chccwdev. As both echo & chccwdev are redundant if one uses chzdev.

chzdev can online the devices, and make that persistent, and wait for udev to settle. Doing all three actions in one go.

E.g. $ sudo chzdev -e 0199

The -p flag forces chzdev to skip bringing the device online, and only modify the persistent configuration... Yet from your goals you appear to want to both online it & make that persistent.

root@JUUB16MS:~# /sbin/dasdfmt -b 4096 -d cdl -f /dev/disk/by-path/ccw-0.0.019a -y
/sbin/dasdfmt: Unable to open device /dev/disk/by-path/ccw-0.0.019a: No such device

This suggests that device is offline. Has the following command been run?: $ sudo chzdev -e 019a

on device 198 and 199 the output is different:
/sbin/dasdfmt -b 4096 -d cdl -f /dev/disk/by-path/ccw-0.0.0199 -y
Disk in use!

Digging further into the sosreport data.

The behavior above is expected, as there are physical LVM volumes in use on 199-part1 and 198-part2. Note that upon bringing devices online, the physical volumes, logical groups and volumes are automatically activated. One should use vgchange / vgreduce to deactivate relevant volume groups and make sure that physical volumes on 199-part1 & 198-part2 are not in activate use by active volume groups before reformatting. Please note, in addition to LVM, mdadm software RAID, btrfs and zfs can make devices become in use upon bringing them up online as well thanks to systemd-udevd and/or monitoring agents.

S: disk/by-path/ccw-0.0.0199-part1
E: DEVLINKS=/dev/disk/by-id/ccw-IBM.750000000XG921.9241.00-part1 /dev/disk/by-id/ccw-IBM.750000000XG921.9241.00.0000181d000018540000000000000000-part1 /dev/disk/by-id/ccw-0X0199-part1 /dev/disk/by-path/ccw-0.0.0199-part1 /dev/disk/by-id/lvm-pv-uuid-LgRz3a-6rkf-xZt3-FoHb-wCcG-XqgL-clWlWD
E: DEVNAME=/dev/dasdh1
E: DEVPATH=/devices/css0/0.0.0011/0.0.0199/block/dasdh/dasdh1
E: DEVTYPE=partition
E: ID_BUS=ccw
E: ID_FS_TYPE=LVM2_member
E: ID_FS_USAGE=raid
E: ID_FS_UUID=LgRz3a-6rkf-xZt3-FoHb-wCcG-XqgL-clWlWD
E: ID_FS_UUID_ENC=LgRz3a-6rkf-xZt3-FoHb-wCcG-XqgL-clWlWD
E: ID_FS_VERSION=LVM2 001

S: disk/by-path/ccw-0.0.0198-part2
E: DEVLINKS=/dev/disk/by-id/ccw-IBM.750000000XG921.9241.00.0000854400008ae60000000000000000-part2 /dev/disk/by-id/ccw-0X0192-part2 /dev/disk/by-id/lvm-pv-uuid-b2Bob7-Evkm-BJTc-S4m3-Xc2E-Lmbu-u9XHhk /dev/disk/by-id/ccw-IBM.750000000XG921.9241.00-part2 /dev/disk/by-path/ccw-0.0.0198-part2
E: DEVNAME=/dev/dasdg2
E: DEVPATH=/devices/css0/0.0.0010/0.0.0198/block/dasdg/dasdg2
E: DEVTYPE=partition
E: ID_BUS=ccw
E: ID_FS_TYPE=LVM2_member
E: ID_FS_USAGE=raid
E: ID_FS_UUID=b2Bob7-Evkm-BJTc-S4m3-Xc2E-Lmbu-u9XHhk
E: ID_FS_UUID_ENC=b2Bob7-Evkm-BJTc-S4m3-Xc2E-Lmbu-u9XHhk
E: ID_FS_VERSION=LVM2 001

Revision history for this message
Frank Heimes (fheimes) wrote :

so with fresh and clean disks it works - even if using the not recommended way to use echo for setting the disks online (rather than the recommended way of using chzdev):

# during installation in d-i
0.0.0200 (configured)
0.0.0301 (configured) # minidisk 1st part or dasd 2605 mod09
0.0.0302 # minidisk 2nd part or dasd 2605 mod09
0.0.0303 # minidisk 2nd part or dasd 2605 mod09
# only the 1st part configured and used during installation, just to verify that it works in general
...
# rough sample disk layout
DASD 0.0.0200 (ECKD) - 7.4 GB IBM S390 DASD drive
> #1 400.0 MB f ext2 /boot
> #2 6.5 GB f ext4 /
> #3 484.4 MB f swap swap
> 49.2 kB unusable
DASD 0.0.0301 (ECKD) - 2.5 GB IBM S390 DASD drive
> #1 2.5 GB f ext4 /test301
-
# after installation
ubuntu@hwe0009:~$ lsdasd
Bus-ID Status Name Device Type BlkSz Size Blocks
==============================================================================
0.0.0200 active dasda 94:0 ECKD 4096 7042MB 1802880
0.0.0301 active dasdb 94:4 ECKD 4096 2340MB 599040
ubuntu@hwe0009:~$
-
# other minidisks visible ...
ubuntu@hwe0009:~$ ls -la /sys/bus/ccw/devices/0.0.03*
lrwxrwxrwx 1 root root 0 Nov 22 13:33 /sys/bus/ccw/devices/0.0.0301 -> ../../../devices/css0/0.0.0002/0.0.0301
lrwxrwxrwx 1 root root 0 Nov 22 13:33 /sys/bus/ccw/devices/0.0.0302 -> ../../../devices/css0/0.0.0003/0.0.0302
lrwxrwxrwx 1 root root 0 Nov 22 13:33 /sys/bus/ccw/devices/0.0.0303 -> ../../../devices/css0/0.0.0004/0.0.0303
-
# setting 2nd part aka minidis online
ubuntu@hwe0009:~$ cat /sys/bus/ccw/devices/0.0.0302/online
0
-
# trial using echo to set device online (even if chzdev is recommended)
root@hwe0009:~# echo 1 > /sys/bus/ccw/devices/0.0.0302/online
root@hwe0009:~# cat /sys/bus/ccw/devices/0.0.0302/online
1
-
# fully enabled disk in Linux using chzdev
root@hwe0009:~# chzdev dasd 0.0.0302 -e
ECKD DASD 0.0.0302 configured
-
# format works w/o any issues
root@hwe0009:~# dasdfmt -b 4096 -d cdl -f /dev/disk/by-path/ccw-0.0.0302 -y
Finished formatting the device.
Rereading the partition table... ok
root@hwe0009:~#
-
# disk is there
root@hwe0009:~# lsdasd
Bus-ID Status Name Device Type BlkSz Size Blocks
==============================================================================
0.0.0200 active dasda 94:0 ECKD 4096 7042MB 1802880
0.0.0301 active dasdb 94:4 ECKD 4096 2340MB 599040
0.0.0302 active dasdc 94:8 ECKD 4096 2340MB 599040
root@hwe0009:~#

So this is a '+1' for Dimitris investigations that your issue is probably caused by the LVM config ...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-24 08:44 EDT-------
found a workaround to this issue.
the user i was using didn't have a shell specified in /etc/passwd.
changing the user to use /bin/bash as the shell worked around the issue.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-27 01:26 EDT-------
Hi,

at the moment i am forced to bring devices online and make them persistent in 2 phases... I am aware of chzdev being able to do both. thank you for that.

i understand why LVM is preventing my dasdfmt... that is fine. i missed that.

the general case is when i get back "No such device" although i have varied the device online, it is visible in lsdasd and i have used udevadm settle.

this behavior was consistent for me. it always failed...

as i mentioned in comment 15, the user i was using was created without a specific shell... causing it to use the default shell...

changing the user to run with bash worked around the issue. so i have my workaround... this solved the problem for me and i am fine with this solution.

with that said... you can dig further about 19a in the sosreport... you will see that i did brought 19a online (only not using chzdev).
that should be a valid and supported scenario... i think you should check that...

thanks!

Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1643527] Comment bridged from LTC Bugzilla

On 27 November 2016 at 06:29, bugproxy <email address hidden> wrote:

> ------- Comment From <email address hidden> 2016-11-27 01:26 EDT-------
> Hi,
>
> at the moment i am forced to bring devices online and make them
> persistent in 2 phases... I am aware of chzdev being able to do both.
> thank you for that.
>
> i understand why LVM is preventing my dasdfmt... that is fine. i missed
> that.
>
> the general case is when i get back "No such device" although i have
> varied the device online, it is visible in lsdasd and i have used
> udevadm settle.
>
> this behavior was consistent for me. it always failed...
>
> as i mentioned in comment 15, the user i was using was created without a
> specific shell... causing it to use the default shell...
>
>
So does that mean exec calls from dasdfmt that failed becuase the user
account has no (typical) shell specified?
As in, is there something that can be improved in the s390-tools source
code to explicitely use /bin/sh for example to mitigate this?

changing the user to run with bash worked around the issue. so i have my
> workaround... this solved the problem for me and i am fine with this
> solution.
>
> with that said... you can dig further about 19a in the sosreport... you
> will see that i did brought 19a online (only not using chzdev).
> that should be a valid and supported scenario... i think you should check
> that...
>
>
Horum, I am pondering how can the route cuase be further debuged for this
one. This is still reproducible, after the user has a shell set, right?

> thanks!
>
> --
> You received this bug notification because you are a member of Skipper
> Bug Screeners, which is a bug assignee.
> https://bugs.launchpad.net/bugs/1643527
>
> Title:
> dasdfmt fails after vary online. syslog show strange message "The disk
> layout of the DASD is not supported"
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu-z-systems/+bug/1643527/+subscriptions
>

--
Regards,

Dimitri.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

About 019a.

Nov 22 07:13:46 JUUB16MS kernel: [ 73.806642] dasd-eckd 0.0.019a: A channel path to the device has become operational
Nov 22 07:13:46 JUUB16MS kernel: [ 73.808080] dasd-eckd 0.0.019a: New DASD 3390/0C (CU 3990/01) with 1443 cylinders, 15 heads, 224 sectors
Nov 22 07:13:46 JUUB16MS multipathd[505]: dasdi: add path (uevent)
Nov 22 07:13:46 JUUB16MS kernel: [ 73.822758] dasd-eckd 0.0.019a: The disk layout of the DASD is not supported

Indeed this device appears to not be recorgised by the kernel. The kernel fails to identify it as a RAWTRACK; or it may be failing linux kernel layout validation. Net result is that bp_block property remains uninitialised.

Is 019a device, similar to the 0198 device which was processed correctly?
Nov 22 07:12:57 JUUB16MS kernel: [ 24.516335] dasd-eckd 0.0.0198: A channel path to the device has become operational
Nov 22 07:12:57 JUUB16MS kernel: [ 24.517719] dasd-eckd 0.0.0198: New DASD 3390/0C (CU 3990/01) with 1443 cylinders, 15 heads, 224 sectors
Nov 22 07:12:57 JUUB16MS kernel: [ 24.519920] dasd-eckd 0.0.0198: DASD with 4 KB/block, 1038960 KB total size, 48 KB/track, compatible disk layout

Do any other linux systems recognise / can format the 019a device?
Soon there will be 4.9 kernel in zesty to try if that kernel does any better job at detecting this device.
Is the device damaged in any way, or is somehow special?

Looking at the multipath output dasdi (which is 019a) is blacklisted, yet syslog shows that a path to 019a was added. Is 019a a multipath device, and were all paths to it enabled?

Revision history for this message
Frank Heimes (fheimes) wrote :

Indeed, DASDs with a nomenclature of 019* are often for special use, like 0191, 019D and 019E.
But this is mostly by convention only.
Interesting to know if that applies to this case, too?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-12-13 03:49 EDT-------
Hi,

all the devices added are Mini DISKs (MDISKs).
they are all teh same.
there is no layout/physical problem with any disk.
simply taking the device offline and back online fixes the problem without any external intervention... a reboot also fixes the problem.

the device is not a multipath device and it is ok it is blacklisted.

19D and 19E are CMS formated but are not relevant to the issue at hand.

As i mentioned before this only happens when the device is being added by a user with no default shell specified...
when using a bash user the problem does not happen...

I can't explain why this is related but i am getting persistent results when changing the users default shell...

I hope this answers all your questions...

Revision history for this message
bugproxy (bugproxy) wrote : sosreport using chccwdev and udevadm settle after online

------- Comment (attachment only) From <email address hidden> 2016-11-22 01:33 EDT-------

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

The calls that fail are 'open()' calls. In ubuntu we do not have patches to s390-tools for dasdfmt code path. Please seek help from s390-tools upstream developers at IBM.

And call to 'open()' should not require user to have a default shell set - e.g. it should not be resulting in e.g. system() calls.

Alternatively, please provide steps to reproduce the issue - meaning how to recreate the z/VM and the un-dasdfmt-able disks.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-07-05 04:29 EDT-------
Hi Ofer Baruch,

is this issue still relevant and can you still reproduce it?
If this is the case, can you please provide us with some detailed information on how
you reproduce it. If you've got a script that is been used, you may provide that
as well.

I already had a look at the dbginfo data but I can't see any of the commands
that you used to trigger the issue. When did you collect the dbginfo data?
If possible, please run dbginfo after you reproduced the issue.

Furthermore, the workaround with the user shell seems a bit odd and makes me
curious. Can you elaborate on that? How did you create the user in the first place (when
he had no shell yet) and how is the access to /sbin managed for this user? (sudo?)

Thanks a lot and best regards,
Jan

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dbginfo using chccwdev and udevadm settle after online

------- Comment (attachment only) From <email address hidden> 2016-11-22 01:32 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport using chccwdev and udevadm settle after online

------- Comment (attachment only) From <email address hidden> 2016-11-22 01:33 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-09-11 09:06 EDT-------
IBM bugzilla status -> closed , not reproducable. If the problem will be detected in the future, a new bugzilla should be opened

Changed in ubuntu-z-systems:
status: Incomplete → Invalid
Frank Heimes (fheimes)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Frank Heimes (frank-heimes)
assignee: Frank Heimes (frank-heimes) → nobody
Changed in ubuntu-z-systems:
status: Invalid → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.