multipath errors on vivid and wily kernel?
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | linux (Ubuntu) |
High
|
Unassigned | ||
Bug Description
Oleg and I got changes into curtin to fix bug 1371634.
So that now we can boot into multipath iscsi devices on power.
It seems to go well enough with trusty+hwe-u, but when trying vivid, I am seeing errors.
Boot was failing in one case
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Jun 8 16:05 seq
crw-rw---- 1 root audio 116, 33 Jun 8 16:05 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.17.3-0ubuntu4
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 15.10
Lsusb:
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=screen
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: root=UUID=
ProcLoadAvg: 0.08 0.12 0.10 1/1126 3818
ProcLocks:
1: FLOCK ADVISORY WRITE 2951 00:12:14557 0 EOF
2: POSIX ADVISORY WRITE 2952 00:12:43306 0 EOF
3: POSIX ADVISORY WRITE 2905 00:12:22701 0 EOF
4: POSIX ADVISORY WRITE 2965 00:12:18505 0 EOF
ProcSwaps:
Filename Type Size Used Priority
/swap.img file 8388544 0 -1
ProcVersion: Linux version 3.19.0-20-generic (buildd@fisher03) (gcc version 4.9.2 (Ubuntu 4.9.2-10ubuntu13) ) #20-Ubuntu SMP Fri May 29 10:03:56 UTC 2015
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.144
RfKill: Error: [Errno 2] No such file or directory
Tags: wily uec-images
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
Uname: Linux 3.19.0-20-generic ppc64le
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
cpu_cores: Number of cores present = 20
cpu_coreson: Number of cores online = 20
cpu_dscr: DSCR is 0
cpu_freq:
min: 3.693 GHz (cpu 72)
max: 3.695 GHz (cpu 87)
avg: 3.694 GHz
cpu_runmode:
Could not retrieve current diagnostics mode,
No firmware implementation of function
cpu_smt: SMT=8
| Scott Moser (smoser) wrote : | #1 |
| Changed in linux (Ubuntu): | |
| status: | New → Incomplete |
| tags: | added: vivid |
At least this requires the output of "multipath -ll" and the iscsi setup. Apparently its supporting ALUA, so the targets may be on a real storage server but it would be better you would let us know instead of guessing. Are the errors seen on every boot or just the one time it did not boot?
| Scott Moser (smoser) wrote : | #4 |
marking confirmed... smb has some access now.
| Changed in linux (Ubuntu): | |
| status: | Incomplete → Confirmed |
| Scott Moser (smoser) wrote : | #5 |
Attaching console log of a wily (3.19.0-20-generic) kernel
| summary: |
- multipath errors on vivid kernel? + multipath errors on vivid and wily kernel? |
| Scott Moser (smoser) wrote : CRDA.txt | #6 |
apport information
| tags: | added: apport-collected uec-images wily |
| description: | updated |
| Scott Moser (smoser) wrote : CurrentDmesg.txt | #7 |
apport information
| Scott Moser (smoser) wrote : DeviceTree.tar.gz | #8 |
apport information
| Scott Moser (smoser) wrote : IwConfig.txt | #9 |
apport information
| Scott Moser (smoser) wrote : JournalErrors.txt | #10 |
apport information
| Scott Moser (smoser) wrote : Lspci.txt | #11 |
apport information
| Scott Moser (smoser) wrote : ProcCpuinfo.txt | #12 |
apport information
apport information
| Scott Moser (smoser) wrote : ProcMisc.txt | #14 |
apport information
| Scott Moser (smoser) wrote : ProcModules.txt | #15 |
apport information
| Scott Moser (smoser) wrote : ProcPpc64.tar.gz | #16 |
apport information
| Scott Moser (smoser) wrote : UdevDb.txt | #17 |
apport information
| Scott Moser (smoser) wrote : WifiSyslog.txt | #18 |
apport information
| Scott Moser (smoser) wrote : nvram.gz | #19 |
apport information
| Scott Moser (smoser) wrote : | #20 |
Additional info:
$ sudo multipath -ll
mpath2 (1IBM IPR-0 5EC2A900000000A0) dm-2 IBM,IPR-0 5EC2A900
size=264G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 0:2:2:0 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 1:2:2:0 sdi 8:128 active ready running
mpath1 (1IBM IPR-0 5EC2A90000000060) dm-1 IBM,IPR-0 5EC2A900
size=264G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 0:2:1:0 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 1:2:1:0 sdh 8:112 active ready running
mpath0 (1IBM IPR-0 5EC2A90000000080) dm-0 IBM,IPR-0 5EC2A900
size=264G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 0:2:0:0 sda 8:0 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 1:2:0:0 sdg 8:96 active ready running
mpath5 (1IBM IPR-0 5EC2A90000000020) dm-5 IBM,IPR-0 5EC2A900
size=264G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 0:2:5:0 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 1:2:5:0 sdl 8:176 active ready running
mpath4 (1IBM IPR-0 5EC2A900000000C0) dm-4 IBM,IPR-0 5EC2A900
size=264G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 0:2:4:0 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 1:2:4:0 sdk 8:160 active ready running
mpath3 (1IBM IPR-0 5EC2A90000000040) dm-3 IBM,IPR-0 5EC2A900
size=264G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 0:2:3:0 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 1:2:3:0 sdj 8:144 active ready running
After installation via maas , system hung with the boot log attached as 'console boot from wily (boot hangs)'. I then did and ipmi reset and it came up this time.
system is currently up in this kernel and Stefan's ssh keys should be able to get in as 'ubuntu' user.
| tags: | added: kernel-da-key |
| Changed in linux (Ubuntu): | |
| importance: | Undecided → High |
| Stefan Bader (smb) wrote : | #21 |
From the data I can see all 6 multipath disks are set to failover mode. Which in this special case is done because some implicit logic in the multipath-tools detected that the scsi controller supports ALUA queries and those returned that host0 has all disks in active state. So all IO is going through that controller. The scsi disk devices seen through the other controller (host1) are only in standby.
The boot dmesg reports inconsistencies in the ext4 filesystem on dm-6 which due to a (normally harmless) error message before can be translated into mpath1-part2. Not sure, but on the system you gave me access to the only alias with data was mpath0. So that may be a potential problem or just a different config. And what cannot be seen is whether maybe the installation+reboot had maybe left the filesystem in a bad state.
| Stefan Bader (smb) wrote : | #22 |
In order to get some more insight about when this fs corruption happens, please provide a console log of a provisioning run which on reboot into the prepared system sees those issues.
| Changed in linux (Ubuntu): | |
| status: | Confirmed → Incomplete |
| Scott Moser (smoser) wrote : | #23 |
| Scott Moser (smoser) wrote : | #24 |
| Scott Moser (smoser) wrote : | #25 |
Well, I got what you asked for. full install log (with curtin -vv install).
And the console log of first boot.
Unfortunately, there are no issues in this boot.
I'd really think that I was making things up, in fact, I originally ignored them, but then Mike saw the same thing I saw (log https:/
| Stefan Bader (smb) wrote : | #26 |
Yeah, I am not sure. Somehow it seems that when I install multipath-tools (+multipath-
| Scott Moser (smoser) wrote : | #27 |
I'm copying Mike's log from bug 1371634 comment 21.
Note, this is a full installation log as Stefan asked for, and Mike is still seeing issues.
I have tried the latest curtin-common and python-curtin from the wily repositories running on 14.04.2:
#leftyfb@
curtin-common:
Installed: 0.1.0~bzr214-
Candidate: 0.1.0~bzr214-
Version table:
*** 0.1.0~bzr214-
500 http://
100 /var/lib/
0.
500 http://
0.
500 http://
0.
500 http://
#leftyfb@
python-curtin:
Installed: 0.1.0~bzr214-
Candidate: 0.1.0~bzr214-
Version table:
*** 0.1.0~bzr214-
500 http://
100 /var/lib/
0.
500 http://
0.
500 http://
0.
500 http://
#leftyfb@
maas:
Installed: 1.7.5+bzr3369-
Candidate: 1.7.5+bzr3369-
Version table:
1.
500 http://
*** 1.7.5+bzr3369-
500 http://
100 /var/lib/
1.
500 http://
1.
500 http://
1.
500 http://
The deployment never fully boots up. It never gets to a console or allows an ssh connection. Attached is a log of the deployment and first boot process to debug.
| Scott Moser (smoser) wrote : | #28 |
I collected the log in comment 23 and 24 yesterday, and then walked away.
Looked at the console from that system today, and it shows:
[54550.928492] EXT4-fs error (device dm-9): htree_dirblock_
So it seems that even though I successfully booted, there are some errors, they just weren't found in boot. It is somewhat arbitrary failure, so you might seem OK or might not get through the first boot (as mike has shown).
| Changed in linux (Ubuntu): | |
| status: | Incomplete → Confirmed |
| Scott Moser (smoser) wrote : | #29 |
So, just to catch this bug up with where Oleg and I are.
a.) curtin installs multipath-
b.) the way /etc/multipath/
So, since we've blocked 'b', we dont get the file created and it doesn't get collected into the initramfs.
To solve this, we'll need to run a blocking command with no other side effects in the chroot that creates /etc/multipath/
sudo sh -c 'rm -Rf /etc/multipath && multipath -r >/dev/null && ls -l /etc/multipath/
Last thing, a minor point, there is a race condition in the normal install path that could lead to initramfs not collecting /etc/multipath/
--
[1] http://
[2] http://
| Scott Moser (smoser) wrote : | #30 |
related bug in redhat https:/
| Scott Moser (smoser) wrote : | #31 |
I filed debian bug https:/


This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1462530
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.