Ubuntu 15.04 Install Error with Avago Controller
- Wily (15.10)
- Bug #1475166
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Medium
|
Bryan Quigley | ||
Vivid |
Fix Released
|
Undecided
|
Unassigned | ||
Wily |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Hello Canonical Team
We are running to an issue installing ubuntu PPC64LE 15.04 full blown image on our servers (with RAID controller). Installation hangs around 70% of progress.
An important part of our configuration is Avago RAID controller: 9361-8i Firmware version we used is 4.300.00-4429, and the package is 24.10.0-0002.
IMPORTANT: It has to be noted that with same hardware configuration 14.10 full blown image installs fine.
Dmesg logs are attached:
Looking at dmesg logs point out that:
LSI (avago) driver loads fine
During further interaction with raid volume during the installation process, firmware errors are seen
[ 196.991417] megasas: FW status 0x3
[ 196.999376] megasas: FW status 0x3
[ 197.007376] megasas: FW status 0x3
Further down the process I/O errors are thrown
[ 217.438664] Buffer I/O error on device sda2, logical block 22052864
[ 217.438671] Buffer I/O error on device sda2, logical block 22052865
[ 217.438678] Buffer I/O error on device sda2, logical block 22052866
[ 217.438686] Buffer I/O error on device sda2, logical block 22052867
Full dmesg log is attached. Snippet is pasted below highlighting:
It can be noted that there’s a Mellanox Connectx 3 pro card with some test Firmware on our setup.
We can Ignore any diagnostic messages from that card in dmesg logs for the purpose of this bug.
Thanks
Adi Gangidi
Adi Gangidi (adi-gangidi) wrote : | #1 |
Brad Figg (brad-figg) wrote : Missing required logs. | #2 |
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
tags: | added: vivid |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
tags: | added: kernel-da-key |
Changed in linux (Ubuntu): | |
assignee: | nobody → Chris J Arges (arges) |
Chris J Arges (arges) wrote : | #3 |
A few questions:
1) Are you using any out of tree drivers for your controller?
2) Are you able to install 14.10 then install a newer kernel? Does this produce a similar failure? This would be helpful in debugging as we could get things like a crashdump or additional debugging information.
3) If able to reproduce with 14.10 + vivid kernel, can you try using the out of tree driver and see if that fixes things?
3) There are many related Avago commits between v3.16 and v3.19, I also checked upstream for any related fixes and some have been introduced via stable. We could perform a bisect between the versions and identify the faulty commit. If you can get a reproducer working via step (2) then we could also go this route.
Thanks,
--chris j arges
boga (bogatzeng) wrote : | #4 |
Hi Chris,
Please see our feedback as below,
1) No, we only use in-box driver.
2) Yes, we can install OS by 14.10 first and try to update the kernel to vivid. But we did not find vivid kernel for PPC64EL under http://
Thanks,
Boga Tseng
Chris J Arges (arges) wrote : | #5 |
Bosa,
3.19 series ppc64el kernels can be found here in the vivid archive:
https:/
Go ahead and try this one:
https:/
These two should be sufficient:
linux-image-
linux-image-
If you have DKMS modules or anything that requires headers to be rebuilt, install header files too.
Thanks,
boga (bogatzeng) wrote : | #6 |
Hi Chris,
Sorry for the late replay.
We did install the kernel 3.19.0.22 in Ubuntu 14.10, and the OS is not able to boot up anymore.
Please check the messages as below, thanks!
1. The message from 14.10 with original kernel to 3.19.0.22 kernel installation completion.
[ 0.000000] OPAL V3 detected !
[ 0.000000] Reserved memory: failed to reserve memory for node 'ibm,firmware-
[ 0.000000] Reserved memory: failed to reserve memory for node 'ibm,firmware-
[ 0.000000] Reserved memory: failed to reserve memory for node 'ibm,slw-
[ 0.000000] Reserved memory: failed to reserve memory for node 'ibm,slw-
[ 0.000000] Reserved memory: failed to reserve memory for node 'ibm,firmware-
[ 0.000000] Reserved memory: failed to reserve memory for node 'ibm,firmware-
[ 0.000000] Reserved memory: failed to reserve memory for node 'ibm,firmware-
[ 0.000000] Allocated 4718592 bytes for 2048 pacas at c00000000fb80000
[ 0.000000] Using PowerNV machine description
[ 0.000000] Page sizes from device-tree:
[ 0.000000] base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
[ 0.000000] base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
[ 0.000000] base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
[ 0.000000] base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
[ 0.000000] base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
[ 0.000000] base_shift=20: shift=20, sllp=0x0130, avpnm=0x00000000, tlbiel=0, penc=2
[ 0.000000] base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
[ 0.000000] base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
[ 0.000000] Page orders: linear mapping = 24, virtual = 16, io = 12, vmemmap = 24
[ 0.000000] Using 1TB segments
[ 0.000000] kvm_cma: CMA: reserved 1664 MiB
[ 0.000000] Found initrd at 0xc000000003340
[ 0.000000] OPAL: Power8 LPC bus found, chip ID 0
[ 0.000000] bootconsole [udbg0] enabled
[ 0.000000] CPU maps initialized for 8 threads per core
[ 0.000000] (thread shift is 3)
[ 0.000000] Freed 4325376 bytes for unused pacas
[ 0.000000] Starting Linux PPC64 #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 2014
[ 0.000000] -------
[ 0.000000] ppc64_pft_size = 0x0
[ 0.000000] physicalMemorySize = 0x800000000
[ 0.000000] htab_address = 0xc0000007fe000000
[ 0.000000] htab_hash_mask = 0x3ffff
[ 0.000000] -------
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] ...
Chris J Arges (arges) wrote : | #7 |
Ok let's try one more kernel. So get machine into working state with 14.10, then install the following v4.1 kernel:
https:/
If this also fails, we'll need to either bisect between v3.16..v3.19, and or try and figure out which patch causes the regression.
Thanks,
boga (bogatzeng) wrote : | #8 |
Hi Chris,
Please let us clarify first, we install the following package first:
linux-header-all
linux-header-
linux-image
We reboot the system and it hangs with above messages.
So we re-installed a new 14.10 system and update packages by the following order again:
linux-header-all
linux-header-
linux-image
linux-image-extra
Now it seems be able to boot up OS after we installed linux-image-extra.
Would you please help to clarify what the impact is for installation of linux-image-extra?
However, we saw mounting filesystem error with ext4 while installing Ubuntu 15.04, can we emulate this kind of behavior if we just update the kernel from 14.10?
Regards,
Boga
Adi Gangidi (adi-gangidi) wrote : | #9 |
Hello Chris
Hi from Adi
Here is a update on my findings from 15.04 PPC64LE Live image:
I tried latest driver from avago 06.809.16. Preliminary every thing looked okay: Creating, deleting and accessing RAID0 volume.
However, coming to particular error that we are facing that: 15.04 Ubuntu Disk Install blocked on Avago RAID Disk. That error,from logs and console messages, has to do with mounting ext4 file system on Avago Raid Disk. But as you can imagine it’s a bit hard to reproduce this full install and debug. So I tried to reproduce that specific part of mounting in live image with newest driver.
So basically I focussed to pin point on what’s causing that error:
a)I created a partition on raid disk (from Avago) with ext4.
b)Tried to mount the ext4
Doing so returns an I/O error, even in live image with Avago’s latest driver . I pasted the commands below: and DMESG error at extreme bottom:
Procedure to reproduce this error:
To make it more simple to reproduce and debug: I booted to a Live Ubuntu OS PPC64LE 15.04 Version:
And executed the following commands:
storcli64 /c0 /vall del force (delete virtual disk)
storcli64 /c0 add vd type=r0 drives=252:0-3 (Create new v drive)
storcli64 /c0 /vall start init force (Initiate the v drive)
root@ubuntu:/# fdisk /dev/sda
Welcome to fdisk (util-linux 2.25.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): p
Disk /dev/sda: 3.5 TiB, 3838627020800 bytes, 7497318400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: C6F3B670-
Command (m for help): n
Partition number (1-128, default 1): 1
First sector (2048-7497318366, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-7497318366, default 7497318366):
Created a new partition 1 of type 'Linux filesystem' and of size 3.5 TiB.
Command (m for help): p
Disk /dev/sda: 3.5 TiB, 3838627020800 bytes, 7497318400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: C6F3B670-
Device Start End Sectors Size Type
/dev/sda1 2048 7497318366 7497316319 3.5T Linux filesystem
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@ubuntu:/# fdisk -l
Disk /dev/loop0: 342.8 MiB, 359481344 bytes, 702112 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdb: 3.8 GiB, 4037017600 bytes, 7884800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 653C790A-
Device Start End Sectors Size Type
/dev/sdb1 34 15659 15626 7.6M Linux filesystem
...
Adi Gangidi (adi-gangidi) wrote : | #10 |
Hey Chris
Is there anything else you'd want us to try regarding this bug ?
Boga seems to have been able to upgrade the kernel.
I have shown that in 15.04 live image also, partitioning and mounting a ext4 File system returns an I/O error in above post.
Based on these findings can we start to bisect the change ?
Thanks
Adi
Bryan Quigley (bryanquigley) wrote : | #11 |
Hi Adi,
I would go ahead and proceed with a bisect. The first step would be to test if you can build[1] the 14.10 (git checkout v3.16) kernel and it boots fine.
Then confirm the failure of 3.19 before proceeding to a normal bisect [1].
Which if everything above is expected, would be git bisect start v3.19 v3.16
Kind regards,
Bryan
[1] https:/
[2] https:/
tags: | added: sts |
Chris J Arges (arges) wrote : | #12 |
Agree, a bisect would be the best course of action. Let us know if you need any assistance.
Adi Gangidi (adi-gangidi) wrote : | #13 |
14.10 with 3.19 is NOT a failure. It boots up fine.
So am not sure now if a bisection makes sense at all.
Please let me know if you have any follow up experiments in mind.
Bryan Quigley (bryanquigley) wrote : | #14 |
It would be worth trying 4.1 as well built from source as that should match the other kernel test above. Small fixes do make it after the release and that should rule that out.
The most likely possibility that I see is that it is an Ubuntu specific kernel config change.
It could also be compiler differences between 14.10 and 15.04 - or some other Ubuntu kernel packaging change.
Bryan Quigley (bryanquigley) wrote : | #15 |
You can also try building 3.19 with the config from Ubuntu - see the bottom of GitKernelBuild for more. Alternatively you could grab the config from a 15.04 using step 4 in GitKernelBuild.
[1] http://
Adi Gangidi (adi-gangidi) wrote : | #16 |
Thanks Bryan
1)
I tried 14.10 with 4.1 Kernel and build seems to boot fine.
Apart from checking that boot-up happens fine, is there any other sanity I need to check ?
2)
Here is diff between default configs from uptopic (left) and vivid (right) :
https:/
Is there a key difference in config from above link: that's making you think that configuration change might have been causing this issue ?
Bryan Quigley (bryanquigley) wrote : | #17 |
1) I would at least try booting multiple times.
If possible I would try partitioning so you have extra space and can set up a new ext4 partition on the Controller for every test. Then deleting before running the next one.
In fact, I'd recommend for the next test to install 15.04 on a SATA drive in the system and see if you can reproduce with the controller just doing the above. Try reproducing it at least 3 times to be sure it freezes every time.
Then if you do another Git Kernel build (v3.19) it will use 15.04s config by default (as well as the new compiler). This would help rule out it being an Ubuntu specific patch (or prove that it is). Please do post the 14.10 config from your GitKernelBuild so we can see the full working config you used.
2) There is no specific raid or ext4 change that I've been able to find so if we can narrow it more that would help.
Another useful data point would be to try a wily development image (will be 15.10) - http://
Adi Gangidi (adi-gangidi) wrote : | #18 |
- dmesg_logs_for_different_kernels.zip Edit (37.0 KiB, application/zip)
1)
Even though I was able to boot up with kernels v3.19 or v4.1 smoothly. On closer inspection of Dmesg logs. There is an I/O error during bootup for 3.19 and 4.1 which I don't see for v3.16 kernel. This error is similar to the one we get in 15.04 installation to RAID.
Dmesg logs are attached
This gives me the basis for going ahead with bisection.
2) Further I had to forgo and re-build my environment since the build seems to have gone into a bad state when was moving around some files after upgrading kernel to v4.1.
BUT now, I am NO longer able to generate the custom .deb kernel packages for PPC64LE arch.
Instead
make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=
now generates POWERPC arch images instead of PPC64LE.
For example:
linux-headers-
Am I missing something ? I was successfully able to generate kernel deb packages yesterday for PPC64LE. I want to try and replicate that environment so that I can go ahead with bisection
Changed in linux (Ubuntu): | |
status: | Incomplete → Triaged |
Bryan Quigley (bryanquigley) wrote : | #19 |
1) I would definitely recommend booting up/checking the logs multiple times on every leg of the test to ensure that it gets a consistent result.
2) I don't understand how this would have happened. It does occur to me that if we're dealing with possible IO corruption there could be other issues too, I'd suggest:
If you can reproduce without having the OS or Kernel code on the Avago controller that would be the best option. (like I described in C#17)
If not, reboot back to the stable v3.16 kernel before doing any git or build commands.
Adi Gangidi (adi-gangidi) wrote : | #20 |
1) Yes there is a consistent I/O error like the one i attached in my last comment for v 3.19 and v 4.1 dmesg. This happens across multiple reboots
2)Yes, it was the I/O corruption that caused it.
Regarding what you recommended in C17 regarding installing 15.04 on SATA and stuff. I am not sure why that is different from trying the 14.10 and 3.19 kernel which is giving the error anyways. I did try creating a new ext4 partition in this 14.10 3.19 combination and running FIO on it which gave me bunch of I/O errors.
Bryan Quigley (bryanquigley) wrote : | #21 |
Sorry, I forgot that was also in C#17. I just wanted to minimize the chance the kernel build/git operations would be affected by the IO issues. Sticking with 14.10 for the bisection is perfect.
Adi Gangidi (adi-gangidi) wrote : | #22 |
As I mentioned in my Comment 18
I can't get further with bisection since:
I am at a bisect state between 3.16 and 3.19 where I can't generate a Little Indian kernel deb package. (PPC64LE). All it generates is a POWERPC deb. Console doesn't let me customize it in the config make oldconfig stage,
So after 'git bisect' and checkout to bisected state
make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=
now generates POWERPC arch images instead of PPC64LE.
For example:
linux-headers-
Bryan Quigley (bryanquigley) wrote : | #23 |
Sorry I misunderstood, I'm guessing if you go back to v3.16 it would be fine? Perhaps something about the build is wrong at the checkout preventing that.
There are two ways to try proceeding:
git bisect skip (trys to pick another commit) and see if it builds PPC64LE
checkout v3.18 or 3.17 and see if you can get a PPC64LE to test from them
boga (bogatzeng) wrote : | #24 |
Hi Adi,
1)
We have met a similar problem when we build 3.16 kernel like your C#18&22, and there goes a solution by Johnny.
You have to modify scripts/
...
ppc*)
...
So you can build ppc64el kernel instead of powerpc after fixing the builddeb script.
2)
We start to bisect v3.16 and v3.19, and consistent I/O error happens during copying files to RAID volumns in v3.19 but not in v3.16. It seems take lots of time to finish the bisection so we will keep status post.
Adi Gangidi (adi-gangidi) wrote : | #25 |
Thanks a ton Boga! That hellps!
Bryan
Here is a difference between custom config of a Kernel version where bug is NOT seen (3.18) and where it is seen (3.19)
https:/
There are bunch of config differences. Could any of these could have cause the issue ?
One of the differences is related to SCSI below:
CONFIG_
Hope this helps in you giving a more concrete direction to us instead of bisection.
thanks
Changed in linux (Ubuntu): | |
assignee: | Chris J Arges (arges) → Bryan Quigley (bryanquigley) |
Johnny Chang (jonnyhihi) wrote : | #26 |
Hi Bryan,
We have done bisection, and found the first bad commit is "34b48db" as shown below.
With this bad commit, the filesystem returns I/O error while copying files to disk.
We use the same config file copy from ubuntu 14.10, and follow the build instruction[1].
Would you like to build an install-able ISO image with reverting this commit? And then we can verify if the installation issue can be solved.
[1] https:/
commit 34b48db66e08ca1
Author: Christoph Hellwig <email address hidden>
Date: Sat Sep 6 16:08:05 2014 -0700
block: remove artifical max_hw_sectors cap
Set max_sectors to the value the drivers provides as hardware limit by
default. Linux had proper I/O throttling for a long time and doesn't
rely on a artifically small maximum I/O size anymore. By not limiting
the I/O size by default we remove an annoying tuning step required for
most Linux installation.
Note that both the user, and if absolutely required the driver can still
impose a limit for FS requests below max_hw_sectors_kb.
Signed-off-by: Christoph Hellwig <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
diff --git a/block/
index aa02247..6ed2cbe 100644
--- a/block/
+++ b/block/
@@ -257,9 +257,7 @@ void blk_limits_
}
- limits-
- limits->max_sectors = min_t(unsigned int, max_hw_sectors,
- BLK_DEF_
+ limits->max_sectors = limits-
}
EXPORT_
diff --git a/drivers/
index dd73e1f..46c282f 100644
--- a/drivers/
+++ b/drivers/
@@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
- blk_queue_
+ blk_queue_
d->bufpool = mp;
diff --git a/include/
index 0207a78..74d14db 100644
--- a/include/
+++ b/include/
@@ -1186,7 +1186,6 @@ extern int blk_verify_
enum blk_default_limits {
- BLK_DEF_MAX_SECTORS = 1024,
};
Bryan Quigley (bryanquigley) wrote : | #27 |
Thanks for the bisect results! We can confirm that this fixes the issue by manually specifying the max_sectors_kb and seeing if that fixes the issue.
1. Boot the server installation and stop at the partitioning screen (this way we make sure it has detected everything)
2. Switch to another VT (Ctrl-Alt-F2)
3. Confirm and post the values of cat /sys/block/
4. Execute - echo 1024 > /sys/block/
5. Confirm it was really changed with cat
6. Proceed with the installation
Now, if you want to use this as a more permanent workaround you need to specify the 1024 value in sysfs on the installed system as well. Once confirmed, we would want to get from Avago (or at least some manual testing) what the real hw maximum of this value should be and add it the driver (as described in the commit message).
Adi Gangidi (adi-gangidi) wrote : Re: [Bug 1475166] Re: Ubuntu 15.04 Install Error with Avago Controller | #28 |
Hello Bryan
I tried to your work around:
A) Go to the 15.04 installation partition screen to see all disks are
detected
B) Go to VT and check the values of nodes
/sys/devices/
/block/sde/queue # cat max_sectors_kb
4096
/sys/devices/
/block/sde/queue # cat max_hw_sectors_kb
4096
C) Set and cat
echo 1024 > /sys/block/
~ # cat /sys/block/
1024
D) Go back to Partition screen
E) Partition the Avago Partition
F) Partitioning and mounting fails with same error of mounting ext4
filesystem
G) I go and check back the value of max_sectors_kb and it gets back to
4096. Which basically means paritioning the disk is resetting this node.
~ # cat /sys/block/
4096
I am not sure what the commit is for exactly. But can you help build a
15.04 default image without it to check. Or suggest something else.
Thanks
Adi
On 8/24/15, 8:58 AM, "Bryan Quigley" <email address hidden> wrote:
>Thanks for the bisect results! We can confirm that this fixes the issue
>by manually specifying the max_sectors_kb and seeing if that fixes the
>issue.
>1. Boot the server installation and stop at the partitioning screen
>(this way we make sure it has detected everything)
>2. Switch to another VT (Ctrl-Alt-F2)
>3. Confirm and post the values of cat
>/sys/block/
>the block device for the Avago controller)
>4. Execute - echo 1024 > /sys/block/
>5. Confirm it was really changed with cat
>6. Proceed with the installation
>
>Now, if you want to use this as a more permanent workaround you need to
>specify the 1024 value in sysfs on the installed system as well. Once
>confirmed, we would want to get from Avago (or at least some manual
>testing) what the real hw maximum of this value should be and add it the
>driver (as described in the commit message).
>
>--
>You received this bug notification because you are subscribed to the bug
>report.
>https:/
>
>Title:
> Ubuntu 15.04 Install Error with Avago Controller
>
>Status in linux package in Ubuntu:
> Triaged
>
>Bug description:
> Hello Canonical Team
>
> We are running to an issue installing ubuntu PPC64LE 15.04 full blown
> image on our servers (with RAID controller). Installation hangs
> around 70% of progress.
>
> An important part of our configuration is Avago RAID controller:
> 9361-8i Firmware version we used is 4.300.00-4429, and the package is
> 24.10.0-0002.
>
> IMPORTANT: It has to be noted that with same hardware configuration
> 14.10 full blown image installs fine.
>
> Dmesg logs are attached:
>
> Looking at dmesg logs point out that:
> LSI (avago) driver loads fine
> During further interaction with raid volume during the installation
>process, firmware errors are seen
> [ 196.991417] megasas: FW status 0x3
>
> [ 196.999376] megasas: FW status 0x3
>
> [ 197.007376] megasas: FW status 0x3
> Further down the process I/O errors are thrown
> [ 217.438664] Buffer I/O error o...
Bryan Quigley (bryanquigley) wrote : | #29 |
All that patch appears to do is to remove an artificial limit for all devices to 1024. I haven't been able to reproduce the option reset on my systems, so here is what I would try:
Try setting the controller option and see if that sticks [1]:
rmmod megaraid_sas
modprobe megaraid_sas max_sectors=2048
It may also be possible to do this while booting as a kernel command option like: megaraid_
(2048 does actually mean 1024 for the other options).
Otherwise, I'd recommend switching to a Live CD and see if the option works there.
If not, a last option would be to try using sysfs on the livecd to see if that will help ensure it:
sudo add-apt-repository universe
sudo apt update
edit /etc/sysfs.ctl adding block/xxx/
sudo apt install sysfsutils
sudo service sysfsutils restart
[1] http://
Adi Gangidi (adi-gangidi) wrote : | #30 |
Hey Bryan
modprobe megaraid_sas max_sectors=2048
Doesn¹t seem to help either.
Same error.
Thanks
Adi
On 8/24/15, 2:53 PM, "Bryan Quigley" <email address hidden> wrote:
>All that patch appears to do is to remove an artificial limit for all
>devices to 1024. I haven't been able to reproduce the option reset on
>my systems, so here is what I would try:
>
>Try setting the controller option and see if that sticks [1]:
>rmmod megaraid_sas
>modprobe megaraid_sas max_sectors=2048
>
>It may also be possible to do this while booting as a kernel command
>option like: megaraid_
>(2048 does actually mean 1024 for the other options).
>
>Otherwise, I'd recommend switching to a Live CD and see if the option
>works there.
>
>If not, a last option would be to try using sysfs on the livecd to see if
>that will help ensure it:
>sudo add-apt-repository universe
>sudo apt update
>edit /etc/sysfs.ctl adding block/xxx/
>sudo apt install sysfsutils
>sudo service sysfsutils restart
>
>[1] http://
>for.html
>
>--
>You received this bug notification because you are subscribed to the bug
>report.
>https:/
>
>Title:
> Ubuntu 15.04 Install Error with Avago Controller
>
>Status in linux package in Ubuntu:
> Triaged
>
>Bug description:
> Hello Canonical Team
>
> We are running to an issue installing ubuntu PPC64LE 15.04 full blown
> image on our servers (with RAID controller). Installation hangs
> around 70% of progress.
>
> An important part of our configuration is Avago RAID controller:
> 9361-8i Firmware version we used is 4.300.00-4429, and the package is
> 24.10.0-0002.
>
> IMPORTANT: It has to be noted that with same hardware configuration
> 14.10 full blown image installs fine.
>
> Dmesg logs are attached:
>
> Looking at dmesg logs point out that:
> LSI (avago) driver loads fine
> During further interaction with raid volume during the installation
>process, firmware errors are seen
> [ 196.991417] megasas: FW status 0x3
>
> [ 196.999376] megasas: FW status 0x3
>
> [ 197.007376] megasas: FW status 0x3
> Further down the process I/O errors are thrown
> [ 217.438664] Buffer I/O error on device sda2, logical block 22052864
>
> [ 217.438671] Buffer I/O error on device sda2, logical block 22052865
>
> [ 217.438678] Buffer I/O error on device sda2, logical block 22052866
>
> [ 217.438686] Buffer I/O error on device sda2, logical block 22052867
>
> Full dmesg log is attached. Snippet is pasted below highlighting:
>
> It can be noted that there¹s a Mellanox Connectx 3 pro card with some
>test Firmware on our setup.
> We can Ignore any diagnostic messages from that card in dmesg logs for
>the purpose of this bug.
>
> Thanks
> Adi Gangidi
>
>To manage notifications about this bug go to:
>https:/
>s
Adi Gangidi (adi-gangidi) wrote : | #31 |
Hey Bryan
Editing the boot options to include this string, seems to help install 15.04 onto disk behind avago controller without any errors.
megaraid_
Adi
Bryan Quigley (bryanquigley) wrote : | #32 |
Hi Adi,
Just to confirm my understanding of this bug, did doing so change the value of max_sectors_kb / max_hw_sectors_kb? I don't believe it would have survived the reboot after the install, but you could try adding it to the kernel command line of the installed system and see.
Bryan
Adi Gangidi (adi-gangidi) wrote : | #33 |
I haven't checked the values after successful install. I can check and let you know.
Are you saying that what I tried:
adding megaraid_
Bryan Quigley (bryanquigley) wrote : | #34 |
Yes it would only be for one boot, to use that as a workaround you want to add it permanently to GRUB_CMDLINE_
Bryan Quigley (bryanquigley) wrote : | #35 |
So assuming the the value max_sectors_kb is changed to 1024 (with the 2048 kernel command line) that means we need to figure out what the correct value is for the Avago controller. Currently it's set as 4096 in the kernel according to C#28. It appears 1024 works, but if we get the actual value that's better for performance.
What is the actual hardware limitation of the Avago controller?
If the limit is really 4096 that would imply that it's just broken on PPC64LE which could be another driver or a firmware bug that was only revealed when the default limit was changed from 1024.
kashyap (kashyap-desai) wrote : | #36 |
Avago controller support 256K max IO. max_sector_hw_kb should be 256. Anything like 1024 or 4096 is invalid. Once FW receive more than it can support, IO will be failed and print like " megasas: FW status 0x3" will dump on console from Driver. I will check how Driver behaves using 14.10 Kernel on non-PPC environment.
Bryan Quigley (bryanquigley) wrote : | #37 |
@kashyap-desai thanks for checking. AFAICT it's been at 1024 for at least the 14.04 release as well.
Thanks for noting what FW status 0x3 means, that will help any future tests we do.
kashyap (kashyap-desai) wrote : | #38 |
@ Brayn - Sorry for my earlier comment about 256K IO size. I thought customer is using released FW packages.
I just realize that FW version 24.10.0-0002 is Internal development release and not a product GCA. Our current development series has 1024K support. Looks like there may be some bug in FW which is not working well under 1024K support, but certainly it is not 4096K. Anything higher than 1024K will break here and "FW status 0x3" print will be a trigger for that.
Bryan Quigley (bryanquigley) wrote : | #39 |
@kashyap-desai thanks for that, it could be a simple firmware reporting issue or the kernel code needs to be updated to handle a new case.
megaraid_sas_base.c [1] mentions that it generates the max_sectors limits from data provided from the firmware.
4096 is defined in a few other places within the driver [2] so I'm guessing other megaraids do support 4096?
Testing on non-powerpc would still be useful as would reverting to a released firmware to see if max_sectors changes or the error changes.
[1] /*
* Compute the max allowed sectors per IO: The controller info has two
* limits on max sectors. Driver should use the minimum of these two.
*
* 1 << stripe_sz_ops.min = max sectors per strip
*
* Note that older firmwares ( < FW ver 30) didn't report information
* to calculate max_sectors_1. So the number ended up as zero always.
*/
https:/
[2] https:/
kashyap (kashyap-desai) wrote : | #40 |
@ Bryan - I see what you say... It is bit confusing...you are confuse with max_sectors module parameter. That parameter is only for bit old controllers. See below PnP ID checks in driver code.
/*
* Check if the module parameter value for max_sectors can be used
*/
if (max_sectors && max_sectors < instance-
instance-
else {
if (max_sectors) {
if (((instance-
PCI_
(instance-
PCI_
(max_sectors <= MEGASAS_
instance-
} else {
dev_
"and <= %d (or < 1MB for GEN2 controller)\n",
instance-
}
}
}
I will be checking how non-PPC environment behaves using same FW component.
(as per comment #13) 14.10 should work but 15.04 will fail. I will update my findings as well.
kashyap (kashyap-desai) wrote : | #41 |
@ Bryan -
1. This is an issue with Avago Driver. Driver code snippet -
instance-
if (tmp_sectors && (instance-
Current Driver count max_sectors based on PAGE_SIZE. This is wrong. We will fix and send patch to upstream.
On OPAL setup, PAGE_SIZE is 64K, so we will send max_sectors to OS = 8192. This is higher than FW expect. FW is just 1M capable.
Now kernel has some check at block layer to pick min of two values. One Driver provided and other BLK defaults.
Ubuntu 14.x kernel might be bringing down value of max_hw_sectors_kb to 1024, but 15.4 kernel keep max_hw_sectors_kb to 4096.
Driver fix is required and root cause is PAGE_SIZE is more than 4K. Driver will break the moment PAGE_SIZE is more than 4K w.r.t max_sectors value.
Using max_sectors module parameter of driver, you can reduce the value but can not increase that value. It can solve this problem. So if you/customer is looking for quick resolution use max_sectors module parameter of megasraid_sas driver.
Add below entry in grub
megaraid_
Thanks, Kashyap
Bryan Quigley (bryanquigley) wrote : | #42 |
@kashyap
>Current Driver count max_sectors based on PAGE_SIZE. This is wrong. We will fix and send patch to upstream.
Thanks for the analysis and working upstream. Post here when it lands and I can see about getting it back to 15.04/15.10.
Just for clarify workaround of anyone reading:
megaraid_
megaraid_
Aaron Sullivan (aaron-sullivan) wrote : | #43 |
@bryanquigley @kashyap
Has the update landed yet?
kashyap (kashyap-desai) wrote : | #44 |
We are in process of submit patches to upstream. We are waiting for few older patch series to get committed, otherwise there will be a confusion over series of patches and we may not see clean commit. Once we post patch for this issue, I will update the BZ with commit id or link pointing to upstream submit.
Adi Gangidi (adi-gangidi) wrote : | #45 |
Hi Bryan
Now that patch has been submitted upstream (by Kashyap), what is the rough timeline for this commit to merge ?
Thanks
Adi
Bryan Quigley (bryanquigley) wrote : | #46 |
Hi Adi,
I see the patch got posted here - https:/
I'm guessing it will be part of a scsi pull request in the recently opened 4.4 merge window. That looks imminent. Depending on when it lands, the Ubuntu kernel update cycle takes about a month.
Kind regards,
Bryan
Bryan Quigley (bryanquigley) wrote : | #47 |
It's landed in linus' tree as commit 357ae967ad66e35
Bryan Quigley (bryanquigley) wrote : | #48 |
@kashyap-desai
I was just looking at cherrypicking this fix to 15.10, but then realized I don't see why this wouldn't affect 12.04 or 14.04 as well. Does this affect all supported Ubuntu releases?
Bryan Quigley (bryanquigley) wrote : | #49 |
Oops.. .Nvm I forgot that this won't be an issue until after commit 34b48db, regardless of the same code being in the kernel.
Changed in linux (Ubuntu Vivid): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Wily): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Triaged → Invalid |
Brad Figg (brad-figg) wrote : | #50 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-vivid |
tags: | added: verification-needed-wily |
Brad Figg (brad-figg) wrote : | #51 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
Luis Henriques (henrix) wrote : | #52 |
Since the fix for this bug is also being released in stable kernels (Trusty got the fix from 3.13 and vivid would get it from 3.19), I am tagging this bug as verified.
tags: |
added: verification-done-vivid verification-done-wily removed: verification-needed-vivid verification-needed-wily |
Launchpad Janitor (janitor) wrote : | #53 |
This bug was fixed in the package linux - 4.2.0-21.25
---------------
linux (4.2.0-21.25) wily; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1522108
[ Upstream Kernel Changes ]
* staging/dgnc: fix info leak in ioctl
- LP: #1509565
- CVE-2015-7885
* [media] media/vivid-osd: fix info leak in ioctl
- LP: #1509564
- CVE-2015-7884
* KEYS: Fix race between key destruction and finding a keyring by name
- LP: #1508856
- CVE-2015-7872
* KEYS: Fix crash when attempt to garbage collect an uninstantiated
keyring
- LP: #1508856
- CVE-2015-7872
* KEYS: Don't permit request_key() to construct a new keyring
- LP: #1508856
- CVE-2015-7872
* isdn_ppp: Add checks for allocation failure in isdn_ppp_open()
- LP: #1508329
- CVE-2015-7799
* ppp, slip: Validate VJ compression slot parameters completely
- LP: #1508329
- CVE-2015-7799
linux (4.2.0-20.24) wily; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1521753
[ Andy Whitcroft ]
* [Tests] gcc-multilib does not exist on ppc64el
- LP: #1515541
[ Joseph Salisbury ]
* SAUCE: scsi_sysfs: protect against double execution of
__scsi_
- LP: #1509029
[ Manoj Kumar ]
* SAUCE: (noup) cxlflash: Fix to escalate LINK_RESET also on port 1
- LP: #1513583
[ Matthew R. Ochs ]
* SAUCE: (noup) cxlflash: Fix to avoid virtual LUN failover failure
- LP: #1513583
[ Oren Givon ]
* SAUCE: (noup) iwlwifi: Add new PCI IDs for the 8260 series
- LP: #1517375
[ Seth Forshee ]
* [Config] CONFIG_
- LP: #1510405
[ Upstream Kernel Changes ]
* net/mlx5e: Disable VLAN filter in promiscuous mode
- LP: #1514861
* drivers: net: xgene: fix RGMII 10/100Mb mode
- LP: #1433290
* HID: rmi: Disable scanning if the device is not a wake source
- LP: #1515503
* HID: rmi: Set F01 interrupt enable register when not set
- LP: #1515503
* net/mlx5e: Ethtool link speed setting fixes
- LP: #1517919
* scsi_scan: don't dump trace when scsi_prep_
- LP: #1517942
* x86/ioapic: Disable interrupts when re-routing legacy IRQs
- LP: #1508593
* xhci: Workaround to get Intel xHCI reset working more reliably
* megaraid_sas: Do not use PAGE_SIZE for max_sectors
- LP: #1475166
* net: usb: cdc_ether: add Dell DW5580 as a mobile broadband adapter
- LP: #1513847
* KVM: svm: unconditionally intercept #DB
- LP: #1520184
- CVE-2015-8104
-- Luis Henriques <email address hidden> Wed, 02 Dec 2015 17:30:58 +0000
Changed in linux (Ubuntu Wily): | |
status: | Fix Committed → Fix Released |
Launchpad Janitor (janitor) wrote : | #54 |
This bug was fixed in the package linux - 3.19.0-41.46
---------------
linux (3.19.0-41.46) vivid; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1522918
[ Upstream Kernel Changes ]
* Revert "dm: fix AB-BA deadlock in __dm_destroy()"
- LP: #1522766
* dm: fix AB-BA deadlock in __dm_destroy()
- LP: #1522766
linux (3.19.0-40.45) vivid; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1522786
[ Andy Whitcroft ]
* [Packaging] control -- prepare for new kernel-wedge semantics
- LP: #1516686
* [Debian] rebuild should only trigger for non-linux packages
- LP: #1498862, #1516686
* [Tests] gcc-multilib does not exist on ppc64el
- LP: #1515541
[ Joseph Salisbury ]
* SAUCE: scsi_sysfs: protect against double execution of
__scsi_
- LP: #1509029
[ Luis Henriques ]
* [Config] updateconfigs after 3.19.8-ckt10 stable update
[ Upstream Kernel Changes ]
* Revert "ARM64: unwind: Fix PC calculation"
- LP: #1520309
* Revert "md: allow a partially recovered device to be hot-added to an
array."
- LP: #1520309
* tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c
- LP: #1512815
* HID: rmi: Print the firmware id of the touchpad
- LP: #1515503
* HID: rmi: Add functions for writing to registers
- LP: #1515503
* HID: rmi: Disable scanning if the device is not a wake source
- LP: #1515503
* HID: rmi: Set F01 interrupt enable register when not set
- LP: #1515503
* be2net: log link status
- LP: #1513980
* xhci: Workaround to get Intel xHCI reset working more reliably
* Drivers: hv: hv_balloon: refuse to balloon below the floor
- LP: #1294283
* Drivers: hv: hv_balloon: survive ballooning request with num_pages=0
- LP: #1294283
* Drivers: hv: hv_balloon: correctly handle val.freeram<
- LP: #1294283
* Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case
- LP: #1294283
* Drivers: hv: balloon: check if ha_region_mutex was acquired in
MEM_
- LP: #1294283
* mm: meminit: make __early_pfn_to_nid SMP-safe and introduce
meminit_
- LP: #1294283
* mm: meminit: inline some helper functions
- LP: #1294283
* mm, meminit: allow early_pfn_to_nid to be used during runtime
- LP: #1294283
* mm: initialize hotplugged pages as reserved
- LP: #1294283
* gut proc_register() a bit
- LP: #1519106
* arm: factor out mmap ASLR into mmap_rnd
- LP: #1518483
* x86: standardize mmap_rnd() usage
- LP: #1518483
* arm64: standardize mmap_rnd() usage
- LP: #1518483
* mips: extract logic for mmap_rnd()
- LP: #1518483
* powerpc: standardize mmap_rnd() usage
- LP: #1518483
* s390: standardize mmap_rnd() usage
- LP: #1518483
* mm: expose arch_mmap_rnd when available
- LP: #1518483
* s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
- LP: #1518483
* mm: split ET_DYN ASLR from mmap ASLR
- LP: #1518483
* mm: fold arch_randomize_brk into ARCH_HAS_
- LP: #1518483
* isdn_ppp: Add checks for allocation failure in isdn_ppp_open()
...
Changed in linux (Ubuntu Vivid): | |
status: | Fix Committed → Fix Released |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1475166
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.