poor performance after upgrade to Precise

Bug #1057054 reported by Chris Weiss
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I had a Lucid x64 server working with a Dell MD3000i with 4 paths and worked as expected. I added the "prio rdac" line to the conf file, then upgraded to Precise, and removed the old mpath_rdac line and reboot one more time, just to be sure. I did this based on a section in https://help.ubuntu.com/12.04/serverguide/serverguide.pdf

as a "sanity check" test, I'm doing 'pv < /dev/mapper/dellsas1 > /dev/null' (friendly names enabled). On Lucid i'd get about 100MB/s After upgrading to Precise I get an almost solid 768kB/s. If I instead use the 4 underlying /dev/sd* devices, 2 give errors as expected, and 2 run at about 100MB/s as expected so iscsi seems to be working correctly and multipath not.

Revision history for this message
Chris Weiss (cweiss) wrote :
Revision history for this message
Chris Weiss (cweiss) wrote :

I have also tried a clean install of precise and I see the same results.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Have you verified that the rdac driver is loaded? Also please account for
the contents of /etc/initramfs-tools/modules, scsi_dh_rdac must be
loaded at boot time to be discovered correctly.

Changed in multipath-tools (Ubuntu):
status: New → Incomplete
Revision history for this message
Chris Weiss (cweiss) wrote :

via lsmod? scsi_dh_rdac is loaded. Does it need to be in the initramfs even if I'm not booting off it?

doing so does "fix" the performance, but it is also counter-intuitive. Shouldn't the multipath driver read the conf file and load the needed modules before "finding" anything?

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

That's never how it works. multipath has no kernel module loading ability. Would
make a nice feature though. I admit that discovering "which dh
is the necessary one" is a bit arcane and not well documented anywhere.

The best practice here is to load the necessary device handler into your initrd so
it's attached at the same time the SD devices are initially discovered. You still
need to have this module loaded to provision additional luns at runtime and
not have their performance plummet.

Also, I discovered a typo in the multipath documentation.

 https://bugs.launchpad.net/serverguide/+bug/1057071

Had that worked to begin with, you would have never encountered this
issue in the first place.

Closing this issue as "Invalid" as it's not a bug. Thanks for the report.

Changed in multipath-tools (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Chris Weiss (cweiss) wrote :

My devices are on iscsi, and I'm not booting off them. They are not connected until after the root goes live and network comes active, via normal means.

I never manually loaded scsi_dh_rdac before, it's not in my initrd nor my modules files. It loaded automatically somehow. Why isn't it loading automatically earlier? What loaded it if multipath driver didn't? other drivers load child drivers, why can't multipath?

I never found that html server guide in my searching and it is helpful. The PDF I only found after failing a couple times. Other software presents prompts and links when config formats change, or even changes them for you, it's become expected, multipath should do it too. The local manpage for multipath.conf still has prio_callout and not prio documented, and has zero mention of having to manually load drivers.

Whether this a bug in how the drivers get loaded, or how the documentation is presented, or how upgrade transition is handled isn't that relevant, it's still a bug. It's Lucid to Precise upgrade regression bug. it's not smooth and it's not even easy to find why it's failing.

If the upgrade had presented me with a screen saying the config and driver loading formats have changed, and that html link, I'd have figured it out and we wouldn't be here.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

There is no feature, that has ever existed, that has the capacity to *at runtime*
examine all attached disks, and cross reference their SCSI INQUIRY data to
a table of available device handlers. That table does not exist, if it did, it
would be miserable to maintain.

I checked the udev rules and initramfs scripts from lucid -> precise. We
never loaded dh modules automatically.

The multipath C code has no facility to modprobe or insmod anything.

So the only logical conclusion left is that the module was loaded without
your knowledge, which means your configuration as it was would never survive a reboot.

If that's not true, and you can reproduce that, I would be interested to see it. However,
even if I had the answer, that doesn't completely make up for a complete lack of
vendor participation in qualifying your SAN with our operating [1]. We cannot be
expected to regression test every SAN in creation and rely on users like you (or vendors)
to test and stay engaged. Please contact your vendor expressing support for official Ubuntu
support for your SAN

multipath-tools is supported by the Community, not Canonical, I volunteer to maintain it.
That multipath section in the server guide? I wrote it with the next precise LTS as the deadline,
months of effort. multipath as a whole is light years better than it was in lucid, or ever for
that matter (many helped).

I'm not disagreeing with you that things are missing and there's certainly room for improvement.
You've pointed out several issues, like the dialog box,man page etc, that's all good stuff, please
file a separate bug for each so we can track them.

It's simply a matter of triage and bandwidth, a good multipath bug can soak weeks of time, so
configuration polish like you mentioned falls to the way side. However, that sort of work
is low hanging fruit, and doesn't require kernel storage engineer with years of experience to
accomplish. Contributions are most certainly welcome.

FYI, there really isn't a hard spec for multipath.conf, it actually functions a lot like
YAML where keywords are globbed, the values integrated and override the defaults.
There's no one place in the code where you can go and discover "this is how config works",
it's scattered everywhere which makes creating regression tests prohibitive if not practically
impossible.

1. The implication is that multipath may have changed so dramatically from 0.4.8 to 0.4.9 that
the scsi_dh_rdac driver may not have been as necessary. There's no way we could have caught
that on code review, testing was required.

Revision history for this message
Chris Weiss (cweiss) wrote :

But it survive a reboot. several in fact. I always reboot after major config changes to reduce the chance of a 2am phone call after a power outage. I don't lightly file bugs, it's 3 days and 4 OS re-installs to get me here. This was working on Lucid, and required additional setup for Precise and I found documentation difficult to find. It's repeatable, I've been repeating it all week, both on upgrades and clean installs.

Did lucid also use scsi_dh_rdac? I had no issues there. Something changed. I wish i could help find out what. Is there a debug flag and/or log file I can send?

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Hmm, that's interesting, does that mean that after reboot scsi_dh_rdac is loaded?
Please verify.

Yes, it is available with the lucid kernel. Also note that if you were to reference
your vendor documentation, it would probably recommend that you load
rdac driver (been around for a long time actually).

NOTE: multipath is basically the same on every distro, so instructions for RH
provided by your vendor are just as good for Ubuntu etc. However kernels
change frequently which is why vendors and others need to be involved.

Let's assume that rdac is never loaded and life is good on lucid. That leaves
multipath itself and the Linux kernel, both of which have jumped dramatically
between lucid and precise. Since there's no real regression tests for HW SAN, and
the vendors aren't pitching in, it's easily possible that something weird
like this could occur.

It's my opinion that if you're using ALUA, it's up to you to determine whether
an additional device handler is necessary. What may have happened is multipath
in lucid was biasing the primary storage controller and forcing a trespass
unbenounced to you. This would have made ALUA irrelevant and ping ponged your
luns behind the RAID for a period of time, it also means you weren't using both your
storage processors like you intended. That would have been a bonafide bug in lucid.
It's likely that it was rectified in precise considering the outcome. Going back and
finding it is an academic exercise as no matter what I find, it'll probably break
lucid. RDAC must be loaded.

[ALUA architecture example]
http://virtualgeek.typepad.com/virtual_geek/2009/09/a-couple-important-alua-and-srm-notes.html

As for the kernel, it does what it's told, something with that horrendous an
impact would have likely impacted your non-san disks as well, if it affected only
one that would be double weird :) So it likely comes down to how the IO was queued
to begin with, and multipathd is responsible for that.

Between the two MP versions, your SAN actually got a codified config, meaning
you don't need to provide your own if you don't want to, it's built in. 0.4.8 didn't
have this.

[libmultipath/hwtable.c]
        {
                /* DELL MD3000 */
                .vendor = "DELL",
                .product = "MD3000",
                .getuid = DEFAULT_GETUID,
                .features = "2 pg_init_retries 50",
                .hwhandler = "1 rdac",
                .selector = DEFAULT_SELECTOR,
                .pgpolicy = GROUP_BY_PRIO,
                .pgfailback = -FAILBACK_IMMEDIATE,
                .rr_weight = RR_WEIGHT_NONE,
                .no_path_retry = 15,
                .minio = DEFAULT_MINIO,
                .checker_name = RDAC,
                .prio_name = PRIO_RDAC,
                .prio_args = NULL,
        },

Your config is overridding any member you defined, the rest are coming through, like
minio.

BTW you might wish to double check exactly what your SAN can do. Active/Active
isn't what it used to be and is really "dual active".

http://gestaltit.com/all/tech/storage/stephen/multipath-activepassive-dual-active-activeactive/

Revision history for this message
Chris Weiss (cweiss) wrote :

after the upgrade and initial reboot, I change the config file and verified multipath -ll was as I expected, then rebooted and did the pv test. I did nothing else between my first posting and the reply verifying lsmod, so yes, it was loaded on reboot.

I have both controllers in the md3000i, and each has 2 ports. a LUN can be on either controller, and active/active on that one controller, with the other controller as standby. I also have luns on it used by esxi, and the RR paths appear to be the same.

I did not check which driver got loaded on lucid, and the dell docs are really kinda all over the place so I was trying to ignore them. rdac did seem to be what was happening based on how multipath -ll showed things, and I did see roughly equal traffic on 2 nics. In my first attempt with Precise I did initially see the ping-ponging you mention, before I understood how to apply rdac configs. With Lucid I put the rdac config in right from the start and did not see that behavior.

is it possible the old mpath_prio_rdac callout was loading the driver early in the process? I was not aware of scsi_dh_rdac's existence until this thread, so I have no explanation of how it's getting loaded when you think it shouldn't be.

it's nice to see the config codified, I'll have to experiment with that some. thanks for that info.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Then we might have a distro bug here, which is weird as I've done hundreds of
SAN installs with lucid and have had to manage the scsi_dh modules everytime.

So on your lucid system, /etc/initramfs-tools/modules should be empty
except for the commented out examples.

The next thing to check is the initram disks themselves, assuming there's no
directives in the previous config file, the presence of the scsi_dh kos in
the initrd would indicate that they were be globbed in by another
initramfs helper.

Actually...

root@nashira:~# zcat /boot/initrd.img-3.2.0-26-generic | cpio -it | grep scsi_dh
lib/modules/3.2.0-26-generic/kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko
lib/modules/3.2.0-26-generic/kernel/drivers/scsi/device_handler/scsi_dh_alua.ko
lib/modules/3.2.0-26-generic/kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko
lib/modules/3.2.0-26-generic/kernel/drivers/scsi/device_handler/scsi_dh_emc.ko
81384 blocks
root@nashira:~# zcat /boot/initrd.img-2.6.32-41-generic | cpio -it | grep scsi_dh
lib/modules/2.6.32-41-generic/kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko
lib/modules/2.6.32-41-generic/kernel/drivers/scsi/device_handler/scsi_dh_alua.ko
lib/modules/2.6.32-41-generic/kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko
lib/modules/2.6.32-41-generic/kernel/drivers/scsi/device_handler/scsi_dh_emc.ko
64508 blocks

OK, I'm surprised :)

In my 2.6.32 initrd I find:
root@nashira:~/2.6.32# cat conf/modules
scsi_dh_alua

which I suppose prompts my device handler to be installed. Which is driven
by 'modules' and 'modules.d' in /usr/share/initramfs-tools. So the modules
apparently have always been there, thanks to this hook script.

/usr/share/initramfs-tools/hook-functions

        scsi)
                copy_modules_dir kernel/drivers/scsi
                for x in mptfc mptsas mptscsih mptspi zfcp; do
                        manual_add_modules "${x}"
                done
        ;;

Which is how the scsi_dh_* kos got on the initramfs, but that doesn't
explain how it was loaded. That directive had to come from somewhere,
so it's either already in your /usr/share/initramfs-tools/modules|modules.d
or it's in your /etc/initramfs-tools/modules.d|modules and you missed it.
*something* has to be prompting it's inclusion.

What's the output of, as root?
grep -Rl scsi_dh /etc /usr/share/initramfs-tools/

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

There's nothing in any of the priority checkers except scsi cmds. See for
yourself.

bzr+ssh://bazaar.launchpad.net/+branch/ubuntu/lucid/multipath-tools/

path_priority/pp_rdac/pp_rdac.c

Revision history for this message
Chris Weiss (cweiss) wrote :

nothing. from the 10.04 syslog, I do see this right after all the scsi attach events:

Sep 24 13:54:01 file3 multipathd: sdm: add path (uevent)
Sep 24 13:54:02 file3 kernel: [ 7.297268] sd 10:0:0:0: rdac: LUN 0 (unowned)
Sep 24 13:54:02 file3 kernel: [ 7.299115] sd 8:0:0:0: rdac: LUN 0 (owned)
Sep 24 13:54:02 file3 kernel: [ 7.300844] sd 9:0:0:0: rdac: LUN 0 (owned)
Sep 24 13:54:02 file3 kernel: [ 7.302519] sd 10:0:0:1: rdac: LUN 1 (owned)
Sep 24 13:54:02 file3 kernel: [ 7.304256] sd 10:0:0:2: rdac: LUN 2 (owned)
Sep 24 13:54:02 file3 kernel: [ 7.306406] sd 10:0:0:3: rdac: LUN 3 (unowned)
Sep 24 13:54:02 file3 kernel: [ 7.308048] sd 8:0:0:1: rdac: LUN 1 (unowned)
Sep 24 13:54:02 file3 kernel: [ 7.309676] sd 9:0:0:1: rdac: LUN 1 (unowned)
Sep 24 13:54:02 file3 kernel: [ 7.311221] sd 9:0:0:2: rdac: LUN 2 (unowned)
Sep 24 13:54:02 file3 kernel: [ 7.313092] sd 8:0:0:2: rdac: LUN 2 (unowned)
Sep 24 13:54:02 file3 kernel: [ 7.314769] sd 9:0:0:3: rdac: LUN 3 (owned)
Sep 24 13:54:02 file3 kernel: [ 7.316383] sd 8:0:0:3: rdac: LUN 3 (owned)
Sep 24 13:54:02 file3 kernel: [ 7.316388] rdac: device handler registered

and similar with the first 12.04 boot, except the "owned/unowned" lines are intermixed with the scsi attaches instead of all at the end, so that's different.

as far as active/active, I am seeing 140MB/s writes, so RR is working well enough. I think the MD3000 processor is the bottleneck. it's not renowned for being super fast. I did try adjusting the rr_weight and rr_min_io, but it doens't seems to have changed much. Defaults are working well. I'm not running Jumbo frames yet, so maybe that'll help some.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Then the only possible actors left are the actual initramdisk contents e.g.
zcat <initrd> | cpio -id and examine all the scripts (init, conf/modules) to determine
how it could be loaded. The absence of the module reference in all of /etc and
/usr/share/initramfs-tools/ tell me that whatever is loading that module is doing
so as a side effect or an administration artifact e.g. someone wrote a udev rule and forgot.

Another possible source and this is also way out in left field is if modules.dep
was compromised and scsi_sh_rdac was added as dependency of another module
and thus loaded indirectly.

root@nashira:/lib/modules/2.6.32-41-generic# grep scsi_dh modules*
modules.builtin:kernel/drivers/scsi/device_handler/scsi_dh.ko
modules.dep:kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko:
modules.dep:kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko:
modules.dep:kernel/drivers/scsi/device_handler/scsi_dh_emc.ko:
modules.dep:kernel/drivers/scsi/device_handler/scsi_dh_alua.ko:
Binary file modules.dep.bin matches
modules.order:kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko
modules.order:kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko
modules.order:kernel/drivers/scsi/device_handler/scsi_dh_emc.ko
modules.order:kernel/drivers/scsi/device_handler/scsi_dh_alua.ko
root@nashira:/lib/modules/2.6.32-41-generic# vim modules.dep
root@nashira:/lib/modules/2.6.32-41-generic# vim modules.builtin

Fine here.

concerning boot probe, scsi discovery is asymmetric, there's no expectation
of order. Performance tuning is where I get off, I also don't know much about
iSCSI transport, though yeah, jumbo frames it probably wise.

If you're seeing those messages before the file system is mounted then
the actors are definitely *in the ramdisk*, you just need to find them.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments