Comment 9 for bug 1911999

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
I reordered this slightly to split into topics properly.

> The problem is that I don't reach "3. finally once I hit my 60 second limit on the paths
> (dev_loss_tmo) they are considered dead" . The paths are never removed and stay "running"
> forever. Maybe it is related to the fact that there is only one HBA active.

> My expectation is the paths to be removed after dev_loss_tmo. However if it is acting on the
> whole rport - we still have healthy paths on the both rports, so it might be actually working
> as expected.

^^ yes I think this part indeed work as expected, but I'm open to be convinced otherwise

> Also when the both paths are failing and the map is flushed(as we have only one FC HBA we have
> only 2 paths to both controllers of the storage), it is expected the devices underneath to be
> removed as well. They are not removed and the map comes back on multipath reload maps.

You saw my example on "failing paths" above which indeed seemed to remove the devices for me.
In Journal/dmesg I had:
  "rport-1:0-0: blocked FC remote port time out: removing target and saving binding"
If the port/path is down (not the LUN) then I'd expect the kernel to trigger that after dev_loss_tmo.

> udev is caching the old disk info, so all the ids should be the same. When you run sg_inq, it
> is checking the real values and returns the real ID. Multipath however is still using the
> cached wrong ids from udev.

> Also the udev is to reluctant to rescan the readded devices because they are never removed.
> When a path is reinstated it should be rescanned to validate that it is in fact the same disk.

^^ I agree to this, if indeed the UUID changed but is cached/not-rescanned that seems like an issue to me.

> Please tell me if you need more information.

I don't think "I/we" need any more here - it seems to be something that the multipath-tools don't do yet (or it could, but we both fail to see the right mix of config options to do so).

The next step to me seems to be to engage with upstream which is usually done best by the affected person. As mentioned before that would be at [1][2].
A link back to the discussion/issue would be awesome so that we can track the outcome and integrate it into Ubuntu.

[1]: https://www.redhat.com/mailman/listinfo/dm-devel
[2]: https://github.com/opensvc/multipath-tools/issues