Comment 6 for bug 1946149

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

For the record, 4.15.0-1113-aws works in r5.metal w/ kexec.

Booted it 10 times successfully from both 5.4.0-1058-aws
and 4.15.0-1113-aws (itself.)

(not that it was expected to make a difference as the issue
happens on normal boot, which doesn't have previous kernel.)

Right after that, in the same instance, trying a normal boot
fails.

And it had kdump installed/enabled (ie, crashkernel in cmdline),
w/ which Ian mentioned that he couldn't reproduce the problem.

---

It also works on normal boot w/ r5d.metal (note 'd'), which
should be the same as r5.metal but w/ four local nvme disks.
(still boots from EBS/nvme disk in the same way as r5.metal)

---

Similarly, it works on r4.24xlarge (this is not metal) but
does boot from EBS/nvme disk too.

---

So it seems like there's no problem with the patchset as in
4.15.0-1113-aws as it boots fine in several types w/ approx
the same hardware config, just differing on normal/kexec in
the r5.metal type (problem report.)

- r5.metal: normal boot fails / kexec boot works
- r5d.metal: normal boot works.
- r5.24xlarge: normal boot works.

The kexec boot worked ~20 times, so it wouldn't seem like a
race condition is in place, as that should be enough runs,
considering it failed every time on normal boot.

Also, Ian mentioned that he couldn't reproduce w/ crashdump
installed. Well, I think the only difference it would cause
_before_ mounting the rootfs (assuming that's what doesn't
work/allow machine to boot, as we have no serial console)
is the crashkernel reservation?

---

So, all this is a bit confusing, but seem to indicate again
that there's no problem w/ the patchset per se, but perhaps
something in booting this particular kernel on a particular
instance type (r5.metal) which _might_ be related to normal/
kexec/crashkernel boot differences.

More tomorrow.