Comment 92 for bug 1921664

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

On the 'further justification for widespread reversion'
(data is provided are in the next comments/attachment).

...

The assertion failures in qemu due to LTO optimization
are more likely observed with arm64 and armhf binaries,
at least on the arm64, ppc64el and s390x architectures
(archs available as openstack vms on canonistack bos01).

This apparently indicates: the arm instruction code in
qemu is more _likely_ to be affected or trigger issues
(note: possibly not the _only_ one affected/triggering).

Perhaps it's due to arch-specific assembly/bugs in LTO
_or_ characteristics like register allocation, kernel
interaction, coroutine/userspace thread switching and/
or ABI specifics, that hit the arm-based binaries more.

However, considering that maybe other arch _binaries_
aren't just hitting this _often_ enough (or during boot,
which is the duration of our test), and might hit this
some time later.. (or worse, w/out an observable impact;
i.e., silent corruption)

...

We have the other example of riscv64 (misc) binary on
arm64 architecture that hits the issues; it is indeed
a different code, ABI, etc.

Other binaries didn't hit issues on arm64 arch, e.g.,
amd64, ppc64el, and s390x (not LTO issues, at least).

Interestingly, armhf binaries failed on arm64 but not
on armhf (actually running on arm64-capable processor,
but in armhf compat mode) so there's apparently timing,
compiler, and/or environment factors into play as well.

...

So, although the results point to arm-based binaries
across different architectures, issues with riscv64
on arm64 pose questions as raised above (likelyhood).

I guess it's a matter of deciding whether we'd like
to selectively disable LTO on archs with _reported_
bugs (binaries: armhf, arm64, misc; on arm64, s390x,
ppc64el), or generally disable LTO with the rationale
this might be a toolchain issue with unknown impact.