Important build flag left out of IGC build

Bug #2085370 reported by Shane McKee
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intel-graphics-compiler (Ubuntu)
Fix Released
Medium
Shane McKee
Noble
Fix Released
Medium
Shane McKee
Oracular
Invalid
Medium
Shane McKee

Bug Description

Our partners at Intel have informed us that we are missing a very important build flag for the IGC package:
-DIGC_OPTION__USE_KHRONOS_SPIRV_TRANSLATOR_IN_SC=1

Without this build flag enabled, we should expect to see instabilities such as pytorch crashes.

---------------------- SRU Template ----------------------

[ Impact ]

 * Users will notice that some software that depends on IGC will be less stable

[ Test Plan ]

 * Intel has done testing on their end and assures us that this is a necessary build flag for their software stack to work as intended.
* As Pavel has mentioned below, we can check that this is working as intended by running "$ ocloc -device mtl -spirv_input -file ./a.spv"

[ Where problems could occur ]

From our contact at Intel:
"Without the build flag, IGC will not link to translator at all [...] it uses its own internal translator, which is wrong, So this option enables it to link to the distro's translator"

So while this is a fix, it does carry the inherent risk that we are switching to a different translator, which will change the behaviors of any software that uses the translator.

Shane McKee (mckeesh)
tags: added: pe-sponsoring-request
description: updated
Revision history for this message
Shane McKee (mckeesh) wrote :

[ Impact ]

 * Users will notice that some software that depends on IGC will be less stable

[ Test Plan ]

 * Intel has done testing on their end and assures us that this is a necessary build flag for their software stack to work as intended.

[ Where problems could occur ]

From our contact at Intel:
"Without the build flag, IGC will not link to translator at all [...] it uses its own internal translator, which is wrong, So this option enables it to link to the distro's translator"

So while this is a fix, it does carry the inherent risk that we are switching to a different translator, which will change the behaviors of any software that uses the translator.

Revision history for this message
Pavel Androniychuk (androniychuk) wrote :

if this build flag is not in oracular, it should be as well

Revision history for this message
Pavel Androniychuk (androniychuk) wrote :

ways to test this is working or not...
example command on an MTL, meteor lake platform
for which this issue happens most often

"ocloc -device mtl -spirv_input -file ./a.spv"

PE Bot (pe-bot)
description: updated
PE Bot (pe-bot)
description: updated
summary: - Important build flag left out of Noble IGC build
+ Important build flag left out of IGC build
Revision history for this message
Shane McKee (mckeesh) wrote :

This change has landed in plucky-proposed, and I am in the process of getting it uploaded for Oracular and Noble

Revision history for this message
Loïc Minier (lool) wrote :

Hi Shane, what's the upstream status of this change, is this now the default in upstream builds?

Changing from "internal translator" to "distro translator" seems like a possibly major change, what are potential risks associated with this change?

Is there a test that would demonstrate a typical issue with the internal translator that we won't see anymore after making this change?

Changed in intel-graphics-compiler (Ubuntu):
status: New → Fix Committed
importance: Undecided → Medium
Changed in intel-graphics-compiler (Ubuntu Noble):
importance: Undecided → Medium
Changed in intel-graphics-compiler (Ubuntu Oracular):
importance: Undecided → Medium
Changed in intel-graphics-compiler (Ubuntu):
assignee: nobody → Shane McKee (mckeesh)
Changed in intel-graphics-compiler (Ubuntu Noble):
assignee: nobody → Shane McKee (mckeesh)
Changed in intel-graphics-compiler (Ubuntu Oracular):
assignee: nobody → Shane McKee (mckeesh)
Changed in intel-graphics-compiler (Ubuntu Noble):
status: New → In Progress
Changed in intel-graphics-compiler (Ubuntu Oracular):
status: New → In Progress
Revision history for this message
Shane McKee (mckeesh) wrote :

Hey Loïc, the change is currently in Sid (one change back in the changelog).

Agreed that this is potentially a major change, but it's also how the software is intended to work. Without it, Intel is noting random crashes in popular applications like Pytorch, a problem which has been worse on Meteor Lake. If we're looking for potential risks, we already see that using a different translator than expected can cause random crashes. Perhaps there are some users who have stable code with the current translator who will find a bug when we switch it. However, if that is the case, we would have a bug with Intel's supported configuration rather than our current situation where we would have to address bugs knowing that our configuration is unsupported by the developers themselves.

As for tests, we can use this command on MTL to demonstrate an issue that is solved by this change:
"ocloc -device mtl -spirv_input -file ./a.spv"

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package intel-graphics-compiler - 1.0.17537.20-1ubuntu1

---------------
intel-graphics-compiler (1.0.17537.20-1ubuntu1) plucky; urgency=medium

  * Enable IGC_OPTION__USE_KHRONOS_SPIRV_TRANSLATOR_IN_SC in build (LP:
    #2085370)

 -- Shane McKee <email address hidden> Fri, 01 Nov 2024 13:04:46 +0400

Changed in intel-graphics-compiler (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Bun K Tan (bktan1) wrote :

Any update on this?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

today I found out about IGC_OPTION__LINK_KHRONOS_SPIRV_TRANSLATOR, which is enabled on Fedora (but not USE_KHRONOS_SPIRV_TRANSLATOR_IN_SC)

should that be enabled as well?

Revision history for this message
Pavel Androniychuk (androniychuk) wrote :

the build flag became default "-DIGC_OPTION__USE_KHRONOS_SPIRV_TRANSLATOR_IN_SC=1"

the version that it became default with is 1.0.15770
date: 2023-11-29
commit
https://github.com/intel/intel-graphics-compiler/commit/24342480e0c446b71217e8fe6f0064c99e1ab2bb

anything after that builds with that flag on by default

Revision history for this message
Shane McKee (mckeesh) wrote :
  • a.spv Edit (12.2 KiB, application/octet-stream)

Attaching file which Intel used to reproduce on MTL

Revision history for this message
Shane McKee (mckeesh) wrote :
Revision history for this message
Shane McKee (mckeesh) wrote :

Debdiff for Noble

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

no need to change oracular then, as the flag is there already

Changed in intel-graphics-compiler (Ubuntu Oracular):
status: In Progress → Invalid
Revision history for this message
Pavel Androniychuk (androniychuk) wrote :

to answer Timo's question:

IGC_OPTION__LINK_KHRONOS_SPIRV_TRANSLATOR is ON by default as well when the other flag is used
"-DIGC_OPTION__USE_KHRONOS_SPIRV_TRANSLATOR_IN_SC=1"

Since both are on by default no action needs to be taken with oracular and plucky

but please please update igc in noble with this build flag
"-DIGC_OPTION__USE_KHRONOS_SPIRV_TRANSLATOR_IN_SC=1"

Revision history for this message
Shane McKee (mckeesh) wrote :

OK, deleted the debdiff for Oracular to avoid any confusion when a sponsor picks this up

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Shane,
I had a look at the modification for noble aka the noble debdiff.

I couldn't debdiff-apply it the patch, got message "Hunk is shorter than expected".
Fortunately the changes are pretty small, so I created one myself and figured out that (somehow) just a trailing blank line was missing ;-)

Just the version that you used "1.0.15468.25-2ubuntu1" is not correct.
Since we are coming from a Debian version "1.0.15468.25-2build1"
and then moving to an Ubuntu specific version, the ubuntu suffix need to start with "ubuntu0.1",
so "1.0.15468.25-2ubuntu0.1" for an SRU.
(according to https://wiki.ubuntu.com/SecurityTeam/UpdatePreparation#Update_the_packaging or
https://github.com/canonical/ubuntu-maintainers-handbook/blob/main/VersionStrings.md - even if lintian
might moan with "changelog-file-missing-explicit-entry 1.0.15468.25-2build1 -> 1.0.15468.25-2ubuntu0 (missing) -> 1.0.15468.25-2ubuntu0.1").
I quickly adjusted the version.

And surprisingly the "update-maintainer" script does not need to be executed,
since d/control already has:
"
Maintainer: Ubuntu Developers <email address hidden>
XSBC-Original-Maintainer: Debian OpenCL team <email address hidden>
"
(but as Debian version I would have expected the XSBC-Original-Maintainer as Maintainer - anyway ...)

I also did a successful PPA test build with -proposed enabled (I know it's a simple change, but one never knows...)
https://launchpad.net/~fheimes/+archive/ubuntu/lp2085370

With that I completed the review and I'm sponsoring and uploading the updated package to the 'noble' queue.
Check the noble queue: https://launchpad.net/ubuntu/noble/+queue?queue_state=1&queue_text=intel-graphics-compiler

Revision history for this message
Frank Heimes (fheimes) wrote :

With that I unsubscribed "Ubuntu Sponsors".

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Shane, or anyone else affected,

Accepted intel-graphics-compiler into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/intel-graphics-compiler/1.0.15468.25-2ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in intel-graphics-compiler (Ubuntu Noble):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-noble
Revision history for this message
Pavel Androniychuk (androniychuk) wrote :

Tested the ubuntu-proposed repo packages on an MTL system, with stock kernel and no other repos configured.

Installed compute-stack packages and reproduced error

$ ocloc -device mtl -spirv_input -file ./a.spv
Compilation from IR - skipping loading of FCL
[0]: /lib/x86_64-linux-gnu/libocloc.so(_ZN16SafetyGuardLinux9sigActionEiP9siginfo_tPv+0x39) [0x7b6a966eddd9]
[1]: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x7b6a96245320]
[2]: /lib/x86_64-linux-gnu/libigc.so.1(+0x67e021) [0x7b6a9367e021]
[3]: /lib/x86_64-linux-gnu/libigc.so.1(+0x67ea00) [0x7b6a9367ea00]
[4]: /lib/x86_64-linux-gnu/libigc.so.1(+0x67ee6a) [0x7b6a9367ee6a]
[5]: /lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x980) [0x7b6a8d1ef680]
[6]: /lib/x86_64-linux-gnu/libigc.so.1(+0x3469b2) [0x7b6a933469b2]
[7]: /lib/x86_64-linux-gnu/libigc.so.1(+0x347146) [0x7b6a93347146]
[8]: /lib/x86_64-linux-gnu/libigc.so.1(+0x30d84b) [0x7b6a9330d84b]
[9]: /lib/x86_64-linux-gnu/libigc.so.1(+0x445465) [0x7b6a93445465]
[10]: /lib/x86_64-linux-gnu/libigc.so.1(+0x30f8f3) [0x7b6a9330f8f3]
[11]: /lib/x86_64-linux-gnu/libigc.so.1(+0x419ecc) [0x7b6a93419ecc]
[12]: /lib/x86_64-linux-gnu/libigc.so.1(+0x41b45c) [0x7b6a9341b45c]
[13]: /lib/x86_64-linux-gnu/libocloc.so(_ZN3NEO15OfflineCompiler15buildSourceCodeEv+0x750) [0x7b6a966ac940]
[14]: /lib/x86_64-linux-gnu/libocloc.so(_ZN3NEO15OfflineCompiler5buildEv+0x45) [0x7b6a966b05c5]
[15]: /lib/x86_64-linux-gnu/libocloc.so(_ZN16SafetyGuardLinux4callIiN3NEO15OfflineCompilerEMS2_FivEEET_PT0_T1_S5_+0x55) [0x7b6a966edef5]
[16]: /lib/x86_64-linux-gnu/libocloc.so(_Z20buildWithSafetyGuardPN3NEO15OfflineCompilerE+0xd0) [0x7b6a966edbf0]
[17]: /lib/x86_64-linux-gnu/libocloc.so(_ZN5Ocloc8Commands7compileEP14OclocArgHelperRKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS9_EE+0x128) [0x7b6a966a2958]
[18]: /lib/x86_64-linux-gnu/libocloc.so(oclocInvoke+0x3ac) [0x7b6a9668d7bc]
[19]: ocloc(main+0x27) [0x60b834a8c787]
[20]: /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x7b6a9622a1ca]
[21]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x7b6a9622a28b]
[22]: ocloc(_start+0x25) [0x60b834a8c7b5]
Segmentation fault (core dumped

upgraded the igc pkgs from ubuntu-propsed and re-ran test command

$ ocloc -device mtl -spirv_input -file ./a.spv
Compilation from IR - skipping loading of FCL
Build succeeded

Issue is fixed!

Shane McKee (mckeesh)
tags: added: verification-done-noble
removed: verification-needed verification-needed-noble
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package intel-graphics-compiler - 1.0.15468.25-2ubuntu0.1

---------------
intel-graphics-compiler (1.0.15468.25-2ubuntu0.1) noble; urgency=medium

  * Enable IGC_OPTION__USE_KHRONOS_SPIRV_TRANSLATOR_IN_SC in build
    (LP: #2085370)

 -- Shane McKee <email address hidden> Wed, 11 Dec 2024 11:27:40 +0800

Changed in intel-graphics-compiler (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Update Released

The verification of the Stable Release Update for intel-graphics-compiler has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.