use of patchelf in classic leads to dri drivers segfaulting

Bug #1928209 reported by Sebastien Bacher
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapcraft (Ubuntu)
New
Undecided
Unassigned

Bug Description

Building a graphical classic snap including dri drivers result in binaries segfault on start because of the snapcraft use of patchelf.

The reason is explained on https://forum.snapcraft.io/t/caveats-for-no-patchelf-in-a-classic-snap

The issue is really non obvious to debug so perhaps snapcraft could help there? One way would be to avoid calling patchelf on known-to-not-work-with-it-binaries. If special casing is controversial then perhaps you could at least display a warning in the build log?

summary: - Snapcraft use of patchelf in classic leads to dri drivers segfaulting
+ use of patchelf in classic leads to dri drivers segfaulting
Revision history for this message
William Wilson (jawn-smith) wrote :

I came across this while working on the ubuntu-image snap, and have narrowed down the root cause a bit further. In the ubuntu-image snap, `du` was experiencing a segmentation fault because the RPATH was not set by snapcraft's patchelf.

In snapcraft/internal/elf.py (func _dtermine_libraries) the logic for setting up the patchelf command is:

```
try an ldd command
if <command failed>
    try the same ldd command with LD_PRELOAD
    if <second command succeeded>
        use patchelf to set the interpreter of the binary
    if <second command failed>
        try a new ldd command with LD_TRACE_LOADED_OBJECTS
        if <third command succeeded>
            use patchelf to set the interpreter and rpath of the binary
        if <third command failed>
            return error
```

This logic leads to inconsistent patchelf commands being constructed based on the environment in which they are tried. In my example, I found that I had two different lxc containers, both 18.04, in which different patchelf commands were being executed on many of the binaries found in the coreutils package. I eventually traced it back to the environment variable "SHELL=" in one of the containers. SHELL being set was causing different behavior in the second `ldd` command between the two environments.

While my SHELL environment issue is just one example, there are likely more. The patchelf command that is constructed by snapcraft should be consistent and not subject to environment variables in the host system.

For further debugging, here are strace logs for the second ldd command in the "working" and "not working (segmentation fault)" environments. Note that the working environment expects this second ldd command to fail so it can move on to the LD_TRACE_LOADED_OBJECTS command

working: https://paste.ubuntu.com/p/HyJ6xvYJ3F/
not working: https://paste.ubuntu.com/p/2HngJk8qTX/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.