autopkgtests fail on s390x (segfault)

Bug #2062118 reported by Christian Ehrhardt
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Triaged
Undecided
bugproxy
libtraceevent (Ubuntu)
Status tracked in Plucky
Noble
New
Undecided
Unassigned
Oracular
New
Undecided
Unassigned
Plucky
Fix Released
Undecided
Unassigned
libtracefs (Ubuntu)
Status tracked in Plucky
Noble
New
Undecided
Unassigned
Oracular
New
Undecided
Unassigned
Plucky
Fix Released
Undecided
Adrien Nader

Bug Description

As part of the added QA to libtracefs it was found that it triggers a segfault on s390x.
This isn't just a test failing, it seems this is still deeply broken on s390x.

Either way, while in the time pressure of the noble release the decision was simplified like "The tests didn't make it worse, just now we know" and continued (To not leave these platforms behind later unable to add it, albeit knowing it is still incomplete for now).

It does not mean that we can ignore them for too long and certainly need to work on completing that into being fully functional in tests and real usage. Hence we create this spin off bug from the MIR work in bug 2051925 for tracking the further efforts.

Example test log:
https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/s390x/libt/libtracefs/20240417_184123_8ab96@/log.gz

Related branches

Revision history for this message
Christian Ehrhardt (paelzer) wrote :

I'll assign Adrien as the agreement on the MIR was to follow up on these, but also Frank to sync this with IBM for their input which TBH could be anything from "here is the fix" to a worse "it will never work please remove it"

Also there is a sibling of this for ppc64el in bug 2062119

Changed in libtracefs (Ubuntu):
assignee: nobody → Adrien Nader (adrien-n)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → bugproxy (bugproxy)
tags: added: reverse-proxy-bugzilla
Revision history for this message
Frank Heimes (fheimes) wrote :

Is it know if 'Test: trace pid events filter' also ended up in a segfault on earlier versions, e.g. 1.7.0-1 / mantic? (Probably not, looks like the test suite is only triggered during build starting with 1.8.0-1ubuntu1 / noble...)

Adrien Nader (adrien)
tags: added: rls-oo-incoming
bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-206054 severity-high targetmilestone-inin---
tags: added: foundations-todo
removed: rls-oo-incoming
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
Changed in libtracefs (Ubuntu):
status: New → Triaged
Revision history for this message
Pragyansh Chaturvedi (r41k0u) wrote :

I spent some time on this, as the tests in utest/ were segfaulting.
Turns out this is an endianness issue in libtraceeevent.
In libtraceevent/src/event_parse.c, we have:

```
/**
 * tep_alloc - create a tep handle
 */
struct tep_handle *tep_alloc(void)
{
 struct tep_handle *tep = calloc(1, sizeof(*tep));

 if (tep) {
  tep->ref_count = 1;
  tep->host_bigendian = tep_is_bigendian();
 }

 return tep;
}
```

So on s390x, tep->host_bigendian is TEP_BIG_ENDIAN, but tep->file_bigendian stays the default value (TEP_LITTLE_ENDIAN)

Then in libtracefs/src/kbuffer_parse.c, we have:

```
enum {
 KBUFFER_FL_HOST_BIG_ENDIAN = (1<<0),
 KBUFFER_FL_BIG_ENDIAN = (1<<1),
 KBUFFER_FL_LONG_8 = (1<<2),
 KBUFFER_FL_OLD_FORMAT = (1<<3),
};

#define ENDIAN_MASK (KBUFFER_FL_HOST_BIG_ENDIAN | KBUFFER_FL_BIG_ENDIAN)

...

static int do_swap(struct kbuffer *kbuf)
{
 return ((kbuf->flags & KBUFFER_FL_HOST_BIG_ENDIAN) + kbuf->flags) &
  ENDIAN_MASK;
}
```

kbuf->flags is populated based off the tep_handle object. So the tests fail because libtraceevent thinks the files it opens are stored in little endian format, while actually it is the other way round.

My fix was to change `tep->host_bigendian = tep_is_bigendian();` to `tep->host_bigendian = tep->file_bigendian = tep_is_bigendian();`

We can make a default assumption that the host and FS endianness is same. If it is different, the user must set the correct endianness using the event-parse-api (tep_set_file_bigendian)

I am not sure if this must go upstream as well, and even if this would be the right fix. But it does fix the tests

```
Run Summary: Type Total Ran Passed Failed Inactive
              suites 1 1 n/a 0 0
               tests 36 36 35 1 0
             asserts 16407066 16407066 16407064 2 n/a

Elapsed time = 22.623 seconds
```

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libtraceevent - 1:1.8.4-2

---------------
libtraceevent (1:1.8.4-2) unstable; urgency=medium

  * Add upstream commit to set default file_bigendian in
    struct tep_handle. (LP: #2062118)
    - Thanks Pragyansh Chaturvedi.
  * Add salsa ci.

 -- Sudip Mukherjee <email address hidden> Tue, 24 Dec 2024 12:08:10 +0000

Changed in libtraceevent (Ubuntu Plucky):
status: New → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Changed in libtracefs (Ubuntu Plucky):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.