systemtap currently broken in xenial

Bug #1830226 reported by Andrea Righi on 2019-05-23
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
systemtap (Ubuntu)
Medium
Andrea Righi
Xenial
Medium
Andrea Righi

Bug Description

[Impact]

Kernel commit 768ae309a96103ed02eb1e111e838c87854d8b51 changed the prototype of get_user_pages() removing 'write' and 'force' arguments, merging them into 'gup_flags'.

This breaks systemtap that makes use of get_user_pages() in its kernel runtime interface.

Fix the usage of get_user_pages() checking if it requires the old or the new signature.

[Test Case]

ubuntu@ubuntu:~$ cat hello.stp
#!/usr/bin/env stap
probe oneshot { println("hello world") }

[Regression Potential]

A similar change is already present in bionic+. This is a backported change for xenial. Moreover the fix adds a runtime check (done every time a stap script is executed) to verify if the running kernel is using the old or the new signature of get_user_pages(), so this change does not break the old stap scripts. Therefore regression potential is minimal.

[Original bug report]

Running a simple systemtap hello world example on a fresh installed Xenial VM produces the following errors:

ubuntu@ubuntu:~$ uname -r
4.4.0-148-generic

ubuntu@ubuntu:~$ dpkg -l | grep systemtap
ii systemtap 2.9-2ubuntu2 amd64 instrumentation system for Linux
ii systemtap-common 2.9-2ubuntu2 all instrumentation system for Linux (common component)
ii systemtap-runtime 2.9-2ubuntu2 amd64 instrumentation system for Linux (runtime component)

ubuntu@ubuntu:~$ cat hello.stp
#!/usr/bin/env stap
probe oneshot { println("hello world") }

ubuntu@ubuntu:~$ sudo ./hello.stp
In file included from /usr/share/systemtap/runtime/linux/runtime.h:204:0,
                 from /usr/share/systemtap/runtime/runtime.h:24,
                 from /tmp/stapCrPm1y/stap_9bc2f1adeaead87a69b1ab80b0f14480_967_src.c:25:
/usr/share/systemtap/runtime/linux/access_process_vm.h: In function ‘__access_process_vm_’:
/usr/share/systemtap/runtime/linux/access_process_vm.h:35:54: error: passing argument 6 of ‘get_user_pages’ makes pointer from integer without a cast [-Werror=int-conversion]
       ret = get_user_pages (tsk, mm, addr, 1, write, 1, &page, &vma);
                                                      ^
In file included from include/linux/pid_namespace.h:6:0,
                 from include/linux/ptrace.h:8,
                 from include/linux/ftrace.h:13,
                 from include/linux/kprobes.h:42,
                 from /usr/share/systemtap/runtime/linux/runtime.h:21,
                 from /usr/share/systemtap/runtime/runtime.h:24,
                 from /tmp/stapCrPm1y/stap_9bc2f1adeaead87a69b1ab80b0f14480_967_src.c:25:
include/linux/mm.h:1222:6: note: expected ‘struct page **’ but argument is of type ‘int’
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
      ^
In file included from /usr/share/systemtap/runtime/linux/runtime.h:204:0,
                 from /usr/share/systemtap/runtime/runtime.h:24,
                 from /tmp/stapCrPm1y/stap_9bc2f1adeaead87a69b1ab80b0f14480_967_src.c:25:
/usr/share/systemtap/runtime/linux/access_process_vm.h:35:57: error: passing argument 7 of ‘get_user_pages’ from incompatible pointer type [-Werror=incompatible-pointer-types]
       ret = get_user_pages (tsk, mm, addr, 1, write, 1, &page, &vma);
                                                         ^
In file included from include/linux/pid_namespace.h:6:0,
                 from include/linux/ptrace.h:8,
                 from include/linux/ftrace.h:13,
                 from include/linux/kprobes.h:42,
                 from /usr/share/systemtap/runtime/linux/runtime.h:21,
                 from /usr/share/systemtap/runtime/runtime.h:24,
                 from /tmp/stapCrPm1y/stap_9bc2f1adeaead87a69b1ab80b0f14480_967_src.c:25:
include/linux/mm.h:1222:6: note: expected ‘struct vm_area_struct **’ but argument is of type ‘struct page **’
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
      ^
In file included from /usr/share/systemtap/runtime/linux/runtime.h:204:0,
                 from /usr/share/systemtap/runtime/runtime.h:24,
                 from /tmp/stapCrPm1y/stap_9bc2f1adeaead87a69b1ab80b0f14480_967_src.c:25:
/usr/share/systemtap/runtime/linux/access_process_vm.h:35:13: error: too many arguments to function ‘get_user_pages’
       ret = get_user_pages (tsk, mm, addr, 1, write, 1, &page, &vma);
             ^
In file included from include/linux/pid_namespace.h:6:0,
                 from include/linux/ptrace.h:8,
                 from include/linux/ftrace.h:13,
                 from include/linux/kprobes.h:42,
                 from /usr/share/systemtap/runtime/linux/runtime.h:21,
                 from /usr/share/systemtap/runtime/runtime.h:24,
                 from /tmp/stapCrPm1y/stap_9bc2f1adeaead87a69b1ab80b0f14480_967_src.c:25:
include/linux/mm.h:1222:6: note: declared here
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
      ^
cc1: all warnings being treated as errors
scripts/Makefile.build:285: recipe for target '/tmp/stapCrPm1y/stap_9bc2f1adeaead87a69b1ab80b0f14480_967_src.o' failed
make[1]: *** [/tmp/stapCrPm1y/stap_9bc2f1adeaead87a69b1ab80b0f14480_967_src.o] Error 1
Makefile:1454: recipe for target '_module_/tmp/stapCrPm1y' failed
make: *** [_module_/tmp/stapCrPm1y] Error 2
WARNING: kbuild exited with status: 2
Pass 4: compilation failed. [man error::pass4]
Tip: /usr/share/doc/systemtap/README.Debian should help you get started.

Andrea Righi (arighi) on 2019-05-23
affects: linaro-aarch64 → linux
Andrea Righi (arighi) wrote :

It looks like this particular build error has been introduced by this upstream commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=768ae309a96103ed02eb1e111e838c87854d8b51

That was backported to the xenial kernel, changing the prototype of get_user_pages(). We need to update systemtap to use the new get_user_pages() prototype as well.

Andrea Righi (arighi) wrote :

With this patch applied to systemtap the "hello world" test case completes successfully without any error.

Andrea Righi (arighi) wrote :

Attached a debdiff that fixes the bug.

Andrea Righi (arighi) on 2019-05-24
description: updated
Changed in linux:
assignee: nobody → Andrea Righi (arighi)
Andrea Righi (arighi) on 2019-05-28
affects: linux → systemtap (Ubuntu)
Changed in systemtap (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Changed in systemtap (Ubuntu Xenial):
importance: Undecided → Medium
assignee: nobody → Andrea Righi (arighi)
status: New → Confirmed

The attachment "fix-get-user-pages-prototype.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Robie Basak (racb) wrote :

> A similar change is already present in bionic+.

Please set the development release task to Fix Released or Invalid accordingly.

Robie Basak (racb) wrote :

Please could you also add testing against the release pocket Bionic kernel in your SRU verification steps?

Changed in systemtap (Ubuntu Xenial):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-xenial

Hello Andrea, or anyone else affected,

Accepted systemtap into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemtap/2.9-2ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Mathew Hodson (mhodson) on 2019-07-21
Changed in systemtap (Ubuntu):
status: Confirmed → Fix Released

The fix for this bug has been awaiting testing feedback in the -proposed repository for xenial for more than 90 days. Please test this fix and update the bug appropriately with the results. In the event that the fix for this bug is still not verified 15 days from now, the package will be removed from the -proposed repository.

tags: added: removal-candidate
Martijn (tgmpje) wrote :

Installed the new package (2.9-2ubuntu2 --> 2.9-2ubuntu2.1) and can confirm systemtap is now working again (Ubuntu Xenial)

Matthew Ruffell (mruffell) wrote :

I went and tested the released and -proposed versions of systemtap along with the GA and HWE kernels on xenial. This is what I found:

Tests 1 and 2, confirming that current package is broken:

Test 1:
Xenial 4.4.0-165-generic kernel, Systemtap 2.9-2ubuntu2
Test results:
https://paste.ubuntu.com/p/MhVhHJHkMT/
We failed to load the trivial hello world probe. As reported.

Test 2:
Xenial 4.15.0-66-generic #75~16.04.1-Ubuntu HWE kernel, Systemtap 2.9-2ubuntu2
Test results:
https://paste.ubuntu.com/p/pZYrpHNWCV/
We failed to load the trivial hello world probe. As reported.

Tests 3 and 4, with the new package in -proposed:

Test 3:
Xenial 4.4.0-165-generic #193-Ubuntu GA kernel, Systemtap 2.9-2ubuntu2.1 from -proposed.
Test results:
https://paste.ubuntu.com/p/SGC8wKdS6W/
We successfully loaded the trivial hello world probe. -proposed package fixes issue.

Test 4:
Xenial 4.15.0-66-generic #75~16.04.1-Ubuntu HWE kernel, Systemtap 2.9-2ubuntu2.1 from -proposed.
Test results:
https://paste.ubuntu.com/p/PnzMknQxqP/
We failed to load the trivial hello world probe. HWE kernels are still broken.

While the package in -proposed does solve the problem for the GA 4.4 kernel, this package is still incompatible with the 4.15 HWE kernel, as per Bug 1683876.

Another thing that is critical to mention, is that the function signature change that happened because of "mm: replace get_user_pages() write/force parameters with gup_flags" landed in xenial 4.4.0-143 #169 earlier this year.

That means if we release the systemtap 2.9-2ubuntu2.1 from -proposed, which requires the above commit from 4.4.0-143 #169 onward, systemtap will then break for all users of xenial 4.4.0-142 and previous, as systemtap 2.9-2ubuntu2.1 would contain the wrong function signature.

This would cause regressions if users are using the updated systemtap package with older kernels, but do not wish to upgrade their kernel.

Because of this, I will not mark this bug as verified, as this needs more discussion before we go releasing this package.

This version of systemtap needs to be dependent on kernel 4.4.0-143 #169 or later.
I'm assuming that we can do that?

Hi -

> This version of systemtap needs to be dependent on kernel 4.4.0-143 #169 or later.

Please note that upstream systemtap 4.1 supports the whole range of
kernels 2.6 through 5.1ish.

- FChE

Dorina Timbur (dorina-t) wrote :

Hi, what's the current status here regarding getting a fix for Xenial? We have a live customer environment affected by this issue and it's significantly impacting our ability to troubleshoot a package loss issue.

Dan Streetman (ddstreet) wrote :

In case it's useful to anyone coming to this bug, we do have a daily build of systemtap set up in this PPA:
https://launchpad.net/~ubuntu-support-team/+archive/ubuntu/systemtap

That is COMPLETELY UNSUPPORTED and may be broken at any time, but may be useful to people wanting to use systemtap in SRU releases.

Frank Ch. Eigler (fche) wrote :

Dan, nice to hear of the nightly-build PPA. If there were one coupled to a fresher elfutils (0.178+), then you'd get a nice combination of new systemtap and auto-downloaded debuginfod content (even for xenial). https://sourceware.org/elfutils/Debuginfod.html

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers