ubuntu_lttng_smoke_test crash with B-5.0 AWS/Azure/GCP

Bug #1841766 reported by Po-Hsu Lin
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Unassigned
linux-aws-edge (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
lttng-modules (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
Undecided
Kleber Sacilotto de Souza

Bug Description

[Impact]
Test will crash and hang, until timeout on jenkins.

This test passed with B-4.15 AWS, and D-5.0 AWS, just not B-5.0 AWS

There is no report and no obvious failure on the jenkins output (just marked as build aborted), this can only be found by tailing the syslog when you run the test.

Test output:
Running '/home/ubuntu/autotest/client/tests/ubuntu_lttng_smoke_test/ubuntu_lttng_smoke_test.sh'
[stdout] == lttng smoke test of session create/destroy ==
[stdout] Session test-kernel-session created.
[stdout] Traces will be written in /tmp/lttng-kernel-trace-6325-session
[stdout] PASSED (lttng create)
[stdout] Session test-kernel-session destroyed
[stdout] PASSED (lttng destroy)
[stdout]
[stdout] == lttng smoke test trace context switches ==
[stdout] Session test-kernel-session created.
[stdout] Traces will be written in /tmp/lttng-kernel-trace-6325-session
[stdout] PASSED (lttng create)

syslog output:
 kernel: [ 253.282158] lttng_kretprobes: loading out-of-tree module taints kernel.
 kernel: [ 253.282191] lttng_kretprobes: module verification failed: signature and/or required key missing - tainting kernel
 kernel: [ 253.293676] BUG: unable to handle kernel paging request at 0000000000025534
 kernel: [ 253.296613] #PF error: [normal kernel read fault]
 kernel: [ 253.298623] PGD 0 P4D 0
 kernel: [ 253.299783] Oops: 0000 [#1] SMP PTI
 kernel: [ 253.301313] CPU: 3 PID: 955 Comm: lttng-sessiond Tainted: G OE 5.0.0-1014-aws #16~18.04.1-Ubuntu
 kernel: [ 253.305391] Hardware name: Amazon EC2 m5.xlarge/, BIOS 1.0 10/16/2017
 kernel: [ 253.308102] RIP: 0010:lttng_tracepoint_notify+0x172/0x210 [lttng_tracer]
 kernel: [ 253.310886] Code: eb 1a 49 39 c6 0f 85 ab 00 00 00 49 8b 55 10 41 83 c4 01 44 39 a2 8c 02 00 00 76 a3 48 8b 92 90 02 00 00 49 63 c4 4c 63 34 82 <49> 8b 1e 48 89 df e8 33 fb ff ff 48 85 c0 49 89 c7 74 52 49 8b 47
 kernel: [ 253.318387] RSP: 0018:ffffb9c80232fbd8 EFLAGS: 00010246
 kernel: [ 253.320596] RAX: 0000000000000000 RBX: ffff9a27cbec9c00 RCX: 0000000000000041
 kernel: [ 253.323556] RDX: ffffffffc0456a0c RSI: 0000000000000001 RDI: ffffffffc0898100
 kernel: [ 253.326507] RBP: ffffb9c80232fc08 R08: ffff9a27d2ba70c0 R09: ffff9a27d2403680
 kernel: [ 253.329460] R10: ffffb9c80232fb48 R11: 00000000ffffffff R12: 0000000000000000
 kernel: [ 253.332420] R13: ffff9a27cbec9c00 R14: 0000000000025534 R15: ffffffffc0898740
 kernel: [ 253.335375] FS: 00007f26e1d49700(0000) GS:ffff9a27d2b80000(0000) knlGS:0000000000000000
 kernel: [ 253.338700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: [ 253.341118] CR2: 0000000000025534 CR3: 000000040c86c003 CR4: 00000000007606e0
 kernel: [ 253.345464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 kernel: [ 253.349798] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 kernel: [ 253.354126] PKRU: 55555554
 kernel: [ 253.356719] Call Trace:
 kernel: [ 253.359210] register_tracepoint_module_notifier+0x57/0x80
 kernel: [ 253.362856] ? 0xffffffffc08d7000
 kernel: [ 253.365675] lttng_tracepoint_init+0x45/0xc40 [lttng_tracer]
 kernel: [ 253.369380] lttng_events_init+0xbc/0x233 [lttng_tracer]
 kernel: [ 253.372952] ? 0xffffffffc08d7000
 kernel: [ 253.375739] do_one_initcall+0x4a/0x1c9
 kernel: [ 253.378733] ? _cond_resched+0x19/0x40
 kernel: [ 253.381693] ? kmem_cache_alloc_trace+0x151/0x1c0
 kernel: [ 253.385037] do_init_module+0x5f/0x216
 kernel: [ 253.387999] load_module+0x19f6/0x20a0
 kernel: [ 253.390956] __do_sys_finit_module+0xfc/0x120
 kernel: [ 253.394153] ? __do_sys_finit_module+0xfc/0x120
 kernel: [ 253.397416] __x64_sys_finit_module+0x1a/0x20
 kernel: [ 253.400618] do_syscall_64+0x5a/0x120
 kernel: [ 253.403547] entry_SYSCALL_64_after_hwframe+0x44/0xa9
 kernel: [ 253.407035] RIP: 0033:0x7f26e738d839
 kernel: [ 253.409949] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
 kernel: [ 253.421539] RSP: 002b:00007f26e1d38758 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
 kernel: [ 253.427373] RAX: ffffffffffffffda RBX: 00007f26cc004d70 RCX: 00007f26e738d839
 kernel: [ 253.431692] RDX: 0000000000000000 RSI: 00007f26e8d0a145 RDI: 0000000000000030
 kernel: [ 253.435978] RBP: 00007f26e8d0a145 R08: 0000000000000000 R09: 00007f26cc004b00
 kernel: [ 253.440270] R10: 0000000000000030 R11: 0000000000000246 R12: 0000000000000000
 kernel: [ 253.444581] R13: 00007f26cc004570 R14: 0000000000000000 R15: 00007f26cc004b00
 kernel: [ 253.448853] Modules linked in: lttng_tracer(OE+) lttng_statedump(OE) lttng_kprobes(OE) lttng_clock(OE) lttng_lib_ring_buffer(OE) lttng_kretprobes(OE) ppdev parport_pc parport serio_raw sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ena
 kernel: [ 253.478768] CR2: 0000000000025534
 kernel: [ 253.481575] ---[ end trace 9c47de0da4f80343 ]---
 kernel: [ 253.484885] RIP: 0010:lttng_tracepoint_notify+0x172/0x210 [lttng_tracer]
 kernel: [ 253.489142] Code: eb 1a 49 39 c6 0f 85 ab 00 00 00 49 8b 55 10 41 83 c4 01 44 39 a2 8c 02 00 00 76 a3 48 8b 92 90 02 00 00 49 63 c4 4c 63 34 82 <49> 8b 1e 48 89 df e8 33 fb ff ff 48 85 c0 49 89 c7 74 52 49 8b 47
 kernel: [ 253.500676] RSP: 0018:ffffb9c80232fbd8 EFLAGS: 00010246
 kernel: [ 253.504223] RAX: 0000000000000000 RBX: ffff9a27cbec9c00 RCX: 0000000000000041
 kernel: [ 253.508498] RDX: ffffffffc0456a0c RSI: 0000000000000001 RDI: ffffffffc0898100
 kernel: [ 253.512783] RBP: ffffb9c80232fc08 R08: ffff9a27d2ba70c0 R09: ffff9a27d2403680
 kernel: [ 253.517125] R10: ffffb9c80232fb48 R11: 00000000ffffffff R12: 0000000000000000
 kernel: [ 253.521421] R13: ffff9a27cbec9c00 R14: 0000000000025534 R15: ffffffffc0898740
 kernel: [ 253.525702] FS: 00007f26e1d49700(0000) GS:ffff9a27d2b80000(0000) knlGS:0000000000000000
 kernel: [ 253.531708] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: [ 253.535444] CR2: 0000000000025534 CR3: 000000040c86c003 CR4: 00000000007606e0
 kernel: [ 253.539722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 kernel: [ 253.544020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 kernel: [ 253.548300] PKRU: 55555554

I am proposing we backport the lttng-modules from Disco (2.10.8-1ubuntu1) to fix the issues with most recent kernel.

[Test Case]
- Install lttng-modules-dkms
- load lttng-tracer module

I have built the package on bionic and load-tested with linux 4.15 and some 5.0 backports.

[Regression Potential]
Backporting a newer version to Bionic has the potential to break compilation or users with not up-to-date 4.15 kernels.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

On generic B-5.0 kernel, this test just failed with launching session daemon:

 == lttng smoke test of session create/destroy ==
 Error: Session daemon terminated with an error (exit status: 1)
 Error: Problem occurred while launching session daemon (/usr/bin/lttng-sessiond)
 Error: Command error
 Spawning a session daemon
 FAILED (lttng create)
 Error: No session daemon is available
 Error: Command error
 FAILED (lttng destroy)

 == lttng smoke test trace context switches ==
 Error: Session daemon terminated with an error (exit status: 1)
 Error: Problem occurred while launching session daemon (/usr/bin/lttng-sessiond)
 Error: Command error
 Spawning a session daemon
 FAILED (lttng create)

description: updated
tags: added: 5.0 aws bionic ubuntu-lttng-smoke-test
tags: added: sru-20190812
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue could be found across different clouds,
but the D-5.0 GCP / D-5.0 Azure are all good like AWS here.

summary: - ubuntu_lttng_smoke_test crash with B-5.0 AWS
+ ubuntu_lttng_smoke_test crash with B-5.0 AWS/Azure/GCP
tags: added: azure gcp
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

I was able to reproduce the issue as well with bionic/linux-hwe 5.0.0-25 as well. Note that this kernel version is from -updates, so this doesn't affect only the kernel currently in -proposed.

While this is not a regression per-se, given that lttng-modules 2.10.5-1ubuntu1.2 were not compiling with 5.0 kernel in Bionic, version 2.10.5-1ubuntu1.3 compiles but causes the kernel bug mentioned.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

To reproduce:

- Install lttng-modules-dkms 2.10.5-1ubuntu1.3
- load lttng-tracer module

description: updated
Changed in linux-aws-edge (Ubuntu):
status: New → Invalid
Changed in linux-aws-edge (Ubuntu Bionic):
status: New → Invalid
Changed in lttng-modules (Ubuntu):
status: New → In Progress
Changed in lttng-modules (Ubuntu Bionic):
status: New → In Progress
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

lttng-modules package for sponsor for Bionic:

https://people.canonical.com/~ksouza/lp1841766/

Changed in lttng-modules (Ubuntu):
status: In Progress → Invalid
Revision history for this message
Sean Feole (sfeole) wrote :

Using the package built : lttng-modules-dkms_2.10.8-1ubuntu1~18.04.1_all.deb

I was able to successfully re-run the ubuntu_lttng_smoke_tests suite, ensuring the lttng-modules-dkms from the ppa was installed.

Full logs attached.

lttng-modules-dkms:
  Installed: 2.10.8-1ubuntu1~18.04.1
  Candidate: 2.10.8-1ubuntu1~18.04.1
  Version table:
 *** 2.10.8-1ubuntu1~18.04.1 500
        500 http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status
     2.10.5-1ubuntu1.3 500
        500 http://us-west1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages
     2.10.5-1ubuntu1 500
        500 http://us-west1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages

Revision history for this message
Tyler Hicks (tyhicks) wrote :

Based on the positive test results and a review of the changes between current Disco and Bionic, I've sponsored this package to bionic-proposed. Thanks!

Changed in lttng-modules (Ubuntu Bionic):
status: In Progress → Fix Committed
assignee: nobody → Kleber Sacilotto de Souza (kleber-souza)
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Po-Hsu, or anyone else affected,

Accepted lttng-modules into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/lttng-modules/2.10.8-1ubuntu1~18.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed verification-needed-bionic
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Test re-triggered on B-hwe, passed with amd64 / i386

Manually verified on an ARM64 node, test passed

Thanks

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

I have also verified locally on a VM, running ubuntu_lttng_smoke_test, with both linux-hwe 5.0.0-27 and linux-aws-edge 5.0.0-1014.

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Bug 1846485 was opened by a customer as a duplicate of this Bug, 1841766.

I did the following testing on bionic:

Kernel 4.15.0-65-generic with lttng-modules 2.10.5-1ubuntu1.3
Functions as intended.

Kernel 5.0.0-29-hwe with lttng-modules 2.10.5-1ubuntu1.3
Reproduced page faults kernel oops documented in Bug 1841766 and Bug 1846485.

Kernel 5.0.0-29-hwe with lttng-modules 2.10.8-1ubuntu1~18.04.1 from -proposed
Everything functions as intended, verified package from -proposed fixes the problem.

Kernel 4.15.0-65-generic with lttng-modules 2.10.8-1ubuntu1~18.04.1 from -proposed
Everything functions as intended, verified package from -proposed does not introduce any obvious regressions.

lttng-modules 2.10.8-1ubuntu1~18.04.1 seems to be waiting on Bug 1813062 to be verified to be promoted to -updates, and I have successfully ran autopkgtest.
I will mark that bug as verified as well.

tags: added: sts
removed: verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for lttng-modules has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lttng-modules - 2.10.8-1ubuntu1~18.04.1

---------------
lttng-modules (2.10.8-1ubuntu1~18.04.1) bionic; urgency=medium

  * Fix kernel crash on Bionic with 5.0.0 kernels (LP: #1841766)
    - Initial backport for Bionic to fix issues with 5.0.0
      kernels.

lttng-modules (2.10.8-1ubuntu1) disco; urgency=medium

  * Fix Linux 5.0 dkms build issues (LP: #1813062)
    - Minor patch wiggle backports of upstream commits:
    [80bb260] Fix: Remove 'type' argument from access_ok()
              function (v5.0)
    [cef5d79] Fix: signal: Remove SEND_SIG_FORCED (v4.20)
    [b90a7f3] Fix: signal: Distinguish between kernel_siginfo and
              siginfo
    [b9dbdfe] Fix: Replace pointer values with task->tk_pid and
              rpc_clnt->cl_clid
    [28fef30] Fix: SUNRPC: Simplify defining common RPC trace
              events (v5.0)

lttng-modules (2.10.8-1) unstable; urgency=medium

  * [7037820] New upstream version 2.10.8
  * [9138c6b] Bump standard to 4.2.1, no changes necessary

lttng-modules (2.10.7-1) unstable; urgency=medium

  * [a2b641e] New upstream version 2.10.7
  * [473eab3] Drop patch merged upstream
  * [8237ded] Bump standard to 4.2.0, no changes necessary

lttng-modules (2.10.6-2) unstable; urgency=medium

  * [4005ba5] Add patch to fix build on linux 4.16

lttng-modules (2.10.6-1) unstable; urgency=medium

  * [60f9658] New upstream version 2.10.6
  * [e1998ce] Drop patch merged upstream
  * [36dee0d] Use salsa canonical uri in VCS-Browser
  * [8542a9d] Bump standard to 4.1.4, no changes necessary

 -- Kleber Sacilotto de Souza <email address hidden> Thu, 29 Aug 2019 17:51:23 +0200

Changed in lttng-modules (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Closing this old bug.

Changed in ubuntu-kernel-tests:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.