ubuntu_lttng_smoke_test crash with B-5.0 AWS/Azure/GCP

Bug #1841766 reported by Po-Hsu Lin on 2019-08-28
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Unassigned
linux-aws-edge (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
lttng-modules (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Kleber Sacilotto de Souza

Bug Description

[Impact]
Test will crash and hang, until timeout on jenkins.

This test passed with B-4.15 AWS, and D-5.0 AWS, just not B-5.0 AWS

There is no report and no obvious failure on the jenkins output (just marked as build aborted), this can only be found by tailing the syslog when you run the test.

Test output:
Running '/home/ubuntu/autotest/client/tests/ubuntu_lttng_smoke_test/ubuntu_lttng_smoke_test.sh'
[stdout] == lttng smoke test of session create/destroy ==
[stdout] Session test-kernel-session created.
[stdout] Traces will be written in /tmp/lttng-kernel-trace-6325-session
[stdout] PASSED (lttng create)
[stdout] Session test-kernel-session destroyed
[stdout] PASSED (lttng destroy)
[stdout]
[stdout] == lttng smoke test trace context switches ==
[stdout] Session test-kernel-session created.
[stdout] Traces will be written in /tmp/lttng-kernel-trace-6325-session
[stdout] PASSED (lttng create)

syslog output:
 kernel: [ 253.282158] lttng_kretprobes: loading out-of-tree module taints kernel.
 kernel: [ 253.282191] lttng_kretprobes: module verification failed: signature and/or required key missing - tainting kernel
 kernel: [ 253.293676] BUG: unable to handle kernel paging request at 0000000000025534
 kernel: [ 253.296613] #PF error: [normal kernel read fault]
 kernel: [ 253.298623] PGD 0 P4D 0
 kernel: [ 253.299783] Oops: 0000 [#1] SMP PTI
 kernel: [ 253.301313] CPU: 3 PID: 955 Comm: lttng-sessiond Tainted: G OE 5.0.0-1014-aws #16~18.04.1-Ubuntu
 kernel: [ 253.305391] Hardware name: Amazon EC2 m5.xlarge/, BIOS 1.0 10/16/2017
 kernel: [ 253.308102] RIP: 0010:lttng_tracepoint_notify+0x172/0x210 [lttng_tracer]
 kernel: [ 253.310886] Code: eb 1a 49 39 c6 0f 85 ab 00 00 00 49 8b 55 10 41 83 c4 01 44 39 a2 8c 02 00 00 76 a3 48 8b 92 90 02 00 00 49 63 c4 4c 63 34 82 <49> 8b 1e 48 89 df e8 33 fb ff ff 48 85 c0 49 89 c7 74 52 49 8b 47
 kernel: [ 253.318387] RSP: 0018:ffffb9c80232fbd8 EFLAGS: 00010246
 kernel: [ 253.320596] RAX: 0000000000000000 RBX: ffff9a27cbec9c00 RCX: 0000000000000041
 kernel: [ 253.323556] RDX: ffffffffc0456a0c RSI: 0000000000000001 RDI: ffffffffc0898100
 kernel: [ 253.326507] RBP: ffffb9c80232fc08 R08: ffff9a27d2ba70c0 R09: ffff9a27d2403680
 kernel: [ 253.329460] R10: ffffb9c80232fb48 R11: 00000000ffffffff R12: 0000000000000000
 kernel: [ 253.332420] R13: ffff9a27cbec9c00 R14: 0000000000025534 R15: ffffffffc0898740
 kernel: [ 253.335375] FS: 00007f26e1d49700(0000) GS:ffff9a27d2b80000(0000) knlGS:0000000000000000
 kernel: [ 253.338700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: [ 253.341118] CR2: 0000000000025534 CR3: 000000040c86c003 CR4: 00000000007606e0
 kernel: [ 253.345464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 kernel: [ 253.349798] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 kernel: [ 253.354126] PKRU: 55555554
 kernel: [ 253.356719] Call Trace:
 kernel: [ 253.359210] register_tracepoint_module_notifier+0x57/0x80
 kernel: [ 253.362856] ? 0xffffffffc08d7000
 kernel: [ 253.365675] lttng_tracepoint_init+0x45/0xc40 [lttng_tracer]
 kernel: [ 253.369380] lttng_events_init+0xbc/0x233 [lttng_tracer]
 kernel: [ 253.372952] ? 0xffffffffc08d7000
 kernel: [ 253.375739] do_one_initcall+0x4a/0x1c9
 kernel: [ 253.378733] ? _cond_resched+0x19/0x40
 kernel: [ 253.381693] ? kmem_cache_alloc_trace+0x151/0x1c0
 kernel: [ 253.385037] do_init_module+0x5f/0x216
 kernel: [ 253.387999] load_module+0x19f6/0x20a0
 kernel: [ 253.390956] __do_sys_finit_module+0xfc/0x120
 kernel: [ 253.394153] ? __do_sys_finit_module+0xfc/0x120
 kernel: [ 253.397416] __x64_sys_finit_module+0x1a/0x20
 kernel: [ 253.400618] do_syscall_64+0x5a/0x120
 kernel: [ 253.403547] entry_SYSCALL_64_after_hwframe+0x44/0xa9
 kernel: [ 253.407035] RIP: 0033:0x7f26e738d839
 kernel: [ 253.409949] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
 kernel: [ 253.421539] RSP: 002b:00007f26e1d38758 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
 kernel: [ 253.427373] RAX: ffffffffffffffda RBX: 00007f26cc004d70 RCX: 00007f26e738d839
 kernel: [ 253.431692] RDX: 0000000000000000 RSI: 00007f26e8d0a145 RDI: 0000000000000030
 kernel: [ 253.435978] RBP: 00007f26e8d0a145 R08: 0000000000000000 R09: 00007f26cc004b00
 kernel: [ 253.440270] R10: 0000000000000030 R11: 0000000000000246 R12: 0000000000000000
 kernel: [ 253.444581] R13: 00007f26cc004570 R14: 0000000000000000 R15: 00007f26cc004b00
 kernel: [ 253.448853] Modules linked in: lttng_tracer(OE+) lttng_statedump(OE) lttng_kprobes(OE) lttng_clock(OE) lttng_lib_ring_buffer(OE) lttng_kretprobes(OE) ppdev parport_pc parport serio_raw sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ena
 kernel: [ 253.478768] CR2: 0000000000025534
 kernel: [ 253.481575] ---[ end trace 9c47de0da4f80343 ]---
 kernel: [ 253.484885] RIP: 0010:lttng_tracepoint_notify+0x172/0x210 [lttng_tracer]
 kernel: [ 253.489142] Code: eb 1a 49 39 c6 0f 85 ab 00 00 00 49 8b 55 10 41 83 c4 01 44 39 a2 8c 02 00 00 76 a3 48 8b 92 90 02 00 00 49 63 c4 4c 63 34 82 <49> 8b 1e 48 89 df e8 33 fb ff ff 48 85 c0 49 89 c7 74 52 49 8b 47
 kernel: [ 253.500676] RSP: 0018:ffffb9c80232fbd8 EFLAGS: 00010246
 kernel: [ 253.504223] RAX: 0000000000000000 RBX: ffff9a27cbec9c00 RCX: 0000000000000041
 kernel: [ 253.508498] RDX: ffffffffc0456a0c RSI: 0000000000000001 RDI: ffffffffc0898100
 kernel: [ 253.512783] RBP: ffffb9c80232fc08 R08: ffff9a27d2ba70c0 R09: ffff9a27d2403680
 kernel: [ 253.517125] R10: ffffb9c80232fb48 R11: 00000000ffffffff R12: 0000000000000000
 kernel: [ 253.521421] R13: ffff9a27cbec9c00 R14: 0000000000025534 R15: ffffffffc0898740
 kernel: [ 253.525702] FS: 00007f26e1d49700(0000) GS:ffff9a27d2b80000(0000) knlGS:0000000000000000
 kernel: [ 253.531708] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: [ 253.535444] CR2: 0000000000025534 CR3: 000000040c86c003 CR4: 00000000007606e0
 kernel: [ 253.539722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 kernel: [ 253.544020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 kernel: [ 253.548300] PKRU: 55555554

I am proposing we backport the lttng-modules from Disco (2.10.8-1ubuntu1) to fix the issues with most recent kernel.

[Test Case]
- Install lttng-modules-dkms
- load lttng-tracer module

I have built the package on bionic and load-tested with linux 4.15 and some 5.0 backports.

[Regression Potential]
Backporting a newer version to Bionic has the potential to break compilation or users with not up-to-date 4.15 kernels.

Po-Hsu Lin (cypressyew) wrote :

On generic B-5.0 kernel, this test just failed with launching session daemon:

 == lttng smoke test of session create/destroy ==
 Error: Session daemon terminated with an error (exit status: 1)
 Error: Problem occurred while launching session daemon (/usr/bin/lttng-sessiond)
 Error: Command error
 Spawning a session daemon
 FAILED (lttng create)
 Error: No session daemon is available
 Error: Command error
 FAILED (lttng destroy)

 == lttng smoke test trace context switches ==
 Error: Session daemon terminated with an error (exit status: 1)
 Error: Problem occurred while launching session daemon (/usr/bin/lttng-sessiond)
 Error: Command error
 Spawning a session daemon
 FAILED (lttng create)

description: updated
tags: added: 5.0 aws bionic ubuntu-lttng-smoke-test
tags: added: sru-20190812
Po-Hsu Lin (cypressyew) wrote :

This issue could be found across different clouds,
but the D-5.0 GCP / D-5.0 Azure are all good like AWS here.

summary: - ubuntu_lttng_smoke_test crash with B-5.0 AWS
+ ubuntu_lttng_smoke_test crash with B-5.0 AWS/Azure/GCP
tags: added: azure gcp

I was able to reproduce the issue as well with bionic/linux-hwe 5.0.0-25 as well. Note that this kernel version is from -updates, so this doesn't affect only the kernel currently in -proposed.

While this is not a regression per-se, given that lttng-modules 2.10.5-1ubuntu1.2 were not compiling with 5.0 kernel in Bionic, version 2.10.5-1ubuntu1.3 compiles but causes the kernel bug mentioned.

To reproduce:

- Install lttng-modules-dkms 2.10.5-1ubuntu1.3
- load lttng-tracer module

description: updated
Changed in linux-aws-edge (Ubuntu):
status: New → Invalid
Changed in linux-aws-edge (Ubuntu Bionic):
status: New → Invalid
Changed in lttng-modules (Ubuntu):
status: New → In Progress
Changed in lttng-modules (Ubuntu Bionic):
status: New → In Progress

lttng-modules package for sponsor for Bionic:

https://people.canonical.com/~ksouza/lp1841766/

Changed in lttng-modules (Ubuntu):
status: In Progress → Invalid
Sean Feole (sfeole) wrote :

Using the package built : lttng-modules-dkms_2.10.8-1ubuntu1~18.04.1_all.deb

I was able to successfully re-run the ubuntu_lttng_smoke_tests suite, ensuring the lttng-modules-dkms from the ppa was installed.

Full logs attached.

lttng-modules-dkms:
  Installed: 2.10.8-1ubuntu1~18.04.1
  Candidate: 2.10.8-1ubuntu1~18.04.1
  Version table:
 *** 2.10.8-1ubuntu1~18.04.1 500
        500 http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status
     2.10.5-1ubuntu1.3 500
        500 http://us-west1.gce.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages
     2.10.5-1ubuntu1 500
        500 http://us-west1.gce.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages

Tyler Hicks (tyhicks) wrote :

Based on the positive test results and a review of the changes between current Disco and Bionic, I've sponsored this package to bionic-proposed. Thanks!

Changed in lttng-modules (Ubuntu Bionic):
status: In Progress → Fix Committed
assignee: nobody → Kleber Sacilotto de Souza (kleber-souza)

Hello Po-Hsu, or anyone else affected,

Accepted lttng-modules into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/lttng-modules/2.10.8-1ubuntu1~18.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed verification-needed-bionic
description: updated
Po-Hsu Lin (cypressyew) wrote :

Test re-triggered on B-hwe, passed with amd64 / i386

Manually verified on an ARM64 node, test passed

Thanks

tags: added: verification-done-bionic
removed: verification-needed-bionic

I have also verified locally on a VM, running ubuntu_lttng_smoke_test, with both linux-hwe 5.0.0-27 and linux-aws-edge 5.0.0-1014.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers