Focal Azure AMD64 instances crash with ftrace/test.d/00basic/basic2.tc in ubuntu_kselftests_ftrace and ubuntu_ftrace_smoke_test

Bug #1882669 reported by Po-Hsu Lin
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
High
Unassigned
linux-aws (Ubuntu)
Fix Released
Undecided
Unassigned
linux-azure (Ubuntu)
Confirmed
Undecided
Unassigned
Focal
Confirmed
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned

Bug Description

Issue found on 5.4.0-1013.13

The ftrace in ubuntu_kernel_selftests failed on the second test:
# === Ftrace unit tests ===
# [1] Basic trace file check [PASS]
# [2] Basic test for tracers
(System hang here)

Syslog:
kernel: [ 2234.408225] in mmio_trace_init
kernel: [ 2234.988256] mmiotrace: Disabling non-boot CPUs...
kernel: [ 2235.028451] mmiotrace: Error taking CPU1 down: -16
kernel: [ 2235.052564] mmiotrace: Error taking CPU2 down: -16
kernel: [ 2235.076593] mmiotrace: Error taking CPU3 down: -16
kernel: [ 2235.096576] mmiotrace: Error taking CPU4 down: -16
kernel: [ 2235.116487] mmiotrace: Error taking CPU5 down: -16
kernel: [ 2235.132446] mmiotrace: Error taking CPU6 down: -16
kernel: [ 2235.152488] mmiotrace: Error taking CPU7 down: -16
kernel: [ 2235.172526] mmiotrace: Error taking CPU8 down: -16
kernel: [ 2235.192486] mmiotrace: Error taking CPU9 down: -16
kernel: [ 2235.212455] mmiotrace: Error taking CPU10 down: -16
kernel: [ 2235.233020] mmiotrace: Error taking CPU11 down: -16
kernel: [ 2235.256355] mmiotrace: Error taking CPU12 down: -16
kernel: [ 2235.272473] mmiotrace: Error taking CPU13 down: -16
kernel: [ 2235.292393] mmiotrace: Error taking CPU14 down: -16
kernel: [ 2235.312426] mmiotrace: Error taking CPU15 down: -16
kernel: [ 2235.344407] mmiotrace: Error taking CPU16 down: -16
kernel: [ 2235.380494] mmiotrace: Error taking CPU17 down: -16
kernel: [ 2235.400319] mmiotrace: Error taking CPU18 down: -16
kernel: [ 2235.432631] mmiotrace: Error taking CPU19 down: -16
kernel: [ 2235.456453] mmiotrace: Error taking CPU20 down: -16
kernel: [ 2235.480432] mmiotrace: Error taking CPU21 down: -16
kernel: [ 2235.496501] mmiotrace: Error taking CPU22 down: -16
kernel: [ 2235.516442] mmiotrace: Error taking CPU23 down: -16
kernel: [ 2235.536394] mmiotrace: Error taking CPU24 down: -16
kernel: [ 2235.556489] mmiotrace: Error taking CPU25 down: -16
kernel: [ 2235.573759] smpboot: CPU 26 is now offline
kernel: [ 2235.575071] mmiotrace: CPU26 is down.
kernel: [ 2235.585643] smpboot: CPU 27 is now offline
kernel: [ 2235.586699] mmiotrace: CPU27 is down.
kernel: [ 2235.609679] smpboot: CPU 28 is now offline
kernel: [ 2235.610788] mmiotrace: CPU28 is down.
kernel: [ 2235.633530] smpboot: CPU 29 is now offline
kernel: [ 2235.634511] mmiotrace: CPU29 is down.
kernel: [ 2235.653652] smpboot: CPU 30 is now offline
kernel: [ 2235.655415] mmiotrace: CPU30 is down.
kernel: [ 2235.678126] smpboot: CPU 31 is now offline
kernel: [ 2235.679061] mmiotrace: CPU31 is down.
kernel: [ 2235.679062] mmiotrace: multiple CPUs still online, may miss events.
kernel: [ 2235.679063] mmiotrace: enabled.
kernel: [ 2235.679128] in mmio_trace_reset
kernel: [ 2235.696163] mmiotrace: Re-enabling CPUs...
kernel: [ 2235.732157] mmiotrace: enabled CPU1.
kernel: [ 2235.772207] mmiotrace: enabled CPU2.
kernel: [ 2235.812216] mmiotrace: enabled CPU3.
kernel: [ 2235.844220] mmiotrace: enabled CPU4.
kernel: [ 2235.876211] mmiotrace: enabled CPU5.
kernel: [ 2235.916159] mmiotrace: enabled CPU6.
kernel: [ 2235.956214] mmiotrace: enabled CPU7.
kernel: [ 2235.988145] mmiotrace: enabled CPU8.
kernel: [ 2236.020214] mmiotrace: enabled CPU9.
kernel: [ 2236.060207] mmiotrace: enabled CPU10.
kernel: [ 2236.096208] mmiotrace: enabled CPU11.
kernel: [ 2236.128205] mmiotrace: enabled CPU12.
/usr/sbin/irqbalance: WARNING, didn't collect load info for all cpus, balancing is broken
kernel: [ 2236.168204] mmiotrace: enabled CPU13.
kernel: [ 2236.200206] mmiotrace: enabled CPU14.
kernel: [ 2236.232207] mmiotrace: enabled CPU15.
kernel: [ 2236.264224] mmiotrace: enabled CPU16.
kernel: [ 2236.300207] mmiotrace: enabled CPU17.
kernel: [ 2236.332210] mmiotrace: enabled CPU18.
kernel: [ 2236.372224] mmiotrace: enabled CPU19.
kernel: [ 2236.404147] mmiotrace: enabled CPU20.
kernel: [ 2236.452153] mmiotrace: enabled CPU21.
kernel: [ 2236.492152] mmiotrace: enabled CPU22.
kernel: [ 2236.524153] mmiotrace: enabled CPU23.
kernel: [ 2236.564211] mmiotrace: enabled CPU24.
kernel: [ 2236.612204] mmiotrace: enabled CPU25.
kernel: [ 2236.644353] smpboot: Booting Node 0 Processor 26 APIC 0x1a
kernel: [ 2236.645769] mmiotrace: enabled CPU26.
kernel: [ 2236.684537] smpboot: Booting Node 0 Processor 27 APIC 0x1b
kernel: [ 2236.685751] mmiotrace: enabled CPU27.
kernel: [ 2236.724284] smpboot: Booting Node 0 Processor 28 APIC 0x1c
kernel: [ 2236.725508] mmiotrace: enabled CPU28.
kernel: [ 2236.772271] smpboot: Booting Node 0 Processor 29 APIC 0x1d
kernel: [ 2236.773532] mmiotrace: enabled CPU29.
kernel: [ 2236.816285] smpboot: Booting Node 0 Processor 30 APIC 0x1e
kernel: [ 2236.817403] mmiotrace: enabled CPU30.
kernel: [ 2236.856265] smpboot: Booting Node 0 Processor 31 APIC 0x1f
kernel: [ 2236.857716] mmiotrace: enabled CPU31.
kernel: [ 2236.868235] mmiotrace: disabled.
(System hang here)

For the ftrace smoke test it will hang here:
 PASSED (CONFIG_FUNCTION_TRACER=y in /boot/config-5.4.0-1012-azure)
 PASSED (CONFIG_FUNCTION_GRAPH_TRACER=y in /boot/config-5.4.0-1012-azure)
 PASSED (CONFIG_STACK_TRACER=y in /boot/config-5.4.0-1012-azure)
 PASSED (CONFIG_DYNAMIC_FTRACE=y in /boot/config-5.4.0-1012-azure)
 PASSED all expected /sys/kernel/debug/tracing files exist
 PASSED (function_graph in /sys/kernel/debug/tracing/available_tracers)
 PASSED (function in /sys/kernel/debug/tracing/available_tracers)
 PASSED (nop in /sys/kernel/debug/tracing/available_tracers)
 PASSED (tracer function can be enabled)
(System hang here)

According to the test result on Focal generic kernel, the next test should be:
(tracer function_graph can be enabled)

This is not a regression as it can be found on 5.4.0-1012.12 as well.

Po-Hsu Lin (cypressyew)
tags: added: 5.4 azure focal kqa-blocker sru-20200518 ubuntu-ftrace-smoke-test ubuntu-kernel-selftests
Po-Hsu Lin (cypressyew)
description: updated
summary: - Focal Azure crash with ftracetest in ubuntu_kernel_selftests and
- ubuntu_ftrace_smoke_test
+ Focal Azure instances crash with ftracetest in ubuntu_kernel_selftests
+ and ubuntu_ftrace_smoke_test
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: Focal Azure instances crash with ftracetest in ubuntu_kernel_selftests and ubuntu_ftrace_smoke_test

Still affecting Azure 5.4.0-1017.17

tags: added: sru-20200608
Po-Hsu Lin (cypressyew)
tags: added: bionic sru-20200629
Po-Hsu Lin (cypressyew)
tags: added: sru-20200810
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This is blocking ubuntu_kernel_selftests test result to populate on B-5.4 azure instances.

tags: added: sru-20200831
Changed in ubuntu-kernel-tests:
importance: Undecided → High
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Found on G-5.8 Azure

summary: - Focal Azure instances crash with ftracetest in ubuntu_kernel_selftests
- and ubuntu_ftrace_smoke_test
+ Focal / Groovy Azure instances crash with ftracetest in
+ ubuntu_kernel_selftests and ubuntu_ftrace_smoke_test
tags: added: 5.8 groovy
Po-Hsu Lin (cypressyew)
tags: added: sru-20201109
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: Focal / Groovy Azure instances crash with ftracetest in ubuntu_kernel_selftests and ubuntu_ftrace_smoke_test

Found on G-AWS 5.8.0-1017.18 ARM64 (passed with amd64)

tags: added: sru-20201130
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Found on F-azure 5.4.0-1035.36

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: Focal Azure instances crash with ftracetest in ubuntu_kernel_selftests and ubuntu_ftrace_smoke_test

This issue does not exist on G-azure 5.8.0-1016.17. Closing corresponding groovy task.

summary: - Focal / Groovy Azure instances crash with ftracetest in
- ubuntu_kernel_selftests and ubuntu_ftrace_smoke_test
+ Focal Azure instances crash with ftracetest in ubuntu_kernel_selftests
+ and ubuntu_ftrace_smoke_test
Changed in linux-azure (Ubuntu Groovy):
status: New → Fix Released
no longer affects: linux-azure-5.4 (Ubuntu)
no longer affects: linux-azure-5.4 (Ubuntu Focal)
no longer affects: linux-azure-5.4 (Ubuntu Groovy)
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: Focal / Groovy aws and azure instances crash with ftracetest in ubuntu_kernel_selftests and ubuntu_ftrace_smoke_test

This issue can be found on Groovy AWS 5.8.0-1021.23, on ARM64 instance only.

summary: - Focal Azure instances crash with ftracetest in ubuntu_kernel_selftests
- and ubuntu_ftrace_smoke_test
+ Focal / Groovy aws and azure instances crash with ftracetest in
+ ubuntu_kernel_selftests and ubuntu_ftrace_smoke_test
Po-Hsu Lin (cypressyew)
tags: added: sru-20210104
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws (Ubuntu):
status: New → Confirmed
Changed in linux-azure (Ubuntu Focal):
status: New → Confirmed
Changed in linux-azure (Ubuntu):
status: New → Confirmed
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

ubuntu_ftrace_smoke_test on all v5.4 kernels on all Azure instances hangs:

----
114 11:22:52 INFO | Writing results to /home/azure/autotest/client/results/default
115 11:22:52 DEBUG| Initializing the state engine
116 11:22:52 DEBUG| Persistent state client.steps now set to []
117 11:22:52 DEBUG| Persistent option harness now set to None
118 11:22:52 DEBUG| Persistent option harness_args now set to None
119 11:22:52 DEBUG| Selected harness: standalone
120 11:22:52 INFO | START ---- ---- timestamp=1631791372 localtime=Sep 16 11:22:52
121 11:22:52 DEBUG| Persistent state client._record_indent now set to 1
122 11:22:52 DEBUG| Test has timeout: 900 sec.
123 11:22:52 INFO | START ubuntu_ftrace_smoke_test.ftrace-smoke-test ubuntu_ftrace_smoke_test.ftrace-smoke-test timestamp=1631791372 timeout=900 localtime=Sep 16 11:22:52
124 11:22:52 DEBUG| Persistent state client._record_indent now set to 2
125 11:22:52 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_ftrace_smoke_test.ftrace-smoke-test', 'ubuntu_ftrace_smoke_test.ftrace-smoke-test')
126 11:22:52 DEBUG| Waiting for pid 2953 for 900 seconds
127 11:22:52 DEBUG| Running '/home/azure/autotest/client/tests/ubuntu_ftrace_smoke_test/ubuntu_ftrace_smoke_test.sh'
128 11:22:52 DEBUG| [stdout] PASSED (CONFIG_FUNCTION_TRACER=y in /boot/config-5.4.0-1059-azure)
129 11:22:52 DEBUG| [stdout] PASSED (CONFIG_FUNCTION_GRAPH_TRACER=y in /boot/config-5.4.0-1059-azure)
130 11:22:52 DEBUG| [stdout] PASSED (CONFIG_STACK_TRACER=y in /boot/config-5.4.0-1059-azure)
131 11:22:52 DEBUG| [stdout] PASSED (CONFIG_DYNAMIC_FTRACE=y in /boot/config-5.4.0-1059-azure)
132 11:22:54 DEBUG| [stdout] PASSED all expected /sys/kernel/debug/tracing files exist
133 11:22:54 DEBUG| [stdout] PASSED (function_graph in /sys/kernel/debug/tracing/available_tracers)
134 11:22:54 DEBUG| [stdout] PASSED (function in /sys/kernel/debug/tracing/available_tracers)
135 11:22:54 DEBUG| [stdout] PASSED (nop in /sys/kernel/debug/tracing/available_tracers)
136 11:22:57 DEBUG| [stdout] PASSED (tracer function can be enabled)
----

Found on Azure clouds:
* focal/linux/5.4.0-85.95
* focal/linux-azure/5.4.0-1059.62
* bionic/linux-azure-5.4/5.4.0-1059.62~18.04.1
* SRU cycles: 2021.08.16, 2021.07.19, 2021.05.31 (not checked earlier)

Not found:
* Other clouds.
* Azure: hirsute/linux/5.11.0-35.37
* Azure: focal/linux-azure-5.11/5.11.0-1016.17~20.04.1
* Azure: focal/linux-azure-5.8/5.8.0-1042.45~20.04.1
* Azure: bionic/linux-azure-4.15/4.15.0-1124.137

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Seems that AWS got fixed (or issue was not related) but all Azure cases with v5.4 are affected.

tags: added: sru-20210816 sru-20210906
tags: added: hinted
Po-Hsu Lin (cypressyew)
tags: added: sru-20220711
Po-Hsu Lin (cypressyew)
tags: added: sru-20230320
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Download full text (248.1 KiB)

The test script that cause the crash is:
    test.d--00basic--basic2.tc

 Running './ftracetest -vvv test.d/00basic/basic2.tc'
 [stdout] === Ftrace unit tests ===
 [stderr] + initialize_ftrace
 [stderr] + disable_tracing
 [stderr] + echo 0
 [stderr] + reset_tracer
 [stderr] + echo nop
 [stderr] + reset_trigger
 [stderr] + [ -d events/synthetic ]
 [stderr] + reset_trigger_file events/alarmtimer/alarmtimer_cancel/trigger events/alarmtimer/alarmtimer_fired/trigger events/alarmtimer/alarmtimer_start/trigger events/alarmtimer/alarmtimer_suspend/trigger events/block/block_bio_backmerge/trigger events/block/block_bio_bounce/trigger events/block/block_bio_complete/trigger events/block/block_bio_frontmerge/trigger events/block/block_bio_queue/trigger events/block/block_bio_remap/trigger events/block/block_dirty_buffer/trigger events/block/block_getrq/trigger events/block/block_plug/trigger events/block/block_rq_complete/trigger events/block/block_rq_insert/trigger events/block/block_rq_issue/trigger events/block/block_rq_remap/trigger events/block/block_rq_requeue/trigger events/block/block_sleeprq/trigger events/block/block_split/trigger events/block/block_touch_buffer/trigger events/block/block_unplug/trigger events/bpf_test_run/bpf_test_finish/trigger events/bridge/br_fdb_add/trigger events/bridge/br_fdb_external_learn_add/trigger events/bridge/br_fdb_update/trigger events/bridge/fdb_delete/trigger events/btrfs/__extent_writepage/trigger events/btrfs/add_delayed_data_ref/trigger events/btrfs/add_delayed_ref_head/trigger events/btrfs/add_delayed_tree_ref/trigger events/btrfs/alloc_extent_state/trigger events/btrfs/btrfs_add_block_group/trigger events/btrfs/btrfs_add_unused_block_group/trigger events/btrfs/btrfs_all_work_done/trigger events/btrfs/btrfs_chunk_alloc/trigger events/btrfs/btrfs_chunk_free/trigger events/btrfs/btrfs_clear_extent_bit/trigger events/btrfs/btrfs_convert_extent_bit/trigger events/btrfs/btrfs_cow_block/trigger events/btrfs/btrfs_failed_cluster_setup/trigger events/btrfs/btrfs_find_cluster/trigger events/btrfs/btrfs_flush_space/trigger events/btrfs/btrfs_get_extent/trigger events/btrfs/btrfs_get_extent_show_fi_inline/trigger events/btrfs/btrfs_get_extent_show_fi_regular/trigger events/btrfs/btrfs_handle_em_exist/trigger events/btrfs/btrfs_inode_evict/trigger events/btrfs/btrfs_inode_mod_outstanding_extents/trigger events/btrfs/btrfs_inode_new/trigger events/btrfs/btrfs_inode_request/trigger events/btrfs/btrfs_ordered_extent_add/trigger events/btrfs/btrfs_ordered_extent_put/trigger events/btrfs/btrfs_ordered_extent_remove/trigger events/btrfs/btrfs_ordered_extent_start/trigger events/btrfs/btrfs_ordered_sched/trigger events/btrfs/btrfs_prelim_ref_insert/trigger events/btrfs/btrfs_prelim_ref_merge/trigger events/btrfs/btrfs_qgroup_account_extent/trigger events/btrfs/btrfs_qgroup_account_extents/trigger events/btrfs/btrfs_qgroup_release_data/trigger events/btrfs/btrfs_qgroup_reserve_data/trigger events/btrfs/btrfs_qgroup_trace_extent/trigger events/btrfs/btrfs_remove_block_group/trigger events/btrfs/btrfs_reserve_extent/trigger events/btrfs/btrfs_reserve_extent_cluster/trigger events/btrfs/btrfs_reserved_extent_alloc/trigger event...

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: Focal Azure instances crash with ftrace/test.d/00basic/basic2.tc in ubuntu_kselftests_ftrace and ubuntu_ftrace_smoke_test

This is not affecting AWS anymore.

summary: - Focal / Groovy aws and azure instances crash with ftracetest in
- ubuntu_kernel_selftests and ubuntu_ftrace_smoke_test
+ Focal Azure instances crash with ftrace/test.d/00basic/basic2.tc in
+ ubuntu_kselftests_ftrace and ubuntu_ftrace_smoke_test
Changed in linux-aws (Ubuntu):
status: Confirmed → Fix Released
tags: added: sru-20230710
summary: - Focal Azure instances crash with ftrace/test.d/00basic/basic2.tc in
- ubuntu_kselftests_ftrace and ubuntu_ftrace_smoke_test
+ Focal Azure AMD64 instances crash with ftrace/test.d/00basic/basic2.tc
+ in ubuntu_kselftests_ftrace and ubuntu_ftrace_smoke_test
Revision history for this message
Magali Lemes do Sacramento (magalilemes) wrote :

Seems to still be found on f:linux-aws-fips, cycle 2023-07-10

Revision history for this message
Po-Hsu Lin (cypressyew) wrote (last edit ):

For ftrace related issues found with c3.xlarge on AWS cloud, let track it in bug 2034057.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.