s390: dbginfo.sh triggers kernel panic, reading from /sys/kernel/mm/page_idle/bitmap
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
| Ubuntu on IBM z Systems |
High
|
Skipper Bug Screeners | |||
linux (Ubuntu) | Status tracked in Hirsute | |||||
| Bionic |
Undecided
|
Unassigned | |||
| Focal |
Medium
|
Unassigned | |||
| Groovy |
Undecided
|
Unassigned | |||
| Hirsute |
Undecided
|
Unassigned |
Bug Description
SRU Justification:
==================
[Impact]
* While executing dbginfo.sh (a script to collect runtime, configuration, and trace information on s390x) the systems hangs.
* This is because 'idle page tracking' users can pass random pfn, that might be mapped to
an offline page - and attempts to access offline pages lead to the hang.
* It needs to be avoided that such pages are accessed.
* The upstream commit modifies 'page_idle_
'pfn_valid()' and 'pfn_to_page()' combination, so that the pfn mapped to an offline page is skipped.
[Fix]
* 92fb1db26eef "mm/page_idle.c: skip offline pages"
[Test Case]
* IBM Z or LinuxONE hardware with Ubuntu Server 18.04 (GA kernel, 4.15) installed.
* Execute a test application that tries to access offline pages.
* Or execute dbginfo.sh with having some offline (idle) pages in the system.
[Regression Potential]
* There is a certain regression risk, especially for bionic, since the structure in the kernel 4.15 is a bit different compared to kernel 5.4 (and newer).
* However, for newer kernels the modification is pretty save, since it's upstream accepted since kernel 5.8 and with that already inluded in hirsute and groovy.
* And the patch is fine (and cherry picks cleanly) for focal as well.
* For bionic there is a slightly conflicting context, since the struct 'zone' was replaced by 'pg_data_t *pgdat' (by another commit: 92fb1db26eef), but that change (or any change to the struct zone) would not be necessary to fix the uninitialized struct page access.
* Hence the upstream commit/patch needs to be adjusted/backported to bionic 4.15, largely by replacing line 'pg_data_t *pgdat;' with 'struct zone *zone;' (or actually leaving this line).
* But this needs to be carefully considered, since the handling of idle pages could be harmful, in the end it could make things even worse, means break even more.
[Other]
* The patch got upstream accepted with kernel v5.8, hence it's already is in groovy and hirsute.
* The upstream commit cherry picks cleanly to focal, but for bionic a backport is needed.
* Hence this kernel SRU request is for focal (cherry-pick) and bionic (backport).
__________
System hangs on dbginfo.sh script execution.
Solution:
Commit 92fb1db26eef ("mm/page_idle.c: skip offline pages")
Included upstream since kernel v5.8, so it is already included in Ubuntu 20.10, but not in 20.04 and earlier.
Commit 92fb1db26eef ("mm/page_idle.c: skip offline pages") applies cleanly on ubuntu-focal, but not on ubuntu-bionic.
Adjustment / backport for bionic should be trivial, but it is not IBM code and therefore the backport will be requested here by Canonical.
CVE References
tags: | added: architecture-s39064 bugnameltc-189321 severity-critical targetmilestone-inin20041 |
Changed in ubuntu: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
affects: | ubuntu → linux (Ubuntu) |
Changed in ubuntu-z-systems: | |
importance: | Undecided → High |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
tags: |
added: severity-high removed: severity-critical |
Frank Heimes (fheimes) wrote : Re: [UBUNUT 20.04] s390: dbginfo.sh triggers kernel panic, reading from /sys/kernel/mm/page_idle/bitmap | #2 |
The commit 92fb1db26eef "mm/page_idle.c: skip offline pages" is upstream with 5.8 (rc-1) (also tagged with next-20200820).
Hence it's in groovy (since Ubuntu-5.8.0-13.14) and in hirsute (Ubuntu-
It's not in focal (also not via upstream stable updates).
Hence updating only G and H to Fix Released.
Changed in linux (Ubuntu Hirsute): | |
assignee: | Skipper Bug Screeners (skipper-screen-team) → nobody |
status: | New → Fix Released |
Changed in linux (Ubuntu Groovy): | |
status: | New → Fix Released |
Changed in ubuntu-z-systems: | |
status: | New → Triaged |
summary: |
- [UBUNUT 20.04] s390: dbginfo.sh triggers kernel panic, reading from + s390: dbginfo.sh triggers kernel panic, reading from /sys/kernel/mm/page_idle/bitmap |
------- Comment From <email address hidden> 2020-11-20 09:50 EDT-------
(In reply to comment #14)
> suggested bionic backport for /mm/page_idle.c
Hmm, not sure if I read this diff correcty, but it seems to remove struct zone and spinlock, which would not be right, and introduce a new bug.
The reason why the upstream commit does not apply directly to bionic, is because of conflicting context. The "struct zone" was replaced with "pg_data_t *pgdat" by (another) commit 92fb1db26eef, but that change (or any change to the struct zone) would not be necessary to fix the uninitialized struct page access.
So, I would suggest simply replacing the line
pg_data_t *pgdat;
from the upstream commit context, with this line to fix the context for bionic
struct zone *zone;
Removing struct zone and the spinlock would certainly introduce a new bug, because the spinlock is necessary.
Frank Heimes (fheimes) wrote : | #5 |
Well, there should be no patch attached, yet.
There was just an attempt to cherry-pick but it wasn't added to this LP.
(And at least on LP there is no patch attached.)
If you accidentally received one via the BZ bridge, please ignore and remove.
Anyway, appreciate your comment ...
Default Comment by Bridge
Frank Heimes (fheimes) wrote : | #7 |
quick update on focal:
Cherry pick worked cleanly, build succeeded w/o issues and a test install didn't showed any regressions so far (after some hours and oidn another build on top).
The kernel test build is available here for further evaluation: https:/
bugproxy (bugproxy) wrote : | #8 |
Default Comment by Bridge
Frank Heimes (fheimes) wrote : | #9 |
Hmm, that's interesting, the BZ bridge seems to have the patch re-attached :-/ (see LP comment #6)
I removed it (however, not sure if the bridge will attach it again ...)
Frank Heimes (fheimes) wrote : | #10 |
Looks like it's fixed now - the BZ bridge stopped re-attaching the attachment ...
Frank Heimes (fheimes) wrote : | #11 |
'draft' backport for bionic - tbd
description: | updated |
Frank Heimes (fheimes) wrote : | #12 |
Kernel SRU request submitted for bionic and focal:
https:/
changing status to 'In Progress' for bionic and focal.
Changed in linux (Ubuntu Focal): | |
status: | New → In Progress |
Changed in linux (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in ubuntu-z-systems: | |
status: | Triaged → In Progress |
Frank Heimes (fheimes) wrote : | #13 |
applied to B and F trees
Changed in linux (Ubuntu Focal): | |
importance: | Undecided → Medium |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | In Progress → Fix Committed |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-focal |
------- Comment From <email address hidden> 2020-12-01 09:14 EDT-------
Verified
Frank Heimes (fheimes) wrote : | #16 |
Thx Gerald, adjusting tags accordingly ...
tags: |
added: verification-done-focal removed: verification-needed-focal |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-bionic |
bugproxy (bugproxy) wrote : | #18 |
------- Comment From <email address hidden> 2020-12-02 08:27 EDT-------
Verified
Frank Heimes (fheimes) wrote : | #19 |
thank you for the verification, adjusting tag accordingly ...
tags: |
added: verification-done-bionic removed: verification-needed-bionic |
Launchpad Janitor (janitor) wrote : | #20 |
This bug was fixed in the package linux - 5.4.0-59.65
---------------
linux (5.4.0-59.65) focal; urgency=medium
* focal/linux: 5.4.0-59.65 -proposed tracker (LP: #1907604)
* focal: selftests/bpf build broken: test_map_
directory (LP: #1906866)
- SAUCE: Revert selftests/ "bpf: Zero-fill re-used per-cpu map element"
* Packaging resync (LP: #1786013)
- update dkms package versions
* memory is leaked when tasks are moved to net_prio (LP: #1886859)
- netprio_cgroup: Fix unlimited memory leak of v2 cgroups
* Focal update: v5.4.78 upstream stable release (LP: #1905618)
- drm/i915/gem: Flush coherency domains on first set-domain-ioctl
- time: Prevent undefined behaviour in timespec64_to_ns()
- nbd: don't update block size after device is started
- KVM: arm64: Force PTE mapping on fault resulting in a device mapping
- PCI: qcom: Make sure PCIe is reset before init for rev 2.1.0
- usb: dwc3: gadget: Continue to process pending requests
- usb: dwc3: gadget: Reclaim extra TRBs after request completion
- btrfs: tracepoints: output proper root owner for trace_find_
- btrfs: sysfs: init devices outside of the chunk_mutex
- btrfs: reschedule when cloning lots of extents
- ASoC: Intel: kbl_rt5663_
- genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_
- hv_balloon: disable warning when floor reached
- net: xfrm: fix a race condition during allocing spi
- ASoC: codecs: wcd9335: Set digital gain range correctly
- xfs: set xefi_discard when creating a deferred agfl free log intent item
- netfilter: use actual socket sk rather than skb sk when routing harder
- netfilter: nf_tables: missing validation from the abort path
- netfilter: ipset: Update byte and packet counters regardless of whether they
match
- powerpc/eeh_cache: Fix a possible debugfs deadlock
- perf trace: Fix segfault when trying to trace events by cgroup
- perf tools: Add missing swap for ino_generation
- ALSA: hda: prevent undefined shift in snd_hdac_
- iommu/vt-d: Fix a bug for PDP check in prq_event_thread
- afs: Fix warning due to unadvanced marshalling pointer
- can: rx-offload: don't call kfree_skb() from IRQ context
- can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ
context
- can: dev: __can_get_
frames
- can: can_create_
- can: j1939: swap addr and pgn in the send example
- can: j1939: j1939_sk_bind(): return failure if netdev is down
- can: ti_hecc: ti_hecc_probe(): add missed clk_disable_
path
- can: xilinx_can: handle failure cases of pm_runtime_get_sync
- can: peak_usb: add range checking in decode operations
- can: peak_usb: peak_usb_
- can: peak_canfd: pucan_handle_
on
- can: flexcan: remove FLEXCAN_
- c...
Changed in linux (Ubuntu Focal): | |
status: | Fix Committed → Fix Released |
Launchpad Janitor (janitor) wrote : | #21 |
This bug was fixed in the package linux - 4.15.0-129.132
---------------
linux (4.15.0-129.132) bionic; urgency=medium
* bionic/linux: 4.15.0-129.132 -proposed tracker (LP: #1907635)
* Packaging resync (LP: #1786013)
- update dkms package versions
* Ubuntu 18.04- call trace in kernel buffer when unloading ib_ipoib module
(LP: #1904848)
- SAUCE: net/mlx5e: IPoIB, initialize update_stat_work for ipoib devices
* memory is leaked when tasks are moved to net_prio (LP: #1886859)
- netprio_cgroup: Fix unlimited memory leak of v2 cgroups
* s390: dbginfo.sh triggers kernel panic, reading from
/sys/
- mm/page_idle.c: skip offline pages
* Bionic update: upstream stable patchset 2020-11-23 (LP: #1905333)
- drm/i915: Break up error capture compression loops with cond_resched()
- tipc: fix use-after-free in tipc_bcast_get_mode
- gianfar: Replace skb_realloc_
- gianfar: Account for Tx PTP timestamp in the skb headroom
- net: usb: qmi_wwan: add Telit LE910Cx 0x1230 composition
- sctp: Fix COMM_LOST/
- sfp: Fix error handing in sfp_probe()
- Blktrace: bail out early if block debugfs is not configured
- i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
- Fonts: Replace discarded const qualifier
- ALSA: usb-audio: Add implicit feedback quirk for Qu-16
- lib/crc32test: remove extra local_irq_
- kthread_worker: prevent queuing delayed work from timer_fn when it is being
canceled
- mm: always have io_remap_
- gfs2: Wake up when sd_glock_disposal becomes zero
- ftrace: Fix recursion check for NMI test
- ftrace: Handle tracing when switching between context
- tracing: Fix out of bounds write in get_trace_buf
- futex: Handle transient "ownerless" rtmutex state correctly
- ARM: dts: sun4i-a10: fix cpu_alert temperature
- x86/kexec: Use up-to-dated screen_info copy to fill boot params
- of: Fix reserved-memory overlap detection
- blk-cgroup: Fix memleak on error path
- blk-cgroup: Pre-allocate tree node on blkg_conf_prep
- scsi: core: Don't start concurrent async scan on same host
- vsock: use ns_capable_
- drm/vc4: drv: Add error handding for bind
- ACPI: NFIT: Fix comparison to '-ENXIO'
- vt: Disable KD_FONT_OP_COPY
- fork: fix copy_process(
- serial: 8250_mtk: Fix uart_get_baud_rate warning
- serial: txx9: add missing platform_
serial_
- USB: serial: cyberjack: fix write-URB completion race
- USB: serial: option: add Quectel EC200T module support
- USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231
- USB: serial: option: add Telit FN980 composition 0x1055
- USB: Add NO_LPM quirk for Kingston flash drive
- usb: mtu3: fix panic in mtu3_gadget_stop()
- ARC: stack unwinding: avoid indefinite looping
- Revert "ARC: entry: fix potential EFA c...
Changed in linux (Ubuntu Bionic): | |
status: | Fix Committed → Fix Released |
Changed in ubuntu-z-systems: | |
status: | Fix Committed → Fix Released |
------- Comment From <email address hidden> 2020-11-19 10:17 EDT-------
Reduced importance from "ship issue" to "high", not a real ship issue, but is mandatory to be fixed within the service stream