Groovy kernel (5.8.0-1004-aws) creates broken /dev/console on i3.metal instances

Bug #1896604 reported by Paride Legovini
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
Undecided
Unassigned
cloud-init
Fix Released
Undecided
Paride Legovini
linux-aws (Ubuntu)
Fix Released
Undecided
Andrea Righi

Bug Description

[Impact]

Starting with kernel 5.8 the default nr_uarts has been changed from 4 to 2 for amd64, but this seems to affect i3.metal instances in AWS, because ttyS0 is now remapped to ttyS4 and this is breaking tools like cloud-init (and probably something else).

[Test case]

# echo > /dev/console
bash: echo: write error: Input/output error

[Fix]

Setting nr_uarts=4 by default (via CONFIG_SERIAL_8250_RUNTIME_UARTS) restores the previous behavior and writing to /dev/console works without returning any error.

[Regression potential]

Minimal. Restores the old behavior used in 5.4 (that shouldn't have changed in the first place).

[Original bug report]

Hi,

When running Groovy daily images on i3.metal instances a broken /dev/console is created. The char device appears to be writable but writing to it causes an Input/output error. This is breaking cloud-init, as it tries to log to /dev/console, and is likely to break other programs.

On Focal:

root@ip-172-31-24-163:~# ls -l /dev/console
crw------- 1 root root 5, 1 Sep 21 16:07 /dev/console
root@ip-172-31-24-163:~# echo x > /dev/console
root@ip-172-31-24-163:~#

On Groovy:

root@ip-172-31-20-184:~# ls -l /dev/console
crw--w---- 1 root tty 5, 1 Sep 21 16:03 /dev/console
root@ip-172-31-20-184:~# echo x > /dev/console
bash: echo: write error: Input/output error

The Groovy kernel log has a

[ 3.561696] fbcon: Taking over console

line in it, which is not present in the Focal kernel log (5.4.0-1024-aws). Perhaps fbcon should be prevented from taking over console?

Paride Legovini (paride)
summary: - Groovy kernel (5.8.0-1004-aws) created broken /dev/console on i3.metal
+ Groovy kernel (5.8.0-1004-aws) creates broken /dev/console on i3.metal
instances
Andrea Righi (arighi)
Changed in linux-aws (Ubuntu):
assignee: nobody → Andrea Righi (arighi)
Revision history for this message
Andrea Righi (arighi) wrote :

Adding some details about this issue. It looks like the real problem is the serial driver, in fact with a 5.4 kernel we can see the following in dmesg:

[ 4.991325] 0000:16:00.0: ttyS0 at MMIO 0xc5a00000 (irq = 85, base_baud = 115200) is a 16550A

With the 5.8 kernel we don't see any message at all about ttyS0, meaning that the serial isn't properly recognized.

A temporary workaround could be to remove console=ttyS0 from the kernel boot parameters, this would probably make cloud-init happy, but this is not obviously the right solution.

I'll investigate more to find the exact commit that introduced this regression.

Thanks Paride for helping me out to reproduce and test this problem!

Revision history for this message
Paride Legovini (paride) wrote :

Thanks Andrea for looking into this.

Added a cloud-init task for tracking.

no longer affects: cloud-init (Ubuntu)
Changed in cloud-init:
status: New → Triaged
assignee: nobody → Paride Legovini (paride)
Revision history for this message
Andrea Righi (arighi) wrote :

The reason of this problem is that in 5.8 the default amount of nr_uarts has been changed from 4 to 32. This is causing ttyS0 to be remapped to ttyS4, breaking the user-space.

The solution is to set back the number of UARTS to 4. I tried to boot the kernel adding 8250.nr_uarts=4 to the kernel boot parameters in GRUB and /dev/console is now working correctly.

I'll send a fix for this to restore the previous behavior by default in the kernel and avoid breaking the user-space.

Andrea Righi (arighi)
description: updated
Paride Legovini (paride)
Changed in linux-aws (Ubuntu):
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (19.8 KiB)

This bug was fixed in the package linux-aws - 5.8.0-1007.7

---------------
linux-aws (5.8.0-1007.7) groovy; urgency=medium

  * groovy/linux-aws: 5.8.0-1007.7 -proposed tracker (LP: #1898143)

  * Groovy kernel (5.8.0-1004-aws) creates broken /dev/console on i3.metal
    instances (LP: #1896604)
    - [Config] [aws] set default nr_uarts back to 4 on amd64

  * Miscellaneous Ubuntu changes
    - [Config] toolchain update

  [ Ubuntu: 5.8.0-21.22 ]

  * groovy/linux: 5.8.0-21.22 -proposed tracker (LP: #1898150)
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * Fix broken e1000e device after S3 (LP: #1897755)
    - SAUCE: e1000e: Increase polling timeout on MDIC ready bit
  * EFA: add support for 0xefa1 devices (LP: #1896791)
    - RDMA/efa: Expose maximum TX doorbell batch
    - RDMA/efa: Expose minimum SQ size
    - RDMA/efa: User/kernel compatibility handshake mechanism
    - RDMA/efa: Add EFA 0xefa1 PCI ID
  * Groovy update: v5.8.13 upstream stable release (LP: #1898076)
    - device_cgroup: Fix RCU list debugging warning
    - ASoC: pcm3168a: ignore 0 Hz settings
    - ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811
    - ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions
    - ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1
    - clk: versatile: Add of_node_put() before return statement
    - RISC-V: Take text_mutex in ftrace_init_nop()
    - i2c: aspeed: Mask IRQ status to relevant bits
    - s390/init: add missing __init annotations
    - lockdep: fix order in trace_hardirqs_off_caller()
    - EDAC/ghes: Check whether the driver is on the safe list correctly
    - drm/amdkfd: fix a memory leak issue
    - drm/amd/display: Don't use DRM_ERROR() for DTM add topology
    - drm/amd/display: update nv1x stutter latencies
    - drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is
    - drm/amd/display: Don't log hdcp module warnings in dmesg
    - objtool: Fix noreturn detection for ignored functions
    - i2c: mediatek: Send i2c master code at more than 1MHz
    - riscv: Fix Kendryte K210 device tree
    - ieee802154: fix one possible memleak in ca8210_dev_com_init
    - ieee802154/adf7242: check status of adf7242_read_reg
    - clocksource/drivers/h8300_timer8: Fix wrong return value in
      h8300_8timer_init()
    - batman-adv: bla: fix type misuse for backbone_gw hash indexing
    - libbpf: Fix build failure from uninitialized variable warning
    - atm: eni: fix the missed pci_disable_device() for eni_init_one()
    - batman-adv: mcast/TT: fix wrongly dropped or rerouted packets
    - netfilter: ctnetlink: add a range check for l3/l4 protonum
    - netfilter: ctnetlink: fix mark based dump filtering regression
    - netfilter: conntrack: nf_conncount_init is failing with IPv6 disabled
    - netfilter: nft_meta: use socket user_ns to retrieve skuid and skgid
    - mac802154: tx: fix use-after-free
    - bpf: Fix clobbering of r2 in bpf_gen_ld_abs
    - tools/libbpf: Avoid counting local symbols in ABI check
    - drm/vc4/vc4_hdmi: fill ASoC card owner
    - net: qed: Disable aRFS for NPAR and 100G
    - net: qede: Disable aRFS for NPA...

Changed in linux-aws (Ubuntu):
status: Fix Committed → Fix Released
Paride Legovini (paride)
Changed in cloud-init:
status: Triaged → Fix Released
Changed in cloud-images:
status: New → Fix Committed
Joshua Powers (powersj)
Changed in cloud-images:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.