lots of printk to serial console can hang system for long time
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Dan Streetman | ||
Xenial |
Fix Released
|
Medium
|
Dan Streetman |
Bug Description
This is a clone from bug 1505564, to track the separate issue of the serial port driver failing to schedule itself off its cpu.
The original bug's problem was caused by the kernel spamming a huge number of error messages in a certain situation. Normally, that would not be a problem, but in this case the system is virtualized, and logs over its serial port. When the massive number of kernel messages are sent to the serial port driver, it can't keep up, so sending all the log messages can take a very long time - minutes or longer - and the serial port driver fails to schedule itself off the cpu it's using during that time. That results in other cpus hanging, waiting for the serial port driver's cpu to become avaiable.
I'll update the bug with more details as I debug.
Changed in linux (Ubuntu): | |
assignee: | nobody → Dan Streetman (ddstreet) |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
summary: |
- serial port driver under heavy load can hang system + lots of printk to serial console can hang system for long time |
I don't believe we've ever seen this in VMs. The main place excessive kernel output has been problematic is on scalingstack compute nodes, which are mostly HP ProLiant DL360p Gen8s (AMD64) and HP ProLiant m400s (aka. mcdivitt, ARMv8). Both have a virtual serial port exposed by the BMC or chassis, and kernel messages go to that. But the machines run VMs -- they're not themselves virtualised.
In the DL360p case we get CPU hangs, while in the m400 case we get excessive memory consumption.