Stack trace booting 20.04 LTS server on system with dual Xeon Gold 6240 CPUs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I noticed this in syslog while investigating an unrelated issue today. I have Focal installed on a Fujitsu RX2530 M5 server with two Xeon Gold 6240 18c/36t CPUs installed. Every reboot results in the following MSR stack trace:
Dec 3 17:34:31 nabbit kernel: [ 0.002463] smpboot: CPU 18 Converting physical 0 to logical die 1
Dec 3 17:34:31 nabbit kernel: [ 0.002463] unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) at rIP: 0xffffffff81c78b04 (native_
Dec 3 17:34:31 nabbit kernel: [ 0.002463] Call Trace:
Dec 3 17:34:31 nabbit kernel: [ 0.002463] ? intel_pmu_
Dec 3 17:34:31 nabbit kernel: [ 0.002463] ? x86_pmu_
Dec 3 17:34:31 nabbit kernel: [ 0.002463] x86_pmu_
Dec 3 17:34:31 nabbit kernel: [ 0.002463] cpuhp_invoke_
Dec 3 17:34:31 nabbit kernel: [ 0.002463] notify_
Dec 3 17:34:31 nabbit kernel: [ 0.002463] start_secondary
Dec 3 17:34:31 nabbit kernel: [ 0.002463] secondary_
Dec 3 17:34:31 nabbit kernel: [ 0.498575] #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35
Dec 3 17:34:31 nabbit kernel: [ 0.618576] .... node #0, CPUs: #36
Dec 3 17:34:31 nabbit kernel: [ 0.623308] MDS CPU bug present and SMT on, data leak possible. See https:/
Dec 3 17:34:31 nabbit kernel: [ 0.623308] TAA CPU bug present and SMT on, data leak possible. See https:/
Dec 3 17:34:31 nabbit kernel: [ 0.623308] #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53
Dec 3 17:34:31 nabbit kernel: [ 0.672450] .... node #1, CPUs: #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71Dec 3 17:34:31 nabbit kernel: [ 0.729432] smp: Brought up 2 nodes, 72 CPUs
Dec 3 17:34:31 nabbit kernel: [ 0.729432] smpboot: Max logical packages: 2
Dec 3 17:34:31 nabbit kernel: [ 0.729432] smpboot: Total of 72 processors activated (374479.29 BogoMIPS)
it doesn't seem to be catastrophic, but is troubling to find this in the logs.
On a different FJ server (RX2540 M5) with 2x Xeon Gold 6242 cpus (16c/32T)
This trace is not present, so this could indicate something with this particular machine, or this particular CPU model.
Here is the smp boot from the non-failing machine:
Dec 2 16:02:56 polari kernel: [ 1.522346] smpboot: CPU0: Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (family: 0x6, model: 0x55, stepping: 0x5)
Dec 2 16:02:56 polari kernel: [ 1.522575] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
Dec 2 16:02:56 polari kernel: [ 1.522584] ... version: 4
Dec 2 16:02:56 polari kernel: [ 1.522585] ... bit width: 48
Dec 2 16:02:56 polari kernel: [ 1.522587] ... generic registers: 4
Dec 2 16:02:56 polari kernel: [ 1.522588] ... value mask: 0000ffffffffffff
Dec 2 16:02:56 polari kernel: [ 1.522589] ... max period: 00007fffffffffff
Dec 2 16:02:56 polari kernel: [ 1.522591] ... fixed-purpose events: 3
Dec 2 16:02:56 polari kernel: [ 1.522592] ... event mask: 000000070000000f
Dec 2 16:02:56 polari kernel: [ 1.522665] rcu: Hierarchical SRCU implementation.
Dec 2 16:02:56 polari kernel: [ 1.524965] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
Dec 2 16:02:56 polari kernel: [ 1.525875] smp: Bringing up secondary CPUs ...
Dec 2 16:02:56 polari kernel: [ 1.525990] x86: Booting SMP configuration:
Dec 2 16:02:56 polari kernel: [ 1.525992] .... node #0, CPUs: #1 #2 #3
Dec 2 16:02:56 polari kernel: [ 1.533485] .... node #1, CPUs: #4 #5 #6 #7
Dec 2 16:02:56 polari kernel: [ 1.543960] .... node #0, CPUs: #8 #9 #10 #11
Dec 2 16:02:56 polari kernel: [ 1.553544] .... node #1, CPUs: #12 #13 #14 #15
Dec 2 16:02:56 polari kernel: [ 1.564701] .... node #2, CPUs: #16
Dec 2 16:02:56 polari kernel: [ 0.002176] smpboot: CPU 16 Converting physical 0 to logical die 1
Dec 2 16:02:56 polari kernel: [ 1.651254] #17 #18 #19
Dec 2 16:02:56 polari kernel: [ 1.659278] .... node #3, CPUs: #20 #21 #22 #23
Dec 2 16:02:56 polari kernel: [ 1.669669] .... node #2, CPUs: #24 #25 #26 #27
Dec 2 16:02:56 polari kernel: [ 1.680637] .... node #3, CPUs: #28 #29 #30 #31
Dec 2 16:02:56 polari kernel: [ 1.691394] .... node #0, CPUs: #32
Dec 2 16:02:56 polari kernel: [ 1.693845] MDS CPU bug present and SMT on, data leak possible. See https:/
Dec 2 16:02:56 polari kernel: [ 1.693845] TAA CPU bug present and SMT on, data leak possible. See https:/
Dec 2 16:02:56 polari kernel: [ 1.693845] #33 #34 #35
Dec 2 16:02:56 polari kernel: [ 1.701504] .... node #1, CPUs: #36 #37 #38 #39
Dec 2 16:02:56 polari kernel: [ 1.712687] .... node #0, CPUs: #40 #41 #42 #43
Dec 2 16:02:56 polari kernel: [ 1.723263] .... node #1, CPUs: #44 #45 #46 #47
Dec 2 16:02:56 polari kernel: [ 1.733658] .... node #2, CPUs: #48 #49 #50 #51
Dec 2 16:02:56 polari kernel: [ 1.744372] .... node #3, CPUs: #52 #53 #54 #55
Dec 2 16:02:56 polari kernel: [ 1.755243] .... node #2, CPUs: #56 #57 #58 #59
Dec 2 16:02:56 polari kernel: [ 1.765640] .... node #3, CPUs: #60 #61 #62 #63
Dec 2 16:02:56 polari kernel: [ 1.776965] smp: Brought up 4 nodes, 64 CPUs
Dec 2 16:02:56 polari kernel: [ 1.776965] smpboot: Max logical packages: 2
Dec 2 16:02:56 polari kernel: [ 1.776965] smpboot: Total of 64 processors activated (358464.56 BogoMIPS)
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-
ProcVersionSign
Uname: Linux 5.4.0-56-generic x86_64
NonfreeKernelMo
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Dec 3 20:02 seq
crw-rw---- 1 root audio 116, 33 Dec 3 20:02 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckR
Date: Thu Dec 3 20:15:56 2020
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 0424:2533 Microchip Technology, Inc. (formerly SMSC)
Bus 001 Device 004: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse
Bus 001 Device 003: ID 046b:ff01 American Megatrends, Inc. Virtual Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: FUJITSU PRIMERGY RX2530 M5
PciMultimedia:
ProcEnviron:
TERM=screen-
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 1.187.4
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/17/2019
dmi.bios.vendor: FUJITSU // American Megatrends Inc.
dmi.bios.version: V5.0.0.14 R1.15.0 for D3383-B1x
dmi.board.name: D3383-B1
dmi.board.vendor: FUJITSU
dmi.board.version: S26361-D3383-B13 WGS04 GS01
dmi.chassis.
dmi.chassis.type: 23
dmi.chassis.vendor: FUJITSU
dmi.chassis.
dmi.modalias: dmi:bvnFUJITSU/
dmi.product.family: SERVER
dmi.product.name: PRIMERGY RX2530 M5
dmi.product.sku: S26361-K1659-Vxxx
dmi.sys.vendor: FUJITSU
This change was made by a bot.