Comment 18 for bug 2026790

Revision history for this message
beadon (bryant-eadon) wrote :

Appears to be a red herring, VirtualBox apparently experiences random guest crashes when split lock detection is enabled on the Host OS. ( I am able to verify this )

Reference:
"Random crashes with Windows 10 guest operating system with Intel Tiger Lake chipset"

CPU is :
https://ark.intel.com/content/www/us/en/ark/products/208664/intel-core-i71185g7-processor-12m-cache-up-to-4-80-ghz-with-ipu.html

11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
(yes, it's Tiger Lake).

The reference in the VirtualBox ticketing system seems to have been deleted, it appears that VirtualBox is using split locking.

Reference to the much longer discussion of what this is and how it's impacting systems is here :
https://lwn.net/Articles/911219/

seems there's a penalty applied when the code is doing split locking. This is a tunable.

Deeper discussion of how Intel and the Linux kernel team were involved is here:
https://www.virtualbox.org/ticket/20180?cversion=0&cnum_hist=4

Here are the errors again for the split lock, note the gnome-shell errors just before :

2023-07-11T16:01:26.427798-07:00 semiauto gnome-character[5478]: JS LOG: Characters Application exiting
2023-07-11T16:01:29.507946-07:00 semiauto gnome-shell[2700]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x55c018ffa2a0] is on because it needs an allocation.
2023-07-11T16:01:29.508086-07:00 semiauto gnome-shell[2700]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x55c01daf3ce0] is on because it needs an allocation.
2023-07-11T16:01:29.601935-07:00 semiauto gnome-shell[2700]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x55c01ab19670] is on because it needs an allocation.
2023-07-11T16:01:29.602058-07:00 semiauto gnome-shell[2700]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x55c018212df0] is on because it needs an allocation.
2023-07-11T16:01:37.587046-07:00 semiauto kernel: [ 2385.543465] x86/split lock detection: #AC: EMT-0/5871 took a split_lock trap at address: 0x7fd78aa96453
2023-07-11T16:01:38.079070-07:00 semiauto kernel: [ 2386.033592] x86/split lock detection: #AC: EMT-1/5872 took a split_lock trap at address: 0x7fd78aa96453
2023-07-11T16:01:39.723080-07:00 semiauto kernel: [ 2387.676986] x86/split lock detection: #AC: EMT-2/5873 took a split_lock trap at address: 0x7fd78aa96453
2023-07-11T16:01:45.484475-07:00 semiauto systemd[1]: systemd-timedated.service: Deactivated successfully.
2023-07-11T16:02:22.331078-07:00 semiauto kernel: [ 2430.285901] pcieport 0000:00:07.2: pciehp: Slot(0-1): Link Down
2023-07-11T16:02:22.331102-07:00 semiauto kernel: [ 2430.285908] pcieport 0000:00:07.2: pciehp: Slot(0-1): Card not present

The split lock penalty triggered by VirtualBox seems to trip some kind of timer which causes the link to the eGPU to be reset. Which then of course causes all kinds of havoc on the system.

Further logs ;
2023-07-11T16:07:55.895042-07:00 semiauto kernel: [ 2763.846900] thunderbolt 0000:00:0d.2: failed to send driver ready to ICM

So -- long way around , there are a lot of things interacting in here. I am unsure how to get a pure Xorg work-around in place to use the extra GPU. I am also a little concerned that any application doing split locking and having the kernel penalty imposed will cause this setup to crash because of link detection.

Unsure where to go next, I suppose I'll try split lock disabling ..