Running Oneiric kernel as Xen HVM guest with pvlocks hangs on boot

Bug #838026 reported by Stefan Bader
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Stefan Bader
Oneiric
High
Stefan Bader

Bug Description

Since xen-4.1.1 the hypervisor supports callback vectors. This causes newer kernels to try using IPIs and spinlocks in a paravirtualized way. However booting with such a kernel (2.6.39 or newer) hangs on boot. With only the pv spinlocks disabled, the boot succeeds.
Adding additional debug shows that the floppy_lock is released through the pv method, but it never seems to have been taken that way. The contents of the lock look like it was taken by the default ticket_spinlock method.

Stefan Bader (smb)
Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → High
Revision history for this message
Andy Whitcroft (apw) wrote :

From the raw crash dump here is the lock/set_next_request/unlock sequence. Note that
this has had its locking primatives rewritten as direct calls where inline (for the unlock):

ffffffffa0020df0: 00 e9 ed fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 ......f.........
ffffffffa0020e00: 48 c7 c7 f8 5e 02 a0
    6e00: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
                                        e8 74 c3 5d e1
    6e07: e8 00 00 00 00 callq 6e0c <redo_fd_request+0x17c>
FFFFFFFFA0020E0C + FFFFFFFFE15DC374 => FFFFFFFF815FD180
ffffffff815fd180 (T) _raw_spin_lock_irq
                                                       e8 cf a7 ff H...^...t.].....
ffffffffa0020e10: ff
    6e0c: e8 cf a7 ff ff callq 15e0 <set_next_request>
ffffffffa0020e10: 4c 89 e7
    6e11: 4c 89 e7 mov %r12,%rdi
ffffffffa0020e10: 89 c3
    6e14: 89 c3 mov %eax,%ebx
ffffffffa0020e10: e8 25 83 fe e0
                                                    66 90 fb 66 66 .L.....%...f..ff
    6e16: ff 14 25 00 00 00 00 callq *0x0
NOTE: this has been rewritten as a callq
FFFFFFFFA0020E1B + FFFFFFFFE0FE8325 => FFFFFFFF81009140
ffffffff81009140 (t) xen_spin_unlock
ffffffffa0020e20: 90 66 66 90 85 db 0f 84 1c 01 00 00 48 8b 05 bd .ff.........H...

Looking at the out-of-line lock this has been written as the ticket version:

12:37:31 smb | 0xffffffff815fd180 <_raw_spin_lock_irq+0>: push %rbp
12:37:31 smb | 0xffffffff815fd181 <_raw_spin_lock_irq+1>: mov %rsp,%rbp
12:37:31 smb | 0xffffffff815fd184 <_raw_spin_lock_irq+4>: xchg %ax,%ax
12:37:31 smb | 0xffffffff815fd189 <_raw_spin_lock_irq+9>: cli
12:37:32 smb | 0xffffffff815fd18a <_raw_spin_lock_irq+10>: xchg %ax,%ax
12:37:34 smb | 0xffffffff815fd18d <_raw_spin_lock_irq+13>: xchg %ax,%ax
12:37:36 smb | 0xffffffff815fd190 <_raw_spin_lock_irq+16>: callq 0xffffffff81033900 <__ticket_spin_lock>
12:37:40 smb | 0xffffffff815fd195 <_raw_spin_lock_irq+21>: xchg %ax,%ax
12:37:42 smb | 0xffffffff815fd197 <_raw_spin_lock_irq+23>: pop %rbp
12:37:44 smb | 0xffffffff815fd198 <_raw_spin_lock_irq+24>: retq

Proving smb's conjecture we are locking with ticket and unlocking with xen paravirt locks. As the lock and unlock are not the same form, explosions are inevitable.

Revision history for this message
Stefan Bader (smb) wrote :

After much joy with this, I thought I post this to a bigger audience. After having migrated to Xen 4.1.1, booting HVM guests had several issues. Some related to interrupts not being set up correctly (which Stefano has posted patches) and even with those 3.0 guests seem to hang for me while 2.6.38 or older kernels were ok.

After digging deeply into this, I think I found the issue. However, if that is true, it seems rather lucky if pv spinlocks in HVM worked for anybody.

The xen_hvm_smp_init() call will change the smp_ops hooks. One of which is smp_prepare_cpus. This is done in start_kernel and at this point in time, there is no change to the pv_lock_ops and they point to the ticket versions. Later in start_kernel, check_bugs is called and part of that takes the pv_lock_ops and patches the kernel with the correct jumps.
_After_ that, the kernel_init is called and that in turn does the smp_prepare_cpus which now changes the pv_lock_ops again, *but not* run any patching again. So the _raw_spin_*lock calls still use the ticket calls.

start_kernel
  setup_arch -> xen_hvm_smp_init (set smp_ops)
  ...
  check_bugs -> alternative_instructions (patches pv_locks sites)
  ...
  rest_init (triggers kernel_init as a thread)
    kernel_init
      ...
      smp_prepare_cpus (calls xen_init_spinlocks through smp_ops hook)

To make things even more fun, there is the possibility that some spinlock functions are forced to be inlined and others are not (CONFIG_INLINE_SPIN_*). In our special case only two versions of spin_unlock were inlined. Put that together into a pot with modules, shake well, and there is the fun. Basically on load time, the non-inline calls remain pointing to the unmodified ticket implementation (main kernel). But the unlock calls (which are inlined) get modified because the loaded module gets patched up. One can imagine that this does not work too well.

Anyway, even without the secondary issue, I am sure that just replacing the functions in pv_lock_ops without the spinlock calls getting actually modified is not the intended behaviour.

Unfortunately I have not yet been able to make it work. Any attempt to move xen_init_spinlocks to be called before check_bugs or adding a call to alternative_instructions results in another hang on boot. At least the latter method results in a more usable dump for crash, which shows that on spinlock was taken (slow) and two spurious taken ones (this is more to play for me).

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Oneiric):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.9 KiB)

This bug was fixed in the package linux - 3.0.0-10.16

---------------
linux (3.0.0-10.16) oneiric; urgency=low

  [ Andy Whitcroft ]

  * Revert "ubuntu: compcache -- follow changes to bd_claim/bd_release"
    - LP: #832694
  * Revert "ubuntu: compcache -- version 0.5.3"
    - LP: #832694
  * [Config] dropping compcache configuration options

  [ David Henningsson ]

  * SAUCE: ALSA: HDA: hdmi: Emit pcm device index for jack input devices

  [ Kees Cook ]

  * [Config] enable and enforce SECCOMP_FILTER on x86

  [ Leann Ogasawara ]

  * [Config] Update CONFIG_EFI_VARS enforcer check
  * [Config] Enable CONFIG_ECHO=m on powerpc
  * [Config] Enable CONFIG_ET131X=m on powerpc
  * [Config] Set CONFIG_FB_MATROX=m
  * [Config] Enable CONFIG_FB_UDL=m on powerpc
  * [Config] Set CONFIG_FB_VIRTUAL=n
  * [Config] Enable CONFIG_FB_VGA16=m on powerpc
  * [Config] Enable CONFIG_GPIO_MAX732X=m on arm
  * [Config] Enable CONFIG_GPIO_PCF857X=m on arm
  * [Config] Set CONFIG_HOTPLUG_PCI_FAKE=m
  * [Config] Enable CONFIG_HOTPLUG_PCI=y on powerpc
  * [Config] Enable CONFIG_HOTPLUG_PCI_CPCI=y on powerpc
  * [Config] Enable CONFIG_HP_ILO=m on powerpc-smp
  * [Config] Enable CONFIG_I2C_PASEMI=m on powerpc
  * [Config] Enable CONFIG_IBM_BSR=m on powerpc
  * [Config] Enable CONFIG_IBMVETH=m on powerpc
  * [Config] Enable CONFIG_IDE_PHISON=m on powerpc
  * [Config] Enable CONFIG_IGB=m on powerpc
  * [Config] Enable CONFIG_IIO=m on powerpc
  * [Config] Enable CONFIG_INFINIBAND_NES=m
  * [Config] Enable CONFIG_IPMI_HANDLER=m on arm
  * [Config] Enable CONFIG_IWL3945=m on powerpc
  * [Config] Disable CONFIG_KVM_BOOK3S_64
  * [Config] Enable CONFIG_LAPBETHER=m on arm
  * [Config] Enable CONFIG_LEDS_GPIO=m on powerpc
  * [Config] Enable CONFIG_LEDS_CLEVO_MAIL=m all arch's
  * [Config] Enable CONFIG_LEDS_PCA9532=m on powerpc
  * [Config] Enable CONFIG_LEDS_PCA955X=m on powerpc
  * [Config] Enable CONFIG_LEDS_TRIGGER_DEFAULT_ON=m on powerpc
  * [Config] Set CONFIG_LEDS_TRIGGER_HEARTBEAT=m on arm and powerpc
  * [Config] Set CONFIG_LEDS_TRIGGER_TIMER=m on powerpc
  * [Config] Enable CONFIG_LINE6_USB=m on arm and powerpc
  * [Config] Enable CONFIG_MEMSTICK=m on arm
  * [Config] Enable CONFIG_MTD_AFS_PARTS=m on arm
  * [Config] Enable CONFIG_MTD_ALAUDA=m on arm
  * [Config] Enable CONFIG_MTD_AR7_PARTS=m on arm
  * [Config] Enable CONFIG_MTD_ARM_INTEGRATOR=m on arm
  * [Config] Enable CONFIG_MOXA_SMARTIO=m on powerpc
  * [Config] Enable CONFIG_MTD_DATAFLASH=m on arm
  * [Config] Enable CONFIG_MTD_GPIO_ADDR=m on arm
  * [Config] Enable CONFIG_MTD_IMPA7=m on arm
  * [Config] Enable CONFIG_MTD_NAND_GPIO=m on arm
  * [Config] Enable CONFIG_MTD_NAND_NANDSIM=m on arm
  * [Config] Enable CONFIG_MTD_NAND_PASEMI=m on powerpc
  * [Config] Enable CONFIG_MTD_NAND_PLATFORM=m on arm
  * [Config] Enable CONFIG_MTD_NAND_TMIO=m on arm
  * [Config] Enable CONFIG_MTD_SST25L=m on arm
  * [Config] Enable CONFIG_NET_CLS_CGROUP=y on arm
  * [Config] Enable CONFIG_NET_CLS_FLOW=m on arm
  * [Config] Enable CONFIG_NET_CLS_U32=m on arm
  * [Config] Enable CONFIG_NET_DCCPPROBE=m on arm
  * [Config] Enable CONFIG_NET_SCH_INGRESS=m on arm
  * [Config] Enable CONFIG_NET_TCPPROBE=m on arm
  * [Config] Enable...

Read more...

Changed in linux (Ubuntu Oneiric):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers