Comment 2 for bug 838026

Revision history for this message
Stefan Bader (smb) wrote :

After much joy with this, I thought I post this to a bigger audience. After having migrated to Xen 4.1.1, booting HVM guests had several issues. Some related to interrupts not being set up correctly (which Stefano has posted patches) and even with those 3.0 guests seem to hang for me while 2.6.38 or older kernels were ok.

After digging deeply into this, I think I found the issue. However, if that is true, it seems rather lucky if pv spinlocks in HVM worked for anybody.

The xen_hvm_smp_init() call will change the smp_ops hooks. One of which is smp_prepare_cpus. This is done in start_kernel and at this point in time, there is no change to the pv_lock_ops and they point to the ticket versions. Later in start_kernel, check_bugs is called and part of that takes the pv_lock_ops and patches the kernel with the correct jumps.
_After_ that, the kernel_init is called and that in turn does the smp_prepare_cpus which now changes the pv_lock_ops again, *but not* run any patching again. So the _raw_spin_*lock calls still use the ticket calls.

start_kernel
  setup_arch -> xen_hvm_smp_init (set smp_ops)
  ...
  check_bugs -> alternative_instructions (patches pv_locks sites)
  ...
  rest_init (triggers kernel_init as a thread)
    kernel_init
      ...
      smp_prepare_cpus (calls xen_init_spinlocks through smp_ops hook)

To make things even more fun, there is the possibility that some spinlock functions are forced to be inlined and others are not (CONFIG_INLINE_SPIN_*). In our special case only two versions of spin_unlock were inlined. Put that together into a pot with modules, shake well, and there is the fun. Basically on load time, the non-inline calls remain pointing to the unmodified ticket implementation (main kernel). But the unlock calls (which are inlined) get modified because the loaded module gets patched up. One can imagine that this does not work too well.

Anyway, even without the secondary issue, I am sure that just replacing the functions in pv_lock_ops without the spinlock calls getting actually modified is not the intended behaviour.

Unfortunately I have not yet been able to make it work. Any attempt to move xen_init_spinlocks to be called before check_bugs or adding a call to alternative_instructions results in another hang on boot. At least the latter method results in a more usable dump for crash, which shows that on spinlock was taken (slow) and two spurious taken ones (this is more to play for me).