wrong return address sometimes pushed for INT in kvm (not qemu)

Bug #747090 reported by Timo Jyrinki on 2011-04-01
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Translations
Low
Unassigned
linux (Ubuntu)
High
Andy Whitcroft
Natty
High
Andy Whitcroft
qemu-kvm (Ubuntu)
High
Serge Hallyn
Natty
High
Serge Hallyn

Bug Description

Binary package hint: gfxboot-theme-ubuntu

In the beta release, no translations seem to be available when you press any key at the start of the boot from CD to get to the menu. Any language can be chosen (and the choice is presented by default), but strings are kept in English.

description: updated
Colin Watson (cjwatson) on 2011-04-02
Changed in gfxboot-theme-ubuntu (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Colin Watson (cjwatson) on 2011-04-02
Changed in gfxboot-theme-ubuntu (Ubuntu Natty):
assignee: nobody → Colin Watson (cjwatson)
milestone: none → ubuntu-11.04-beta-2
description: updated
Colin Watson (cjwatson) wrote :

This seems to be broken in maverick too. I expect it has something to do with switching to the gfxboot com32 module in syslinux, although I thought I'd fixed that in 0.10.0.

Colin Watson (cjwatson) wrote :

The findfile primitive is failing mysteriously. But only sometimes. Argh.

summary: - No translations in natty
+ No translations in natty - inside kvm only
Download full text (5.1 KiB)

Believe it or not, this appears to be a kvm bug. It is reproducible with kvm but not with 'qemu -no-kvm'.

The problem appears to be that the return address is sometimes wrong when calling INT in real mode. Here's how to reproduce it in GDB. Download http://cdimage.ubuntu.com/daily-live/current/natty-desktop-i386.iso (sorry, I realise this is large, at 700MB or so - if a kvm developer needs a smaller image, I should be able to prepare one). Start gdb and run 'target remote | kvm -gdb stdio -cdrom natty-desktop-i386.iso', 'c' at the prompt, and Ctrl-c in gdb as soon as kvm displays the aubergine splash screen. Then set up breakpoints as follows:

(gdb) b *0x8eeb if $al==6
Breakpoint 1 at 0x8eeb
(gdb) commands 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>info reg al
>x/s ($es<<4)+$si
>x/xh ($ss<<4)+$sp
>end
(gdb) b *0x8de5
Breakpoint 2 at 0x8de5
(gdb) commands 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>x/xh ($ss<<4)+$sp+40
>end

The first breakpoint is at comboot_int22 in syslinux, in the case where AL == 6, with commands that print AL, the string at ES:SI, and the return address on the stack; this corresponds to the "Open file" real-mode COMBOOT API call, which is called from cb_fopen in syslinux/com32/gfxboot/realmode_callback.asm. The second is on the IRET instruction that returns from that interrupt, with a command that prints the return address on the stack. The relevant chunk of calling code is (with 16-bit operand size):

cb_fopen:
                mov si,f_name
                push ds
                pop es
                mov ax,6
                int 22h
                xchg edx,eax
                mov al,1
                jc cb_fopen_90

After setting this up, tell gdb to continue, and quickly switch to the kvm window and press Escape to instruct the Ubuntu gfxboot theme to display the full boot menu.

You may have to try a few times to make this happen, because it's not entirely consistent, and chances are you will have to continue through a few irrelevant breakpoint traps. You may also need to use the F2 menu in gfxboot to select a different language. Sooner or later you should see:

Breakpoint 1, 0x00008eeb in ?? ()
al 0x6 6
0x110fa: "en.tr"
0x1d346: 0x0031

The return address is with CS == 0x1100, and we're in real mode, so 'set architecture i8086', and:

(gdb) disas /r 0x11031,+20
Dump of assembler code from 0x11031 to 0x11045:
   0x00011031: cd 22 int $0x22
   0x00011033: 66 92 xchg %ax,%dx
   0x00011035: b0 01 mov $0x1,%al
   0x00011037: 72 2c jb 0x11065
   0x00011039: 3b 0e cmp (%esi),%ecx
   0x0001103b: 00 00 add %al,(%eax)
   0x0001103d: 77 26 ja 0x11065
   0x0001103f: 09 c9 or %ecx,%ecx
   0x00011041: 74 22 je 0x11065
   0x00011043: 89 0e mov %ecx,(%esi)
End of assembler dump.

Note that the return address is the INT, not the instruction following the INT. This does not always happen, but when it does, other things always seem to go wrong:

(gdb) c
Continuing.

Breakpoint 3,...

Read more...

affects: gfxboot-theme-ubuntu (Ubuntu Natty) → qemu-kvm (Ubuntu Natty)
Changed in qemu-kvm (Ubuntu Natty):
assignee: Colin Watson (cjwatson) → nobody
Colin Watson (cjwatson) wrote :

My host system is as follows:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz
stepping : 13
cpu MHz : 1801.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dts tpr_shadow vnmi flexpriority
bogomips : 3590.90
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz
stepping : 13
cpu MHz : 800.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dts tpr_shadow vnmi flexpriority
bogomips : 3590.97
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

Colin Watson (cjwatson) wrote :

Colin Watson wrote:
> (gdb) commands 2
> Type commands for breakpoint(s) 2, one per line.
> End with a line saying just "end".
> >x/xh ($ss<<4)+$sp+40
> >end

This should have been "x/xh ($ss<<4)+$sp". Sorry for the typo.

Colin Watson (cjwatson) wrote :

I've reproduced this with qemu-kvm git HEAD (df85c051d780bca0ee2462cfeb8ef6d9552a19b0).

Colin Watson (cjwatson) on 2011-04-06
summary: - No translations in natty - inside kvm only
+ wrong return address sometimes pushed for INT in kvm (not qemu)
Serge Hallyn (serge-hallyn) wrote :

Hi Colin,

I'm trying right now to reproduce this myself, but I'm curious, since this only happens with kvm enabled, have you tried to reproduce this with say a maverick kernel?

Changed in qemu-kvm (Ubuntu Natty):
assignee: nobody → Serge Hallyn (serge-hallyn)

I haven't. I'm not sure I can easily take my laptop down at the moment
to try that, unfortunately ...

Dustin Kirkland  (kirkland) wrote :

Subscribing Anthony...have you seen anything like this, Anthony?

Serge Hallyn (serge-hallyn) wrote :

Instrumenting arch/x86/kvm/emulate.c gives me:

[ 119.115925] emulate_int_real: emulating push of eip 148
[ 119.159032] emulate_int_real: emulating push of eip 40a3
[ 119.159063] emulate_int_real: emulating push of eip 40a3
[ 119.159086] emulate_int_real: emulating push of eip 148
[ 119.161142] emulate_int_real: emulating push of eip 35f7
[ 119.199433] emulate_int_real: emulating push of eip 40a3
[ 119.199464] emulate_int_real: emulating push of eip 40a3
[ 119.202484] emulate_int_real: emulating push of eip c416
[ 119.208262] emulate_int_real: emulating push of eip efc4
[ 119.257379] emulate_int_real: emulating push of eip efc4
[ 119.316397] emulate_int_real: emulating push of eip 3
[ 119.370991] emulate_int_real: emulating push of eip 7ee6
[ 119.877462] emulate_int_real: emulating push of eip c046
[ 119.879276] emulate_int_real: emulating push of eip 31
[ 120.035390] emulate_int_real: emulating push of eip c046
[ 120.073810] emulate_int_real: emulating push of eip 31
[ 123.826593] wlan0: No active IBSS STAs - trying to scan for other IBSS networks with same SSID (merge)
[ 132.604929] emulate_int_real: emulating push of eip 888e
[ 132.605069] emulate_int_real: emulating push of eip 888e
[ 132.647343] emulate_int_real: emulating push of eip 6a54
[ 132.757042] emulate_int_real: emulating push of eip 6a54
[ 132.976608] emulate_int_real: emulating push of eip 6a54
[ 133.141226] emulate_int_real: emulating push of eip 6a54
[ 133.250917] emulate_int_real: emulating push of eip 6a54
[ 133.415668] emulate_int_real: emulating push of eip 6a54
[ 133.525302] emulate_int_real: emulating push of eip 6a54
[ 133.635169] emulate_int_real: emulating push of eip 6a54
[ 133.964395] emulate_int_real: emulating push of eip 6a54
[ 134.458270] emulate_int_real: emulating push of eip 6a54
[ 134.853848] emulate_int_real: emulating push of eip 6a54
[ 134.875356] emulate_int_real: emulating push of eip 31

Serge Hallyn (serge-hallyn) wrote :

Building a kernel with emulate_int_real removed, I assume that will start working.

I suspect the right answer will just be to increment eip, of course, i.e. something like insn_fetch(s8, 1, c->eip).

Dustin Kirkland  (kirkland) wrote :

<aliguori> kirkland, yeah, that's real mode emulation
<aliguori> i'll look closer this afternoon

Serge Hallyn (serge-hallyn) wrote :

Quoting Dustin Kirkland (<email address hidden>):
> <aliguori> kirkland, yeah, that's real mode emulation
> <aliguori> i'll look closer this afternoon

I was wrong about the path being taken when this happens -
emulate_real_int() is not being called by emulate.c:emulate_int(), but
by x86.c:kvm_inject_realmode_interrupt().

David Planella (dpm) on 2011-04-08
Changed in ubuntu-translations:
status: New → Triaged
importance: Undecided → Low
Serge Hallyn (serge-hallyn) wrote :

With this kernel patch applied, the problem appears solved for me.

I did first try my hand at a 'proper' fix, in two different ways, but failed.

The patch probably won't apply 100% cleanly, but only because two of the lines being removed have changed. Ignore that and make the patch apply.

Dustin Kirkland  (kirkland) wrote :

Serge, are you working this upstream through the kvm development
mailing list, too?

Dustin Kirkland  (kirkland) wrote :

Oh, and nice work on the patch, by the way :-)

Serge Hallyn (serge-hallyn) wrote :

Quoting Dustin Kirkland (<email address hidden>):
> Serge, are you working this upstream through the kvm development
> mailing list, too?

I've sent an email to KVM mailing list <email address hidden>
     http://www.spinics.net/lists/kvm/msg52279.html
but no responses yet.

Serge Hallyn (serge-hallyn) wrote :

A cleaned up patch which applies to uptodate linux-2.6 HEAD.

Changed in linux (Ubuntu Natty):
status: New → Triaged
importance: Undecided → High
milestone: none → ubuntu-11.04-beta-2
tags: added: kernel-key
Dustin Kirkland  (kirkland) wrote :

Marking the qemu-kvm userspace task 'invalid', as this looks to me to be exclusively in the kernel.

Marking the linux task triaged/high/B2, to make sure this is on the Kernel team's release radar. JFo: adjust accordingly, if you disagree ;-)

Changed in qemu-kvm (Ubuntu Natty):
status: Confirmed → Invalid
Andy Whitcroft (apw) wrote :

@serge -- have we heard anything further from upstream. The thread you started seems quiet. I suspect we need to debug this more before they are going to react.

Serge Hallyn (serge-hallyn) wrote :

Quoting Andy Whitcroft (<email address hidden>):
> @serge -- have we heard anything further from upstream. The thread you
> started seems quiet. I suspect we need to debug this more before they
> are going to react.

Yeah I'd gotten distracted during the day yesterday. When I looked more
into it last night, I think I found another solution. Namely, every
place where the code replaced by calls to kvm_inject_realmode_interrupt()
incremented the rmode.irq.rip, we need to do the same.

I can try my hand at a patch today (or leave it in your capable hands)

Serge Hallyn (serge-hallyn) wrote :

No comment on this patch yet from upstream, but this patch follows upstream guidance in actually fixing the bug as opposed to undoing the bad patch altogether (as my last patch did). The kernel built with this patch works for me.

tags: added: iso-testing
Andy Whitcroft (apw) on 2011-04-13
Changed in linux (Ubuntu Natty):
assignee: nobody → Andy Whitcroft (apw)
Andy Whitcroft (apw) wrote :

@Serge -- I have pulled down the patch (and applied Jan's changes), and applied the patch to a Natty kernel for testing. If you could just verify the kernels at the URL for me, I can get the patch out for review. Kernels are here:

    http://people.canonical.com/~apw/lp747090-natty/

Thanks!

Changed in linux (Ubuntu Natty):
status: Triaged → Incomplete
Serge Hallyn (serge-hallyn) wrote :

I've sent apw's updated version of the patch with no changes to the kvm mailing list. Hopefully they forward it to lkml soon.

I'm still testing, but the kernel in comment #23 is working great so far. Thanks much.

Colin Watson (cjwatson) on 2011-04-15
Changed in linux (Ubuntu Natty):
milestone: ubuntu-11.04-beta-2 → ubuntu-11.04
Dave Walker (davewalker) on 2011-04-15
tags: added: server-nrs
Dave Walker (davewalker) on 2011-04-15
Changed in qemu-kvm (Ubuntu Natty):
milestone: ubuntu-11.04-beta-2 → none
Serge Hallyn (serge-hallyn) wrote :

The new kernel is working great for me for kvm. Marking the linux task from 'incomplete' to 'fix committed', as my understanding is that it is in the tree. Please correct if I'm wrong.

Changed in linux (Ubuntu Natty):
status: Incomplete → Fix Committed
Dave Walker (davewalker) wrote :

Confirmed with apw that this is fixed in ubuntu git tree, and will be Fixed in first SRU upload. Updating milestone to natty-updates to reflect this.

Changed in linux (Ubuntu Natty):
milestone: ubuntu-11.04 → natty-updates
Dave Walker (davewalker) on 2011-04-18
tags: added: server-nro
removed: server-nrs

Accepted linux into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Brad Figg (brad-figg) on 2011-05-10
tags: added: verification-needed-natty
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Serge Hallyn (serge-hallyn) wrote :

I no longer have the original natty-desktop-i386.iso, and the current one appears to have moved some of the breakpoints.

So I instead tested using this bug's dup, 771227. virsh save/restore are working with this proposed kernel. That should verify this bug, I don't know if cjwatson wants to give it another go or not.

Serge Hallyn (serge-hallyn) wrote :

Also, with this kernel I do get translations when I choose 'Nederlands' as my language from the boot cd.

Colin Watson (cjwatson) wrote :

Agreed, this is working fine now. I've confirmed that translations are working in the CD boot menu again, and I've also reproduced my gdb testing and am no longer seeing instances of wrong return addresses following INT. Thanks!

tags: added: verification-done
removed: verification-needed-natty
Steve Conklin (sconklin) on 2011-05-17
tags: added: verification-done-natty
removed: verification-done
Launchpad Janitor (janitor) wrote :
Download full text (26.1 KiB)

This bug was fixed in the package linux - 2.6.38-10.46

---------------
linux (2.6.38-10.46) natty-proposed; urgency=low

  [ Steve Conklin ]

  * Release Tracking Bug
    - LP: #802464

  [ Upstream Kernel Changes ]

  * Revert "put stricter guards on queue dead checks"
  * Revert "fix oops in scsi_run_queue()"

linux (2.6.38-10.45) natty-proposed; urgency=low

  [ Upstream Kernel Changes ]

  * Revert "af_unix: Only allow recv on connected seqpacket sockets."

linux (2.6.38-10.44) natty-proposed; urgency=low

  [ Steve Conklin ]

  * Release Tracking Bug
    - LP: #792013

  [ Robert Nelson ]

  * SAUCE: omap3: beagle: detect new xM revision B
    - LP: #770679
  * SAUCE: omap3: beagle: detect new xM revision C
    - LP: #770679
  * SAUCE: omap3: beagle: if rev unknown, assume xM revision C
    - LP: #770679

  [ Stefan Bader ]

  * Include nls_iso8859-1 for virtual images
    - LP: #732046

  [ Thomas Schlichter ]

  * SAUCE: vesafb: mtrr module parameter is uint, not bool
    - LP: #778043

  [ Tim Gardner ]

  * Revert "SAUCE: acpi battery -- move first lookup asynchronous"
    - LP: #775809
  * updateconfigs after update to v2.6.38.6

  [ Upstream Kernel Changes ]

  * Revert "ALSA: hda - Fix pin-config of Gigabyte mobo"
    - LP: #780546
  * Revert "[SCSI] Retrieve the Caching mode page"
    - LP: #788691
  * Revert "USB: xhci - fix unsafe macro definitions"
  * Revert "USB: xhci - fix math in xhci_get_endpoint_interval()"
  * Revert "USB: xhci - also free streams when resetting devices"
  * ath9k_hw: fix stopping rx DMA during resets
    - LP: #775809
  * netxen: limit skb frags for non tso packet
    - LP: #775809
  * ath: add missing regdomain pair 0x5c mapping
    - LP: #775809
  * block, blk-sysfs: Fix an err return path in blk_register_queue()
    - LP: #775809
  * p54: Initialize extra_len in p54_tx_80211
    - LP: #775809
  * qlcnic: limit skb frags for non tso packet
    - LP: #775809
  * nfsd4: fix struct file leak on delegation
    - LP: #775809
  * nfsd4: Fix filp leak
    - LP: #775809
  * virtio: Decrement avail idx on buffer detach
    - LP: #775809
  * x86, gart: Set DISTLBWALKPRB bit always
    - LP: #775809
  * x86, gart: Make sure GART does not map physmem above 1TB
    - LP: #775809
  * intel-iommu: Fix use after release during device attach
    - LP: #775809
  * intel-iommu: Unlink domain from iommu
    - LP: #775809
  * intel-iommu: Fix get_domain_for_dev() error path
    - LP: #775809
  * drm/radeon/kms: pll tweaks for r7xx
    - LP: #775809
  * drm/nouveau: fix notifier memory corruption bug
    - LP: #775809
  * drm/radeon/kms: fix bad shift in atom iio table parser
    - LP: #775809
  * drm/i915/tv: Remember the detected TV type
    - LP: #775809
  * tty/n_gsm: fix bug in CRC calculation for gsm1 mode
    - LP: #775809
  * serial/imx: read cts state only after acking cts change irq
    - LP: #775809
  * ASoC: Fix output PGA enabling in wm_hubs CODECs
    - LP: #775809
  * ASoC: codecs: JZ4740: Fix OOPS
    - LP: #775809
  * ALSA: hda - Add a fix-up for Acer dmic with ALC271x codec
    - LP: #775809
  * ahci: don't enable port irq before handler is registered
    - LP: #775809
  * libata: Implement ATA_FLAG_NO_...

Changed in linux (Ubuntu Natty):
status: Fix Committed → Fix Released
Gabor Kelemen (kelemeng) on 2011-08-19
Changed in ubuntu-translations:
status: Triaged → Fix Released

Marking actively developed linux task from Fix Committed to Fix Released as this patch has been applied and uploaded for Oneiric:

ubuntu-oneiric$ git describe --contains 71f9833bb1cba9939245f3e57388d87d69f8f399
v3.0-rc1~350^2~68

Changed in linux (Ubuntu):
milestone: natty-updates → none
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers