Comment 3 for bug 747090

Revision history for this message
Colin Watson (cjwatson) wrote : Re: No translations in natty - inside kvm only

Believe it or not, this appears to be a kvm bug. It is reproducible with kvm but not with 'qemu -no-kvm'.

The problem appears to be that the return address is sometimes wrong when calling INT in real mode. Here's how to reproduce it in GDB. Download http://cdimage.ubuntu.com/daily-live/current/natty-desktop-i386.iso (sorry, I realise this is large, at 700MB or so - if a kvm developer needs a smaller image, I should be able to prepare one). Start gdb and run 'target remote | kvm -gdb stdio -cdrom natty-desktop-i386.iso', 'c' at the prompt, and Ctrl-c in gdb as soon as kvm displays the aubergine splash screen. Then set up breakpoints as follows:

(gdb) b *0x8eeb if $al==6
Breakpoint 1 at 0x8eeb
(gdb) commands 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>info reg al
>x/s ($es<<4)+$si
>x/xh ($ss<<4)+$sp
>end
(gdb) b *0x8de5
Breakpoint 2 at 0x8de5
(gdb) commands 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>x/xh ($ss<<4)+$sp+40
>end

The first breakpoint is at comboot_int22 in syslinux, in the case where AL == 6, with commands that print AL, the string at ES:SI, and the return address on the stack; this corresponds to the "Open file" real-mode COMBOOT API call, which is called from cb_fopen in syslinux/com32/gfxboot/realmode_callback.asm. The second is on the IRET instruction that returns from that interrupt, with a command that prints the return address on the stack. The relevant chunk of calling code is (with 16-bit operand size):

cb_fopen:
                mov si,f_name
                push ds
                pop es
                mov ax,6
                int 22h
                xchg edx,eax
                mov al,1
                jc cb_fopen_90

After setting this up, tell gdb to continue, and quickly switch to the kvm window and press Escape to instruct the Ubuntu gfxboot theme to display the full boot menu.

You may have to try a few times to make this happen, because it's not entirely consistent, and chances are you will have to continue through a few irrelevant breakpoint traps. You may also need to use the F2 menu in gfxboot to select a different language. Sooner or later you should see:

Breakpoint 1, 0x00008eeb in ?? ()
al 0x6 6
0x110fa: "en.tr"
0x1d346: 0x0031

The return address is with CS == 0x1100, and we're in real mode, so 'set architecture i8086', and:

(gdb) disas /r 0x11031,+20
Dump of assembler code from 0x11031 to 0x11045:
   0x00011031: cd 22 int $0x22
   0x00011033: 66 92 xchg %ax,%dx
   0x00011035: b0 01 mov $0x1,%al
   0x00011037: 72 2c jb 0x11065
   0x00011039: 3b 0e cmp (%esi),%ecx
   0x0001103b: 00 00 add %al,(%eax)
   0x0001103d: 77 26 ja 0x11065
   0x0001103f: 09 c9 or %ecx,%ecx
   0x00011041: 74 22 je 0x11065
   0x00011043: 89 0e mov %ecx,(%esi)
End of assembler dump.

Note that the return address is the INT, not the instruction following the INT. This does not always happen, but when it does, other things always seem to go wrong:

(gdb) c
Continuing.

Breakpoint 3, 0x00008de5 in ?? ()
0x1d346: 0x0031

Now we're at the IRET, which will pop three 16-bit words off the stack for EIP (zero-extended), CS, and EFLAGS. These are supposed to be:

(gdb) x/xh ($ss<<4)+$sp
0x1d346: 0x0031
(gdb) x/xh ($ss<<4)+$sp+2
0x1d348: 0x1100
(gdb) info reg eflags
eflags 0x203206 [ PF IF #12 #13 ID ]
(gdb) x/xh ($ss<<4)+$sp+4
0x1d34a: 0x3200

That all looks right, since comboot_resume propagates the error status to the lower byte of EFLAGS on the stack, setting it to either 0 or 1. This corresponds to a successful return.

(gdb) stepi
0x00000031 in ?? ()
(gdb) disas /r ($cs<<4)+$eip,+20
Dump of assembler code from 0x11031 to 0x11045:
   0x00011031: cd 22 int $0x22
   0x00011033: 66 92 xchg %ax,%dx
   0x00011035: b0 01 mov $0x1,%al
   0x00011037: 72 2c jb 0x11065
   0x00011039: 3b 0e cmp (%esi),%ecx
   0x0001103b: 00 00 add %al,(%eax)
   0x0001103d: 77 26 ja 0x11065
   0x0001103f: 09 c9 or %ecx,%ecx
   0x00011041: 74 22 je 0x11065
   0x00011043: 89 0e mov %ecx,(%esi)
End of assembler dump.
(gdb) info reg eflags
eflags 0x203202 [ IF #12 #13 ID ]
(gdb) stepi
0x00008eeb in ?? ()

The flags are fine (qemu sets bit 0x2, since the architecture manual documents it as always 1). But now it jumps straight back into the interrupt, executing it twice! This is clearly wrong.

(gdb) c
Continuing.

Breakpoint 2, 0x00008de5 in ?? ()
0x1d346: 0x0033

The return address is now after the INT, as it should be. However, since the registers were wrong for the second call:

(gdb) stepi
0x00000033 in ?? ()
(gdb)
0x00000035 in ?? ()
(gdb) info reg eflags
eflags 0x203203 [ CF IF #12 #13 ID ]

Now CF is set, which causes gfxboot to think that this API call failed, which explains the original bug. (In earlier debugging sessions, gdb/kvm apparently silently continued past the second interrupt, so all I saw was CF mysteriously being set.)