Comment 8 for bug 1777777

Revision history for this message
Peter Maydell (pmaydell) wrote :

Thanks for the repro case. Preliminary analysis: I added some tracepoints to the sp804 code, and you're right that the value of the interrupt status register isn't always correct:

A normal attempt to read the clock looks like this:

Taking exception 2 [SVC]
...from EL0 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from svc to sys PC 0x12010
AArch32 mode switch from sys to svc PC 0x12018
32696@1569513406.450378:sp804_read addr 0x00000004 value 0x00000681
32696@1569513406.450384:sp804_read addr 0x00000014 value 0x00000000
Exception return from AArch32 svc to usr PC 0x10ff0

Sometimes we might read the clock when it's exactly got down to 0 (this is more likely on QEMU than on real h/w for internal reasons of our implementation):
Taking exception 2 [SVC]
...from EL0 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from svc to sys PC 0x12010
AArch32 mode switch from sys to svc PC 0x12018
32696@1569513406.452273:sp804_read addr 0x00000004 value 0x00000000
32696@1569513406.452279:sp804_read addr 0x00000014 value 0x00000000
Exception return from AArch32 svc to usr PC 0x10ff0

A correct handling of the rollover looks like this (we read the counter, which is rolled over, and the interrupt-status, which is 1, which causes us to reread the counter; once we're done the IRQ handler itself runs):

4003@1569514474.944756:sp804_read addr 0x00000004 value 0x0000c29d
4001@1569514474.944761:sp804_arm_timer_update level 1
4003@1569514474.944797:sp804_read addr 0x00000014 value 0x00000001
4003@1569514474.944828:sp804_read addr 0x00000004 value 0x0000c255
Taking exception 5 [IRQ]
...from EL1 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from irq to svc PC 0x2a5e4
4003@1569514474.944943:sp804_read addr 0x00000014 value 0x00000001
4003@1569514474.944957:sp804_read addr 0x00000034 value 0x00000000
4003@1569514474.944962:sp804_read addr 0x00000014 value 0x00000001
4003@1569514474.944965:sp804_write addr 0x0000000c value 0x00000000
4003@1569514474.944966:sp804_arm_timer_update level 0
4003@1569514474.944969:sp804_read addr 0x00000034 value 0x00000000
AArch32 mode switch from svc to irq PC 0x2a718
Exception return from AArch32 irq to svc PC 0x2a3bc
Exception return from AArch32 svc to usr PC 0x10ff0

But sometimes we get this, where we read the rolled-over counter value but the interrupt-status register is still 0:

Taking exception 2 [SVC]
...from EL0 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from svc to sys PC 0x12010
AArch32 mode switch from sys to svc PC 0x12018
4003@1569514475.794690:sp804_read addr 0x00000004 value 0x0000c2df
4003@1569514475.794698:sp804_read addr 0x00000014 value 0x00000000
4001@1569514475.794703:sp804_arm_timer_update level 1
Exception return from AArch32 svc to usr PC 0x10ff0
Taking exception 5 [IRQ]
...from EL0 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from irq to svc PC 0x2a5e4
4003@1569514475.794768:sp804_read addr 0x00000014 value 0x00000001
4003@1569514475.794937:sp804_read addr 0x00000034 value 0x00000000
4003@1569514475.794944:sp804_read addr 0x00000014 value 0x00000001
4003@1569514475.794947:sp804_write addr 0x0000000c value 0x00000000
4003@1569514475.794949:sp804_arm_timer_update level 0
4003@1569514475.794952:sp804_read addr 0x00000034 value 0x00000000
AArch32 mode switch from svc to irq PC 0x2a718
Exception return from AArch32 irq to usr PC 0x10ff0

This happens because the sp804 uses a QEMU timer abstraction "ptimer". The ptimer updates its internal state when a raw QEMU timer expires and calls the ptimer_tick() function. From this point on, a guest read of the counter value will get the rolled-over value, because the sp804 implements this as a simple ptimer_get_count(). However, the ptimer doesn't immediately call the sp804's arm_timer_tick() function (which is where we update the interrupt-status flag and arrange for an IRQ to be delivered) -- it just schedules that to happen later via a QEMU "bottom half handler". Unfortunately it's possible for the guest CPU to run between when the ptimer's ptimer_tick() happens and when the bottom-half-handler is triggered, which means that the guest can see this incorrectly out-of-sync state from the sp804 device.

I'm not currently sure how best to fix this.