arm9 clock pending (SP804)

Bug #1777777 reported by RTOS Pharos
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

Hello all,

I'm using the versatilepb board and the timer Interrupt Mask Status register (offset 0x14 of the SP804) does not seem to be working properly on the latest qemu-2.12. I tried on the 2.5 (i believe this is the mainstream version that comes with Linux) and it works perfectly.

What happens is that the pending bit does not seem to be set in some scenarios. In my case, I see the timer value decreasing to 0 and then being reset to the reload value and neither does the interrupt is triggered nor the pending bit is set.

I believe this is a matter of timing since in the "long" run the system eventually catches up (after a few microseconds).

Thank you

Revision history for this message
Peter Maydell (pmaydell) wrote :

Do you have a test case image you can provide that reproduces the problem?

Revision history for this message
RTOS Pharos (rtos.pharos) wrote :

Hello,

Let me explain a bit how to reproduce the problem. I'm developing a RTOS with qemu (and boards,etc) - https://sourceforge.net/projects/rtospharos/. The first version used ARM926 with the versatilepb board and I used the qemu 2.5 version.

In my test (to check that the RTOS is working correctly) I produced a loop that is always reading the current time ("pharosClockGet") and checking that it is always increasing. This works perfectly on qemu 2.5. When qemu 2.12 was released I updated to the new version to add support for the raspberry pi 3.

Here is a piece of the test:

void aperiodicThread0_0()
{
    uint64_t microseconds;
    uint64_t previousMicroseconds = 0;
    uint32_t i;

    for(i = 0; i < NUMBER_REPETITIONS; i++)
    {
        microseconds = pharosClockGetSinceBoot();
        if(previousMicroseconds > microseconds)
        {
            print("ERROR! Clock got lower\r\n");
            break;
        }
    }
}

I'll show you the "internals" of Pharos next (this is file clock.c):

uint64_t pharosIClockGetSinceBoot(void)
{
    /* microseconds since last clock tick*/
    uint32_t microseconds;

    /* number of clock ticks since boot */
    ClockTick clockTicks;

    /* >> interrupts are disabled when we get here << */

    /* read the number of microseconds on the counter */
    clockTicks = pharosVClockTicks;

    /* read the timer value */
    microseconds = pharosCpuClockReadCounter();

    /* if the clock interrupt is pending we might have read the microseconds before or after the clock tick occurred (but not processed since interrupts are disabled) */
    if(pharosCpuClockIsrIsPending() == TRUE)
    {
        /* interrupt is pending, so read again the microseconds and subtract 2 clock ticks */
        microseconds = (2U * pharosIMicrosecondsPerClockTick()) - pharosCpuClockReadCounter();
    }
    else
    {
        /* the timer counts backwards so we must subtract to get the number of microseconds since last tick */
        microseconds = pharosIMicrosecondsPerClockTick() - microseconds;
    }

    /* calculate the result using the last us read */
    return (uint64_t) ((pharosIMicrosecondsPerClockTick() * clockTicks) + microseconds);
}

And the part that is hw dependent (for ARM926) - this is in file ClockSp804.h:

INLINE bool armSp804TimerInterruptIsPending(const ptrArmSp804Timer c)
{
        /* get the first bit of the interrupt status (ignore other bits) to check if the interrupt is pending */
        return (c->timerInterruptMaskStatus & 0x1U) == 1U ? TRUE : FALSE;
}

I hope the code is easy enough to read.
Thank you

Revision history for this message
Peter Maydell (pmaydell) wrote :

Sorry, I should have been clearer. Please can you provide a test case binary and QEMU command line that reproduces the problem (including what the expected output is and what the actual output is)?

Revision history for this message
Peter Maydell (pmaydell) wrote :

Mark this bug as Incomplete as we're still waiting for the test case binary and command line to reproduce it.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
Revision history for this message
RTOS Pharos (rtos.pharos) wrote :
Download full text (5.0 KiB)

Sorry to have taken some time to answer.

Created the binary with the test.

Here is the output when it fails (tested in Qemu 2.12):

Checked 0 times if the time is always increasing
Checked 100000 times if the time is always increasing
Checked 200000 times if the time is always increasing
Checked 300000 times if the time is always increasing
Checked 400000 times if the time is always increasing
Checked 500000 times if the time is always increasing
Checked 600000 times if the time is always increasing
Checked 700000 times if the time is always increasing
Error: the time read is smaller than the previous time read

And here with the latest Qemu (4.1.0):

Checked 0 times if the time is always increasing
Checked 100000 times if the time is always increasing
Error: the time read is smaller than the previous time read

The error can occur basically at any time. But when tested with Qemu 2.5 it goes well:

Checked 0 times if the time is always increasing
Checked 100000 times if the time is always increasing
Checked 200000 times if the time is always increasing
Checked 300000 times if the time is always increasing
Checked 400000 times if the time is always increasing
Checked 500000 times if the time is always increasing
Checked 600000 times if the time is always increasing
Checked 700000 times if the time is always increasing
Checked 800000 times if the time is always increasing
Checked 900000 times if the time is always increasing
Checked 1000000 times if the time is always increasing
Checked 1100000 times if the time is always increasing
Checked 1200000 times if the time is always increasing
Checked 1300000 times if the time is always increasing
Checked 1400000 times if the time is always increasing
Checked 1500000 times if the time is always increasing
Checked 1600000 times if the time is always increasing
Checked 1700000 times if the time is always increasing
Checked 1800000 times if the time is always increasing
Checked 1900000 times if the time is always increasing
Checked 2000000 times if the time is always increasing
Checked 2100000 times if the time is always increasing
Checked 2200000 times if the time is always increasing
Checked 2300000 times if the time is always increasing
Checked 2400000 times if the time is always increasing
Checked 2500000 times if the time is always increasing
Checked 2600000 times if the time is always increasing
Checked 2700000 times if the time is always increasing
Checked 2800000 times if the time is always increasing
Checked 2900000 times if the time is always increasing
.
.
.
Checked 9500000 times if the time is always increasing
Checked 9600000 times if the time is always increasing
Checked 9700000 times if the time is always increasing
Checked 9800000 times if the time is always increasing
Checked 9900000 times if the time is always increasing
Example finished

Just to give some background on the test itself, the main.c file contains this:

void helloWorld0()
{
    uint64_t microseconds;
    uint64_t previousMicroseconds = 0;
    uint32_t i;

    for(i = 0; i < NUMBER_REPETITIONS; i++)
    {
        microseconds = pharosClockGetSinceBoot();

        if(microseconds < previousMicroseconds)
        {
   ...

Read more...

Changed in qemu:
status: Expired → New
Revision history for this message
RTOS Pharos (rtos.pharos) wrote :

Forgot to state the qemu argument, it's this one:

~/cross-compiler/qemu/2.5/bin/qemu-system-arm -M versatilepb -m 128M -nographic -kernel myApplication

Revision history for this message
Peter Maydell (pmaydell) wrote :
Download full text (4.1 KiB)

Thanks for the repro case. Preliminary analysis: I added some tracepoints to the sp804 code, and you're right that the value of the interrupt status register isn't always correct:

A normal attempt to read the clock looks like this:

Taking exception 2 [SVC]
...from EL0 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from svc to sys PC 0x12010
AArch32 mode switch from sys to svc PC 0x12018
32696@1569513406.450378:sp804_read addr 0x00000004 value 0x00000681
32696@1569513406.450384:sp804_read addr 0x00000014 value 0x00000000
Exception return from AArch32 svc to usr PC 0x10ff0

Sometimes we might read the clock when it's exactly got down to 0 (this is more likely on QEMU than on real h/w for internal reasons of our implementation):
Taking exception 2 [SVC]
...from EL0 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from svc to sys PC 0x12010
AArch32 mode switch from sys to svc PC 0x12018
32696@1569513406.452273:sp804_read addr 0x00000004 value 0x00000000
32696@1569513406.452279:sp804_read addr 0x00000014 value 0x00000000
Exception return from AArch32 svc to usr PC 0x10ff0

A correct handling of the rollover looks like this (we read the counter, which is rolled over, and the interrupt-status, which is 1, which causes us to reread the counter; once we're done the IRQ handler itself runs):

4003@1569514474.944756:sp804_read addr 0x00000004 value 0x0000c29d
4001@1569514474.944761:sp804_arm_timer_update level 1
4003@1569514474.944797:sp804_read addr 0x00000014 value 0x00000001
4003@1569514474.944828:sp804_read addr 0x00000004 value 0x0000c255
Taking exception 5 [IRQ]
...from EL1 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from irq to svc PC 0x2a5e4
4003@1569514474.944943:sp804_read addr 0x00000014 value 0x00000001
4003@1569514474.944957:sp804_read addr 0x00000034 value 0x00000000
4003@1569514474.944962:sp804_read addr 0x00000014 value 0x00000001
4003@1569514474.944965:sp804_write addr 0x0000000c value 0x00000000
4003@1569514474.944966:sp804_arm_timer_update level 0
4003@1569514474.944969:sp804_read addr 0x00000034 value 0x00000000
AArch32 mode switch from svc to irq PC 0x2a718
Exception return from AArch32 irq to svc PC 0x2a3bc
Exception return from AArch32 svc to usr PC 0x10ff0

But sometimes we get this, where we read the rolled-over counter value but the interrupt-status register is still 0:

Taking exception 2 [SVC]
...from EL0 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from svc to sys PC 0x12010
AArch32 mode switch from sys to svc PC 0x12018
4003@1569514475.794690:sp804_read addr 0x00000004 value 0x0000c2df
4003@1569514475.794698:sp804_read addr 0x00000014 value 0x00000000
4001@1569514475.794703:sp804_arm_timer_update level 1
Exception return from AArch32 svc to usr PC 0x10ff0
Taking exception 5 [IRQ]
...from EL0 to EL1
...with ESR 0x11/0x4600000b
AArch32 mode switch from irq to svc PC 0x2a5e4
4003@1569514475.794768:sp804_read addr 0x00000014 value 0x00000001
4003@1569514475.794937:sp804_read addr 0x00000034 value 0x00000000
4003@1569514475.794944:sp804_read addr 0x00000014 value 0x00000001
4003@1569514475.794947:sp804_write addr 0x0000000c value 0x00000000
4003@1569514475.794949:sp804_arm_timer_update level ...

Read more...

Peter Maydell (pmaydell)
Changed in qemu:
status: New → Confirmed
Revision history for this message
RTOS Pharos (rtos.pharos) wrote :

Thank you for checking this issue.

I looked at the ptimer code, like you said. Just one question: isn't this used by other hw as well?
Maybe this problem is more general...

I also tried (basically) the same example on a aarch64 (raspberry pi3), and I don't find any problems there. Maybe could be helpful to look how this problem is solved there.

Just trying to help

Revision history for this message
Peter Maydell (pmaydell) wrote :

I sent out an initial RFC patchset which fixes this (it's just an RFC because it only converts this one device to the new ptimer API, and we should do a proper conversion of all devices); it seems to make the test case in this bug work correctly:
https://<email address hidden>/

Revision history for this message
Thomas Huth (th-huth) wrote :
Changed in qemu:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.