Comment 9 for bug 1785262

Revision history for this message
Iain Lane (laney) wrote :

> Would work for me, can we mark flaky-on-armhf only (I'd not be aware)?

The only way I could think to do this would be to duplicate the test and use "Architecture:" to run the "Restrictions: flaky" one on armhf arm64 only and the other one otherwise.

In this case the failure is a *timeout* though so I'm not sure that Restrictions: flaky works since it is being killed from the outside by autopkgtest itself (the test runner). That would need to be tested to make sure it does. You could do that by making a simple (in another package, doesn't have to be libreoffice) which has "Restricitons: flaky" and runs something like "sleep infinity". See what happens when autopkgtest times out the test - you can pass --timeout-test to reduce this value for testing.

Onto the test itself a bit.

When the "uicheck-sw" test passes, it takes in the region of 20-25 minutes. Picking two recent results:

armhf:

autopkgtest [05:43:20]: test uicheck-sw: [-----------------------
autopkgtest [06:03:54]: test uicheck-sw: -----------------------]
autopkgtest [06:03:58]: test uicheck-sw: - - - - - - - - - - results - - - - - - - - - -
uicheck-sw PASS

arm64:

autopkgtest [02:52:01]: test uicheck-sw: [-----------------------
autopkgtest [03:05:33]: test uicheck-sw: -----------------------]
autopkgtest [03:05:34]: test uicheck-sw: - - - - - - - - - - results - - - - - - - - - -
uicheck-sw PASS

but the timeout we have (this is a "large" test, it runs with higher timeouts than we use by default) is after over 5 *hours* 30 minutes. It's clearly wildly variable, that's more than 10× as much time and it still timed out.

If this is to do with the test runners being "loaded" (slower) vs "unloaded" (faster), then it could be a race condition that happens when the test runs slowly that causes it to get stuck?

I'm not sure if you want to go down the route of investigating the actual failure, though, but it does feel to me like this "shouldn't" be failing. We're not usually close to the limits and just being tipped over them sometimes, does look like the test is actually getting stuck. But it would require someone to actually look at the tests themselves to determine that.

Clearly having the tests working would be ideal; we want assurance that Libreoffice works on armhf and arm64.

In the absence of this, since we're just mashing the retry button and not actually trying to fix them (ATM), making this stop blocking us one way or another would be better than the status quo.