FaviconFetcherTests random failures in silo builds

Bug #1498539 reported by Olivier Tilloy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qtbase-opensource-src (Ubuntu)
New
Undecided
Unassigned
webbrowser-app (Ubuntu)
Fix Released
Critical
Olivier Tilloy

Bug Description

Since yesterday (2015-09-21), FaviconFetcherTests started failing at random in silo builds (not failing all the time, but in about 50% of the builds).

Since it has been happening in several silos, it’s very unlikely that it’s a change in webbrowser-app itself that triggered the failure. It’s more likely a change somewhere else in the stack that triggered a bug that was already there.

Related branches

Revision history for this message
Olivier Tilloy (osomon) wrote :

It took me a while to manage to reproduce locally, but here it is, the failing test is shouldCancelRequests, and the failure message is:

    'serverSpy->wait()' returned FALSE. ()

Full XML output for the failing test is:

    <testcase result="fail" name="shouldCancelRequests">
        <!-- message="QIODevice::write: device not open" type="qwarn" -->
        <failure message="&apos;serverSpy&#x002D;&gt;wait()&apos; returned FALSE. ()" result="fail"/>
    </testcase>

Changed in webbrowser-app (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
status: Triaged → In Progress
assignee: nobody → Olivier Tilloy (osomon)
Revision history for this message
Olivier Tilloy (osomon) wrote :

When the test fails, it appears TestHTTPServer::discardClient() is being called even before TestHTTPServer::readClient() (or rather, TestHTTPServer::readClient() is never being called), so the socket is deleted but the server never registers that a request was made.

On the FaviconFetcher side, the request is emitted, but immediately after I’m seeing the "QIODevice::write: device not open" warning. The test server receives the incoming connection but it is immediately disconnected so it discards the client socket.

Revision history for this message
Olivier Tilloy (osomon) wrote :

Mirv suggested that https://launchpadlibrarian.net/218473113/qtbase-opensource-src_5.4.1%2Bdfsg-2ubuntu8_5.4.1%2Bdfsg-2ubuntu9.diff.gz could very well be when the test started failing, and it looks plausible. Not sure what conclusions to draw from the diff though. Is the issue in the way the test server handles incoming connections, or is it an actual bug in QNAM?

In doubt, I’ll mark qtbase-opensource-src also affected, for further investigation.

Changed in webbrowser-app (Ubuntu):
status: In Progress → Confirmed
status: Confirmed → Triaged
Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

The bug fix https://codereview.qt-project.org/#/c/120738 links to a bug that indicates QNAM erronously closing queued multiple requests with 'Unknown error' and a warning before that "QIODevice:::write device not open" which matches what the test mentions. And the fix itself states "This patch changes the QNAM socket slot connections to be DirectConnection.
We don't close the socket anymore in slots where it is anyway in a closed state afterwards. This prevents event/stack recursions.". Could the type change to DirectConnection cause this change of behavior?

I've rebuilt Scopes packages, unity8, and download manager to have unit test results in addition to autopilot tests that I did earlier, and it still seems to be that the only flakiness is this webbrowser-app unit test, and on x86 only: https://launchpad.net/~canonical-qt5-edgers/+archive/qt5-beta1/+packages As x86 is faster, maybe there's something timing related.

I've asked pstolowski to look whether he can see a delta with/without the patch. He mentioned there have been some 'device is open' in bug reports in the past, which is one of the symptoms being fixed by the bug fix.

It's also useful to know that this bug fix itself is a bug fix to an earlier fix https://codereview.qt-project.org/#/c/110150/ which upstream urged everyone to ship, and we're currently doing that (in OTA-6). So if we're not seeing worrying regressions, it sounds like a worthwhile bug fix that may help random network failures in QNAM using components.

To switch between with/without patch, the revert of this latter patch is in silo 054. One can use the pinning (https://wiki.ubuntu.com/LandingTeam/SiloTestingGuidelines#Install_silos_with_overlay_PPA_enabled), enabling/disabling the file and "apt upgrade" will allow to both quickly "upgrade to the revert" or "downgrade to the fix that was landed".

Revision history for this message
Olivier Tilloy (osomon) wrote :

Seen an armhf build failure yesterday, so I’ll update the description, the issue is not specific to amd64 and i386.

description: updated
Revision history for this message
Olivier Tilloy (osomon) wrote :

I can now easily reproduce the failure on my laptop by running the test while the system is under a heavy load (running a bunch of "cat /dev/urandom > /dev/null" processes help).

Revision history for this message
Olivier Tilloy (osomon) wrote :

When the test fails, the test server reports a QAbstractSocket::RemoteHostClosedError, so the client (the QNAM) closed the connection.

Olivier Tilloy (osomon)
Changed in webbrowser-app (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package webbrowser-app - 0.23+15.10.20151022.1-0ubuntu1

---------------
webbrowser-app (0.23+15.10.20151022.1-0ubuntu1) wily; urgency=medium

  [ CI Train Bot ]
  * New rebuild forced.
  * Resync trunk.

  [ Olivier Tilloy ]
  * Add an exception to the generated apparmor profile to allow reading
    HERE’s TOS in the browser. (LP: #1507667)
  * Modify the generated apparmor profile to allow rw access to
    /dev/shm/.org.chromium.Chromium.* too. (LP: #1508054)
  * Update translation template.

  [ Ugo Riboni ]
  * Fix inability to drag the map to pan in Google maps, on desktop.
    (LP: #1503506)
  * Implement support for allowing or denying access to media input
    devices and for setting default media input devices. (LP: #1410996)
  * Refactor the BookmarksModel to be a singleton.

 -- Olivier Tilloy <email address hidden> Thu, 22 Oct 2015 15:07:49 +0000

Changed in webbrowser-app (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.