Spurious per_interrepository.test_fetch.TestInterRepository. test_fetch_parent_inventories_at_stacking_boundary_smart_old failures

Bug #874153 reported by Jelmer Vernooij on 2011-10-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
High
Unassigned
Breezy
Medium
Unassigned
bzr (Ubuntu)
High
Unassigned

Bug Description

The daily builds are occasionally getting a spurious test failure:

======================================================================
ERROR: bzrlib.tests.per_interrepository.test_fetch.TestInterRepository.test_fetch_parent_inventories_at_stacking_boundary_smart_old(InterDifferingSerializer+get_known_graph_ancestry,RepositoryFormatKnitPack1,RepositoryFormatKnitPack6RichRoot)
----------------------------------------------------------------------
_StringException: log: {{{
326.165 creating repository in bzr://127.0.0.1:53297/extra/branch/.bzr/.
326.193 creating branch <bzrlib.branch.BzrBranchFormat6 object at 0x8524f4c> in chroot-148171468:///branch/
326.219 preparing to commit
326.238 Selecting files for commit with filter None
326.302 preparing to commit
326.316 Selecting files for commit with filter None
326.410 preparing to commit
326.419 Selecting files for commit with filter None
326.519 preparing to commit
326.519 commit parent revision {right}
326.532 Selecting files for commit with filter None
326.621 creating repository in bzr://127.0.0.1:53297/extra/trunk/.bzr/.
326.647 creating branch <bzrlib.branch.BzrBranchFormat7 object at 0x852742c> in chroot-148171468:///trunk/
326.657 Using fetch logic to copy between RemoteRepository(bzr://127.0.0.1:53297/extra/branch/.bzr/)(RemoteRepositoryFormat(_network_name='Bazaar pack repository format 1 (needs bzr 0.92)\n')) and RemoteRepository(bzr://127.0.0.1:53297/extra/trunk/.bzr/)(RemoteRepositoryFormat(_network_name='Bazaar RepositoryFormatKnitPack6RichRoot (bzr 1.9)\n'))
326.665 fetching: <SearchResult search:(set(['left']), set(['null:']), 2)>
326.667 Traceback (most recent call last):
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/protocol.py", line 907, in accept_bytes
    _StatefulDecoder.accept_bytes(self, bytes)
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/protocol.py", line 402, in accept_bytes
    self.state_accept()
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/protocol.py", line 1013, in _state_accept_expecting_message_part
    self.done()
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/protocol.py", line 1051, in done
    raise errors.SmartMessageHandlerError(sys.exc_info())
SmartMessageHandlerError: The message handler raised an exception:
Traceback (most recent call last):
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/protocol.py", line 1049, in done
    self.message_handler.end_received()
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/message.py", line 164, in end_received
    "Complete conventional request was received, but request "
SmartProtocolError: Generic bzr smart protocol error: Complete conventional request was received, but request handler has not finished reading.
}}}

traceback-5: {{{
Traceback (most recent call last):
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/tests/per_interrepository/test_fetch.py", line 182, in test_fetch_parent_inventories_at_stacking_boundary_smart_old
    self.test_fetch_parent_inventories_at_stacking_boundary()
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/tests/per_interrepository/test_fetch.py", line 213, in test_fetch_parent_inventories_at_stacking_boundary
    trunk.repository.fetch(branch.repository, 'left')
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/remote.py", line 1673, in fetch
    find_ghosts=find_ghosts, fetch_spec=fetch_spec)
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/decorators.py", line 217, in write_locked
    result = unbound(self, *args, **kwargs)
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/vf_repository.py", line 2517, in fetch
    find_ghosts=find_ghosts)
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/fetch.py", line 76, in __init__
    self.__fetch()
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/fetch.py", line 103, in __fetch
    self._fetch_everything_for_search(search_result)
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/fetch.py", line 131, in _fetch_everything_for_search
    stream, from_format, [])
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/remote.py", line 2098, in insert_stream
    (verb, path, '') + lock_args, byte_stream)
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/client.py", line 181, in call_with_body_stream
    expect_response_body=False)
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/client.py", line 81, in _call_and_read_response
    expect_body=expect_response_body),
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/message.py", line 299, in read_response_tuple
    self._wait_for_response_args()
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/message.py", line 264, in _wait_for_response_args
    self._read_more()
  File "/build/buildd/bzr-2.5.0~bzr6213~ppa3977~oneiric1/build/lib.linux-i686-2.7/bzrlib/smart/message.py", line 286, in _read_more
    "Unexpected end of message. "
ConnectionReset: Connection closed: Unexpected end of message. Please check connectivity and permissions, and report a bug if problems persist.
}}}

----------------------------------------------------------------------

Related branches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/14/2011 2:32 PM, Jelmer Vernooij wrote:
> Public bug reported:
>
> The daily builds are occasionally getting a spurious test failure:
>
This had been happening on Babune, which caused me to introduce this
change:

 6197 Patch Queue Manager 2011-10-06 [merge]
      (jameinel) Allow for some tests to run slowly under load. (John
A Meinel)
      === modified file 'bzrlib/tests/test_server.py'
      --- bzrlib/tests/test_server.py 2011-10-03 14:15:44 +0000
      +++ bzrlib/tests/test_server.py 2011-10-05 15:01:54 +0000
      @@ -616,7 +616,7 @@
                   return

      -_DEFAULT_TESTING_CLIENT_TIMEOUT = 4.0
      +_DEFAULT_TESTING_CLIENT_TIMEOUT = 60.0

       class TestingSmartServer(TestingThreadingTCPServer,
server.SmartTCPServer):

So if they are still failing, that would indicate the single test is
taking more than 1 minute to complete.

Note that if this really is the failure, the connection retry code
that I wrote should also handle this by just retrying from the client
side.

Though in that traceback there isn't a ConnectionTimeout. Instead it
looks like one side just closes the connection.

Specifically, it looks like the client sends some data, and indicates
that the message is done. The server reads this information, but feels
there is more that needs to be said, and raises an exception, closing
the connection to the client.

Now, if the client is genuinely sending a malformed message, that is
how it is supposed to work.

If find it odd that the test highlighted here is also the one that was
failing because of timeouts, though.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6dXcIACgkQJdeBCYSNAAOqnwCfbWQEKoui8be0OgQdtrisFl8v
oCUAoKgRrbywsXvGmFh7j8xm43F0p8hh
=062Z
-----END PGP SIGNATURE-----

Vincent Ladeuil (vila) wrote :

> So if they are still failing, that would indicate the single test is
> taking more than 1 minute to complete.

AIUI, your first diagnosis was that the test itself needed more than 4s to
complete under heavy load.

That doesn't sound to be the case with 1 min, so something else is going on
there.

>
> Note that if this really is the failure, the connection retry code
> that I wrote should also handle this by just retrying from the client
> side.

True. But that wouldn't address the underlying (and yet unknown) issue.

>
> Though in that traceback there isn't a ConnectionTimeout. Instead it
> looks like one side just closes the connection.

I had concerns about closing too much too often server-side, that may be
related.

>
> Specifically, it looks like the client sends some data, and indicates
> that the message is done. The server reads this information, but feels
> there is more that needs to be said, and raises an exception, closing
> the connection to the client.

>
> Now, if the client is genuinely sending a malformed message, that is
> how it is supposed to work.

But in this case the failure should be permanent, not transient no ?

>
> If find it odd that the test highlighted here is also the one that was
> failing because of timeouts, though.

Could it be that the server is closing the wrong client connection then ?

Vincent Ladeuil (vila) on 2011-11-10
tags: added: babune selftest test-failure
Jelmer Vernooij (jelmer) wrote :
Changed in bzr (Ubuntu):
status: New → Triaged
importance: Undecided → High
Vincent Ladeuil (vila) on 2011-11-10
summary: - Spurious test failure
+ Spurious
+ per_interrepository.test_fetch.TestInterRepository.test_fetch_parent_inventories_at_stacking_boundary_smart_old
+ failures

Occurrences since 2011-10-21:

http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/549/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/553/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/558/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/559/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/560/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/564/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/567/

http://babune.ladeuil.net:24842/view/%20High/job/selftest-osx-10.6/330/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-osx-10.6/334/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-osx-10.6/336/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-osx-10.6/337/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-osx-10.6/340/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-osx-10.6/341/
http://babune.ladeuil.net:24842/view/%20High/job/selftest-osx-10.6/343/

http://babune.ladeuil.net:24842/job/selftest-chroot-lucid/272/
http://babune.ladeuil.net:24842/job/selftest-chroot-lucid/282/
http://babune.ladeuil.net:24842/job/selftest-chroot-lucid/284/

http://babune.ladeuil.net:24842/job/selftest-chroot-maverick/279/
http://babune.ladeuil.net:24842/job/selftest-chroot-maverick/284/
http://babune.ladeuil.net:24842/job/selftest-chroot-maverick/289/

http://babune.ladeuil.net:24842/job/selftest-chroot-natty/283/
http://babune.ladeuil.net:24842/job/selftest-chroot-natty/285/

http://babune.ladeuil.net:24842/job/selftest-chroot-oneiric/204/
http://babune.ladeuil.net:24842/job/selftest-chroot-oneiric/222/
http://babune.ladeuil.net:24842/job/selftest-chroot-oneiric/223/
http://babune.ladeuil.net:24842/job/selftest-chroot-oneiric/225/

Martin Pool (mbp) wrote :

Since this is causing test failures but it's hard to debug and not apparently related to an actual user-affecting issue, I want to just make these tests knownfailures, mentioning this bug, and the other similar ones, and then move on. If we find a similar user affecting problem we can come back and dig more.

Martin Pool (mbp) on 2011-11-29
summary: - Spurious
- per_interrepository.test_fetch.TestInterRepository.test_fetch_parent_inventories_at_stacking_boundary_smart_old
- failures
+ Spurious per_interrepository.test_fetch.TestInterRepository.
+ test_fetch_parent_inventories_at_stacking_boundary_smart_old failures
Martin Packman (gz) wrote :

A mitigation has now landed in bzr 2.5 which should prevent this problem breaking the test run, and be enough to resolve the issue for the Ubuntu builders.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package bzr - 2.6.0~beta1-0ubuntu1

---------------
bzr (2.6.0~beta1-0ubuntu1) quantal; urgency=low

  * New upstream release.
   + Fixes handling of colocated branches in `bzr rmbranch`. LP: #920653
   + rmbranch now refuses to remove active branch. LP: #922953
   + Connecting with HTTPS via HTTP correctly uses host name of destination
     rather than that of proxy. LP: #944696
   + No longer requires tty when GPG signing commits. LP: #847388
   + Fixes unicode erorrs when translated progress task messages
     contain non-ascii text. LP: #966934
   + Fixes display of help for configuration options that overlap
     with other topics. LP: #941672
   + Drop 06_spurious_test_failure: applied upstream. LP: #874153
   + Drop 05_allow_no_slash_controldirs: applied upstream.
  * Drop unnecessary build conflicts with zlib.
 -- Jelmer Vernooij <email address hidden> Fri, 08 Jun 2012 12:59:27 +0200

Changed in bzr (Ubuntu):
status: Triaged → Fix Released
Jelmer Vernooij (jelmer) on 2017-11-08
tags: added: check-for-breezy
Jelmer Vernooij (jelmer) on 2017-12-04
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
tags: removed: check-for-breezy
Jelmer Vernooij (jelmer) on 2018-09-16
Changed in bzr:
status: Confirmed → Fix Released
Changed in brz:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers