Frequent DEP8 test failures related to ftp

Bug #1713098 reported by Andreas Hasenack on 2017-08-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gvfs (Ubuntu)
Undecided
Unassigned

Bug Description

There are somewhat constant, but intermittent, DEP8 test failures in the gvfs package. They seem to always be around ftp:
tp:// anonymous (API with flag) ... FAIL

----- dbus stdout -----

----- dbus stderr -----

test_anonymous_api_user (__main__.Ftp)
ftp:// anonymous (API with user) ... ok
test_anonymous_cli_option (__main__.Ftp)
ftp:// anonymous (CLI with option) ... Error mounting location: Location is already mounted
ERROR

----- dbus stdout -----

----- dbus stderr -----

test_anonymous_cli_user (__main__.Ftp)
ftp:// anonymous (CLI with user) ... ok
test_authenticated_api (__main__.Ftp)
ftp:// authenticated (API) ... FAIL

----- dbus stdout -----

----- dbus stderr -----

test_authenticated_cli (__main__.Ftp)
ftp:// authenticated (CLI) ... FAIL

----- dbus stdout -----

----- dbus stderr -----

Here is a link: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-xenial/xenial/amd64/g/gvfs/20170825_135452_c32a9@/log.gz

This bug is to investigate these ftp tests, determine why they are flaky, and fix them if possible.

s390x and arm skip the test so all green there isn't worth anything.

The others fail in ~30% of the cases I'd say:
http://autopkgtest.ubuntu.com/packages/gvfs/artful/ppc64el
http://autopkgtest.ubuntu.com/packages/gvfs/artful/i386
http://autopkgtest.ubuntu.com/packages/gvfs/artful/amd64

The errors read like networking or concurrency breaking it.
I agree to have the bug to debug and improve the case.

For now I re-triggered the one hanging test on artful.
The release Team can mark dep8 tests as flaky to skip them blocking a migration, but that would prereq an analysis of the case as an argument.

Andreas Hasenack (ahasenack) wrote :

This is the first test that fails:
======================================================================
FAIL: test_anonymous_api_user (__main__.Ftp)
ftp:// anonymous (API with user)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gvfs-testbed-script", line 676, in test_anonymous_api_user
    self.unmount_api(gfile)
  File "/home/gvfs-testbed-script", line 287, in unmount_api
    self.assertNotEqual(self.cb_result, None, 'operation timed out')
AssertionError: None == None : operation timed out

The code does:
    def test_anonymous_api_user(self):
        '''ftp:// anonymous (API with user)'''

        uri = 'ftp://anonymous@localhost:2121'
        gfile = Gio.File.new_for_uri(uri)
        self.assertEqual(self.mount_api(gfile), True)
        try:
            self.do_mount_check_api(gfile, True)
        finally:
            self.unmount_api(gfile)

It's failing in the "finally" clause, with a timeout. It's never unmounted.

Then the next test comes along, and fails because the thing is still mounted:

test_anonymous_cli_user (__main__.Ftp)
ftp:// anonymous (CLI with user) ... gio: ftp://anonymous@localhost:2121/: Location is already mounted
(...)
ERROR: test_anonymous_cli_user (__main__.Ftp)
ftp:// anonymous (CLI with user)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gvfs-testbed-script", line 601, in test_anonymous_cli_user
    subprocess.check_call(['gio', 'mount', uri])
  File "/usr/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gio', 'mount', 'ftp://anonymous@localhost:2121']' returned non-zero exit status 2.

Andreas Hasenack (ahasenack) wrote :

Oh, look at that unmount_api() docstring:

    def unmount_api(self, gfile):
        '''Umount a mounted Gio.File using the Gio API

        This times out after 5 seconds.
        '''

Andreas Hasenack (ahasenack) wrote :

I don't know if the timeout is borderline short, but the unmount definitely failed. Logs further down (sorry, these logs are interleaved and have no timestamp) show that unmount failed because the filesystem was busy:
ftp: Queued new job 0x1002d932210 (GVfsJobCloseRead)
ftp: <- 0 -- 226 Transfer Complete.
ftp: send_reply(0x1002d932210), failed=0 ()
ftp: backend_dbus_handler org.gtk.vfs.Mount:Unmount (pid=7819)
ftp: g_vfs_job_unmount_new request: 0x789a5002adf0
ftp: Queued new job 0x1002d93f740 (GVfsJobUnmount)
ftp: send_reply(0x1002d93f740), failed=1 (File system is busy)
ftp: send_reply(0x1002d93f740), failed=1 (File system is busy)

That "transfer complete" was about self.do_mount_check_api(gfile, True) which, among other things, downloads "myfile.txt" and asserts its contents. That's the last thing that method does, and the "finally:\nself.unmount_api(gfile)" call comes right after that.

I don't know why there are two "file system is busy" messages in the logs. And I also don't know if it would have worked had we waited a bit more, or if the myfile.txt retrieval somehow left that file open and that's why unmount failed.

Iain Lane (laney) wrote :

After increasing the timeout, I ran the tests 5 times each on i386 and amd64.

https://paste.ubuntu.com/p/tXS2dnzYVy/

It passed every time - so it looks as if this might at least help. I just filed https://bugzilla.gnome.org/show_bug.cgi?id=794487 to see if we can get this merged upstream.

Iain Lane (laney) wrote :

I included that patch in 1.36.0-1ubuntu1 just uploaded, but I forgot to close this bug with that upload, sorry.

Let's watch the tests and see how they go.

Changed in gvfs (Ubuntu):
status: New → Fix Committed
Andreas Hasenack (ahasenack) wrote :

Unfortunately the timeout bump didn't fix the problem:
======================================================================
FAIL: test_anonymous_api_user (__main__.Ftp)
ftp:// anonymous (API with user)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gvfs-testbed-script", line 676, in test_anonymous_api_user
    self.unmount_api(gfile)
  File "/home/gvfs-testbed-script", line 287, in unmount_api
    self.assertNotEqual(self.cb_result, None, 'operation timed out')
AssertionError: None == None : operation timed out

----------------------------------------------------------------------
Ran 31 tests in 42.862s

The previous test runs used to take around 32s.

The problem is somewhere else.

Andreas Hasenack (ahasenack) wrote :

We need a way to reproduce this locally, so that we can get a shell and investigate what is going on. So far I tried locally with amd64 and i386 vms, and the tests always pass. I guess I really need a ppc64el one.

Andreas Hasenack (ahasenack) wrote :

Using this patch:
diff --git a/test/gvfs-test b/test/gvfs-test
index 4537d8a5..c82e13a8 100755
--- a/test/gvfs-test
+++ b/test/gvfs-test
@@ -278,7 +278,7 @@ class GvfsTestCase(unittest.TestCase):

         ml = GLib.MainLoop()

- mount.unmount_with_operation(Gio.MountUnmountFlags.NONE, None, None,
+ mount.unmount_with_operation(Gio.MountUnmountFlags.FORCE, None, None,
                                      unmount_done, ml)
         # ensure we are timing out
         GLib.timeout_add_seconds(5, lambda data: ml.quit(), None)

We ran the tests 10x in the build farm on ppc, got no failures: https://paste.ubuntu.com/p/kNcRc4Dd3c/

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.