fatrace autopkgtest failure due to rename test

Bug #1885188 reported by Brian Murray
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
fatrace
Fix Released
Undecided
Martin Pitt
fatrace (Debian)
Fix Released
Unknown
fatrace (Ubuntu)
Fix Released
Medium
Martin Pitt

Bug Description

fatrace version 0.15-1 is stuck in groovy-proposed because the "file renaming" test in fatrace-currentmount is failing. This is reproducable using a qemu virtualization server.

The package passed Debian ci because the test was skipped as it was run on a 4.19 kernel which doesn't support directory events. On Ubuntu a 5.04 kernel was used and so the test was ran and it failed.

Complete logs are here:

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-groovy/groovy/amd64/f/fatrace/20200501_155154_7ae8a@/log.gz

The relevant excerpt is:

checking log...
^mv(.*): <> /tmp/autopkgtest.TV550j/build.0mf/src not found in log:

Looking at the log file I didn't actually see any events relating to test.txt.2 or any lines containing "<>".

tags: added: groovy update-excuse
Changed in fatrace (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in fatrace (Debian):
status: Unknown → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The case is still present and nothing happened on the Debian bug yet.

I found that it works fine on a retry.
1. exec - the reported issue
  no mv event logged
2. exec - works as expected
  fatrace.log:271:mv(1234): <> /tmp/autopkgtest.cWRPFV/build.Y7S/src

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Some experimentation revealed that it is flaky and that this might just be the first event missing on a racy event gathering job.

I was running it in a loop (on a x86 autopkgtest VM) and after some initial tests and 200 iterations is seems like it is ~18% good in regard to the "mv <>" event.

But if we look further sometimes other events come up in the error report, like:
^touch.* C\?W\?O /tmp/autopkgtest.cWRPFV/build.Y7S/src/test.txt not found in log:
^mkdir(.*): + /tmp/autopkgtest.cWRPFV/build.Y7S/src$ not found in log:
...

So it is not limited to the rename event, maybe the buffer got slower or asynchronous?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The way the test works it collects data for 2 seconds and then expects everything to be there.
That isn't sufficient in any case where the I/O takes longer than two seconds which can for many reasons happen (e.g. disk spindown, cpu overload, ...).

Note: also one of the 2 seconds is already the "sleep 1" that tries to allow the process to start.

I started to make it slower and throw in sync's - just to be on the very safe side.
Like:
--- a/tests/fatrace-currentmount
+++ b/tests/fatrace-currentmount
@@ -3,8 +3,8 @@ set -euC

 LOG="$AUTOPKGTEST_TMP/fatrace.log"
 echo "starting fatrace..."
-fatrace --current-mount -s 2 -o $LOG &
-sleep 1
+fatrace --current-mount -s 40 -o $LOG &
+sleep 20

 echo "read a file..."
 head NEWS > /dev/null
@@ -24,9 +24,11 @@ mkdir dest
 mv "${TEST_FILE}.2" dest/"${TEST_FILE}.2"
 rm dest/"${TEST_FILE}.2"
 rmdir dest
+sync

 echo "waiting for fatrace..."
 wait
+sync

 echo "checking log..."
 check_log() {

This didn't fix it either - so it is not just late-start or early-exit

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It starts to feels like if genuinely "misses" events "sometimes".

To illustrate I was running the "hardened and slow" version a few times.
And then I checked which of the events were found.

Modified the test to run on multiple logs after a loop, like

#!/bin/bash

LOG=$1
TPATH="/tmp/autopkgtest.cWRPFV/build.Y7S/src"
TEST_FILE="${TPATH}/test.txt"
RC=0

check_log() {
    if ! grep -q "$1" $LOG; then
        printf "F"
        ((RC=RC+1))
    else
        printf "."
    fi
}

printf "%22s : " "${LOG}"

# accessing the NEWS file
check_log "RC\?O\? ${TPATH}/NEWS$"
# file creation
TEST_FILE=$(realpath "$TEST_FILE")
check_log "^touch.* C\?W\?O $TEST_FILE"
check_log "^touch.* C\?WO\? $TEST_FILE"
check_log "^bash(.* C\?WO\? $TEST_FILE"
# file deletion
check_log "^rm(.*): D $(dirname $TEST_FILE)$"
# dir creation
check_log "^touch(.*): + $(dirname $TEST_FILE)$"
check_log "^mkdir(.*): + $(dirname $TEST_FILE)$"
# file renaming
check_log "^mv(.*): <> $(dirname $TEST_FILE)"
# file moving
check_log "^mv(.*): < $(dirname $TEST_FILE)$"
check_log "^mv(.*): > $(dirname $TEST_FILE)/dest$"

echo " - ${RC} matches missing"
exit ${RC}

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

And the test results show - it is not a particular kind ov events, not all of them, not the first ones, ...

   fatrace.log.7XZ3Ief : ...F...F.F - 3 matches missing
   fatrace.log.DHuzcGX : ....F..... - 1 matches missing
   fatrace.log.DtQWtxF : ......FF.. - 2 matches missing
   fatrace.log.IPN1ZZX : ...F...F.F - 3 matches missing
   fatrace.log.JmcGCpx : .......... - 0 matches missing
   fatrace.log.K06ToCE : .......... - 0 matches missing
   fatrace.log.KSdRSG8 : .......F.. - 1 matches missing
   fatrace.log.NepBJN8 : .......F.. - 1 matches missing
   fatrace.log.OiECdoh : ......FF.F - 3 matches missing
   fatrace.log.TydkhKd : .......... - 0 matches missing
   fatrace.log.UV22acQ : .......F.. - 1 matches missing
   fatrace.log.UxzxTRb : ....F..... - 1 matches missing
   fatrace.log.VApdiDa : ......FF.F - 3 matches missing
   fatrace.log.Y8rSY2G : .......... - 0 matches missing
   fatrace.log.ehxRpk7 : .......F.. - 1 matches missing
   fatrace.log.k8NBKKk : ....F..F.. - 2 matches missing
   fatrace.log.kLmSHVi : ...FF.F.FF - 5 matches missing
   fatrace.log.kb4wV7v : ...FF..... - 2 matches missing
   fatrace.log.nsbCDTm : ......FF.. - 2 matches missing
   fatrace.log.oR5UZ58 : ......FF.. - 2 matches missing
   fatrace.log.qACUHvY : ....F.FF.. - 3 matches missing
   fatrace.log.rxWTYs5 : ....F...FF - 3 matches missing
   fatrace.log.rxrJBZl : .......... - 0 matches missing
   fatrace.log.uCDFrK1 : .......F.. - 1 matches missing
   fatrace.log.uTSKJQF : .......F.. - 1 matches missing
   fatrace.log.wNQsl8z : .FF..FFF.. - 5 matches missing
   fatrace.log.zT5gLQF : .......F.. - 1 matches missing

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Also reported an interim update to Debian

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.8 KiB)

I wanted to check if this could be an Ubuntu kernel bug, so I checked Debian on:

 Linux debian 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux

It is slightly faster there, but the results look just as bad:
     fatrace.log.0oLjK : .FFFFFFFFF - 9 matches missing
     fatrace.log.0qKTf : ....F..FFF - 4 matches missing
     fatrace.log.18xyI : .......F.. - 1 matches missing
     fatrace.log.2Jk3p : ......F.FF - 3 matches missing
     fatrace.log.38fQg : .......... - 0 matches missing
     fatrace.log.3k2ln : .FFF.FFFFF - 8 matches missing
     fatrace.log.6ayso : ...FF....F - 3 matches missing
     fatrace.log.6X8WR : .......... - 0 matches missing
     fatrace.log.7LHFJ : .FF.FFF.FF - 7 matches missing
     fatrace.log.8L8D5 : .FFFFFFFFF - 9 matches missing
     fatrace.log.a8ucx : ...F..FFFF - 5 matches missing
     fatrace.log.Amer7 : ....F.FFFF - 5 matches missing
     fatrace.log.anbdn : ...FF.FFFF - 6 matches missing
     fatrace.log.AWqH7 : ...FF.F.FF - 5 matches missing
     fatrace.log.b2EPv : .FF.FFFFFF - 8 matches missing
     fatrace.log.bHqPs : .FF.FFFFFF - 8 matches missing
     fatrace.log.CbDqn : ....F.FFFF - 5 matches missing
     fatrace.log.ckmDf : ....F.FF.. - 3 matches missing
     fatrace.log.dAyHW : .......... - 0 matches missing
     fatrace.log.F3s4G : ....F.FFFF - 5 matches missing
     fatrace.log.fOcgN : ....F.FFFF - 5 matches missing
     fatrace.log.fq1vl : ...FF.FF.. - 4 matches missing
     fatrace.log.G3LdW : ...F..FF.. - 3 matches missing
     fatrace.log.gB3KR : ...FF.FFFF - 6 matches missing
     fatrace.log.GChqm : .FFF.FFFFF - 8 matches missing
     fatrace.log.Ge8Fk : .FF..F.FFF - 6 matches missing
     fatrace.log.GEral : ......FF.. - 2 matches missing
     fatrace.log.GgPdn : .FFFFFFFFF - 9 matches missing
     fatrace.log.gXPUR : ....F..F.F - 3 matches missing
     fatrace.log.hZ2v4 : ....F..FFF - 4 matches missing
     fatrace.log.IVFQj : ....F..... - 1 matches missing
     fatrace.log.Ivpzb : ......FF.. - 2 matches missing
     fatrace.log.jga62 : ..FFF.FFFF - 7 matches missing
     fatrace.log.jMu0a : .......... - 0 matches missing
     fatrace.log.JUQZO : ....F.FFFF - 5 matches missing
     fatrace.log.KKNvF : ...FF.FFFF - 6 matches missing
     fatrace.log.Kn4Ou : ...F....FF - 3 matches missing
     fatrace.log.kNEoX : ...FF.FF.F - 5 matches missing
     fatrace.log.Ktxi7 : ...FF..... - 2 matches missing
     fatrace.log.LoWR1 : .FF..FFFFF - 7 matches missing
     fatrace.log.LZnyG : .FF.FF.FFF - 7 matches missing
     fatrace.log.mWwXg : .FF.FFFF.. - 6 matches missing
     fatrace.log.nIj9e : ...FF.F... - 3 matches missing
     fatrace.log.nzzNl : .FF..FFFFF - 7 matches missing
     fatrace.log.O4QiA : ........FF - 2 matches missing
     fatrace.log.OM5Wp : .FF...F... - 3 matches missing
     fatrace.log.OxOAT : .......F.. - 1 matches missing
     fatrace.log.q5oU5 : ....F...FF - 3 matches missing
     fatrace.log.qBlok : .FF..FFFFF - 7 matches missing
     fatrace.log.QUkD2 : ...FF.FF.. - 4 matches missing
     fatrace.log.rWC7B : .FFF.F.F.. - 5 matches missing
     fatrace.log.t9Pvd : ...FF...FF - 4 matches missing
     fatrace.log.UNS1f : ....F.FFFF - 5 matches missing
    ...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I tried the former 0.13 version.
The format changed a bit and some support was not there.
So let us not think about the latter 6 events. But the first 4 events can be found and are only present in some runs.

# for i in fatrace.log.*; do ./grepcheck $i; done
     fatrace.log.4Y4sj : ....FFFFFF - 6 matches missing
     fatrace.log.BmDhE : ....FFFFFF - 6 matches missing
     fatrace.log.D3Q0R : .FF.FFFFFF - 8 matches missing
     fatrace.log.dBKeg : ....FFFFFF - 6 matches missing
     fatrace.log.DwbX5 : ....FFFFFF - 6 matches missing
     fatrace.log.FfiSG : .FF.FFFFFF - 8 matches missing
     fatrace.log.HZY4w : ....FFFFFF - 6 matches missing
     fatrace.log.KQgf3 : ....FFFFFF - 6 matches missing
     fatrace.log.kWx9y : ....FFFFFF - 6 matches missing
     fatrace.log.lM669 : FFFFFFFFFF - 10 matches missing
     fatrace.log.mYvpw : ....FFFFFF - 6 matches missing
     fatrace.log.n78BL : ....FFFFFF - 6 matches missing
     fatrace.log.qr7ll : ....FFFFFF - 6 matches missing
     fatrace.log.rglRA : ....FFFFFF - 6 matches missing
     fatrace.log.s3gPp : ....FFFFFF - 6 matches missing
     fatrace.log.sktJy : ....FFFFFF - 6 matches missing

This leads to the assumption that this never worked better, but now is catched.

How can this test well in Debian CI as I can reproduce in Debian just as well?
The answer - just as Brian pointed out initially - still seems to be:
   checking log...
   skipping FAN_REPORT_FID tests, not available on this kernel
It will most likely hit Debian once the kernel in the test env gets bumped.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Since (so far) only the --currentmount tests failed I tried if it is different without that.
It is, but only statistically - I get ~50% good results then - maybe a timing/buffer issue that works better but not perfect without this?

# while /bin/true; do f=$(mktemp /tmp/fatrace/fatrace.log.XXXXX); mv /tmp/fatrace/fatrace.log $f; ./tests/fatrace-currentmount; done
# for i in fatrace.log.*; do ./grepcheck $i; done
     fatrace.log.0xLev : .......F.. - 1 matches missing
     fatrace.log.2Jmxh : ........FF - 2 matches missing
     fatrace.log.3USXR : ......F... - 1 matches missing
     fatrace.log.44Di7 : .FF.FF.... - 4 matches missing
     fatrace.log.83JEz : ...F...... - 1 matches missing
     fatrace.log.9cd6V : .......... - 0 matches missing
     fatrace.log.9U2XK : .FF..F.... - 3 matches missing
     fatrace.log.a5lNX : .......F.. - 1 matches missing
     fatrace.log.bsAZe : ......FF.. - 2 matches missing
     fatrace.log.bZxDG : .......... - 0 matches missing
     fatrace.log.CN2Gc : .......... - 0 matches missing
     fatrace.log.daVJ2 : .......... - 0 matches missing
     fatrace.log.DrQVf : .......... - 0 matches missing
     fatrace.log.eT5r1 : ......F.FF - 3 matches missing
     fatrace.log.F0P6T : .......... - 0 matches missing
     fatrace.log.f4Ak7 : .......... - 0 matches missing
     fatrace.log.H2oRi : ....F..F.. - 2 matches missing
     fatrace.log.inSK6 : ...FF..F.. - 3 matches missing
     fatrace.log.iSJJG : ......F.FF - 3 matches missing
     fatrace.log.Jqxrk : ...F...... - 1 matches missing
     fatrace.log.L912a : ......F... - 1 matches missing
     fatrace.log.n5aUL : .......F.. - 1 matches missing
     fatrace.log.Npaef : .......... - 0 matches missing
     fatrace.log.nx0ON : ......FF.. - 2 matches missing
     fatrace.log.OLZUL : .FF....F.. - 3 matches missing
     fatrace.log.OQSpO : .FF.FFFF.. - 6 matches missing
     fatrace.log.OvqMa : ....F..FFF - 4 matches missing
     fatrace.log.oyr7Y : ........FF - 2 matches missing
     fatrace.log.pB4od : .......... - 0 matches missing
     fatrace.log.PKskN : ..F...FF.. - 3 matches missing
     fatrace.log.rf6HO : .......... - 0 matches missing
     fatrace.log.s3Oi9 : .......F.. - 1 matches missing
     fatrace.log.taEHU : .FF....F.. - 3 matches missing
     fatrace.log.TfMIx : .......... - 0 matches missing
     fatrace.log.tiSAg : FFFFFFFFFF - 10 matches missing
     fatrace.log.u8tJp : .......... - 0 matches missing
     fatrace.log.updMK : .......... - 0 matches missing
     fatrace.log.WVFtI : .......... - 0 matches missing
     fatrace.log.x50aO : ......FF.. - 2 matches missing
     fatrace.log.XnqDl : .......... - 0 matches missing
     fatrace.log.yPsnf : ......F... - 1 matches missing
     fatrace.log.zBbGh : .......... - 0 matches missing

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks Christian for your initial investigations! I'm back from vacation, looking into this now. Observations:

 * The 0.13 tests still work fine <https://autopkgtest.ubuntu.com/packages/f/fatrace/groovy/amd64>, so this is indeed a problem with the "new" FAN_REPORT_FID mode.

 * I finally moved the upstream code from git.lauchpad.net to github <https://github.com/martinpitt/fatrace> and added Travis integration for Ubuntu 20.04 LTS: This worked: https://travis-ci.com/github/martinpitt/fatrace/builds/180931055 (also a previous run on the tests branch).

 * I can in principle reproduce this on my Fedora 32 host (kernel 5.7.10), but it's really difficult -- I only got one failure in about 100 runs. This was when I had quite some background noise (downloading a VM image), and thus a lot of events that were unrelated to the test.

 * After the image download finished, and thus background noise ceased, I can no longer reproduce this.

 * I tried in an Ubuntu 20.04 VM, and after some dozen iterations I get a *different* type of failure:

   ^bash(.* C\?WO\? /root/fatrace/test.txt not found in log

     bash(8605): CO /etc/ld.so.cache
     [...]
     unknown(8605): CWO /root/fatrace/test.txt

   ^rm(.*): D /root/fatrace$ not found in log

     rm(10115): O /usr/lib/locale/C.UTF-8/LC_NUMERIC
     rm(10115): O /usr/lib/locale/C.UTF-8/LC_CTYPE
     unknown(10115): D /root/fatrace
     unknown(10115): C /usr/bin/rm

   My first Travis run <https://travis-ci.com/github/martinpitt/fatrace/builds/180930904> had something similar.

  This is again the ancient race condition of /proc/pid/comm being out of sync with the actual process -- there is no race free way how to map the fanotify event to a process exec. To fix this, I could cache the lookup result for a while, but of course it would then be subject to pid re-use.

  But either way, this is a different failure than this bug report.

 * I can make the locally running test fail when creating background noise with

       find . -exec head '{}' \; > /dev/null

   But it usually just fails with the "unknown" error.

So it seems I should first do something about that "/proc/pid/comm out of sync" issue.

Martin Pitt (pitti)
Changed in fatrace:
assignee: nobody → Martin Pitt (pitti)
Revision history for this message
Martin Pitt (pitti) wrote :

I improved the "unknown" process names a bit now <https://github.com/martinpitt/fatrace/commit/314cb93dc5589>.

But I just realized what actually goes wrong in terms of *this* bug: All three autopkgtest failures <https://autopkgtest.ubuntu.com/packages/f/fatrace/groovy/amd64> fail like this:

  ^mv(.*): <> /tmp/autopkgtest.TV550j/build.0mf/src not found in log:

    mv(1320): < /tmp/autopkgtest.TV550j/build.0mf/src
    mv(1320): > /tmp/autopkgtest.TV550j/build.0mf/src

So it seems in some circumstances a file rename is not sent as a single FAN_MOVED_FROM|FAN_MOVED_TO event, but two discrete ones. So I just need to fix the test to accept either: https://github.com/martinpitt/fatrace/commit/6cf5a35dc18

Changed in fatrace:
status: New → Fix Committed
Changed in fatrace (Ubuntu):
status: Triaged → In Progress
assignee: nobody → Martin Pitt (pitti)
Revision history for this message
Martin Pitt (pitti) wrote :

https://github.com/martinpitt/fatrace/releases/tag/0.16

I uploaded this to Debian as well, from where it should get auto-imported into groovy.

Changed in fatrace:
assignee: Martin Pitt (pitti) → nobody
status: Fix Committed → Fix Released
assignee: nobody → Martin Pitt (pitti)
Changed in fatrace (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package fatrace - 0.16-1

---------------
fatrace (0.16-1) unstable; urgency=medium

  * New upstream version 0.16:
    - Avoid most "unknown" process names on process exit
    - tests: Accept split file rename events (Closes: #963714, LP: #1885188)
  * Adjust packaging for upstream move to GitHub
  * Update Standards-Version to 4.5.0.
    No changes necessary.

 -- Martin Pitt <email address hidden> Sat, 22 Aug 2020 15:57:39 +0200

Changed in fatrace (Ubuntu):
status: Fix Committed → Fix Released
Changed in fatrace (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.