Comment 2 for bug 1133249

Revision history for this message
Callum Macdonald (chmac) wrote :

I don't think it is a race condition. I did consider that and I ran some tests. I specifically chose a delay of 2.7s because the file is written every 1s, so a delay (twice) of 2.7s means we try to read at 0s, 2.7s, and 5.4s +/- 10ms. It seems highly improbable that the script is writing at those same intervals, when it's set to write every 1s.

I also ran a test as I described above where I check the size of the file every 50ms or 100ms. The file remains 0 bytes for more than 30 checks in some instances. Using these commands:
$ while true; do ls --full-time /var/spool/pt-heartbeat >> pt-heartbeat.log; sleep 0.05; done
$ cat pt-heartbeat.log | grep " 0 " | uniq -c

I can see some instances where for 40 x 50ms, the file remains at 0 bytes. As I understand it, that's not a race condition, that's something wonky with the tool on this machine. While running the same test on another machine shows 0 results with a filesize of 0 bytes.

I don't know enough about filesystems to hazard a guess as to what's going on. I can say that in the week we've been running the script on two separate machines, we've never had a single failure on the SSD based machine, but it happens frequently on this disk based machine.