Comment 10 for bug 1580615

Revision history for this message
Alex Schultz (alex-schultz) wrote :

So a few things here, it appears the host that the task failed on was very loaded. In the logs, it shows that astute kicked off the puppet task at 13:10:55.

2016-05-11 13:10:55 INFO [2703] Task[updatedb/3]: Run on node: Node[3]

But the first log line for the task was at 13:11:37

2016-05-11 13:11:37 +0000 Scope(Class[Osnailyfacter::Ceph::Updatedb]) (notice): MODULAR: ceph/updatedb.pp
2016-05-11 13:11:39 +0000 Puppet (notice): Compiled catalog for node-3.test.domain.local in environment production in 2.97 seconds
2016-05-11 13:11:56 +0000 /Stage[main]/Osnailyfacter::Ceph::Updatedb/Exec[Ensure /var/lib/ceph in the updatedb PRUNEPATH]/returns (notice): executed successfully
2016-05-11 13:11:59 +0000 Puppet (notice): Finished catalog run in 8.53 seconds

So this means that it took ~42 seconds for the fact loading before it actually started to run the manifest.

Additionally if you look at the task that was being run it's just a single exec. The exec consists of a sed and an unless that performs a file existence test and a grep. The catalog compilation took ~3 seconds and the execution of the simple sed/grep took over 10 seconds. Unfortunately the combined slowness of this entire process exceeded the 60 second task timeout. So to address this issue the only thing we can really do is increase the task timeout to allow for heavily loaded environments. This should not be a problem in an actual production deployment since the hosts being deployed on should not be so loaded, but in CI this can be an issue.