Comment 6 for bug 1733469

Revision history for this message
Joel Sing (jsing) wrote :

This is readily reproducible as follows:

 $ sudo touch /var/lib/juju/metricspool/metrics.meta
 <restart unit agent>

At which point the unit agent will start logging:

  2017-11-21 06:49:31 ERROR juju.worker.dependency engine.go:551 "metric-sender" manifold worker returned unexpected error: failed to open the metric reader: EOF

And leaking file handles:

  $ while sleep 5; do sudo lsof | grep \/var\/lib\/juju\/agents | wc -l; done
  36
  54
  63
  81
  99

A similar failure case will also occur if there are corrupt metrics metadata files (e.g. zero bytes due to power outage/file system corruption):

  $ sudo dd if=/dev/zero of=metrics.meta bs=512 count=1
  <restart unit agent>

  2017-11-21 06:58:49 ERROR juju.worker.dependency engine.go:551 "metric-sender" manifold worker returned unexpected error: failed to open the metric reader: invalid character '\x00' looking for beginning of value

The zero-length metadata case can be caused by a bug in recordMetaData, since the code that closes and moves the file is called unconditionally from a defer - this means that if the encode fails (e.g. full disk), the metadata file is still put into place:

  https://github.com/juju/juju/blob/develop/worker/metrics/spool/metrics.go#L265