illegal attempt to update using time y when last update time is x (minimum one second step)

Bug #2003839 reported by Christian Ehrhardt 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Wormly Collectd installer
New
Unknown
collectd (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

This is a spin off from bug 1971093, not a regression as we had no collectd at all, but waiting on Upstream to respond.

We will link the upstream case and chime in there, hoping to get some insight.

Issue: https://github.com/collectd/collectd/issues/3963

Repro (not a 100% hit rate yet):

# Install packages
$ apt install collectd collectd-utils rrdtool
Change default config to (not sure if all are required):
$ vim /etc/collectd/collectd.conf
...
<Plugin rrdtool>
 DataDir "/var/lib/collectd/rrd"
 CacheTimeout 10
 CacheFlush 900
 RandomTimeout 5
 WritesPerSecond 50
...

a) as reported upstream
# Enable new config and clean old data
$ systemctl stop collectd
$ rm -rf /var/lib/collectd/rrd/*
$ systemctl start collectd

b) as I've found it on my system
# restart once to enable the new config
$ systemctl restart collectd
# let it collect some data
$ sleep 120s
# remove files, restart again
$ rm -rf /var/lib/collectd/rrd/*
$ systemctl restart collectd

Then check the service output:
$ journalctl -u collectd

Notes:
- I can only see that
  a) while stopping the service
  b) after deleting DataDir

Others (https://bugs.launchpad.net/ubuntu/+source/collectd/+bug/1971093/comments/33) have reported to see it more than once. I'm unsure if failing with a warning on deleted data is too much of a real problem. We need to find if this can also occur in other more critical scenarios.

Sadly this is racy, in 1 of 10 cases I got:
rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/j.lxd/disk-loop8/disk_time.rrd) failed: /var/lib/collectd/rrd/j.lxd/disk-loop8/disk_time.rrd: illegal attempt to update using time 1674629134 when last update time is 1674629143 (minimum one second step)

In the others I got only the more expected (and IMHO ok):
rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/j.lxd/battery-0/capacity.rrd) failed: opening '/var/lib/collectd/rrd/j.lxd/battery-0/capacity.rrd': No such file or directory

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It does not always show up (makes bisecting hard :-/)

Bad case:
Jan 25 06:45:55 j collectd[31381]: Exiting normally.
Jan 25 06:45:55 j systemd[1]: Stopping Statistics collection and monitoring daemon...
Jan 25 06:45:55 j collectd[31381]: collectd: Stopping 5 read threads.
Jan 25 06:45:55 j collectd[31381]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/j.lxd/disk-nvme0n1p5/disk_io_time.rrd) failed: /var/lib/collectd/rrd/j.lxd/disk-nvme0n1p5/disk_io_time.rrd: illegal attempt to update using time 1674629134 when last update time is 1674629143 (minimum one second step)
Jan 25 06:45:55 j collectd[31381]: collectd: Stopping 5 write threads.
Jan 25 06:45:55 j collectd[31381]: rrdtool plugin: Shutting down the queue thread. This may take a while.
Jan 25 06:45:55 j collectd[31381]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/j.lxd/disk-loop8/disk_time.rrd) failed: /var/lib/collectd/rrd/j.lxd/disk-loop8/disk_time.rrd: illegal attempt to update using time 1674629134 when last update time is 1674629143 (minimum one second step)

Same config/szenario - good case:
Jan 25 06:51:14 j collectd[31447]: Exiting normally.
Jan 25 06:51:14 j systemd[1]: Stopping Statistics collection and monitoring daemon...
Jan 25 06:51:14 j collectd[31447]: collectd: Stopping 5 read threads.
Jan 25 06:51:14 j collectd[31447]: collectd: Stopping 5 write threads.
Jan 25 06:51:14 j collectd[31447]: rrdtool plugin: Shutting down the queue thread. This may take a while.
Jan 25 06:51:14 j collectd[31447]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/j.lxd/battery-0/capacity.rrd) failed: opening '/var/lib/collectd/rrd/j.lxd/battery-0/capacity.rrd': No such file or directory

Since this is related to caching and write threads, maybe this is about some info being in memory or those threads - which races on a restart with other content like the removed files on disk or the flushing.

Waiting for others to chime in...

description: updated
description: updated
description: updated
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Other than this report and the comment leading to it I found more discussions about the same.
Old issues:
- https://github.com/collectd/collectd/issues/230
Discussions declared work as intended:
- https://github.com/collectd/collectd/issues/781
And many many Forum entries.

The thing they all share is that the rrd files contain some config data like the interval and such.
And in almost all of them issues like this got resolved by removing the rrd data and restarting collectd.

@Jussi - for your initial report - is that an option for you to try?
I mean maybe your system has old records with some kind of old data being incompatible after upgrade causing it to occur on each write. Removing the old data might make it create new and matching files. Please report if that would resolve your issue before we continue chasing an issue that potentially is just a "needs cleanup on upgrade" case.(marking incomplete)

Changed in collectd (Ubuntu):
status: New → Incomplete
Changed in wormly-collectd:
status: Unknown → New
Revision history for this message
Jussi Mäkinen (jmakinen) wrote :

This is very weird. I rebooted the machine and then did the b) route. Can't reproduce it anymore.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.