Unexplainable time jumps in CRON

Bug #1429427 reported by sgofferj on 2015-03-07
52
This bug affects 9 people
Affects Status Importance Assigned to Milestone
rsyslog (Ubuntu)
Undecided
Unassigned
Trusty
Medium
Seyeong Kim

Bug Description

[Impact]

on Trusty

logging to syslog via cron causes timestamp abnormals after several hours.

[Test Case]

1. run below first,
while true ; do logger "hello syslog" ; sleep 1; done

2. register below script to crontab
for i in {1..100} ; do logger "hello syslog via cron" ; sleep 1; done

3. monitor syslog with below script
tail -f -n 100 /var/log/syslog

You can see weird timestamp in several hours.

[Regression Potential]

it's ratelimiting turn off code, could be issue with being off rate-limit

[Other Info]

below commit fixes this issue in my test

From 7a2e2473476d2b10a775affd9ac6c62b81c450e5 Mon Sep 17 00:00:00 2001
From: Tomas Heinrich <email address hidden>
Date: Thu, 24 Jul 2014 13:47:14 +0200
Subject: [PATCH 1/1] bugfix: plug a memleak in imuxsock

The hash table for the system socket was allocated twice. The other
one being in activateListeners().

This commit practically reverts: 34a77cde2423303da72ab773128a2ddcf41
The issue seems to be that the hash table is not initialized (to NULL)
rather then not being allocated.

#####################################################################
## old description

On my main server I see unexplainable time jumps backwards in the syslog. Those jumps affect CRON.
Example:

Feb 10 06:48:01 nostromo CRON[20351]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
Feb 10 06:49:01 nostromo CRON[20364]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
Feb 10 06:50:01 nostromo CRON[20386]: (root) CMD ( /storage/exec/status-nostromo.sh >/dev/null 2>&1)
Feb 7 05:40:01 nostromo CRON[20389]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
Feb 10 06:50:01 nostromo CRON[20390]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
Feb 10 06:50:01 nostromo CRON[20391]: (root) CMD ( /storage/exec/checkip.sh 2>/dev/null 1>/dev/null)

For debugging I did the following:
Start xclock and watch xclock and tail -f /var/log/syslog in parallel. When CRON logged a wrong time, xclock did NOT show any time jump but seemed to freeze for a fraction of a second.
Open a screen and start a script that will once per second read the time (in unix seconds) and compare the read time with the time read a second ago. If the current time was smaller, the script would send an email with a process list from before and after the jump. The script also never detected any time jump.

In summary, my current impression is that there might be a bug in CRON because no other programm seems to be able to see the "wrong" time. The server in question is syslog server for 4 servers and 3 network devices. The time jumps exclusively show in syslog entries from the local CRON instance. Not in any remote syslog entry and not in any other local syslog entry, e.g. from DHCPD, bind, tftpd, etc. etc.
Also, after a reboot, things work ok for several days upto about 2 or 3 weeks. Then the "time jumps" start to occur with increasing frequency.

I don't use user crontabs but maintain all jobs in /etc/crontab. I have number of jobs which are triggered every minute and another number of jobs which are triggered every 5 minutes (maybe some CRON internal counter overflow problem?).

Hardware:
Asus P9D-V
Intel Xeon E3-1240L V3
16GB ECC RAM
128GB SSD System
3x3TB ZFS RaidZ2 storage
1x3TB Misc. data

CMOS battery already changed and board inspected.

nostromo:~ # lsb_release -rd
Description: Ubuntu 14.04.2 LTS
Release: 14.04

nostromo:~ # apt-cache policy cron
cron:
  Installed: 3.0pl1-124ubuntu2
  Candidate: 3.0pl1-124ubuntu2
  Version table:
 *** 3.0pl1-124ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status

sgofferj (sgofferj) wrote :

Script used for monitoring:

#!/bin/bash

oldTime=$(date +%s)
oldPsOutput=$(ps faux)
while sleep 1
do
  currentTime=$(date +%s)
  currentPsOutput=$(ps faux)
  if [ "$currentTime" -lt "$oldTime" ] # clock change detected?
  then
    (
        echo '========='
        echo "$currentTime < $oldTime"
        echo "$oldPsOutput"
        echo ':::::::::'
        echo "$currentPsOutput"
    ) | mail -s "Time jump" root
  fi
  oldPsOutput=$currentPsOutput
  oldTime=$currentTime
done

affects: tftp-hpa (Ubuntu) → cron (Ubuntu)
sgofferj (sgofferj) wrote :

Forgot: Restarting CRON does not solve the problem. Rebooting the system does - as written, temporarily.

sgofferj (sgofferj) wrote :

One of the mentioned network devices is a Firewall which logs verbosely, i.e. there is a LOT of logs coming in from there and /var/log/firewall does not show a single time jump.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cron (Ubuntu):
status: New → Confirmed
G.J. (gjmoed) wrote :

Figured I'd give it a quick google for this weirdness and stumbled upon this report. Confirmed it, but then started doubting if it actually is cron that's causing this weirdness. So I put a simple strace on my freshly restarted cron with strace -f -p <pid> -s 1000 2>&1 | grep 'CRON\[' and left this running for half a day. Guess what? You can see cron logging like it should, no jumping around. But then when I look at the actual syslog again, I see the cron entries jumping around... So that makes one question syslog somehow... No time yet to dive into that though...

Hi,

On 2015-03-07 18:16, sgofferj wrote:
> On my main server I see unexplainable time jumps backwards in the syslog. Those jumps affect CRON.
> Example:
>
> Feb 10 06:48:01 nostromo CRON[20351]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
> Feb 10 06:49:01 nostromo CRON[20364]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
> Feb 10 06:50:01 nostromo CRON[20386]: (root) CMD ( /storage/exec/status-nostromo.sh >/dev/null 2>&1)
> Feb 7 05:40:01 nostromo CRON[20389]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
> Feb 10 06:50:01 nostromo CRON[20390]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
> Feb 10 06:50:01 nostromo CRON[20391]: (root) CMD ( /storage/exec/checkip.sh 2>/dev/null 1>/dev/null)

I'm afraid this is not a cron bug. The cron daemon does not write any
logs itself: as it is customary, it uses the syslog(3) facilities to log
all its messages. Therefore, the timestamp you are seeing is not
generated by cron, but by your syslog daemon.

Note that the PIDs of the individual cron processes are close together
(20386, 20389, 20390, 20391), which usually indicates that they were
started in close succession. So, despite the bogus timestamp, your cron
execution schedule appears to be fine.

I'd say this is either a bug in your syslog implementation, or a
hardware issue affecting syslog. By reassigning this bug to your syslog
implementation, the respective maintainers might be able to help you in
tracking down this issue.

Regards,
Christian

> For debugging I did the following:
> Start xclock and watch xclock and tail -f /var/log/syslog in parallel. When CRON logged a wrong time, xclock did NOT show any time jump but seemed to freeze for a fraction of a second.
> Open a screen and start a script that will once per second read the time (in unix seconds) and compare the read time with the time read a second ago. If the current time was smaller, the script would send an email with a process list from before and after the jump. The script also never detected any time jump.
>
> In summary, my current impression is that there might be a bug in CRON because no other programm seems to be able to see the "wrong" time. The server in question is syslog server for 4 servers and 3 network devices. The time jumps exclusively show in syslog entries from the local CRON instance. Not in any remote syslog entry and not in any other local syslog entry, e.g. from DHCPD, bind, tftpd, etc. etc.
> Also, after a reboot, things work ok for several days upto about 2 or 3 weeks. Then the "time jumps" start to occur with increasing frequency.
>
> I don't use user crontabs but maintain all jobs in /etc/crontab. I have
> number of jobs which are triggered every minute and another number of
> jobs which are triggered every 5 minutes (maybe some CRON internal
> counter overflow problem?).

G.J. (gjmoed) wrote :
Download full text (6.1 KiB)

Gave it another 10 mins, this time checking the 'receiving' end, being rsyslog in my case:

[pid 583] <... select resumed> ) = 1 (in [0])
[pid 583] recvmsg(0, {msg_name(0)=NULL, msg_iov(1)=[{"<86>Jun 9 10:01:01 CRON[4860]: pam_unix(cron:session): session opened for user sogo by (uid=0)", 8096}], msg_controllen=64, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, MSG_DONTWAIT) = 95
[pid 583] futex(0x73de74, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x73de70, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid 585] <... futex resumed> ) = 0
[pid 583] <... futex resumed> ) = 1
[pid 585] futex(0x73e060, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 583] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 585] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 583] <... futex resumed> ) = 0
[pid 585] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 583] select(4, [0 3], NULL, NULL, NULL <unfinished ...>
[pid 585] <... futex resumed> ) = 0
[pid 585] write(5, "Jun 9 10:01:01 mail1 CRON[4860]: pam_unix(cron:session): session opened for user sogo by (uid=0)\n", 98) = 98
[pid 585] write(1, "May 30 19:38:01 mail1 CRON[4860]: message repeated 3 times: [ (sogo) CMD (/usr/sbin/sogo-ealarms-notify && /usr/sbin/sogo-tool expire-sessions 60)]\n", 148) = 148
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC <unfinished ...>
[pid 583] <... select resumed> ) = 1 (in [0])
[pid 585] <... open resumed> ) = -1 ENOENT (No such file or directory)
[pid 583] recvmsg(0, <unfinished ...>
[pid 585] futex(0x73de74, FUTEX_WAIT_PRIVATE, 2668217, NULL <unfinished ...>
[pid 583] <... recvmsg resumed> {msg_name(0)=NULL, msg_iov(1)=[{"<78>Jun 9 10:01:01 CRON[4861]: (sogo) CMD (/usr/sbin/sogo-ealarms-notify && /usr/sbin/sogo-tool expire-sessions 60)", 8096}], msg_controllen=64, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, MSG_DONTWAIT) = 116
[pid 585] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 583] futex(0x73de74, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x73de70, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid ...

Read more...

G.J. (gjmoed) wrote :

I'm convinced this is more rsyslog related and not cron.

affects: cron (Ubuntu) → rsyslog (Ubuntu)
G.J. (gjmoed) wrote :

root@mail1:~# apt-cache policy rsyslog
rsyslog:
  Installed: 7.4.4-1ubuntu2.5
  Candidate: 7.4.4-1ubuntu2.6
  Version table:
     7.4.4-1ubuntu2.6 0
        990 http://nl.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
 *** 7.4.4-1ubuntu2.5 0
        100 /var/lib/dpkg/status
     7.4.4-1ubuntu2.3 0
        990 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
     7.4.4-1ubuntu2 0
        990 http://nl.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

So yes I can update, though I doubt that will change anything, except for the rsyslog restart:
rsyslog (7.4.4-1ubuntu2.6) trusty-proposed; urgency=medium

  * debian/usr.sbin.rsyslog: grant read access to /dev/log to cope with
    behaviorial change of apparmor in utopic HWE kernel (LP: #1425398)

sgofferj (sgofferj) wrote :

Hi,

did you notice that I wrote that the time jumps ONLY affect CRON entries? No other message shows them, even not the firewall log which receives logs from a Cisco ASA in debug logging mode, so there's A LOT of logs coming in on that channel.

-Stefan

G.J. (gjmoed) wrote :

Hi Stefan,

yup yup, same here, only showing for cron related entries in syslog. Also somewhat busy due to a noisy sogo/postfix/openchange/dovecot plus all sorts of extra add-ons. Not nearly as noisy as a firewall logging of course ;-)

Can you strace your cron pid for a bit, including forks and hopefully confirm it is sending correct dates to syslog?
Next you can try then is to strace your rsyslog, which will be very noisy so you best grep for cron related stuff only ;-)

Are you using rsyslog? And are you also seeing that xconsole noise?

After my rsyslog restart this morning, not seeing any time jumping for cron in syslog just yet, so it's likely also related to something happening over time, like a buffer or other memory related issue.

You could try the same, restart rsyslog and see how long it takes for things to start jumping around again...

Cheers,

GJ

sgofferj (sgofferj) wrote :

I don't get to anything meaningful before the weekend. Gosh, it's at least 10 years since I last straced anything - have to read up on that too.

Jup, I'm using rsyslog. I'll keep an eye on xconsole stuff but IIRC I disabled that in rsyslogd.conf.

Daniel C (daniel314) wrote :

I'm seeing the same behavior out of rsyslogd. It shows up most frequently with CRON runs, but I've seen it on kernel syslog entries as well.

I'm shipping my logs into ELK, and I was noticing that old indices kept getting updated. When I dug into the issue, I saw that this was happening on a relatively regular basis (at least once a day). It is/was one particular cron job that generated the vast majority of the incorrect timestamps: one that ran once a minute (e.g. * * * * * /bin/true)

I've seen this happen on several Ubuntu hosts, mostly 14 LTS, but I think I've seen it in 12 LTS too. I've only recently started tracking this issue.

The timestamp can be off by hours to days.

When I see it happen, restarting rsyslogd makes the issue go away for a while.

    - Daniel

Frank Agerholm (8-frank-b) wrote :

I found a matching bugreport on the rsyslog issuetracker at github. The report is about ubuntu 14.04.1 too.

https://github.com/rsyslog/rsyslog/issues/315

Roger Dueck (rdueck) wrote :

I've had some of the same time jumps in CRON's syslog entries, but where I really noticed it was during roughly the same time-frame in my apache2 logs.

Michel (michel-timos) wrote :

I can confirm this bug still exists on ubuntu 14.04.4
It only seems to happen to CRON entries although the github bugreport referenced in #14 confirms the problem is with rsyslog.
Supposedly setting '$RepeatedMsgReduction off' in /etc/rsyslog.conf should work around this problem.

Seyeong Kim (xtrusia) on 2017-07-17
description: updated
Seyeong Kim (xtrusia) wrote :
description: updated
Changed in rsyslog (Ubuntu):
status: Confirmed → Fix Released
tags: added: sts
Eric Desrochers (slashd) on 2017-07-17
Changed in rsyslog (Ubuntu Trusty):
assignee: nobody → Seyeong Kim (xtrusia)
importance: Undecided → Medium
tags: added: sts-sru-needed
Eric Desrochers (slashd) wrote :

[ STS SPONSOR REVIEW ]

@Seyeong,

According to your dediff you are only cherry-picking "7a2e247 bugfix: plug a memleak in imuxsock".

This commit has been introduced upstream starting with "v7.6.3-158-g7a2e247", thus it seems like this bug is only affecting Trusty as you previously mentioned.

According to package version Xenial/Zesty and Artful already have the fix as they are > than "v7.6.3".

# Ubuntu
 rsyslog | 7.4.4-1ubuntu2.6 | trusty-updates | source, amd64, arm64, armhf, i386, powerpc, ppc64el
 rsyslog | 7.4.4-1ubuntu14 | vivid | source, amd64, arm64, armhf, i386, powerpc, ppc64el
 rsyslog | 8.16.0-1ubuntu3 | xenial | source, amd64, arm64, armhf, i386, powerpc, ppc64el, s390x
 rsyslog | 8.16.0-1ubuntu5 | yakkety | source, amd64, arm64, armhf, i386, powerpc, ppc64el, s390x
 rsyslog | 8.16.0-1ubuntu5 | zesty | source, amd64, arm64, armhf, i386, ppc64el, s390x
 rsyslog | 8.16.0-1ubuntu5 | artful | source, amd64, arm64, armhf, i386, ppc64el, s390x

and also aleady in Debian :

# Debian
 rsyslog | 8.24.0-1 | stable | source, amd64, arm64, armel, armhf, i386, mips, mips64el, mipsel, ppc64el, s390x
 rsyslog | 8.28.0-1 | testing | source, amd64, arm64, armel, armhf, i386, mips, mips64el, mipsel, ppc64el, s390x
 rsyslog | 8.28.0-1 | unstable | source, amd64, arm64, armel, armhf, i386, mips, mips64el, mipsel, powerpc, ppc64el, s390x
 rsyslog | 8.28.0-1 | unstable-debug | source

Looking at the upstream project I see 3 commits as follow for the same subject : "bugfix: plug a memleak in imuxsock"

$ git log --oneline --grep="plug a memleak in imuxsock"
be45099 bugfix: plug a memleak in imuxsock

#git describe be45099
v7.6.3-159-gbe45099

7a2e247 bugfix: plug a memleak in imuxsock

# git describe 7a2e247
v7.6.3-158-g7a2e247

5a11a1d bugfix: plug a memleak in imuxsock

# git describe 5a11a1d
v7.6.3-119-g5a11a1d

Does the other commits are needed to have a sane rsyslog package ?

I just want to double-check with you that we are not missing anything before uploading this package in the upload queue.

- Eric

Seyeong Kim (xtrusia) wrote :

right,
In my test, only 7a2e247 commit fixes this symptom.
but
I think be45099 is worth SRUing with 7a2e247 as commit's description

5a11a1d is quite far

uploaded debdiff for two patches.

Thanks

Eric Desrochers (slashd) wrote :

[STS SPONSOR]

Thanks Seyeong for the quick patch rework to include the second upstream commit.

The trusty debdiff has been uploaded in the Trusty upload queue, now waiting for SRU verification team approval in order to start building in trusty-proposed.

https://launchpad.net/ubuntu/trusty/+queue?queue_state=1&queue_text=rsyslog

- Eric

Changed in rsyslog (Ubuntu Trusty):
status: New → In Progress

Hello sgofferj, or anyone else affected,

Accepted rsyslog into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/rsyslog/7.4.4-1ubuntu2.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in rsyslog (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-trusty
Robie Basak (racb) wrote :

Unsubscribing ~ubuntu-sponsors as there appears nothing left to sponsor.

Seyeong Kim (xtrusia) wrote :

I tested -proposed pkg and this symptom is fixed with it.

ii rsyslog 7.4.4-1ubuntu2.7 amd64..

Thanks.

tags: added: verification-done-trusty
removed: verification-needed-trusty
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers