Staging job for 5.1.2 is broken due a recent change in ntpd configuration

Bug #1415596 reported by ruhe
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Artem Panchenko
6.1.x
Fix Released
Critical
Artem Panchenko

Bug Description

NOTICE: this bug description might not include all the useful information. I'm going to ask teran to update it. Please do not mark as incomplete, we need this bug to track failure on staging job.

A recent change in fuel-library caused failures in staging jobs for 5.1.2:
https://github.com/stackforge/fuel-library/commit/9a342c899310dc03205e74a16d54964826cfcd

Example of a failing job:
http://jenkins-product.srt.mirantis.net:8080/job/5.1.2.staging.centos.bvt_1/29/console

Trace:
AssertionError: Failed to execute "NTPD=$(find /etc/init.d/ -regex '/etc/init.d/\(ntp.?\|ntp-dev\)'); $NTPD stop; killall ntpd; ntpd -qg && $NTPD start" on remote host: ['ntpd: no process killed\n']

manual executions of commands:
hwclock -s
NTPD=$(find /etc/init.d/ -regex '/etc/init.d/\(ntp.?\|ntp-dev\)'); $NTPD stop; killall ntpd; ntpd -qgd && $NTPD start

failed with not suitable server to sync

traffic log to master node see in attach ntp.pcap(thanks to the Artem Panchenko)

Attached discussion about this issue from IRC channel #fuel-qa

Revision history for this message
ruhe (ruhe) wrote :
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)
status: New → Confirmed
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
description: updated
Revision history for this message
Igor Shishkin (teran) wrote :

There is also was "confimed" for tos orphan :)
So reverting https://github.com/stackforge/fuel-library/commit/9a342c899310dc03205e74a16d54964826cfcd fixes the issue.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Have a local clock and orphan mode together is a bad idea, cause it can lead to unpredicted behavior. In ntp should be used only local clock or orphan mode, so revert is a straightforward, but not good thought.

Moreover, problem in that exact environment is not in ntp at all:

AssertionError: Failed to execute "NTPD=$(find /etc/init.d/ -regex '/etc/init.d/\(ntp.?\|ntp-dev\)'); $NTPD stop; killall ntpd; ntpd -qg && $NTPD start" on remote host: ['ntpd: no process killed\n']

we find ntpd init script, then we stop ntpd with it. Then, I suppose, it stopped a long time and we trying kill it manually. But in that case, "$NTPD stop" actually stopped ntpd, then we try "killall ntpd" and it return 1 to us, cause ntpd was stopped already. Then next commands wasn't executed at all, cause script already stopped with error from killall.

We should just do "killall ntpd || true" instead just "killall ntpd".

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/151168

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

@Stanislaw,

we don't check return code of 'killall ntpd;' command, so it doesn't affect the next command execution. In system tests we check exit code of 'ntpd -qg && $NTPD start' (the last command) and return exception with stderr (which also contains ''ntpd: no process killed\n'') if exit_code != 0. In this issue 'ntpd -qg' returns 1, but for some reason prints error "ntpd: no servers found" to stdout (not stderr), so logs don't contain that.
Also, 'killall ntpd' was added just to avoid the situation when init script can't stop NTP service for some reason and I think we can remove it (looks like init script does its job pretty well).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-main (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/151238

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-main (stable/5.1)

Related fix proposed to branch: stable/5.1
Review: https://review.openstack.org/151241

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-main (stable/5.1)

Change abandoned by Stanislaw Bogatkin (<email address hidden>) on branch: stable/5.1
Review: https://review.openstack.org/151168

Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Artem Panchenko (apanchenko-8)
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

After discussion with Stanislaw we decided to try to use ntpdate instead of ntpd while syncing time on master node during system tests, because it allows to sync huge time skews (useful after reverting VM from snapshot). I prepared patches to tests for both master and stable/5.1 branches and currently I'm testing the fix. I'll update the issue with results little bit later.

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

Just finished tests and confirm that using ntpdate fixes the issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/151558

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (stable/5.1)

Reviewed: https://review.openstack.org/151241
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=2d15e4e6da8970d1c61eebbfdfbd5f49d14b23ac
Submitter: Jenkins
Branch: stable/5.1

commit 2d15e4e6da8970d1c61eebbfdfbd5f49d14b23ac
Author: Artem Panchenko <email address hidden>
Date: Thu Jan 29 15:37:09 2015 +0200

    Improve time sync and its logging in system tests

    1. Don't kill ntp process, let daemon init script
    do its work.
    2. Run time synchronization in debug mode ('-d').
    3. Sync time on master node using ntpdate instead of
    ntpd (remove debug key '-d', add '-v' - be verbose)
    4. Print 'stdout' and 'exit_code' in addition to
    'stderr' if remote command returned non-zero code.

    Closes-bug: #1415596

    Change-Id: I8c7a8dbc8f71591f7fedfc7b280014cc4887bf66

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (stable/6.0)

Reviewed: https://review.openstack.org/151558
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=c799e3a6d88289e58db764a6be7910aab7da3149
Submitter: Jenkins
Branch: stable/6.0

commit c799e3a6d88289e58db764a6be7910aab7da3149
Author: Artem Panchenko <email address hidden>
Date: Thu Jan 29 15:37:09 2015 +0200

    Improve time sync and its logging in system tests

    1. Don't kill ntp process, let daemon init script
    do its work.
    2. Run time synchronization in debug mode ('-d').
    3. Sync time on master node using ntpdate instead of
    ntpd (remove debug key '-d', add '-v' - be verbose)
    4. Print 'stdout' and 'exit_code' in addition to
    'stderr' if remote command returned non-zero code.

    Closes-bug: #1415596

    Change-Id: I8c7a8dbc8f71591f7fedfc7b280014cc4887bf66

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/151238
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=da4264e53f26b839036f2ef2035b3886ac8e8ab7
Submitter: Jenkins
Branch: master

commit da4264e53f26b839036f2ef2035b3886ac8e8ab7
Author: Artem Panchenko <email address hidden>
Date: Thu Jan 29 15:37:09 2015 +0200

    Improve time sync and its logging in system tests

    1. Don't kill ntp process, let daemon init script
    do its work.
    2. Run time synchronization in debug mode ('-d').
    3. Sync time on master node using ntpdate instead of
    ntpd (remove debug key '-d', add '-v' - be verbose)
    4. Print 'stdout' and 'exit_code' in addition to
    'stderr' if remote command returned non-zero code.

    Closes-bug: #1415596

    Change-Id: I8c7a8dbc8f71591f7fedfc7b280014cc4887bf66

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.