chrony has migration regressions from autopkgtests (disco/eoan)

Bug #1836929 reported by Rafael David Tinoco on 2019-07-17
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
chrony (Ubuntu)
Status tracked in Eoan
Xenial
Medium
Unassigned
Bionic
Medium
Unassigned
Cosmic
Medium
Unassigned
Disco
Medium
Unassigned
Eoan
Medium
Unassigned

Bug Description

[Impact]

 * Recent changes have caused the chrony autopkgtests to fail.
   In this case upstream of chrony and the clk tests changed, which need
   to be back under control to match what works reliably for Disco.

 * Later versions have this fixed, backport the changes to fix it in Disco
   as well

[Test Case]

 * Let the autopkgtests run (which is part of the proposed migration
   anyway)
   Sniffs already show them completing:
   https://bileto.ubuntu.com/excuses/3759/disco.html

[Regression Potential]

 * There is no functional change, only the self-tests as well the
   autopkgtest execution are changed.
   The one source for a regression could be that the rebuild picks
   something up that triggers a behavior change. But the PPA builds have
   not shown something (at least not something obvious)

[Other Info]

 * This is one of the cases where the actual package as used by the user
   has no bug. I'm unsure how to proceed. Do we want to push it just to
   disco-proposed but keep it there (to avoid downloads for "nothing")?
   I know rbasak wanted to discuss that in the SRU team for the even worse
   https://bugs.launchpad.net/cloud-archive/+bug/1829823/comments/14
   To some extend this come under the same banner.
 * If this is denied from SRU for this reason I'd ask to add a force-
   badtest as a replacement to unblock proposed migration.

---

Checking last eoan merge I realized that some tests were failing for chrony:

https://code.launchpad.net/~paelzer/ubuntu/+source/chrony/+git/chrony/+merge/369588/comments/967625

But eoan ran autopkgtests okay when the trigger was chrony/3.5-2ubuntu2 (this last merge):

http://autopkgtest.ubuntu.com/packages/chrony/eoan/amd64

Despite having failed for the previous 12 times (eoan).

Now, for the first time, we have the same failure for disco:

http://autopkgtest.ubuntu.com/packages/chrony/disco/amd64

http://bit.ly/2LpMx4G

"""
make: Leaving directory '/tmp/autopkgtest.pBHSAl/build.WCD/src/test/simulation/clknetsim'
...
110-chronyc .................... PASS
111-knownclient xxxxxxxxxxxxxxxxxxxx FAIL
112-port xxxxxxxxxxxxxxxxxxxx FAIL
121-orphan .................... PASS
...
SUMMARY:
  TOTAL 50
  PASSED 48
  FAILED 2 (111-knownclient 112-port) (255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 265)
"""

And I'm able to reproduce locally:

"""
(c)inaddy@iproute2verification:~/work/sources/ubuntu/chrony/test/simulation$ ./111-knownclient
Testing reply to client configured as server:
  network with 1*1 servers and 1 clients:
    non-default settings:
      client_conf=acquisitionport 123
      server_conf=server 192.168.123.2 noselect acquisitionport 123
    starting node 1: OK
    starting node 2: OK
    running simulation: OK
    checking chronyd exit:
      node 1: OK
      node 2: OK
    checking source selection:
      node 2: OK
    checking port numbers in packet log:
      node 1: BAD
      node 2: BAD
FAIL

AND

(c)inaddy@iproute2verification:~/work/sources/ubuntu/chrony/test/simulation$ ./112-port
Testing port and acquisitionport directives:
  network with 1*1 servers and 1 clients:
    non-default settings:
    starting node 1: OK
    starting node 2: OK
    running simulation: OK
    checking chronyd exit:
      node 1: OK
      node 2: OK
    checking source selection:
      node 2: OK
    checking mean/min incoming/outgoing packet interval:
      node 1: 2.74e+02 2.74e+02 6.40e+01 6.40e+01 OK
      node 2: 2.74e+02 2.74e+02 6.40e+01 6.40e+01 OK
    checking clock sync time, max/rms time/freq error:
      node 2: 132 9.29e-05 1.21e-06 5.77e-05 1.01e-07 OK
    checking port numbers in packet log:
      node 1: BAD
      node 2: BAD
  network with 1*1 servers and 1 clients:
    non-default settings:
      client_conf=acquisitionport 123
    starting node 1: OK
    starting node 2: OK
    running simulation: OK
    checking chronyd exit:
      node 1: OK
      node 2: OK
    checking port numbers in packet log:
      node 1: BAD
      node 2: BAD
FAIL
"""

When doing verification for an iproute2 bug (LP: #1831775) we faced the 1st failure in autopkgtests for chrony in disco (at least from the ones I can check from autopkgtests.ubuntu.com).

Related branches

description: updated
Changed in chrony (Ubuntu):
status: New → Confirmed
Changed in chrony (Ubuntu Disco):
status: New → Confirmed
importance: Undecided → Medium
Changed in chrony (Ubuntu Eoan):
importance: Undecided → Medium
Changed in chrony (Ubuntu Disco):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in chrony (Ubuntu Eoan):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Paul Gevers (paul-climbing) wrote :

Already fixed in Debian.

Issue happens because of the "upstream" "git cloned" clknetsim test dependencies.

The tests clone upstream clknetsim repo. Using an old version of clknetsim I was able to make the test 111 to pass:

(c)inaddy@iproute2verification:~/work/sources/ubuntu/chrony/test/simulation$ ./111-knownclient
Testing reply to client configured as server:
  network with 1*1 servers and 1 clients:
    non-default settings:
      client_conf=acquisitionport 123
      server_conf=server 192.168.123.2 noselect acquisitionport 123
    starting node 1: OK
    starting node 2: OK
    running simulation: OK
    checking chronyd exit:
      node 1: OK
      node 2: OK
    checking source selection:
      node 2: OK
    checking port numbers in packet log:
      node 1: OK
      node 2: OK
    checking clock sync time, max/rms time/freq error:
      node 2: 130 1.47e-04 6.19e-07 7.41e-05 8.68e-08 OK
PASS

Will bisect cause its faster and easier.

(c)inaddy@iproute2verification:~/work/sources/ubuntu/chrony/test/simulation/clknetsim$ git bisect bad
537cb7161de9ad372437ab0fcbff76e54a3cd97a is the first bad commit
commit 537cb7161de9ad372437ab0fcbff76e54a3cd97a
Author: Miroslav Lichvar <email address hidden>
Date: Tue Jun 11 17:05:22 2019 +0200

    add message type

:100644 100644 2898b26a7b1311647b4419da79000e69b6d89b39 5b6d8917e1dd4667fe124b22a907e42286cb86f4 M client.c
:100644 100644 20f052d7583d6051e779fd7318f47a3867595993 5cab1f4e86f95719814657da69a93089d029025e M client_fuzz.c
:100644 100644 7b2fcbe0c5d39193946afebbd778a506c3452cb9 eeff3aba6b4bc46dc0ef516257a8ff130549a827 M network.cc
:100644 100644 427036573b6da7227c5d7fa169a3d81627330d9c b5fe35b408f2a635455548d2b7dfd0ec7ec6da90 M network.h
:100644 100644 be9d6fa5177c49acd3906643fc010576a5d5369d cf94b9e42bdb6a1b33b84afe8b5e19a9a5d00806 M node.cc
:100644 100644 36e4e328ce10b062166c101a1d4c74244af558ab da78179483f88c84314c85af8dbfd9843332a33d M protocol.h

This happens because output of clknetsim was modified and the following regex:

  # Check if only NTP port (123) was used
  check_packet_port() {
      local i ret=0 port=123

      test_message 2 1 "checking port numbers in packet log:"

+ set -x
+
+ cp tmp/log.packets /tmp/mine.log.packets
+
      for i in $(seq 1 $(get_chronyd_nodes)); do
          test_message 3 0 "node $i:"

          grep -E -q " $port [0-9]+\$" tmp/log.packets && \
              ! grep -E "^[0-9e.+-]+ $i " tmp/log.packets | \
              grep -E -q -v " $port [0-9]+\$" && \
              test_ok || test_bad
          [ $? -eq 0 ] || ret=1
      done

+ set +x
+
      return $ret
  }

Broke. Example of the output change:

# good (old) output (vi :set list)

2.020970e-01>---1>--2>--1.854754e-04>---123>123>1$
2.022825e-01>---2>--1>--2.935628e-04>---123>123>1$
2.027594e-01>---2>--1>--1.794174e-04>---123>123>1$
2.029388e-01>---1>--2>--2.045724e-04>---123>123>1$

# bad (new) output (vi :set list)

2.066646e+03>---2>--1>--1.297676e-04>---123>123>1>--48>-1$
2.066646e+03>---1>--2>--1.520290e-04>---123>123>1>--48>-1$
2.191654e+03>---1>--2>--1.690480e-04>---123>123>1>--48>-1$
2.191654e+03>---2>--1>--4.082407e-04>---123>123>1>--48>-1$

TODO:

- make a change to eoan so clknetsim code is compatible (so this wont happen again)
- mark this test as faulty for previous chronie versions (since they will always fail)
- (could do) fix autopkgtest next time chronie needs a SRU (disco/bionic)

Changed in chrony (Ubuntu Disco):
status: Confirmed → Won't Fix
Changed in chrony (Ubuntu Eoan):
status: Confirmed → In Progress

This has been fixed by Debian upstream in file:

- debian/patches/update_processing_of_packet_log.patch:

""""
From 62d6aed6a64b887c9e3b7f03d9e0db1deaa2696a Mon Sep 17 00:00:00 2001
From: Miroslav Lichvar <email address hidden>
Date: Tue, 18 Jun 2019 15:41:50 +0200
Subject: test: update processing of packet log

Two new fields have been added to the packet log, which broke some
of the simulation tests.

Applied-Upstream: https://git.tuxfamily.org/chrony/chrony.git/commit/?id=62d6aed6a64b887c9e3b7f03d9e0db1deaa2696a
--- a/test/simulation/test.common
+++ b/test/simulation/test.common
@@ -391,9 +391,9 @@ check_packet_port() {
    for i in $(seq 1 $(get_chronyd_nodes)); do
        test_message 3 0 "node $i:"

- grep -E -q " $port [0-9]+\$" tmp/log.packets && \
+ grep -E -q "^([0-9e.+-]+ ){5}$port " tmp/log.packets && \
            ! grep -E "^[0-9e.+-]+ $i " tmp/log.packets | \
- grep -E -q -v " $port [0-9]+\$" && \
+ grep -E -q -v "^([0-9e.+-]+ ){5}$port " && \
            test_ok || test_bad
        [ $? -eq 0 ] || ret=1
    done
""""

in version:

chrony (3.5-2) unstable; urgency=medium

Changed in chrony (Ubuntu Eoan):
status: In Progress → Fix Released

Okay, so.. for eoan, 3.5-2ubuntu1, version that should have been fixed, selftests failed because of another reason:

http://autopkgtest.ubuntu.com/packages/chrony/eoan/amd64

They failed because of systemd bug, being tracked in:

https://bugs.launchpad.net/ubuntu/+bug/1832050

and last test (last fix from christian) has passed.

For Disco and before, they will all fail because of the reason described in this bug.

I have checked all other versions and they have passed latest autopkgtest because the change, to clknetsim that caused the issue was made in:

Date: Tue Jun 11 17:05:22 2019 +0200

And debian fix was made in:

Date: Tue, 18 Jun 2019 15:41:50 +0200

But the tests never ran after that. I'm requesting all chrony tests:

Xenial/Bionic/Cosmic/Disco

to be marked as known issues in ubuntu autopkgtests.

Changed in chrony (Ubuntu Cosmic):
status: New → Won't Fix
Changed in chrony (Ubuntu Bionic):
status: New → Won't Fix
Changed in chrony (Ubuntu Xenial):
status: New → Won't Fix
Changed in chrony (Ubuntu Cosmic):
importance: Undecided → Medium
Changed in chrony (Ubuntu Bionic):
importance: Undecided → Medium
Changed in chrony (Ubuntu Xenial):
importance: Undecided → Medium
Changed in chrony (Ubuntu Disco):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody

I've done the following merge request:

https://code.launchpad.net/~rafaeldtinoco/britney/hints-ubuntu/+merge/370268

requesting tests before eoan to be classified as "bad tests".

I have reviewed and Nacked the MP for the force-badtest.
Lets continue here, summary:

Summarizing:
- actually two issues seem to break chrony tests
  1. some yet unknown change in a package added stderr which needs [1] for bug 1836882
     The new thing is that this now also affects Disco
  2. (this bug) due to upstream clksim changes things don't properly match anymore.
     We need [2] and maybe [3]
- We will not need to add a badtest, we just fix the tests and iproute will then pass testing
  against the new version.

I was able to reproduce that locally and will create a branch/ppa/test for it that you later on (if I can make it in time) can take a look at.

[1]: https://git.launchpad.net/ubuntu/+source/chrony/commit/?id=ebdec551cbb6a0986bfd489e4ca90ef103e37a07
[2]: https://salsa.debian.org/debian/chrony/commit/bb79b3247c1938b40f7532a4aa445da16148f822
[3]: https://salsa.debian.org/debian/chrony/commit/bc402d6270fef5b6ed1b166abe198efeb78e0b16

Changed in chrony (Ubuntu Eoan):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in chrony (Ubuntu Disco):
status: Won't Fix → Triaged

Without fixes both tests fail:
- simulation due to clksim changes
- service checks due to (non critical) stderr output

Checking my fixes:
1. build time self tests not affected (to be expected but I wanted to be sure)
2. autpkgtest fixed
   111-knownclient .................... PASS
   112-port .................... PASS

Builds succeed in PPA [1] and I later have started a Bileto PPA [2] hopeing I might do the LP infra testing via ticket [3] check on this to be sure.

[1]: https://launchpad.net/~paelzer/+archive/ubuntu/bug-1836929-1836882-test-fails-disco
[2]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3759
[3]: https://bileto.ubuntu.com/#/ticket/3759

Tests complete and (with the fix) all green now:
https://bileto.ubuntu.com/excuses/3759/disco.html

description: updated
description: updated

Alright, to consolidate all cases and make my life easier only:

https://bugs.launchpad.net/ubuntu/+source/chrony/+bug/1836929 (this bug)
TODO: marking as a duplicate of: LP: #1736882

https://bugs.launchpad.net/ubuntu/+source/chrony/+bug/1836882
MR: https://code.launchpad.net/~paelzer/ubuntu/+source/chrony/+git/chrony/+merge/370292
Approved for Disco and Eoan:
    Without fixes both tests fail:
    - simulation due to clksim changes
    - service checks due to (non critical) stderr output
TODO: nothing

systemd issue (causing stderr in chronys autopkgtest):
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1832050
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1832053 (duplicate)
TODO: xnox has still to fix systemd

My disapproved MR blacklisting the tests only:
https://code.launchpad.net/~rafaeldtinoco/britney/hints-ubuntu/+merge/370271
TODO: nothing

> TODO: marking as a duplicate of: LP: #1736882
TODO: marking as a duplicate of: LP: #1836882

After discussion decided that it is not a dup, they just share the symptom to hit at autopkgtest time.

MP acked, Uploaded to disco-unapproved

Summary Fix:

Keeping 2 different bugs:

https://bugs.launchpad.net/ubuntu/+source/chrony/+bug/1836929
-> chrony autopkgtest clknetsim related bug
-> addressed in cpaelzer's merge

https://bugs.launchpad.net/ubuntu/+source/chrony/+bug/1836882
-> chrony autopkgtest stderr related bug (stderr coming from systemd)
-> addressed in cpaelzer's merge

MR: https://code.launchpad.net/~paelzer/ubuntu/+source/chrony/+git/chrony/+merge/370292
Approved for Disco and Eoan:
    Without fixes both tests fail:
    - simulation due to clksim changes
    - service checks due to (non critical) stderr output

The systemd issues are still duplicated an to be worked on.

Changed in chrony (Ubuntu Disco):
status: Triaged → In Progress
Changed in chrony (Ubuntu Disco):
status: In Progress → Fix Released
Timo Aaltonen (tjaalton) wrote :

don't mark bugs fixed when the package has only been uploaded to the SRU queue

Changed in chrony (Ubuntu Disco):
status: Fix Released → In Progress
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco

Hello Rafael, or anyone else affected,

Accepted chrony into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/chrony/3.4-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Thanks - the Dep8 Tests are all green in [1] and good to go again.

We could now either release it as-is or just keep it in proposed (up to the SRU Team as I know these conditions are recently under regular discussion).

[1]: https://people.canonical.com/~ubuntu-archive/proposed-migration/disco/update_excuses.html

tags: added: verification-done verification-done-disco
removed: verification-needed verification-needed-disco
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers