hwclock test failed on Power9 due to 0.x sec differences / time out waiting for time change

Bug #1802233 reported by Po-Hsu Lin on 2018-11-08
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Medium
bugproxy
ubuntu-kernel-tests
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned

Bug Description

The hwclock test will fail on the Power9 system "baltar" due to a <0.02 second difference.

Traceback (most recent call last):
  File "/home/ubuntu/autotest/client/shared/test.py", line 411, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 823, in _call_test_function
    return func(*args, **dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 291, in execute
    postprocess_profiled_run, args, dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 212, in _call_run_once
    self.run_once(*args, **dargs)
  File "/home/ubuntu/autotest/client/tests/hwclock/hwclock.py", line 50, in run_once
    raise error.TestFail("Failed to set hwclock back to Warthog's birthday. Output of hwclock is '%s'" % date)
TestFail: Failed to set hwclock back to Warthog's birthday. Output of hwclock is '2004-10-20 04:09:59.582146+0000'

Didn't see this on Power8 boxes

Sometimes it will fail because "time out waiting for time change" on the same node:
$ sudo /sbin/hwclock --set --date "2004/10/20 04:10:00"; sudo /sbin/hwclock
hwclock: Timed out waiting for time change.

Workaround for these is to add "sleep 1".

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-38-generic 4.15.0-38.41
ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
Uname: Linux 4.15.0-38-generic ppc64le
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 8 03:45 seq
 crw-rw---- 1 root audio 116, 33 Nov 8 03:45 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.4
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CurrentDmesg:

Date: Thu Nov 8 06:03:54 2018
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 0451:80ff Texas Instruments, Inc.
 Bus 001 Device 004: ID 0557:2419 ATEN International Co., Ltd
 Bus 001 Device 002: ID 0557:7000 ATEN International Co., Ltd Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
PciMultimedia:

ProcFB: 0 astdrmfb
ProcKernelCmdLine: root=UUID=acd1a0d7-f6fc-4130-928c-c8b11ad6e4be ro console=hvc0
ProcLoadAvg: 0.31 0.11 0.03 1/1392 5654
ProcLocks:
 1: POSIX ADVISORY WRITE 3544 00:17:582 0 EOF
 2: POSIX ADVISORY WRITE 1798 00:17:338 0 EOF
 3: POSIX ADVISORY WRITE 3603 00:17:576 0 EOF
 4: FLOCK ADVISORY WRITE 3535 00:17:564 0 EOF
 5: FLOCK ADVISORY WRITE 4081 00:17:481 0 EOF
ProcSwaps:
 Filename Type Size Used Priority
 /swap.img file 8388544 0 -2
ProcVersion: Linux version 4.15.0-38-generic (buildd@bos02-ppc64el-018) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #41-Ubuntu SMP Wed Oct 10 10:57:45 UTC 2018
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-38-generic N/A
 linux-backports-modules-4.15.0-38-generic N/A
 linux-firmware 1.173.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
VarLogDump_list: total 0
cpu_cores: Number of cores present = 40
cpu_coreson: Number of cores online = 40
cpu_dscr: DSCR is 16
cpu_freq:
 min: 2.862 GHz (cpu 159)
 max: 2.862 GHz (cpu 1)
 avg: 2.862 GHz
cpu_runmode:
 Could not retrieve current diagnostics mode,
 No kernel interface to firmware
cpu_smt: SMT=4

-- Firmware information on Baltar --
pnor: P9DSU20190404_IBM_prod_sign.pnor
bmc: SMT_P9_206.bin

Po-Hsu Lin (cypressyew) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1802233

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

Spotted on 4.15 Xenial, one of two power8 boxes failed with:
TestFail: Failed to set hwclock back to Warthog's birthday. Output of hwclock is 'Wed Oct 20 04:09:59 2004 .840175 seconds'

Po-Hsu Lin (cypressyew) on 2018-11-08
summary: - hwclock test failed on Power9 due to 0.02 sec differences
+ hwclock test failed on PowerPC due to 0.02 sec differences
Po-Hsu Lin (cypressyew) on 2018-11-28
summary: - hwclock test failed on PowerPC due to 0.02 sec differences
+ hwclock test failed on PowerPC due to 0.x sec differences
Po-Hsu Lin (cypressyew) on 2019-05-10
summary: - hwclock test failed on PowerPC due to 0.x sec differences
+ hwclock test failed on Power9 due to 0.x sec differences
summary: - hwclock test failed on Power9 due to 0.x sec differences
+ hwclock test failed on Power9 due to 0.x sec differences / time out
+ waiting for time change
Po-Hsu Lin (cypressyew) on 2019-05-10
description: updated
Po-Hsu Lin (cypressyew) wrote :

This issue could be found on node "dradis" as well.

Andrew Cloke (andrew-cloke) wrote :

Next step is for the kernel team to assess whether this is a test case issue, or a real kernel bug.

Changed in ubuntu-power-systems:
importance: Undecided → Medium
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
status: New → Triaged
Manoj Iyer (manjo) wrote :

If you use it in verbose mode it works. It could be that it is slow to read from the rtc and the verbose mode just enough time to sync?

ubuntu@tiselius:~$ sudo /sbin/hwclock --set --date "2004/10/25 04:10:00"; sudo /sbin/hwclock --verbose
hwclock from util-linux 2.33.1
System Time: 1557759815.855012
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Last drift adjustment done at 1098677400 seconds after 1969
Last calibration done at 1098677400 seconds after 1969
Hardware clock is on UTC time
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
ioctl(4, RTC_UIE_ON, 0): Invalid argument
Waiting in loop for time from /dev/rtc0 to change
...got clock tick
Time read from Hardware Clock: 2004/10/25 04:10:01
Hw clock time : 2004/10/25 04:10:01 = 1098677401 seconds since 1969
Time since last adjustment is 1 second
Calculated Hardware Clock drift is 0.000000 seconds
2004-10-25 04:09:59.806484+00:00
ubuntu@tiselius:~$

Manoj Iyer (manjo) wrote :

So looks like it is slow to sync with RTC.

ubuntu@tiselius:~$ sudo /sbin/hwclock --set --date "2004/11/21 04:10:00"; sudo strace -e ioctl "/sbin/hwclock"
ioctl(4, RTC_UIE_ON) = -1 EINVAL (Invalid argument)
ioctl(4, RTC_RD_TIME, {tm_sec=0, tm_min=10, tm_hour=4, tm_mday=21, tm_mon=10, tm_year=104, ...}) = 0
ioctl(4, RTC_RD_TIME, {tm_sec=0, tm_min=10, tm_hour=4, tm_mday=21, tm_mon=10, tm_year=104, ...}) = 0

when the hwclock command fails, the ioctl that fails is RTC_RD_TIME.. but running it with --verbose probably gives it just enough time to complete the sync.

Or you could use the command as follows:
ubuntu@tiselius:~$ sudo /sbin/hwclock --set --date "2004/11/22 04:10:00"; sudo /sbin/hwclock --delay=0.5
2004-11-22 04:09:59.898618+00:00
ubuntu@tiselius:~$

Po-Hsu Lin (cypressyew) wrote :

Tested with the latest mainline kernel on Baltar:
ubuntu@baltar:~$ uname -a
Linux baltar 5.1.1-050101-generic #201905110631 SMP Sat May 11 07:16:36 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

This issue still exist.

Manoj Iyer (manjo) wrote :

On the Witherspoon the hwclock does not seem to set at all.

ubuntu@bobone:~$ uname -a
Linux bobone 4.18.0-18-generic #19~18.04.1-Ubuntu SMP Fri Apr 5 10:21:11 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

ubuntu@bobone:~$ sudo /sbin/hwclock --set --date "2004/11/22 04:10:00"; sudo /sbin/hwclock
2019-05-15 14:36:33.802253+0000

ubuntu@bobone:~$ sudo /sbin/hwclock --set --date "2004/11/22 04:10:00"; sudo /sbin/hwclock --debug
hwclock from util-linux 2.31.1
System Time: 1557931207.255519
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Last drift adjustment done at 1101096600 seconds after 1969
Last calibration done at 1101096600 seconds after 1969
Hardware clock is on UTC time
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
ioctl(4, RTC_UIE_ON, 0): Invalid argument
Waiting in loop for time from /dev/rtc0 to change
...got clock tick
Time read from Hardware Clock: 2019/05/15 14:40:08
Hw clock time : 2019/05/15 14:40:08 = 1557931208 seconds since 1969
Time since last adjustment is 456834608 seconds
Calculated Hardware Clock drift is 0.000000 seconds
2019-05-15 14:40:07.171582+0000

description: updated
Manoj Iyer (manjo) wrote :

-- Witherspoon firmware --
Server: IBM-witherspoon-OP9-v2.2-9.56
BMC: ibm-v2.3-476-g2d622cb-r30-0-g862e40a

bugproxy (bugproxy) on 2019-05-17
tags: added: architecture-ppc64le bugnameltc-177684 severity-high targetmilestone-inin---

------- Comment From <email address hidden> 2019-05-17 15:22 EDT-------
We had bugs reported in past also about hwclock not working on P9.
LTC 155321 is one of such bugs. Its likely openBMC issue here.
As per latest comment in bug 155321

========

The issue if already fixed by https://github.com/openbmc/phosphor-time-manager/commit/25b4ce95f9d0bea33fbf0364e72f304542a19bc5 and it's picked in OpenBMC at https://github.com/openbmc/openbmc/commit/7d35280296a118ec2b2e8b8711b6cc65a77f00cb

But the fix changes the "default" setting, if the settings already exists in BMC, it still takes effect.

So suggest to re-test the issue with a fresh OpenBMC build. It should already be fixed.

Manoj Iyer (manjo) wrote :

Would a similar BMC firmware release from AMI fix this issue for Bostons systems? and is there an ETA for when that will be available to us.

Changed in ubuntu-power-systems:
status: Triaged → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-21 11:58 EDT-------
(In reply to comment #24)
> Would a similar BMC firmware release from AMI fix this issue for Bostons
> systems? and is there an ETA for when that will be available to us.

Hi Stewart,

Please let know if a BMC firmware release exists that fixes this issue on
Boston ? This is similar issue reported in LTC bug 155321 for Witherspoon.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-21 21:46 EDT-------
(In reply to comment #25)
> (In reply to comment #24)
> > Would a similar BMC firmware release from AMI fix this issue for Bostons
> > systems? and is there an ETA for when that will be available to us.
>
>
> Hi Stewart,
>
> Please let know if a BMC firmware release exists that fixes this issue on
> Boston ? This is similar issue reported in LTC bug 155321 for Witherspoon.

You'd have to ask Supermicro

Changed in ubuntu-power-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → bugproxy (bugproxy)
Andrew Cloke (andrew-cloke) wrote :

Changing owner to "bugproxy" as the next steps are with IBM on the Witherspoon and Boston firmware.

Manoj Iyer (manjo) wrote :

Michael,

Is there any information from supermicro that you can share with us over here? If this is not going to be fixed then we might have to modify the testcase to use a workaround. Please let us know.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-06-04 16:47 EDT-------
is there issue associate with this problem?
How does one recreate this problem?

Manoj Iyer (manjo) wrote :

@youhour are you able to see the comments in the launchpad bug?

If not you could reproduce the issue using a command like:

$ sudo /sbin/hwclock --set --date "2004/10/25 04:10:00"; sudo /sbin/hwclock

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-06-06 13:19 EDT-------
bugproxy, I did see the command. However, it was not clear if that what trigger the failure. The details of the problem isn't clear.

Any way, you have not answer my first question yet... is there github issue associate with this problem?

Here are some more question...
is NTP enable on your system? If NTP is enabled, BMC will not allow end-user to change SEL time.

is this a problem report from the field or this is just a bug found in the lab?

------- Comment From <email address hidden> 2019-06-06 13:27 EDT-------
Oh, another question. What problem would a customer see if this problem is not fixed?

Manoj Iyer (manjo) wrote :

Sorry did not realize that all the comments were not correctly mirrored over. A launchpad bug was reported by the Ubuntu kernel team as a result of their testing in the lab. We were not sure if this was a BMC issue, a hardware issue or kernel bug, so there is no github report as of now. If you think this is related to the BMC I can open a github bug. Also please tell us how to open bugs for BMC on Bostons.

== P9 Boston ==
There seems to be a delay between setting the hwclock and reading from it.
ubuntu@tiselius:~$ sudo /sbin/hwclock --set --date "2004/11/21 04:10:00"; sudo strace -e ioctl "/sbin/hwclock"
ioctl(4, RTC_UIE_ON) = -1 EINVAL (Invalid argument)
ioctl(4, RTC_RD_TIME, {tm_sec=0, tm_min=10, tm_hour=4, tm_mday=21, tm_mon=10, tm_year=104, ...}) = 0
ioctl(4, RTC_RD_TIME, {tm_sec=0, tm_min=10, tm_hour=4, tm_mday=21, tm_mon=10, tm_year=104, ...}) = 0

when the hwclock command fails, the ioctl that fails is RTC_RD_TIME.. but running it with --verbose probably gives it just enough time to complete the sync.

Or you could use the command as follows:
ubuntu@tiselius:~$ sudo /sbin/hwclock --set --date "2004/11/22 04:10:00"; sudo /sbin/hwclock --delay=0.5
2004-11-22 04:09:59.898618+00:00
ubuntu@tiselius:~$

== P9 Witherspoon ==
On the Witherspoon the hwclock does not seem to set at all.

ubuntu@bobone:~$ uname -a
Linux bobone 4.18.0-18-generic #19~18.04.1-Ubuntu SMP Fri Apr 5 10:21:11 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

ubuntu@bobone:~$ sudo /sbin/hwclock --set --date "2004/11/22 04:10:00"; sudo /sbin/hwclock
2019-05-15 14:36:33.802253+0000

ubuntu@bobone:~$ sudo /sbin/hwclock --set --date "2004/11/22 04:10:00"; sudo /sbin/hwclock --debug
hwclock from util-linux 2.31.1
System Time: 1557931207.255519
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Last drift adjustment done at 1101096600 seconds after 1969
Last calibration done at 1101096600 seconds after 1969
Hardware clock is on UTC time
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
ioctl(4, RTC_UIE_ON, 0): Invalid argument
Waiting in loop for time from /dev/rtc0 to change
...got clock tick
Time read from Hardware Clock: 2019/05/15 14:40:08
Hw clock time : 2019/05/15 14:40:08 = 1557931208 seconds since 1969
Time since last adjustment is 456834608 seconds
Calculated Hardware Clock drift is 0.000000 seconds
2019-05-15 14:40:07.171582+0000

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-06-07 14:04 EDT-------
I don't know if this is a BMC issue either. With your latest note, I still do not understand the problem on Boston. Whether this is a HWCLOCK setting issue or HWCLOCK setting is working but have a long delay... Can you help clarify that?

Here is how you create a problem tracking for Boston.

1) You will need github ID
2) If you do not have access, send your github ID to Maurice Zipse to request for access. If you do not have github ID, get one.
3) Link to github issue: https://github.com/open-power/boston-openpower/issues/
4) Click on "New issue" top right to create a new issue for P9 Boston. Provide details of the problem so that we can understand your issue.

Let me know if you have any other question.

Mike Ranweiler (mranweil) wrote :

Opened issue 1423 on this - unfortunately the boston openpower is closed.

Mike Ranweiler (mranweil) wrote :

I tried some test fw for the boston:

ubuntu@ltc-boston25:~$ sudo /sbin/hwclock --set --date "2019/06/10 12:55:00"; sudo /sbin/hwclock --verbose
[sudo] password for ubuntu:
hwclock from util-linux 2.33.1
System Time: 1560377530.955292
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Last drift adjustment done at 1560189300 seconds after 1969
Last calibration done at 1560189300 seconds after 1969
Hardware clock is on UTC time
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
ioctl(4, RTC_UIE_ON, 0): Invalid argument
Waiting in loop for time from /dev/rtc0 to change
...got clock tick
Time read from Hardware Clock: 2019/06/10 17:55:01
Hw clock time : 2019/06/10 17:55:01 = 1560189301 seconds since 1969
Time since last adjustment is 1 second
Calculated Hardware Clock drift is 0.000000 seconds
2019-06-10 12:55:00.022780-05:00
ubuntu@ltc-boston25:~$

But I still have:
ubuntu@ltc-boston25:~$ sudo /sbin/hwclock --set --date "2004/11/21 04:10:00"; sudo strace -e ioctl "/sbin/hwclock"
ioctl(4, RTC_UIE_ON) = -1 EINVAL (Invalid argument)
ioctl(4, RTC_RD_TIME, {tm_sec=0, tm_min=10, tm_hour=10, tm_mday=21, tm_mon=10, tm_year=104, ...}) = 0
<snip>
ioctl(4, RTC_RD_TIME, {tm_sec=1, tm_min=10, tm_hour=10, tm_mday=21, tm_mon=10, tm_year=104, ...}) = 0
ioctl(4, RTC_RD_TIME, {tm_sec=1, tm_min=10, tm_hour=10, tm_mday=21, tm_mon=10, tm_year=104, ...}) = 0
2004-11-21 04:10:00.018862-06:00
+++ exited with 0 +++
ubuntu@ltc-boston25:~$

But it generally seems to work - if I don't include any args it does set correctly still:
ubuntu@ltc-boston25:~$ sudo /sbin/hwclock --set --date "2019/06/10 12:55:00"; sudo /sbin/hwclock
2019-06-10 12:54:59.982163-05:00
ubuntu@ltc-boston25:~$

Anything else you suggest?

Manoj Iyer (manjo) wrote :

I upgraded the firmware on tiselius (boston p9) and ran the testcase and it works as expected.

ubuntu@tiselius:~$ sudo /sbin/hwclock --set --date "2019/06/10 12:55:00"; sudo /sbin/hwclock
2019-06-10 12:54:59.969984+00:00
ubuntu@tiselius:~$

should I wait for the release version of the firmware before I upgrade the other boston systems?

Manoj Iyer (manjo) on 2019-06-18
Changed in ubuntu-power-systems:
status: Incomplete → Fix Committed
Manoj Iyer (manjo) wrote :

IBM indicated that they do not have an ETA for a GA release of the BMC firmware with this fix. Would the kernel team be ok with working around this issue in the testcase by adding a .5 delay?

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Po-Hsu Lin (cypressyew) wrote :

From my point of view,
I think we should leave the test case as-is, it helped us to identify this firmware issue. And it might be able to catch another FW issues on other platforms in the future.

For this specific HW, we can make a note for the test result by using this bug report.
Thanks.

Manoj Iyer (manjo) wrote :

@cypressyew that sounds reasonable, the only word of caution here is we should not dismiss future bugs detected on this system as "expected to fail", because the problem is in the 3rd party BMC firmware *and not* in the Pnor FW or Power9 hardware.

Manoj Iyer (manjo) on 2019-06-28
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
Po-Hsu Lin (cypressyew) on 2019-09-16
tags: added: hwclock
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers