linuxptp ts2phc master offset spikes on realtime systems
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Cole Walker |
Bug Description
Brief Description
-----------------
On a realtime system, configuring ts2phc to source time from GNSS results in the master offset value intermittently spiking and causes the system time to be unstable.
Severity
--------
Provide the severity of the defect.
Major
Steps to Reproduce
------------------
Use a config like:
controller-0:~$ cat /etc/ptpinstanc
[global]
##
## Default Data Set
##
leapfile /usr/share/
logging_level 7
ts2phc.
ts2phc.pulsewidth 100000000
[enp138s0f0]
##
## Associated interface: data0
##
ts2phc.
[enp81s0f0]
##
## Associated interface: oam0
##
ts2phc.
Observe the master offset value in /var/log/user.log and it will occasionally spike by 1 second or more before attempting to stabilize again. This coincides with a high nmea_delay value.
Expected Behavior
------------------
Master offset value should hover close to 0 at all times.
Actual Behavior
----------------
Unstable master offset
Reproducibility
---------------
100% reproducible on realtime system, issue occurs multiple times per hour.
System Configuration
-------
AIO-SX
Branch/Pull Time/Commit
-------
stx master
Last Pass
---------
New scenario
Timestamp/Logs
--------------
Attach the logs for debugging (use attachments in Launchpad or for large collect files use: https:/
Provide a snippet of logs here and the timestamp when issue was seen.
Please indicate the unique identifier in the logs to highlight the problem
Test Activity
-------------
Developer testing
Workaround
----------
Manually change niceness of ice-gnss thread to be better than default.
tags: | added: stx.7.0 stx.networking |
Changed in starlingx: | |
assignee: | nobody → Cole Walker (cwalops) |
Changed in starlingx: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in starlingx: | |
status: | Triaged → In Progress |
Changed in starlingx: | |
status: | Fix Released → In Progress |
Reviewed: https:/ /review. opendev. org/c/starlingx /utilities/ +/839795 /opendev. org/starlingx/ utilities/ commit/ 9183ef96db02faa 9beeaa4021ad59e 06ae6627ce
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 9183ef96db02faa 9beeaa4021ad59e 06ae6627ce
Author: Cole Walker <email address hidden>
Date: Thu Apr 28 11:58:02 2022 -0400
[PTP SyncE] Set niceness -10 for ice-gnss threads
Problem: the master offset value in ts2phc intermittently spikes and
causes the system to be incorrectly adjusted.
This behaviour is seen when using the Intel Westport Channel NIC and ice
driver 1.7.16 on a realtime kernel.
Analysis of the issue shows that the ice-gnss thread responsible for
reading from the GNSS and writing to the tty for consumption by ts2phc
is sometimes getting delayed on realtime systems. Examination of
typical workloads on the platform cores and discussion between Intel -
the driver supplier - and the StarlingX communitiy has lead to an
agreement to increase the priority of this thread.
Most of the processes ordinarily running on the platform cores run at
the default niceness of 0, so -10 has been selected to elevate the
ice-gnss thread above those while leaving room on either side for other
process tuning. It is also worth noting that the ice-gnss thread is
being left as SCHED_OTHER, so processes assigned to SCHED_FIFO may still
preempt it.
Testing:
PASS: Applied change to AIO-SX with Westport Channel NIC, ice-gnss
thread is set to nice -10 after host lock/unlock.
PASS: Cumulative 24 hours of ts2phc logs show no replication of fault
when thread niceness is set to -10. When the thread is nice 0, fault
occurs multiple times per hour.
Closes-bug: 1970776
Signed-off-by: Cole Walker <email address hidden> 1ab7406a5a1068f 896a06c8843
Change-Id: I1f45530f37ded1