Packet loss between master clock and slave with PTP configuration

Bug #1824218 reported by Anujeyan Manokeran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Alexander Kozyrev

Bug Description

Brief Description
-----------------
   During test on Inflexdb sample data for PTP configuration it was observed that there was always time offset value for all the nodes are 0 nanoseconds all the time. Further investigation showed that master clock which was running on controller-1 inability to send PTP packets and timeouts. This was observed in different configuration of PTP such as hardware and software mode with l2 and udp.
2019-04-09T13:40:27.000 controller-1 ptp4l: err [472.249] timed out while polling for tx timestamp
2019-04-09T13:40:27.000 controller-1 ptp4l: err [472.249] increasing tx_timestamp_timeout
2019-04-09T13:40:27.000 controller-1 ptp4l: err [472.249] port 2: send peer delay request failed
2019-04-09T15:01:22.000 controller-1 ptp4l: err [40.661] uds: sendto failed: No such file or directory
2019-04-09T15:06:35.000 controller-1 ptp4l: info [354.353] subscriber 000000.0000.000000-1 timed out

And slave clocks warns us about some issues with master clock communication:
2019-04-09T15:05:44.000 controller-0 ptp4l: warning [64660.943] foreign master not using PTP timescale
name: ptp_value
---------------
time host type type_instance value
1554821967317256000 compute-1 time_offset nsec 0
1554821927602734000 controller-0 time_offset nsec 0
1554821925682506000 compute-0 time_offset nsec 0
1554821907316409000 compute-1 time_offset nsec 0
1554821867601747000 controller-0 time_offset nsec 0
1554821865681648000 compute-0 time_offset nsec 0
1554821847315126000 compute-1 time_offset nsec 0
1554821147602777000 controller-0 time_offset nsec 0
1554821087602274000 controller-0 time_offset nsec 0
1554821027602698000 controller-0 time_offset nsec 0
1554820981347486000 compute-0 time_offset nsec 0
1554820967602316000 controller-0 time_offset nsec 0
1554820932358006000 compute-1 time_offset nsec 0
1554820921347174000 compute-0 time_offset nsec 0
1554820907602187000 controller-0 time_offset nsec 0
1554820872358392000 compute-1 time_offset nsec 0

name: ptp_value
sudo /usr/sbin/pmc -u -b 0 'GET TIME_STATUS_NP'
sending: GET TIME_STATUS_NP
        001e67.fffe.54aa38-0 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
                master_offset 0
                ingress_time 0
                cumulativeScaledRateOffset +0.000000000
                scaledLastGmPhaseChange 0
                gmTimeBaseIndicator 0
                lastGmPhaseChange 0x0000'0000000000000000.0000
                gmPresent false
                gmIdentity 001e67.fffe.54aa38
controller-0:~$ sudo /usr/sbin/pmc -u -b 0 'GET TIME_STATUS_NP'
sending: GET TIME_STATUS_NP
        001e67.fffe.54aa38-0 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
                master_offset 0
                ingress_time 0
                cumulativeScaledRateOffset +0.000000000
                scaledLastGmPhaseChange 0
                gmTimeBaseIndicator 0
                lastGmPhaseChange 0x0000'0000000000000000.0000
                gmPresent true
                gmIdentity 001e67.fffe.38c3e4
controller-0:~$ sudo /usr/sbin/pmc -u -b 0 'GET TIME_STATUS_NP'
sending: GET TIME_STATUS_NP
        001e67.fffe.54aa38-0 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
                master_offset 0
                ingress_time 0
                cumulativeScaledRateOffset +0.000000000
                scaledLastGmPhaseChange 0
                gmTimeBaseIndicator 0
                lastGmPhaseChange 0x0000'0000000000000000.0000
                gmPresent true
                gmIdentity 001e67.fffe.38c3e4
| | |

Severity
--------
Provide the severity of the defect.
Major
Steps to Reproduce
------------------
1. Install any regular system
2. Modify NTP to PTP
system ntp-modify -–enabled False
system ptp-modify –enabled True --mode=software --transport=l2 mechanism=p2p
3. lock and unlock all the hosts
4. execute influx -database=collectd -execute="SELECT * FROM ptp_value WHERE type='time_offset' AND type_instance='nsec' ORDER by time DESC LIMIT 16 verify offset . Verify offset with master clock reset when master clock is standby controller.

Expected Behavior
------------------
Master clock able to send packets to slave in other hosts.

Actual Behavior
----------------
As per description master clock from controller-1 inability to send packets to slave.
Reproducibility
---------------
N/A
System Configuration
--------------------
Regular system
Branch/Pull Time/Commit
-----------------------
2019-04-07 23:30:01
Last Pass
---------
n/a
Timestamp/Logs
--------------
n/a
Test Activity
-------------
Feature test

tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; issue with ptp functionality; requires further investigation.

tags: added: stx.2.0 stx.config
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Alex Kozyrev (akozyrev)
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

This issue was not reproduce when PTP configuration mode=hardware mode was installed as PTP configuration in same lab . Mast clock was able lock to slave.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/652198

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/652198
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=e74d087b81d761caf3063a815284e11e9f12872c
Submitter: Zuul
Branch: master

commit e74d087b81d761caf3063a815284e11e9f12872c
Author: Alex Kozyrev <email address hidden>
Date: Sat Apr 13 09:21:51 2019 -0400

    Disable PHC sanity check in case of software PTP mode.

    boundary_clock_jbod performs a sanity check to make sure
    that all of the ports share the same hardware clock device.
    This option is not needed in case of software PTP mode.
    Moreover it interferes with normal PTP operation in this
    case and causes PTP clocks instability in a network.

    Also, cleaning up unused pmon scripts for ptp4l and phc2sys
    and adding services dependencies from mainline linuxptp.

    Change-Id: If4bbe6af600dbdf38d301deafb7dc050a7754cad
    Closes-bug: 1824218
    Signed-off-by: Alex Kozyrev <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

Verified in load "20190506T233000Z"

tags: removed: stx.2.0 stx.config stx.retestneeded
Ghada Khalil (gkhalil)
tags: added: stx.2.0 stx.config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.