Current PTP implementation does not support lag

Bug #1828017 reported by Eric MacDonald
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Alexander Kozyrev

Bug Description

The currently deployed implementation of PTP (Precision Time Protocol) does not support link aggregation.

Severity: Major - PTP does not work and will show alarm on hosts with lag

Steps to Reproduce: Configure PTP on a system with lagged management or oam interfaces.
                    Follow PTP provisioning steps.

Expected Behavior: No alarms and PTP works

Actual Behavior: PTP alarm is raised and PTP timestamping is not working for that host. If on a controller(s) then there is no accurate master and there may also be PTP Out-Of-Tolerance alarms raised on some/all subordinate hosts.

Reproducibility: Intermittent

System Configuration: Any system type with PTP configured

Branch/Pull Time/Commit: Current stream 19.01

Last Pass: Never

Timestamp/Logs: PTP logs are in user.log and daemon.log

Test Activity: collectd PTP monitor plugin development testing

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Alex Kozyrev (akozyrev)
importance: Undecided → High
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating given LAG deployments are fairly common for hardware redundancy reasons

tags: added: stx.2.0 stx.config stx.upstream
Changed in starlingx:
status: New → Triaged
Ghada Khalil (gkhalil)
tags: removed: stx.upstream
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/659158

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/659158
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=762fc1b5c9420e1b54e6a80020041b42460e3dfd
Submitter: Zuul
Branch: master

commit 762fc1b5c9420e1b54e6a80020041b42460e3dfd
Author: Alex Kozyrev <email address hidden>
Date: Tue May 14 16:21:21 2019 -0400

    Enable PTP support for Link Aggregation

    The Linux bonding driver aggregate multiple NICs into a single bonded
    interface of two or more NIC slaves. This bonded interface has no
    PTP hardware clock associated with it and cannot be used by the
    LinuxPTP for clock synchronization. Need to derive all the LAG slaves
    from the bonded interface and feed them to the LinuxPTP daemon instead
    of the bonded interface. VLAN interface names should be cleared from
    VLAN numbers as well in order for PTP to work.

    Change-Id: I022e2d74e6459825129107881dd585bb2a13f953
    Closes-bug: 1828017
    Signed-off-by: Alex Kozyrev <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
Download full text (3.2 KiB)

This issue was not resolved in load 2019-05-24 17:41:41 .

   PTP enabled install with below configuration is showing incorrect alarm in a lab have LAG configuration. Also worker nodes are not enabled with PTP.

system ptp-show
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 20dfb6de-9768-43e5-8e5b-8cd69a750382 |
| enabled | True |
| mode | hardware |
| transport | l2 |
| mechanism | e2e |
| isystem_uuid | 1b1c2588-d22a-4cce-9863-14803019a485 |
| created_at | 2019-05-27T15:19:57.833466+00:00 |
| updated_at | 2019-05-27T15:28:28.867380+00:00 |
+--------------+--------------------------------------+
fm alarm-list
+----------+-------------------------------------------------------------------------------------------+--------------------------------------+----------+------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-------------------------------------------------------------------------------------------+--------------------------------------+----------+------------------+
| 100.119 | Provisioned Precision Time Protocol (PTP) 'hardware' time stamping mode seems to be | host=compute-1.ptp | major | 2019-05-27T16:57 |
| | unsupported by this host | | | :30.836681 |
| | | | | |
| 100.119 | Provisioned Precision Time Protocol (PTP) 'hardware' time stamping mode seems to be | host=compute-0.ptp | major | 2019-05-27T16:49 |
| | unsupported by this host | | | :02.174386 |
| | | | | |
| 800.001 | Storage Alarm Condition: HEALTH_WARN [PGs are degraded/stuck or undersized]. Please check | cluster=0be8842d-826a-4b2c- | warning | 2019-05-27T16:20 |
| | 'ceph -s' for more details. | 89b2-53a54c1204f0 | | :34.839210 |
| | | | | |
+----------+-------------------------------------------------------------------------------------------+--------------------------------------+---------
compute-1:~$ sudo pmc -u -b 0 'GET PORT_DATA...

Read more...

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Re-opening based on Jeyan's testing. An additional fix is required.

Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/661645

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/661645
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=65cca0a22cc039c17d325e91232fe7c4d3cae197
Submitter: Zuul
Branch: master

commit 65cca0a22cc039c17d325e91232fe7c4d3cae197
Author: Alex Kozyrev <email address hidden>
Date: Mon May 27 13:44:55 2019 -0400

    Disable PCH sanity check on compute nodes (PTP slaves)

    There can be configurations (i.e. LAG or VLAN) that use several NICs
    with different PTP hardware clocks on management network.
    boundary_clock_jbod allows such configuration and needs to be enabled.
    Before we used this option on controller nodes only because of PTP
    enabling on both OAM and management network interfaces. With LAG
    support this option needs to be enabled on compute nodes as well.

    Change-Id: I523844fce76bd2a90332d31e6104e4eaa36c6ca1
    Closes-bug: 1828017
    Signed-off-by: Alex Kozyrev <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

verified in BUILD_ID="2019-05-30_16-58-16"

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.