Subcloud install fails if hwclock is out of sync

Bug #1995643 reported by Kyle MacLeod
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Kyle MacLeod

Bug Description

Brief Description
-----------------

Subcloud install fails if the hwclock is out of sync with the actual date.

Any SSL connection from the subcloud to system controller fails due to invalid certificates caused by the date/time mismatch.

Severity
--------

Major

Steps to Reproduce
------------------

Set the hwclock to be a few days off of the actual date/time.

Expected Behavior
-----------------

Expect that the time can be synchronized on the subcloud during boot.

Actual Behavior
---------------

The hwclock is left out of sync.

Reproducibility
---------------

Reproducible

System Configuration
--------------------

DC system with hardware/redfish subclouds.

Load info (eg: 2022-03-10_20-00-07)
---------

WRCP master

Last Pass
---------

n/a - this is debian

Test Activity
-------------

Developer Testing

Workaround
----------

Manually set hwclock on subcloud

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/metal/+/863563

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/863563
Committed: https://opendev.org/starlingx/metal/commit/54cba044f118820ae72510c22d95faf8e853c4fe
Submitter: "Zuul (22348)"
Branch: master

commit 54cba044f118820ae72510c22d95faf8e853c4fe
Author: Kyle MacLeod <email address hidden>
Date: Thu Nov 3 17:06:25 2022 -0400

    Basic time sync if required on subcloud install

    This commit avoids install/bootstrap issues when the hwclock
    is very far out of date with the system controller.

    If the hwclock is more than approximiately 20m different than the
    system controller then we set hwclock based on the system date

    LAT initializes the system date based on the 'instdate' boot parameter.
    The instdate boot parameter is the timestamp applied when the miniboot
    bootimage.iso file is created on the system controller. It is close
    enough to avoid any major out-of-sync system clock on the subcloud.

    Secondary change: Optimize the interface assignment prior to ostree pull
    This was mentioned during review
    https://review.opendev.org/c/starlingx/metal/+/861017
    Rather than employ sleeps, the /sys/class/net/${mgmt_dev}/operstate
    file is used to determine when the interface has settled.
    This is much more efficient than sleeps. We timeout after 60s.

    Test Plan:
    PASS:
    - Install with subcloud in relative sync with system controller
      (within seconds). No hwclock change is applied to subcloud.
    - Install when subcloud hwclock is more than 20m out of sync with
      system controller.
        - Verify that the hwclock is updated on the subcloud before
          ostree pull is initiated.
        - Verify that system date is proper on the first post-miniboot
          boot into the ostree installation.
    - Test interface assignment wait/timeout functionality before
      ostree pulls. This is done on hardware subclouds (success path)
      and in sushy subclouds where failure mode testing was done
      by simulating stuck inteface operstate values.

    Closes-Bug: 1995643
    Signed-off-by: Kyle MacLeod <email address hidden>
    Change-Id: Ieddc774f962878f3c7f5886148310b87d4ffddfe

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Kyle MacLeod (kmacleod)
tags: added: stx.8.0 stx.metal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.