Active controller became degraded after lock/unlock compute node

Bug #1856064 reported by Peng Peng
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Paul-Ionut Vaduva

Bug Description

Brief Description
-----------------
After lock/unlock one compute node, the active controller became degraded. 200.006 alarm raised.
After active controller force reboot, the system was recovered and alarm was cleared.

Severity
--------
Major

Steps to Reproduce
------------------
as description

TC-name: mtc/test_lock_unlock_host.py::test_lock_unlock_host[compute]

Expected Behavior
------------------

Actual Behavior
----------------

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
Multi-node system
IPv4

Lab-name: WCP_3-6

Branch/Pull Time/Commit
-----------------------
2019-12-10_20-00-00

Last Pass
---------
2019-12-10_20-00-00 on (WP_8-12)

Timestamp/Logs
--------------
[2019-12-11 08:58:20,124] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-12-11 08:58:21,300] 433 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

[2019-12-11 08:58:22,661] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-0'

[2019-12-11 08:59:40,320] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock compute-0'

[2019-12-11 09:05:59,264] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-12-11 09:06:00,442] 433 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | degraded |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[sysadmin@controller-0 ~(keystone_admin)]$

[2019-12-11 09:11:08,717] 311 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-12-11 09:11:09,693] 433 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------+--------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------+--------------------------------+----------+----------------------------+
| 26e10dab-15dd-45ee-b5ac-4ae73bb5db8d | 200.006 | controller-0 is degraded due to the failure of its 'ceph' process. Auto recovery of this major process is in progress. | host=controller-0.process=ceph | major | 2019-12-11T09:00:12.697608 |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------+--------------------------------+----------+----------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Yang Liu (yliu12)
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Waiting from triage by Dan to understand if this issue is introduced by recent code changes related to: https://review.opendev.org/#/c/695917/

Changed in starlingx:
status: New → Triaged
tags: added: stx.config stx.storage
Changed in starlingx:
assignee: nobody → Dan Voiculeasa (dvoicule)
status: Triaged → New
tags: removed: stx.config
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per Frank Miller, this was introduced by https://review.opendev.org/#/c/695917/
Given that this change is in stx.3.0, we need this LP to be fixed in the next stx.3.0 maintenance release.

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
tags: added: stx.3.0
Revision history for this message
Peng Peng (ppeng) wrote :

Issue seems reproduced on
Lab: WCP_71_75
Load: 2019-12-22_20-00-00

After compute node force reboot, activer controller became degraded.

[2019-12-23 09:13:41,521] 166 INFO MainThread host_helper.reboot_hosts:: Rebooting compute-0
[2019-12-23 09:13:41,521] 311 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'

[2019-12-23 09:15:51,899] 476 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-12-23 09:15:51,899] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-12-23 09:15:53,064] 433 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | disabled | offline |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | compute-2 | worker | unlocked | enabled | available |
| 5 | controller-1 | controller | unlocked | enabled | degraded |
+----+--------------+-------------+----------------+-------------+--------------+

Revision history for this message
Frank Miller (sensfan22) wrote :

Dan has a proposed fix in stx-ceph:
https://github.com/starlingx-staging/stx-ceph/pull/36

Ghada Khalil (gkhalil)
tags: added: stx.4.0
Revision history for this message
Dan Voiculeasa (dvoicule) wrote :

Stuck peering detection is based on ceph health output.
When an osd goes down, the pair may or may not go into stuck peering
for a brief moment. The issue is that ceph reports the time since the
osd peered succesfully instead of time since the pair got down.

The desired behavior is to detect real stuck peering above a threshold.

Implemented.

Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Lab: WCP_63_66
Load: 2020-02-19_20-00-00

log attached

Yang Liu (yliu12)
Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
Yang Liu (yliu12) wrote :
Download full text (5.3 KiB)

Just to be clear, the logs Peng Peng attached in comment #7, is for verification failure.

Email thread pasted:

Hi Dan,

Please put your comment in the ticket. What is you suggestion for next step? Should we reopen it for more investigation?

Thanks,
Peng

From: Voiculeasa, Dan
Sent: Monday, February 24, 2020 12:32 PM
To: Peng, Peng
Cc: Liu, Yang (YOW)
Subject: Re: LP-1856064 is reproduced

Hello,

Yes, the fix for the identified issue when investigating LP-1856064 is in that load.
It correctly detects stuck peering OSDs that are not false positives determined by host-lock operation.

Not sure if the issue at hand is related to lock-unlock. Seems an osd is in a wrong state.

var/log/bash.log:2020-02-21T15:16:12.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-0
var/log/bash.log:2020-02-21T15:17:11.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-1
var/log/bash.log:2020-02-21T15:25:49.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-0

log/bash.log:2020-02-21T15:19:10.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1
log/bash.log:2020-02-21T15:27:39.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock compute-0

# Successful restart on osd.1 of controller-0

020-02-21 15:39:02.413 /etc/init.d/ceph osd.1 WARN: Detected stuck peering for 202 seconds
2020-02-21 15:39:02.427 /etc/init.d/ceph-init-wrapper osd.1 INFO: Restarting OSD stuck peering
2020-02-21 15:39:02.947 /etc/init.d/ceph osd.1 INFO: Stopping process
2020-02-21 15:39:04.012 /etc/init.d/ceph osd.1 INFO: Process stopped, setting state to STOPPED
2020-02-21 15:39:04.151 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:39:04.569 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:39:05.473 /etc/init.d/ceph-init-wrapper - INFO: Ceph START command received
...

Read more...

Revision history for this message
Frank Miller (sensfan22) wrote :

Re-assigning to Paul to investigate and identify a solution. This appears to be an issue on IPv6 configurations.

Changed in starlingx:
assignee: Dan Voiculeasa (dvoicule) → Paul-Ionut Vaduva (pvaduva)
Revision history for this message
Peng Peng (ppeng) wrote :

Issue was reproduced on
Lab: WCP_71_75
Load: 2020-03-05_04-10-00

New log added @
https://files.starlingx.kube.cengn.ca/launchpad/1856064

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/712117

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
Paul-Ionut Vaduva (pvaduva) wrote :

There are 2 aspects to this bug. Two reasons why this appears:
 * The fact that on a lock unlock compute-0 the ceph osd enters stuck peering
 * The fact that ceph osd fails to exit suck peering

The second issue is addressed by the proposed partial-bug commit, https://review.opendev.org/712117

The first is a more complex problem as it is concerned with what ip does ceph uses to connect to it's
distributed components like mons and osds. The ip that ceph components use to listen to incoming
connection is configurable in /etc/ceph/ceph.conf however the outgoing ip address is not and can
and does on ocasions be the floating ip or the controller-platform-nfs (also floating) which are
all assigned on the management network interface. When a connection is instantiated using one of
those floating ips the connection is momentarily interrupted on a host lock/unlock compute-0 or
during a host-swact.

Revision history for this message
Paul-Ionut Vaduva (pvaduva) wrote :
Download full text (3.3 KiB)

For the first part there is an elegant solution: force ongoing connections to initiate from the static ip of a controller. This behavior is ensured when preferred_lft 0 is set on an ip address of an interface (as described here: http://www.davidc.net/networking/ipv6-source-address-selection-linux).
This would allow us to assign a floating IP and a static ip from the same subnet to the management network but at the same time make sure that all outgoing connections to ip's on other hosts in this subnet are initiated from the static ip.

In the example below we have the active controller with 3 ip addresses on the management interface, one static and two floating. The floating IPs move from controller to controller on swact:
1. static: fd01::2/64
2. floating: fd01::1/64
3. floating: fd01::4/64

 Connections w/o the option:
====================
[And give example here]

We can see that there are connections opened from the floating IPs.

 Connection with the option:
====================
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::6800 :::* LISTEN 2587586/ceph-osd
tcp6 0 0 :::6801 :::* LISTEN 2587586/ceph-osd
tcp6 0 0 :::6802 :::* LISTEN 2587586/ceph-osd
tcp6 0 0 :::6803 :::* LISTEN 2587586/ceph-osd
tcp6 0 0 fd01::2:6789 :::* LISTEN 2587301/ceph-mon
tcp6 0 0 fd01::2:40924 fd01::3:6789 ESTABLISHED 2587586/ceph-osd
tcp6 0 0 fd01::2:40888 fd01::3:6789 ESTABLISHED 2587301/ceph-mon
tcp6 0 0 fd01::2:45520 fd01::3:6800 ESTABLISHED 2587586/ceph-osd
tcp6 0 0 fd01::2:43872 fd01::3:6804 ESTABLISHED 2587586/ceph-osd
tcp6 0 0 fd01::2:6801 fd01::3:47520 ESTABLISHED 2587586/ceph-osd
tcp6 0 0 fd01::2:6802 fd01::3:53458 ESTABLISHED 2587586/ceph-osd
tcp6 0 0 fd01::2:6803 fd01::3:48860 ESTABLISHED 2587586/ceph-osd
tcp6 0 0 fd01::2:48158 fd01::15f6:1bc9:22:6789 ESTABLISHED 2587301/ceph-mon
tcp6 0 0 fd01::2:58020 fd01::3:6803 ESTABLISHED 2587586/ceph-osd
tcp6 0 0 fd01::2:34342 fd01::3:6800 ESTABLISHED 81962/ceph-mgr
tcp6 0 0 fd01::2:45484 fd01::3:6800 ESTABLISHED 2587301/ceph-mon
tcp6 0 0 fd01::2:48000 fd01::15f6:1bc9:22:6789 ESTABLISHED 81962/ceph-mgr

How to configure the option by hand
===================

Result:
6: vlan10@enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP group default qlen 1000
    link/ether 08:00:27:66:a6:e8 brd ff:ff:ff:ff:ff:ff
    inet6 fd01::1/64 scope global deprecated
       valid_lft forever preferred_lft 0sec
    inet6 fd01::4/64 scope global deprecated
       valid_lft forever prefe...

Read more...

Revision history for this message
Paul-Ionut Vaduva (pvaduva) wrote :

Connections w/o the option:
====================

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::6800 :::* LISTEN 2856080/ceph-osd
tcp6 0 0 :::6801 :::* LISTEN 2856080/ceph-osd
tcp6 0 0 :::6802 :::* LISTEN 2856080/ceph-osd
tcp6 0 0 :::6803 :::* LISTEN 2856080/ceph-osd
tcp6 0 0 fd01::2:6789 :::* LISTEN 2855692/ceph-mon
tcp6 0 0 fd01::1:6802 fd01::3:45354 ESTABLISHED 2856080/ceph-osd
tcp6 0 0 fd01::1:33368 fd01::3:6800 ESTABLISHED 2856080/ceph-osd
tcp6 0 0 fd01::1:42210 fd01::15f6:1bc9:22:6789 ESTABLISHED 2855692/ceph-mon
tcp6 0 0 fd01::1:33338 fd01::3:6800 ESTABLISHED 2855692/ceph-mon
tcp6 0 0 fd01::1:57248 fd01::3:6804 ESTABLISHED 2856080/ceph-osd
tcp6 0 0 fd01::1:48840 fd01::3:6802 ESTABLISHED 2856080/ceph-osd
tcp6 0 0 fd01::1:54172 fd01::3:6803 ESTABLISHED 2856080/ceph-osd
tcp6 0 0 fd01::1:38900 fd01::3:6789 ESTABLISHED 2856080/ceph-osd
tcp6 0 0 fd01::1:6803 fd01::3:47868 ESTABLISHED 2856080/ceph-osd
tcp6 0 0 fd01::1:38868 fd01::3:6789 ESTABLISHED 2855692/ceph-mon
tcp6 0 0 fd01::2:34342 fd01::3:6800 ESTABLISHED 81962/ceph-mgr
tcp6 0 0 fd01::2:48000 fd01::15f6:1bc9:22:6789 ESTABLISHED 81962/ceph-mgr

We can see that there are connections opened from the floating IPs.

How to configure the option by hand
===================
Just add preferred_lft 0 flag when you add a floating ip address to an interface
sudo ip -6 addr add fd01::1/64 dev vlan10 preferred_lft 0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/714812

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on integ (master)

Change abandoned by Paul-Ionut Vaduva (<email address hidden>) on branch: master
Review: https://review.opendev.org/714812
Reason: Modify the puppet as opposed to ocf libs.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/715120

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/712117
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=bed7388b678b9eda0d06b4d16fb00711741f9ef0
Submitter: Zuul
Branch: master

commit bed7388b678b9eda0d06b4d16fb00711741f9ef0
Author: Paul Vaduva <email address hidden>
Date: Tue Mar 10 12:05:31 2020 -0400

    Release FDs when stuck peering recovery

    During stuck peering recovery if file descriptors are
    not released the state machine does not advance to
    OPERATIONAL state

    Partial-bug: 1856064

    Change-Id: I3fba7be661ebf223eac63608574323ad98d33b75
    Signed-off-by: Paul Vaduva <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716162

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/715120
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=07edad67cc55caf4726d3db3529c8e71fff6254e
Submitter: Zuul
Branch: master

commit 07edad67cc55caf4726d3db3529c8e71fff6254e
Author: Paul Vaduva <email address hidden>
Date: Thu Mar 26 03:09:47 2020 +0200

    Set preferred_lft to 0 for mgmt and nfs floating ips

    For ipv6 the only way to prefer the fixed ip for
    outgoing connection is to set preferred_lft to 0 for
    the floating ips

    Change-Id: I13573ac4628db1fc49146f353d7eb2c96eb1aff0
    Closes-bug: 1856064
    Signed-off-by: Paul Vaduva <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (f/centos8)
Download full text (10.7 KiB)

Reviewed: https://review.opendev.org/716162
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=246f33226dbb50a4c5e86d497df745120ca9e0e4
Submitter: Zuul
Branch: f/centos8

commit a745a5b6f8a02b74f69f828f14960e97a758853c
Author: Jim Somerville <email address hidden>
Date: Fri Mar 20 10:36:14 2020 -0400

    Kernel: Workaround broken bios affecting iommu

    Problem:
    Broken bios creates inaccurate DMAR tables,
    reporting some bridges as having endpoint types.
    This causes IOMMU initialization to bail
    out early with an error code, the result of
    which is vfio not working correctly.
    This is seen on some Skylake based Wolfpass
    server platforms with up-to-date bios installed.

    Solution:
    Instead of just bailing out of IOMMU
    initialization when such a condition is found,
    we report it and continue. The IOMMU ends
    up successfully initialized anyway. We do this
    only on platforms that have the Skylake bridges
    where this issue has been seen.

    This change is inspired by a similar one posted by
    Lu Baolu of Intel Corp to lkml

    https://lkml.org/lkml/2019/12/24/15

    Change-Id: Ief2df7099b6118eab7f99d5531616926a7a3eb27
    Closes-Bug: 1847335
    Signed-off-by: Jim Somerville <email address hidden>

commit 1435fe178ab88aa2b77970a3c07e8a907477a654
Author: Jim Somerville <email address hidden>
Date: Mon Mar 16 16:16:20 2020 -0400

    Build mpt2sas and mpt3sas drivers as modules

    History:
    Back in the day, we didn't have an initramfs
    to allow us to load disk drivers as modules. All
    disk drivers had to be built-in. In CentOS 7.3,
    the mpt2sas and mpt3sas driver code was reorganized
    to allow for a common code base. But along with that,
    those drivers would only now build as modules. We
    created a patch which involved taking a snapshot of
    mpt driver code, and massaged it all into building
    as built-in drivers.

    Problem:
    That old code snapshot along with the fact
    that those two drivers initialize without their
    associated hardware being present (they are built-in),
    seems to cause interference with some other LSI raid
    controllers, namely Harpoon in AVAGO MR9460-8i via a
    Huawei enclosure.

    Solution:
    Simply revert to building those two mptsas drivers as
    modules, the way CentOS intended. They will reside
    on initramfs and be loaded automatically if the
    appropriate hardware is present. With these drivers now
    out of the way, the problematic raid controller works
    fine, driven by the megaraid_sas driver.

    Change-Id: I054c2396df4e659c324e70bffcf3940ad93c9354
    Closes-Bug: 1866293
    Signed-off-by: Jim Somerville <email address hidden>

commit bed7388b678b9eda0d06b4d16fb00711741f9ef0
Author: Paul Vaduva <email address hidden>
Date: Tue Mar 10 12:05:31 2020 -0400

    Release FDs when stuck peering recovery

    During stuck peering recovery if file descriptors are
    not released the state machine does not advance to
    OPERATIONAL state

    Partial-bug: 1856064

    Change-Id: I3fba7be661ebf22...

tags: added: in-f-centos8
Revision history for this message
Peng Peng (ppeng) wrote :

Verified on
Lab: WCP_71_75
Load: 2020-04-20_20-00-00

tags: removed: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Paul/Frank, This LP is marked as gating for stx.3.0. Please cherry-pick the code changes to the stx.3.0 branch if applicable or add a note explaining why it shouldn't be cherry-picked.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729825

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (f/centos8)
Download full text (16.7 KiB)

Reviewed: https://review.opendev.org/729825
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=d4617fbad74a05f2af81ee85a47565083991e6f8
Submitter: Zuul
Branch: f/centos8

commit 4134023ab84d8a635b118d5e3ff26ade3bbe535b
Author: Sharath Kumar K <email address hidden>
Date: Thu May 7 10:08:11 2020 +0200

    Tox and Zuul job for the bandit code scan in stx/stx-puppet

    Setting up the bandit tool for the scanning of HIGH severity issues
    in the python codes under Starlingx/stx-puppet folder.
    Expecting this merge will enable zuul job for CI/CD of bandit scan.

    Configuration files:
    1. tox.ini for adding bandit environment and command.
    2. test-requirements.txt for adding bandit version.
    3. .zuul.yaml file for adding bandit job and configuring under
       check job to run code scan every time before code commit.

    Test:
    Run tox -e bandit command inside the fault folder to validate the
    bandit scan and result.

    Story: 2007541
    Task: 39687
    Depends-On: https://review.opendev.org/#/c/721294/

    Change-Id: I2982268db2b5e75feeb287bc95420fedc9b0d816
    Signed-off-by: Sharath Kumar K <email address hidden>

commit 65daac29e4635f32a57e80cd18f96fd59dc8ebe0
Author: Bin Qian <email address hidden>
Date: Tue May 12 22:39:21 2020 -0400

    DC cert manifest should only apply to controller nodes

    DC cert manifest should only apply to controller nodes on system
    controller.
    This fix is for DC with worker nodes in central cloud.

    Change-Id: I4233509a6f0afb3013c01e81dea6f655d9e15371
    Closes-Bug: 1878260
    Signed-off-by: Bin Qian <email address hidden>

commit 04a3cb8cbad9b1700286c5de67aa5d974cf54400
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 08:44:13 2020 +0000

    Changing permissions for conversion folder

    Adding writing permissions to '/opt/conversion' mountpoint
    so openstack image conversion can happen there.

    Change-Id: Id1a91db6570dcbed3b8068e79e72f5bb800f24ad
    Partial-bug: 1819688
    Signed-off-by: Elena Taivan <email address hidden>

commit 4e9153cf234e714e4bbc9a9eb3d9b55b2828145a
Author: Tao Liu <email address hidden>
Date: Mon May 4 14:30:30 2020 -0500

    Move subcloud audit to separate process

    Subcloud audit is being removed from the dcmanager-manager
    process and it is running in dcmanager-audit process.

    This update adds associated puppet config.

    Story: 2007267
    Task: 39640
    Depends-On: https://review.opendev.org/#/c/725627/

    Change-Id: Idd2e675126a01d6113597646ddd9eb4a0bc5be44
    Signed-off-by: Tao Liu <email address hidden>

commit b793518f65ae932f3974ff85b797f505b5ef1c2a
Author: Robert Church <email address hidden>
Date: Wed Apr 29 12:49:04 2020 -0400

    Ensure containerd binds to the loopback interface

    Set the stream_server_address to bind to the loopback interface with a
    value of "127.0.0.1" for IPv4 and "::1" for IPv6.

    Without setting the stream_server_address in config.toml, containerd was
    binding to the OAM interface. Under most situations this resulted in
    containe...

Revision history for this message
Bill Zvonar (billzvonar) wrote :

Paul/Frank - reminder: This LP is marked as gating for stx.3.0. Please cherry-pick the code changes to the stx.3.0 branch if applicable or add a note explaining why it shouldn't be cherry-picked.

tags: added: stx.cherrypickneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/749482

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/749483

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (r/stx.3.0)

Reviewed: https://review.opendev.org/749482
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=d54b54b2cbdcec16beffa24b1e9418f0a1aad826
Submitter: Zuul
Branch: r/stx.3.0

commit d54b54b2cbdcec16beffa24b1e9418f0a1aad826
Author: Paul Vaduva <email address hidden>
Date: Thu Mar 26 03:09:47 2020 +0200

    Set preferred_lft to 0 for mgmt and nfs floating ips

    For ipv6 the only way to prefer the fixed ip for
    outgoing connection is to set preferred_lft to 0 for
    the floating ips

    Change-Id: I13573ac4628db1fc49146f353d7eb2c96eb1aff0
    Closes-bug: 1856064
    Signed-off-by: Paul Vaduva <email address hidden>
    (cherry picked from master commit 07edad67cc55caf4726d3db3529c8e71fff6254e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (r/stx.3.0)

Reviewed: https://review.opendev.org/749483
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=23ea1d7ed9da72742d65fb48945c100e00c1302b
Submitter: Zuul
Branch: r/stx.3.0

commit 23ea1d7ed9da72742d65fb48945c100e00c1302b
Author: Paul Vaduva <email address hidden>
Date: Tue Mar 10 12:05:31 2020 -0400

    Release FDs when stuck peering recovery

    During stuck peering recovery if file descriptors are
    not released the state machine does not advance to
    OPERATIONAL state

    Partial-bug: 1856064

    Change-Id: I3fba7be661ebf223eac63608574323ad98d33b75
    Signed-off-by: Paul Vaduva <email address hidden>
    (cherry picked from master commit bed7388b678b9eda0d06b4d16fb00711741f9ef0)

Bill Zvonar (billzvonar)
tags: removed: stx.cherrypickneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.