StarlingX

After DOR Test calico-node pods crashLoopBackoff never recovered

Bug #1844579 reported by Anujeyan Manokeran on 2019-09-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	High	Joseph Richard

Bug Description

Brief Description
-----------------
After doing DOR(Dead office Recovery) test calico-node pods crashLoopBackoff as below.
Also alarm appeared lighttpd alarm appeared as below https://bugs.launchpad.net/starlingx/+bug/1844456 .

+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | degraded |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | compute-2 | worker | unlocked | enabled | available |
| 5 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[sysadmin@controller-0 log(keystone_admin)]$
controller-0:/var/log# cat lighttpd-error.log
2019-09-17 06:27:49: (server.c.1472) server started (lighttpd/1.4.52)
2019-09-17 06:40:13: (server.c.2067) server stopped by UID = 0 PID = 1
fm alarm-list
+----------+----------------------------------------------------------------------------+------------------------------+----------+---------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+----------------------------------------------------------------------------+------------------------------+----------+---------------------+
| 400.001 | Service group web-services degraded; lighttpd(enabled-active, failed) | service_domain=controller. | major | 2019-09-17T20:53:19 |
| | | service_group=web-services. | | .183650 |
| | | host=controller-0 | | |
| | | | | |
| 100.114 | NTP address 2607:5300:60:92e7::1 is not a valid or a reachable NTP server. | host=controller-0.ntp=2607: | minor | 2019-09-17T13:56:32 |
| | | 5300:60:92e7::1 | | .157376 |
| | | | | |
| 100.114 | NTP address 2607:5300:60:3308::1 is not a valid or a reachable NTP server. | host=controller-1.ntp=2607: | minor | 2019-09-17T13:35:00 |
| | | 5300:60:3308::1 | | .785675 |
| | | | | |
+----------+----------------------------------------------------------------------------+------------------------------+----------+---------------------+
kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-767467f9cf-jrx52 1/1 Running 1 4m24s dead:beef::8e22:765f:6121:eb4f controller-0 <none> <none>
calico-node-8f4l2 0/1 Running 5 13h face::d299:e6a8:7070:7bb0 compute-0 <none> <none>
calico-node-9zwtk 1/1 Running 3 14h face::3 controller-0 <none> <none>
calico-node-d4tff 0/1 Running 4 13h face::fccf:ae0b:600a:f8a4 compute-2 <none> <none>
calico-node-hmrgr 1/1 Running 2 13h face::4 controller-1 <none> <none>
calico-node-lrbxb 0/1 Running 5 13h face::29dd:f3c5:7eb9:1d21 compute-1 <none> <none>
ceph-pools-audit-1568752200-g68wf 0/1 Completed 0 24m dead:beef::a4ce:fec1:5423:e331 controller-1 <none> <none>
ceph-pools-audit-1568752500-xwrvw 0/1 Completed 0 19m dead:beef::8e22:765f:6121:eb4a controller-0 <none> <none>
ceph-pools-audit-1568753400-zw4bl 0/1 Completed 0 4m25s dead:beef::8e22:765f:6121:eb4e controller-0 <none> <none>
coredns-7cf476b5c8-4vkzr 1/1 Running 3 7h22m dead:beef::8e22:765f:6121:eb4c controller-0 <none> <none>
coredns-7cf476b5c8-qcxph 1/1 Running 5 7h53m dead:beef::a4ce:fec1:5423:e332 controller-1 <none> <none>
kube-apiserver-controller-0 1/1 Running 6 14h face::3 controller-0 <none> <none>
kube-apiserver-controller-1 1/1 Running 4 13h face::4 controller-1 <none> <none>
kube-controller-manager-controller-0 1/1 Running 6 14h face::3 controller-0 <none> <none>
kube-controller-manager-controller-1 1/1 Running 2 13h face::4 controller-1 <none> <none>
kube-multus-ds-amd64-5l4dj 1/1 Running 0 3m3s face::29dd:f3c5:7eb9:1d21 compute-1 <none> <none>
kube-multus-ds-amd64-9n2cj 1/1 Running 0 2m48s face::d299:e6a8:7070:7bb0 compute-0 <none> <none>
kube-multus-ds-amd64-jltqk 1/1 Running 1 7h46m face::4 controller-1 <none> <none>
kube-multus-ds-amd64-n5hb8 1/1 Running 0 2m52s face::fccf:ae0b:600a:f8a4 compute-2 <none> <none>
kube-multus-ds-amd64-p8zsf 1/1 Running 1 7h15m face::3 controller-0 <none> <none>
kube-proxy-8mc4r 1/1 Running 3 14h face::3 controller-0 <none> <none>
kube-proxy-b72qz 1/1 Running 2 13h face::d299:e6a8:7070:7bb0 compute-0 <none> <none>
kube-proxy-g8k8n 1/1 Running 2 13h face::29dd:f3c5:7eb9:1d21 compute-1 <none> <none>
kube-proxy-gbvsx 1/1 Running 2 13h face::4 controller-1 <none> <none>
kube-proxy-l5qxx 1/1 Running 1 13h face::fccf:ae0b:600a:f8a4 compute-2 <none> <none>
kube-scheduler-controller-0 1/1 Running 6 14h face::3 controller-0 <none> <none>
kube-scheduler-controller-1 1/1 Running 2 13h face::4 controller-1 <none> <none>
kube-sriov-cni-ds-amd64-9hbm8 1/1 Running 0 2m33s face::fccf:ae0b:600a:f8a4 compute-2 <none> <none>
kube-sriov-cni-ds-amd64-9q8zk 1/1 Running 0 2m33s face::29dd:f3c5:7eb9:1d21 compute-1 <none> <none>
kube-sriov-cni-ds-amd64-d8fhg 1/1 Running 1 7h15m face::3 controller-0 <none> <none>
kube-sriov-cni-ds-amd64-dkm9z 1/1 Running 1 7h46m face::4 controller-1 <none> <none>
kube-sriov-cni-ds-amd64-hn5qc 1/1 Running 0 2m33s face::d299:e6a8:7070:7bb0 compute-0 <none> <none>
rbd-provisioner-65db585fd6-6w5r4 1/1 Running 4 7h53m dead:beef::a4ce:fec1:5423:e333 controller-1 <none> <none>
rbd-provisioner-65db585fd6-7kz9p 1/1 Running 3 7h22m dead:beef::8e22:765f:6121:eb4b controller-0 <none> <none>
tiller-deploy-7855f54f57-lrc6b 1/1 Running 1 7h22m face::4 controller-1 <none> <none>
kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-767467f9cf-jrx52 1/1 Running 1 32m dead:beef::8e22:765f:6121:eb4f controller-0 <none> <none>
calico-node-8f4l2 0/1 Running 12 13h face::d299:e6a8:7070:7bb0 compute-0 <none> <none>
calico-node-9zwtk 1/1 Running 3 14h face::3 controller-0 <none> <none>
calico-node-d4tff 0/1 CrashLoopBackOff 12 13h face::fccf:ae0b:600a:f8a4 compute-2 <none> <none>
calico-node-hmrgr 1/1 Running 2 14h face::4 controller-1 <none> <none>
calico-node-lrbxb 0/1 CrashLoopBackOff 13 13h face::29dd:f3c5:7eb9:1d21 compute-1 <none> <none>
ceph-pools-audit-1568754600-n8dtm 0/1 Completed 0 12m dead:beef::8e22:765f:6121:eb56 controller-0 <none> <none>
ceph-pools-audit-1568754900-dbdrq 0/1 Completed 0 7m19s dead:beef::a4ce:fec1:5423:e339 controller-1 <none> <none>
ceph-pools-audit-1568755200-pgbtq 0/1 Completed 0 2m19s dead:beef::a4ce:fec1:5423:e33a controller-1 <none> <none>
coredns-7cf476b5c8-4vkzr 1/1 Running 3 7h49m dead:beef::8e22:765f:6121:eb4c controller-0 <none> <none>
coredns-7cf476b5c8-qcxph 1/1 Running 5 8h dead:beef::a4ce:fec1:5423:e332 controller-1 <none> <none>
kube-apiserver-controller-0 1/1 Running 6 14h face::3 controller-0 <none> <none>
kube-apiserver-controller-1 1/1 Running 4 14h face::4 controller-1 <none> <none>
kube-controller-manager-controller-0 1/1 Running 6 14h face::3 controller-0 <none> <none>
kube-controller-manager-controller-1 1/1 Running 2 14h face::4 controller-1 <none> <none>
kube-multus-ds-amd64-5l4dj 1/1 Running 0 30m face::29dd:f3c5:7eb9:1d21 compute-1 <none> <none>
kube-multus-ds-amd64-9n2cj 1/1 Running 0 30m face::d299:e6a8:7070:7bb0 compute-0 <none> <none>
kube-multus-ds-amd64-jltqk 1/1 Running 1 8h face::4 controller-1 <none> <none>
kube-multus-ds-amd64-n5hb8 1/1 Running 0 30m face::fccf:ae0b:600a:f8a4 compute-2 <none> <none>
kube-multus-ds-amd64-p8zsf 1/1 Running 1 7h42m face::3 controller-0 <none> <none>
kube-proxy-8mc4r 1/1 Running 3 14h face::3 controller-0 <none> <none>
kube-proxy-b72qz 1/1 Running 2 13h face::d299:e6a8:7070:7bb0 compute-0 <none> <none>
kube-proxy-g8k8n 1/1 Running 2 13h face::29dd:f3c5:7eb9:1d21 compute-1 <none> <none>
kube-proxy-gbvsx 1/1 Running 2 14h face::4 controller-1 <none> <none>
kube-proxy-l5qxx 1/1 Running 1 13h face::fccf:ae0b:600a:f8a4 compute-2 <none> <none>
kube-scheduler-controller-0 1/1 Running 6 14h face::3 controller-0 <none> <none>
kube-scheduler-controller-1 1/1 Running 2 14h face::4 controller-1 <none> <none>
kube-sriov-cni-ds-amd64-9hbm8 1/1 Running 0 30m face::fccf:ae0b:600a:f8a4 compute-2 <none> <none>
kube-sriov-cni-ds-amd64-9q8zk 1/1 Running 0 30m face::29dd:f3c5:7eb9:1d21 compute-1 <none> <none>
kube-sriov-cni-ds-amd64-d8fhg 1/1 Running 1 7h42m face::3 controller-0 <none> <none>
kube-sriov-cni-ds-amd64-dkm9z 1/1 Running 1 8h face::4 controller-1 <none> <none>
kube-sriov-cni-ds-amd64-hn5qc 1/1 Running 0 30m face::d299:e6a8:7070:7bb0 compute-0 <none> <none>
rbd-provisioner-65db585fd6-6w5r4 1/1 Running 4 8h dead:beef::a4ce:fec1:5423:e333 controller-1 <none> <none>
rbd-provisioner-65db585fd6-7kz9p 1/1 Running 3 7h49m dead:beef::8e22:765f:6121:eb4b controller-0 <none> <none>
tiller-deploy-7855f54f57-lrc6b 1/1 Running 1 7h49m face::4 controller-1 <none> <none>
[sysadmin@controller-0 ~(keystone_admin)]$
Severity
--------
Major
Steps to Reproduce
------------------
1. Verify health of the system. Verify for any alarms.
2. Power off all the nodes for 60 seconds
3. Power on all the nodes.
4. Verify nodes are up and available .
5. Verify pods are up .As description calico-node pods crashLoopBackoff
System Configuration
--------------------
Regular system with IPv6 configuration
Expected Behavior
------------------
No crash on calico-node pods

Actual Behavior
----------------
As per description calico-node pods crash.
Reproducibility
---------------
100% reproduce able. Seen in 2 different IPv6 labs: WCP71-75 & WCP63-66
Not yet tested with IPv4

Load
----
Build was on 2019-09-16_14-18-20
Last Pass
---------
Timestamp/Logs
--------------
2019-09-17T21:02:06
Test Activity
-------------
Regression test

See original description

Tags:

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-09-20:

Marking as stx.3.0 gating - system doesn't fully recover after power outage test

description:	updated
tags:	added: stx.3.0 stx.containers stx.networking
Changed in starlingx:
importance:	Undecided → High
assignee:	nobody → Joseph Richard (josephrichard)
status:	New → Triaged

Revision history for this message

Anujeyan Manokeran (anujeyan) wrote on 2019-09-20:

DOR test was done on IP-33-36 (IPV4 configuration) this issue was not seen .

Yang Liu (yliu12) on 2019-10-02

tags:

added: stx.retestneeded

Revision history for this message

Joseph Richard (josephrichard) wrote on 2019-10-04:

Were logs collected?

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-10-08:

Marking as Incomplete as we need the logs to investigate this issue further.

Changed in starlingx:
status:	Triaged → Incomplete

Revision history for this message

Anujeyan Manokeran (anujeyan) wrote on 2019-10-08:

collect logs Edit (86.1 MiB, application/x-tar)

Revision history for this message

Joseph Richard (josephrichard) wrote on 2019-10-09:

From the logs:
{"log":"2019-09-17 21:04:19.272 [INFO][8] startup.go 331: Hit error connecting to datastore - retry error=Get https://[fd04::1]:443/api/v1/nodes/foo: dial tcp [fd04::1]:443: connect: network is unreachable\n","stream":"stdout","time":"2019-09-17T21:04:19.272839647Z"}

Looks to be a duplicate of https://bugs.launchpad.net/starlingx/+bug/1844192, which is fixed in the latest load.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-10-09:

Duplicate bug is addressed by: https://review.opendev.org/685112
Merged on 2019-10-01

Marking as Fix Released

Changed in starlingx:
status:	Incomplete → Fix Released

Revision history for this message

Anujeyan Manokeran (anujeyan) wrote on 2019-10-23:

Re-testifailure logs Edit (79.3 MiB, application/x-tar)

Revision history for this message

Anujeyan Manokeran (anujeyan) wrote on 2019-10-23:

#10

DOR Verification failed this issue was reproduced in load 2019-10-21_20-00-00 in lab wcp71-75

Revision history for this message

Yang Liu (yliu12) wrote on 2019-10-23:

#11

Removed the dup link and set this LP to confirmed as per Jeyan's comments.
Joseph also confirmed that after DOR, default route was missing from compute despite accept_ra being off.

Changed in starlingx:
status:	Fix Released → Confirmed

Revision history for this message

Joseph Richard (josephrichard) wrote on 2019-10-23:

#12

In the retest failures, dhcp fails on the compute node, and then retries in the background. Unfortunately, this results in the ifup-post not being run, and so the route6 not being run either.

This is unrelated to the issue with accept_ra (which also caused the default route to be missing) that was seen previously, but the calico logs appear the same, as they are both missing the default route.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-10-31:

#13

The workaround for this issue is lock/unlock the worker nodes where the failed pods are residing.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-01: Fix proposed to integ (master)

#14

Fix proposed to branch: master
Review: https://review.opendev.org/692606

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-19: Fix merged to integ (master)

#15

Reviewed: https://review.opendev.org/692606
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=8a3722089d821573ce6766e6fa40d78a0fdbaded
Submitter: Zuul
Branch: master

commit 8a3722089d821573ce6766e6fa40d78a0fdbaded
Author: Joseph Richard <email address hidden>
Date: Tue Oct 29 10:54:58 2019 -0400

Drop initscripts patch running ipv6 dhcp as daemon

This commit rebases initscripts patch set, dropping
run-dhclient-as-daemon-for-ipv6.patch

    Currently, ifup-eth tries running ipv6 dhclient with the one-shot
    option, and if that fails, then retries indefinitely in the background.
    That has the side-effect of causing the ifup-post script to not be run
    if the first dhclient attempt fails, which will prevent routes on that
    interface from being created. This is especially problematic in the
    case of a DOR, where the compute nodes may come up before dnsmasq is up
    on the controller.
    This is different from upstream centos, which will only try running
    dhclient with the one-shot option for ipv6.
    By reverting the initscripts patch to run as a daemon, ipv6 dhclient now
    runs as one-shot only, and if it fails, ifup-eth script exits without
    getting an address, and then the node fails to come up and reboot.
    While this may result in the compute node having an extra reboot in a DOR,
    that is preferable to the compute coming up incorrectly and requiring a
    lock/unlock to recover.

    Closes-bug: 1844579
    Change-Id: I5b7f6b7c878dc4e4737d986f11fae3301585fb1c
    Signed-off-by: Joseph Richard <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Elio Martinez (elio1979) wrote on 2019-11-28:

#16

This problem is not present with IPV4 environment.

calico-node-8qvsg 1/1 Running 6 5d1h 10.10.70.11 controller-0 <none> <none>
calico-node-lwscd 1/1 Running 4 4d15h 10.10.70.12 controller-1 <none> <none>

###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.09"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20191122T023000Z"

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-12-11: Fix proposed to integ (f/centos8)

#17

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/698561

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-12-12: Fix merged to integ (f/centos8)

#18

Download full text (7.1 KiB)

Reviewed: https://review.opendev.org/698561
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=9035cd1be8aa3138691c6c99219030dfbe77ebaf
Submitter: Zuul
Branch: f/centos8

commit 4aa661ce5666220d6beb2a3a3fac987cba4feb74
Author: Martin, Chen <email address hidden>
Date: Thu Nov 21 10:28:13 2019 +0800

    Build layering
    Rebase tarball for i40e Driver
    Rebase srpm for systemd 219-67.el7
    Rebase srpm for sudo
    Rebase srpm for ntp

    Depends-On: https://review.opendev.org/#/c/695061/
    Depends-On: https://review.opendev.org/#/c/695560/
    Depends-On: https://review.opendev.org/#/c/695637/
    Depends-On: https://review.opendev.org/#/c/695983/

Story: 2006166
Task: 37570

Change-Id: I7f33e0fb1319df3421318c4927d2a5675a490273
Signed-off-by: Martin, Chen <email address hidden>

commit 5d854355d873702b78ff6aa8c6fddc025c45be2d
Author: Jim Somerville <email address hidden>
Date: Mon Nov 25 16:07:17 2019 -0500

Uprev ntp to version 4.2.6p5-29.el7

    This solves:
    ntp: Stack-based buffer overflow in ntpq and ntpdc allows
    denial of service or code execution (CVE-2018-12327)

See the announcement link:

https://lists.centos.org/pipermail/centos-cr-announce/2019-August/006016.html

for more details.

    Here we refresh the meta patches and correct the crime of
    "name of patch file differs from git format-patch". We
    also clean up the commit short logs.

    Change-Id: I263465d85f06096296fdd478a302eb110ab1259c
    Closes-Bug: 1849197
    Depends-On: https://review.opendev.org/#/c/695983
    Signed-off-by: Jim Somerville <email address hidden>

commit 11fd5d9cd48a1539b9c7a4ebc8aaad69ed24ae5b
Author: Dan Voiculeasa <email address hidden>
Date: Thu Nov 21 15:01:36 2019 +0200

ceph-init-wrapper: Detect stuck peering OSDs and restart them

OSDs might become stuck peering.
Recover from such state.

Closes-bug: 1851287

Change-Id: I2ef1a0e93d38c3d041ee0c5c1e66a4ac42785a68
Signed-off-by: Dan Voiculeasa <email address hidden>

commit f30cb74fef4b97721010ca9bc6a6b6dde03c4add
Author: Robin Lu <email address hidden>
Date: Fri Nov 22 11:01:27 2019 +0800

Update sudo srpm patch for CVE bug

    To fix below CVE, we will use sudo-1.8.23-4.el7_7.1.src.rpm
    And we have to update some patches according to new srpm.
    https://lists.centos.org/pipermail/centos-announce/2019-October/023499.html

CVE bug: CVE-2019-14287: sudo: can bypass certain policy blacklists

    Closes-Bug: 1852825
    Depends-On: https://review.opendev.org/#/c/695637/
    Change-Id: Ifc0a3423464fafce06cd504d9b427fc3433fb756
    Signed-off-by: Robin Lu <email address hidden>

commit 0231aba5cdcb96b15106591acfff280159050366
Author: Jim Somerville <email address hidden>
Date: Thu Nov 21 15:54:15 2019 -0500

Uprev systemd to version 219-67.el7

    This solves:
    systemd: line splitting via fgets() allows for state injection
    during daemon-reexec (CVE-2018-15686)

along with some other less critical issues. See the security
announcement link:

...

Reviewed:  https://review.opendev.org/698561
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=9035cd1be8aa3138691c6c99219030dfbe77ebaf
Submitter: Zuul
Branch:    f/centos8

commit 4aa661ce5666220d6beb2a3a3fac987cba4feb74
Author: Martin, Chen <haochuan.z.chen@intel.com>
Date:   Thu Nov 21 10:28:13 2019 +0800

Build layering
    Rebase tarball for i40e Driver
    Rebase srpm for systemd 219-67.el7
    Rebase srpm for sudo
    Rebase srpm for ntp
    
    Depends-On: https://review.opendev.org/#/c/695061/
    Depends-On: https://review.opendev.org/#/c/695560/
    Depends-On: https://review.opendev.org/#/c/695637/
    Depends-On: https://review.opendev.org/#/c/695983/
    
    Story: 2006166
    Task: 37570
    
    Change-Id: I7f33e0fb1319df3421318c4927d2a5675a490273
    Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>

commit 5d854355d873702b78ff6aa8c6fddc025c45be2d
Author: Jim Somerville <Jim.Somerville@windriver.com>
Date:   Mon Nov 25 16:07:17 2019 -0500

Uprev ntp to version 4.2.6p5-29.el7
    
    This solves:
    ntp: Stack-based buffer overflow in ntpq and ntpdc allows
    denial of service or code execution (CVE-2018-12327)
    
    See the announcement link:
    
    https://lists.centos.org/pipermail/centos-cr-announce/2019-August/006016.html
    
    for more details.
    
    Here we refresh the meta patches and correct the crime of
    "name of patch file differs from git format-patch".  We
    also clean up the commit short logs.
    
    Change-Id: I263465d85f06096296fdd478a302eb110ab1259c
    Closes-Bug: 1849197
    Depends-On: https://review.opendev.org/#/c/695983
    Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>

commit 11fd5d9cd48a1539b9c7a4ebc8aaad69ed24ae5b
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu Nov 21 15:01:36 2019 +0200

ceph-init-wrapper: Detect stuck peering OSDs and restart them
    
    OSDs might become stuck peering.
    Recover from such state.
    
    Closes-bug: 1851287
    
    Change-Id: I2ef1a0e93d38c3d041ee0c5c1e66a4ac42785a68
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit f30cb74fef4b97721010ca9bc6a6b6dde03c4add
Author: Robin Lu <bin1.lu@intel.com>
Date:   Fri Nov 22 11:01:27 2019 +0800

Update sudo srpm patch for CVE bug
    
    To fix below CVE, we will use sudo-1.8.23-4.el7_7.1.src.rpm
    And we have to update some patches according to new srpm.
    https://lists.centos.org/pipermail/centos-announce/2019-October/023499.html
    
    CVE bug: CVE-2019-14287: sudo: can bypass certain policy blacklists
    
    Closes-Bug: 1852825
    Depends-On: https://review.opendev.org/#/c/695637/
    Change-Id: Ifc0a3423464fafce06cd504d9b427fc3433fb756
    Signed-off-by: Robin Lu <bin1.lu@intel.com>

commit 0231aba5cdcb96b15106591acfff280159050366
Author: Jim Somerville <Jim.Somerville@windriver.com>
Date:   Thu Nov 21 15:54:15 2019 -0500

Uprev systemd to version 219-67.el7
    
    This solves:
    systemd: line splitting via fgets() allows for state injection
    during daemon-reexec (CVE-2018-15686)
    
    along with some other less critical issues.  See the security
    announcement link:
    
    https://lists.centos.org/pipermail/centos-cr-announce/2019-August/006149.html
    
    for more details.
    
    Here we rebase the patches, and fix the atrocious crime of "name of patch file
    doesn't match what git format-patch generates".  We also squash down the
    meta patches which add the patches to the spec file as part of
    good housekeeping.
    
    Change-Id: I01a3fa329bbad541a063cb604d1756892139967f
    Closes-Bug: 1849200
    Depends-On: https://review.opendev.org/#/c/695560
    Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>

commit 2718976ddc5c272a97cc60651fe5d63a9a037406
Author: Jim Somerville <Jim.Somerville@windriver.com>
Date:   Tue Oct 29 15:39:05 2019 -0400

i40e Driver Upgrade in support of N3000 on-board NICs
    
    Uprev i40e to version 2.10.19.30
    i40evf gets replaced by iavf version 3.7.61.20
    
    The iavf driver supports both fortville and columbiaville,
    so they decided to rename from i40evf to something more generic.
    
    We get to drop the patch which polls for coming out of
    reset as it was incorporated upstream.
    
    The Intel FPGA Programmable Acceleration Card N3000 contains
    dual Intel XL710 NICs and an FPGA for acceleration purposes.
    This driver upgrade is required to support those NICs.
    
    Change-Id: Ifbec94bcc00a8cce9fe97bf0eb41556b8bd3e592
    Story: 2006740
    Task: 37542
    Depends-On: https://review.opendev.org/#/c/695061
    Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>

commit 8a3722089d821573ce6766e6fa40d78a0fdbaded
Author: Joseph Richard <joseph.richard@windriver.com>
Date:   Tue Oct 29 10:54:58 2019 -0400

Drop initscripts patch running ipv6 dhcp as daemon
    
    This commit rebases initscripts patch set, dropping
    run-dhclient-as-daemon-for-ipv6.patch
    
    Currently, ifup-eth tries running ipv6 dhclient with the one-shot
    option, and if that fails, then retries indefinitely in the background.
    That has the side-effect of causing the ifup-post script to not be run
    if the first dhclient attempt fails, which will prevent routes on that
    interface from being created.  This is especially problematic in the
    case of a DOR, where the compute nodes may come up before dnsmasq is up
    on the controller.
    This is different from upstream centos, which will only try running
    dhclient with the one-shot option for ipv6.
    By reverting the initscripts patch to run as a daemon, ipv6 dhclient now
    runs as one-shot only, and if it fails, ifup-eth script exits without
    getting an address, and then the node fails to come up and reboot.
    While this may result in the compute node having an extra reboot in a DOR,
    that is preferable to the compute coming up incorrectly and requiring a
    lock/unlock to recover.
    
    Closes-bug: 1844579
    Change-Id: I5b7f6b7c878dc4e4737d986f11fae3301585fb1c
    Signed-off-by: Joseph Richard <joseph.richard@windriver.com>

commit 5afd5f90b29f6e097824f7c6f2fe7762597d9ad6
Author: Andy Ning <andy.ning@windriver.com>
Date:   Tue Nov 5 00:12:26 2019 -0500

update Barbican admin secret's user/project IDs during bootstrap
    
    In a DC system when subcloud is managed, keystone user/project IDs are
    synced with Central Cloud, including admin user and project. But the
    admin's secrets in Barbian still use the original user/project IDs,
    causing docker registry access failure when platform-integ-apps is
    reapplied.
    
    This change added a patch to keystone puppet manifest, that updates
    keystone admin user/project IDs to be the same as Central Cloud right
    after keystone is bootstrapped during subcloud deployment. This way any
    referece to admin user/project IDs after bootstrap will be using the
    IDs same as Central Cloud, including the ones in Barbican. This will
    solve the problem of retrieving central registry credential failure
    when platform-integ-apps is reapplied.
    
    Change-Id: I509a06b4b810620a1b3648837726f7f2771162a5
    Closes-Bug: 1851247
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

tags:

added: in-f-centos8

Revision history for this message

ayyappa (mantri425) wrote on 2019-12-13:

#19

Tested in ipv6 standard and ipv4 aio-duplex plus labs, issue is not seen in the latest build
SW_VERSION="19.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2019-12-11_21-20-16"
SRC_BUILD_ID="18"

tags:

removed: stx.retestneeded

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-01-15:

#20

@Joseph, please cherrypick this fix to the r/stx.3.0 branch at your earliest convenience.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.