stx-monitor app ipv4 config does not work

Bug #1864193 reported by Kevin Smith
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Kevin Smith

Bug Description

Brief Description
-----------------
In order for IPv6 stx-monitor to work correctly, the "-Djava.net.preferIPv6Addresses=true" Java option override was set for the elasticsearch node pods. This worked for ipv6 and did not break ipv4 configurations so was left as the default for both configurations. At some point in the last month or 2 however, the stx-monitor application stopped working due to elasticsearch cluster discovery problems for ipv4 configurations. It is unknown what the underlying change is that caused the ipv4 config to stop working.

Severity
--------
stx-monitor application will not work in ipv4 configurations

Steps to Reproduce
------------------
Apply the stx-monitor application in an ipv4 config, verify the elasticsearch nodes do not discover each other.

Expected Behavior
------------------
Proper discovery of the elasticsearch cluster and successful apply of the stx-monitor application.

Actual Behavior
----------------
The stx-monitor application does not work for ipv4 configuration.

Reproducibility
---------------
100%

System Configuration
--------------------
IPv4

Branch/Pull Time/Commit
-----------------------
Latest.

Last Pass
---------
Some time in January 2020 IPv4 configs stopped working.

Timestamp/Logs
--------------
N/A

Test Activity
-------------
 Developer Testing

 Workaround
 ----------
 For IPv4 configurations, manually set the overrides for the esJavaOpts for the elasticsearch pods to remove the "-Djava.net.preferIPv6Addresses=true" setting while leaving the others.

Changed in starlingx:
assignee: nobody → Kevin Smith (kevin.smith.wrs)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/709143

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / high priority - issue introduced in recent weeks

tags: added: stx.4.0 stx.monitor
Changed in starlingx:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/709143
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=8e2e5f7e82efde39407d34c1a26daffb97dbe26d
Submitter: Zuul
Branch: master

commit 8e2e5f7e82efde39407d34c1a26daffb97dbe26d
Author: Kevin Smith <email address hidden>
Date: Fri Feb 21 07:56:04 2020 -0500

    Set elasticsearch pod java options according to ip config

    The "-Djava.net.preferIPv6Addresses=true" java option was set
    for both ipv4 and ipv6 configurations which worked fine in both
    configs. At some point recently in ipv4 configurations, the
    stx-monitor application stopped applying successfully due to
    elasticsearch cluster discovery failure. Why the ipv4 failures
    are only recently occurring is unknown, but removal of this
    unnecessary java option for ipv4 eliminates the failures.

    This update will set the above java option for elasticsearch
    pods only if the cluster service network is ipv6.

    Closes-Bug: 1864193

    Change-Id: I2952f1c799b121d0812314156162af7696ebd6b0
    Signed-off-by: Kevin Smith <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (32.3 KiB)

Reviewed: https://review.opendev.org/716137
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=cb4cf4299c2ec10fb2eb03cdee3f6d78a6413089
Submitter: Zuul
Branch: f/centos8

commit 16477935845e1c27b4c9d31743e359b0aa94a948
Author: Steven Webster <email address hidden>
Date: Sat Mar 28 17:19:30 2020 -0400

    Fix SR-IOV runtime manifest apply

    When an SR-IOV interface is configured, the platform's
    network runtime manifest is applied in order to apply the virtual
    function (VF) config and restart the interface. This results in
    sysinv being able to determine and populate the puppet hieradata
    with the virtual function PCI addresses.

    A side effect of the network manifest apply is that potentially
    all platform interfaces may be brought down/up if it is determined
    that their configuration has changed. This will likely be the case
    for a system which configures SR-IOV interfaces before initial
    unlock.

    A few issues have been encountered because of this, with some
    services not behaving well when the interface they are communicating
    over suddenly goes down.

    This commit makes the SR-IOV VF configuration much more targeted
    so that only the operation of setting the desired number of VFs
    is performed.

    Closes-Bug: #1868584
    Depends-On: https://review.opendev.org/715669
    Change-Id: Ie162380d3732eb1b6e9c553362fe68cbc313ae2b
    Signed-off-by: Steven Webster <email address hidden>

commit 45c9fe2d3571574b9e0503af108fe7c1567007db
Author: Zhipeng Liu <email address hidden>
Date: Thu Mar 26 01:58:34 2020 +0800

    Add ipv6 support for novncproxy_base_url.

    For ipv6 address, we need url with below format
    [ip]:port

    Partial-Bug: 1859641

    Change-Id: I01a5cd92deb9e88c2d31bd1e16e5bce1e849fcc7
    Signed-off-by: Zhipeng Liu <email address hidden>

commit d119336b3a3b24d924e000277a37ab0b5f93aae1
Author: Andy Ning <email address hidden>
Date: Mon Mar 23 16:26:21 2020 -0400

    Fix timeout waiting for CA cert install during ansible replay

    During ansible bootstrap replay, the ssl_ca_complete_flag file is
    removed. It expects puppet platform::config::runtime manifest apply
    during system CA certificate install to re-generate it. So this commit
    updated conductor manager to run that puppet manifest even if the CA cert
    has already installed so that the ssl_ca_complete_flag file is created
    and makes ansible replay to continue.

    Change-Id: Ic9051fba9afe5d5a189e2be8c8c2960bdb0d20a4
    Closes-Bug: 1868585
    Signed-off-by: Andy Ning <email address hidden>

commit 24a533d800b2c57b84f1086593fe5f04f95fe906
Author: Zhipeng Liu <email address hidden>
Date: Fri Mar 20 23:10:31 2020 +0800

    Fix rabbitmq could not bind port to ipv6 address issue

    When we use Armada to deploy openstack service for ipv6, rabbitmq
    pod could not start listen on [::]:5672 and [::]:15672.
    For ipv6, we need an override for configuration file.

    Upstream patch link is:
    https://review.opendev.org/#/c/714027/

    Test pass for deploying rabbitmq service on both ipv...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.