ussuri load only installs on simplex

Bug #1886003 reported by ruediger stock
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
zhipeng liu

Bug Description

Brief Description
-----------------
The load from
https://mirror.starlingx.cengn.ca/mirror/starlingx/ussuri/centos/monolithic/20200625T130609Z/outputs/iso/
only installs successfully on simplex. It fails for duplex and multinode setups.

This is also an issue with the master builds as Ussuri was merged on 2020-06-28:
http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/monolithic/20200701T054008Z/outputs/iso/

Severity
--------
Major

Steps to Reproduce
------------------
Setup fails in provisioning - it runs into a timeout when trying to install mariadb.

Expected Behavior
------------------
Provisioning finishes successfully.

Actual Behavior
----------------
Provisioning failes

Reproducibility
---------------
Reproducible for all setup types except Simplex

System Configuration
--------------------
virtual multi node

Branch/Pull Time/Commit
-----------------------
https://mirror.starlingx.cengn.ca/mirror/starlingx/ussuri/centos/monolithic/20200625T130609Z

Last Pass
---------
This was the first "officially" available build containing ussuri. Tested with a developer load from 05/29 before.

Timestamp/Logs
--------------

ERROR sysinv.conductor.kube_app [-] Failed to apply application manifest /manifests/stx-openstack/1.0-42-centos-stable-versioned/stx-openstack-stx-openstack.yaml with exit code 1. See /var/log/armada/stx-openstack-apply_2020-06-30-12-10-08.log for details.

From /var/log/armada/stx*log:
2020-06-30 12:10:11.307 726 INFO armada.handlers.armada [-] Processing ChartGroup: openstack-mariadb (Mariadb), sequenced=True^[[00m
 2020-06-30 12:10:11.307 726 INFO armada.handlers.chart_deploy [-] [chart=openstack-mariadb]: Processing Chart, release=osh-openstack-mariadb^[[00m
 2020-06-30 12:10:11.307 726 DEBUG armada.handlers.wait [-] [chart=openstack-mariadb]: Resolved `wait.resources` list: [{'type': 'job', 'required': False, 'labels': {'release_group': 'osh-openstack-mariadb'}}, {'type': 'pod', 'labels': {'release_group': 'osh-openstack-mariadb'}}] __init__ /usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py:89^[[00m
 2020-06-30 12:10:11.308 726 INFO armada.handlers.chartbuilder [-] [chart=openstack-mariadb]: Building dependency chart helm-toolkit for release openstack-mariadb.^[[00m
 2020-06-30 12:10:11.328 726 INFO armada.handlers.chart_deploy [-] [chart=openstack-mariadb]: Purging release osh-openstack-mariadb with status FAILED^[[00m
 2020-06-30 12:10:11.328 726 INFO armada.handlers.tiller [-] [chart=openstack-mariadb]: Delete osh-openstack-mariadb release with disable_hooks=False, purge=True, timeout=300 flags^[[00m
 2020-06-30 12:10:12.018 726 INFO armada.handlers.chart_deploy [-] [chart=openstack-mariadb]: Installing release osh-openstack-mariadb in namespace openstack, wait=True, timeout=1799s^[[00m
 2020-06-30 12:10:12.027 726 INFO armada.handlers.tiller [-] [chart=openstack-mariadb]: Helm install release: wait=True, timeout=1799^[[00m

 2020-06-30 12:40:11.799 726 ERROR armada.handlers.tiller [-] [chart=openstack-mariadb]: Error while installing release osh-openstack-mariadb: grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
   status = StatusCode.UNKNOWN
   details = "release osh-openstack-mariadb failed: timed out waiting for the condition"
   debug_error_string = "{"created":"@1593520811.799184138","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"release osh-openstack-mariadb failed: timed out waiting for the condition","grpc_status":2}"

Logfiles available at https://files.starlingx.kube.cengn.ca/launchpad/1886003

Test Activity
-------------
Regression Testing

Workaround
----------
None

ruediger stock (rstock)
description: updated
ruediger stock (rstock)
description: updated
Revision history for this message
Nicolae Jascanu (njascanu-intel) wrote :
Download full text (9.1 KiB)

Same error we see on the MONOLITHIC "20200701T054008Z" build.

2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller [-] [chart=openstack-mariadb]: Error while installing release osh-openstack-mariadb: grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "release osh-openstack-mariadb failed: timed out waiting for the condition"
        debug_error_string = "{"created":"@1593614153.871875172","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"release osh-openstack-mariadb failed: timed out waiting for the condition","grpc_status":2}"
>
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller Traceback (most recent call last):
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 473, in install_release
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller metadata=self.metadata)
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 533, in __call__
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller return _end_unary_response_blocking(state, call, False, None)
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller raise _Rendezvous(state, None, None, deadline)
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller status = StatusCode.UNKNOWN
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller details = "release osh-openstack-mariadb failed: timed out waiting for the condition"
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller debug_error_string = "{"created":"@1593614153.871875172","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"release osh-openstack-mariadb failed: timed out waiting for the condition","grpc_status":2}"
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller >
2020-07-01 14:35:53.872 550 ERROR armada.handlers.tiller
2020-07-01 14:35:53.873 550 DEBUG armada.handlers.tiller [-] [chart=openstack-mariadb]: Helm getting release status for release=osh-openstack-mariadb, version=0 get_release_status /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:539
2020-07-01 14:35:54.029 550 DEBUG armada.handlers.tiller [-] [chart=openstack-mariadb]: GetReleaseStatus= name: "osh-openstack-mariadb"
info {
  status {
    code: FAILED
  }
  first_deployed {
    seconds: 1593612353
    nanos: 337681515
  }
  last_deployed {
    seconds: 1593612353
    nanos: 337681515
  }
  Description: "Release \"osh-openstack-mariadb\" failed: timed out waiting for the condition"
}
namespace: "openstack"
 get_release_status /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:547
2020-07-01 14:35:54.030 550 ERROR armada.handlers.armada [-] Chart ...

Read more...

Revision history for this message
zhipeng liu (zhipengs) wrote :
Download full text (5.4 KiB)

From log, the second mariadb node could not join the cluster
2020-07-02T11:48:41.232843694Z stderr F 2020-07-02 11:48:41,231 - OpenStack-Helm Mariadb - INFO - b'2020-07-02 11:48:41 140031810905856 [Note] WSREP: Service thread queue flushed.'
2020-07-02T11:48:41.232850901Z stderr F 2020-07-02 11:48:41,231 - OpenStack-Helm Mariadb - INFO - b'2020-07-02 11:48:41 140031772694272 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (85c6ecb7-bc59-11ea-bc4b-d7ca09aff26c): 1 (Operation not permitted)'
2020-07-02T11:48:41.233984948Z stderr F 2020-07-02 11:48:41,233 - OpenStack-Helm Mariadb - INFO - b'\t at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.'
2020-07-02T11:48:41.234002331Z stderr F 2020-07-02 11:48:41,233 - OpenStack-Helm Mariadb - INFO - b"2020-07-02 11:48:41 140031561815808 [Note] WSREP: Member 1.0 (mariadb-server-1.mariadb-discovery.openstack.svc.cluster.local) requested state transfer from '*any*'. Selected 0.0 (mariadb-server-0.mariadb-discovery.openstack.svc.cluster.local)(SYNCED) as donor."
2020-07-02T11:48:41.711004558Z stderr F 2020-07-02 11:48:41,710 - OpenStack-Helm Mariadb - INFO - b'2020-07-02 11:48:41 140031561815808 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)'
2020-07-02T11:48:41.71172359Z stderr F 2020-07-02 11:48:41,710 - OpenStack-Helm Mariadb - INFO - b'2020-07-02 11:48:41 140031772694272 [Note] WSREP: Requesting state transfer: success, donor: 0'
2020-07-02T11:48:41.711946836Z stderr F 2020-07-02 11:48:41,711 - OpenStack-Helm Mariadb - INFO - b'2020-07-02 11:48:41 140031772694272 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 85c6ecb7-bc59-11ea-bc4b-d7ca09aff26c:0'
2020-07-02T11:48:41.71223561Z stderr F 2020-07-02 11:48:41,712 - OpenStack-Helm Mariadb - INFO - b'WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing (20200702 11:48:41.707)'
2020-07-02T11:48:41.727112835Z stderr F 2020-07-02 11:48:41,726 - OpenStack-Helm Mariadb - INFO - b'WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20200702 11:48:41.722)'
2020-07-02T11:48:41.731457828Z stderr F 2020-07-02 11:48:41,731 - OpenStack-Helm Mariadb - INFO - b'WSREP_SST: [INFO] Proceeding with SST (20200702 11:48:41.720)'
2020-07-02T11:48:41.731603336Z stderr F 2020-07-02 11:48:41,731 - OpenStack-Helm Mariadb - INFO - b'WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20200702 11:48:41.730)'
2020-07-02T11:48:41.767510019Z stderr F 2020-07-02 11:48:41,767 - OpenStack-Helm Mariadb - INFO - b'WSREP_SST: [INFO] Waiting for SST streaming to complete! (20200702 11:48:41.764)'
2020-07-02T11:48:49.549740892Z stderr F 2020-07-02 11:48:49,549 - OpenStack-Helm Mariadb - INFO - Updating grastate configmap
2020-07-02T11:48:51.75744306Z stderr F 2020-07-02 11:48:51,756 - OpenStack-Helm Mariadb - INFO - b'2020-07-02 11:48:51 140031561815808 [Warning] WSREP: 0.0 (mariadb-server-0.mariadb-discovery.openstack.svc.cluster.local): State transfer to 1.0 (mariadb-server-1.ma...

Read more...

Changed in starlingx:
assignee: nobody → zhipeng liu (zhipengs)
Revision history for this message
zhipeng liu (zhipengs) wrote :

I found that we can only set bind_address=:: for ipv6 in mariadb config.
Last ussuri build pass with https://review.opendev.org/#/c/731461/10
So I submitted patch for revert change to 731461/10

https://review.opendev.org/#/c/739046/

Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Changed in starlingx:
importance: Undecided → Critical
description: updated
Revision history for this message
zhipeng liu (zhipengs) wrote :

Test pass on duplex with the EB based on master-20200701T120139Z
mariadb issue fixed and openstack applied successfully.

Zhipeng

zhipeng liu (zhipengs)
Changed in starlingx:
status: In Progress → Fix Committed
status: Fix Committed → Confirmed
Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/739046
Committed: https://git.openstack.org/cgit/starlingx/openstack-armada-app/commit/?id=2496a170fea73385ddfbfe8d7187311df2f2f236
Submitter: Zuul
Branch: master

commit 2496a170fea73385ddfbfe8d7187311df2f2f236
Author: Zhipeng Liu <email address hidden>
Date: Fri Jul 3 06:38:33 2020 +0800

    Fix the second mariadb node could not join cluster issue.

    This bug was introduced by below commit
    d3164c63dc24e0a74cf001a1366cb92dd6a7e396
    The update after PATCH SET 10 will cause the second mariadb could not
    join cluster. In this case, could not set bind_address=:: for ipv4. It
    only works for ipv6.

    As for conf.database.config_override, we can override it through
    system helm-override-update command, but could not use python
    plugin to dynamically override it as it will introduce a "-|" line
    in first line of config file.
    A user override for conf.database.config_override might break the IPv6
    system overrides, it need including ipv6 config for ipv6 case as well.

    Test pass on duplex setup. Openstack application applied successfully.

    Closes-Bug: 1886003

    Change-Id: I23c2fb6a7c8b5a38af1e046894d5fae247df2d6f
    Signed-off-by: Zhipeng Liu <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.