Applying stx-openstack on all configurations, stuck at 61% or 64% and provisioning terminate with timeout

Bug #1908117 reported by Alexandru Dimofte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
chen haochuan

Bug Description

Brief Description
-----------------
During the execution of the Daily Sanity, I observed some strange behavior on all configurations.
It happened in the past sporadically but now it seems to affect 100% the image:
http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/flock/20201212T040027Z/
At the provision step, trying to apply the stx-openstack it remains blocked at 61% on Simplex, Duplex and Standard, and at 64% on Standard External.
Because of this problem the provision step will exit with timeout like:

 04:18:34 [2020-12-14T02:18:34.752Z] ==============================================================================
04:18:34 [2020-12-14T02:18:34.752Z] Provision :: Tests for provisioning and unlocking controllers, computes and
04:18:34 [2020-12-14T02:18:34.752Z] ==============================================================================
09:17:25 Cancelling nested steps due to timeout
09:17:25 [2020-12-14T07:17:25.168Z] Provisioning Standard Non-Storage System :: Validates provisioning... Sending interrupt signal to process
09:17:37 [2020-12-14T07:17:37.375Z] Terminated

Severity
--------
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
For me happens always when I try to install the STX at the Provision step.

Expected Behavior
------------------
stx-openstack should apply normally.

Actual Behavior
----------------
Applying stx-openstack, remains blocked at 61% or 64%.

Reproducibility
---------------

100% reproducible on latest images(http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/flock/20201212T040027Z/).

System Configuration
--------------------
All configurations are affected.

Branch/Pull Time/Commit
-----------------------
Master Branch.

Last Pass
---------
It passed on 11-DEC-2020 but I also tried several times to install.

Timestamp/Logs
--------------
S1_SIMPLEX:
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
| cert-manager | 1.0-6 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-29 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-9 | platform-integration-manifest | manifest.yaml | applied | completed |
| rook-ceph-apps | app-version-placeholder | manifest-placeholder | tarfile-placeholder | upload-failed | None |
| stx-openstack | 1.0-69-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applying | processing chart: osh-openstack-horizon, overall completion: 61.0% |
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

S1_DUPLEX:
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
| cert-manager | 1.0-6 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-29 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-9 | platform-integration-manifest | manifest.yaml | applied | completed |
| rook-ceph-apps | app-version-placeholder | manifest-placeholder | tarfile-placeholder | upload-failed | None |
| stx-openstack | 1.0-69-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applying | processing chart: osh-openstack-horizon, overall completion: 61.0% |
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

S1_STANDARD:
controller-0:~$ . /etc/platform/openrc
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+------------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+------------------------------------------------------------------------+
| cert-manager | 1.0-6 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-29 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-9 | platform-integration-manifest | manifest.yaml | applied | completed |
| rook-ceph-apps | app-version-placeholder | manifest-placeholder | tarfile-placeholder | upload-failed | None |
| stx-openstack | 1.0-69-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applying | processing chart: osh-openstack-fm-rest-api, overall completion: 61.0% |
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+------------------------------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

S1_STANDARD_EXT:
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
| cert-manager | 1.0-6 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-29 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-9 | platform-integration-manifest | manifest.yaml | applied | completed |
| rook-ceph-apps | app-version-placeholder | manifest-placeholder | tarfile-placeholder | upload-failed | None |
| stx-openstack | 1.0-69-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applying | processing chart: osh-openstack-horizon, overall completion: 64.0% |
+--------------------------+--------------------------------+-----------------------------------+----------------------------------------+---------------+--------------------------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

Test Activity
-------------
Sanity

Workaround
----------
-

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

I also observed this issue:
020-12-15 09:38:34.230 333 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 252, in sync
2020-12-15 09:38:34.230 333 ERROR armada.cli raise armada_exceptions.ChartDeployException(failures)
2020-12-15 09:38:34.230 333 ERROR armada.cli armada.exceptions.armada_exceptions.ChartDeployException: Exception deploying charts: ['openstack-horizon']
2020-12-15 09:38:34.230 333 ERROR armada.cli 
command terminated with exit code 1

See attached log.

Revision history for this message
Nicolae Jascanu (njascanu-intel) wrote :
Download full text (5.5 KiB)

On the image from "20201218T192132Z", we are seeing the same openstack-horizon error:

2020-12-22 14:03:00.231 457 ERROR armada.handlers.wait [-] [chart=openstack-horizon]: Timed out waiting for jobs (namespace=openstack, labels=(release_group=osh-openstack-horizon)). These jobs were not ready=['horizon-db-sync']
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada [-] Chart deploy [openstack-horizon] failed: armada.exceptions.k8s_exceptions.KubernetesWatchTimeoutException: Timed out waiting for jobs (namespace=openstack, labels=(release_group=osh-openstack-horizon)). These jobs were not ready=['horizon-db-sync']
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada Traceback (most recent call last):
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 225, in handle_result
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada result = get_result()
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada return self.__get_result()
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada raise self._exception
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada result = self.fn(*self.args, **self.kwargs)
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 214, in deploy_chart
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada chart, cg_test_all_charts, prefix, known_releases)
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/chart_deploy.py", line 248, in execute
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada chart_wait.wait(timer)
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py", line 134, in wait
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada wait.wait(timeout=timeout)
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py", line 294, in wait
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada modified = self._wait(deadline)
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py", line 354, in _wait
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada raise k8s_exceptions.KubernetesWatchTimeoutException(error)
2020-12-22 14:03:00.232 457 ERROR armada.handlers.armada armada.exceptions.k8s_exceptions.KubernetesWatchTimeoutException: Timed out waiting for jobs (namespace=openstack, labels=(release_group=osh-openstack-horizon)). These jobs were not ready=['horizon-db-sync']
2020-12-22 14:03:00.232 457 ERROR a...

Read more...

Revision history for this message
chen haochuan (martin1982) wrote :
Download full text (4.3 KiB)

pod launch fail with such error

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl logs -n openstack horizon-db-sync-s7vtg -c horizon-db-sync
++ python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())'
+ SITE_PACKAGES_ROOT=/var/lib/openstack/lib/python3.6/site-packages
+ rm -f /var/lib/openstack/lib/python3.6/site-packages/openstack_dashboard/local/local_settings.py
+ ln -s /etc/openstack-dashboard/local_settings /var/lib/openstack/lib/python3.6/site-packages/openstack_dashboard/local/local_settings.py
+ exec /tmp/manage.py migrate --noinput
/var/lib/openstack/lib64/python3.6/site-packages/scss/compiler.py:1430: DeprecationWarning: invalid escape sequence \:
  result = tb * (i + nesting) + "@media -sass-debug-info{filename{font-family:file\:\/\/%s}line{font-family:\\00003%s}}" % (filename, lineno) + nl
/var/lib/openstack/lib64/python3.6/site-packages/scss/cssdefs.py:516: DeprecationWarning: invalid escape sequence \s
  ''', re.VERBOSE)
/var/lib/openstack/lib64/python3.6/site-packages/scss/namespace.py:172: DeprecationWarning: inspect.getargspec() is deprecated since Python 3.0, use inspect.signature() or inspect.getfullargspec()
  argspec = inspect.getargspec(function)
Traceback (most recent call last):
  File "/tmp/manage.py", line 19, in <module>
    execute_from_command_line(sys.argv)
  File "/var/lib/openstack/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/openstack/lib/python3.6/site-packages/django/core/management/__init__.py", line 357, in execute
    django.setup()
  File "/var/lib/openstack/lib/python3.6/site-packages/django/__init__.py", line 24, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "/var/lib/openstack/lib/python3.6/site-packages/django/apps/registry.py", line 114, in populate
    app_config.import_models()
  File "/var/lib/openstack/lib/python3.6/site-packages/django/apps/config.py", line 211, in import_models
    self.models_module = import_module(models_module_name)
  File "/usr/lib64/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/var/lib/openstack/lib/python3.6/site-packages/django/contrib/auth/models.py", line 2, in <module>
    from django.contrib.auth.base_user import AbstractBaseUser, BaseUserManager
  File "/var/lib/openstack/lib/python3.6/site-packages/django/contrib/auth/base_user.py", line 47, in <module>
    class AbstractBaseUser(models.Model):
  File "/var/lib/openstack/lib/python3.6/site-packages/django/db/models/base.py", line 117, in __new__
    new_class.add_to_class('_meta', Options(meta, app_label))
  File "/var/lib/openstack/lib/python3.6/site-packages/django/...

Read more...

Changed in starlingx:
assignee: nobody → chen haochuan (martin1982)
Revision history for this message
zhipeng liu (zhipengs) wrote :

The same issue as below link
https://stackoverflow.com/questions/55657752/django-installing-mysqlclient-error-mysqlclient-1-3-13-or-newer-is-required

We are using pymysql instead of mysqlclient, right?
However, 0.9.3 is just the latest version of pymysql.
Now, it report below error, refer to
django.core.exceptions.ImproperlyConfigured: mysqlclient 1.3.13 or newer is required; you have 0.9.3
We could not upgrade pymysql, that's why we lock down django to 2.1.5 version during ussuri upgrade.

Martin,
You might check the proposal from Don, or you can find better solution for it.
https://bugs.launchpad.net/starlingx/+bug/1907290/comments/3

Thanks!
Zhipeng

Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Changed in starlingx:
importance: Undecided → Critical
status: New → Triaged
tags: added: stx.5.0
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Critical since this has been causing red sanities since Dec 14.

Revision history for this message
chen haochuan (martin1982) wrote :
Revision history for this message
Alexandru Dimofte (adimofte) wrote :

Your gerrit change was merged Martin, new docker images were generated on 11th of January and the first master flock iso image was 20210112T000255Z which is GREEN again. I guess this bug can be closed now. Thank you!

Revision history for this message
Austin Sun (sunausti) wrote :

Thanks Martin and Alexandru.

https://review.opendev.org/c/starlingx/upstream/+/769662

changed to Fix released.

Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to upstream (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/upstream/+/792215

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to upstream (f/centos8)
Download full text (5.1 KiB)

Reviewed: https://review.opendev.org/c/starlingx/upstream/+/792215
Committed: https://opendev.org/starlingx/upstream/commit/ab2f84da41b6865e6db05ce0e24bc1d4e0e379ae
Submitter: "Zuul (22348)"
Branch: f/centos8

commit a4046414b634e027f646be58502f3af2ea2329f0
Author: Andy Ning <email address hidden>
Date: Mon Apr 26 16:22:26 2021 -0400

    Enforce "cannot reuse the last 2 passwords" for ks users

    Currently the "unique_last_password_count" attribute in keystone
    configuration is set to "2", which enforces "cannot reuse the last
    1 passwords" in history instead of "cannot reuse the last 2 passwords"
    stated in security document.

    This update changed "unique_last_password_count" attribute to "3" so
    that keystone users password change rule complies with the document.

    Closes-Bug: 1924772
    Change-Id: I6a2de54336c7253022d49ecb118a315a7825c889
    Signed-off-by: Andy Ning <email address hidden>

commit 341eb6980c3a290f3633616bef0f32152a51b41f
Author: Daniel Pereira <email address hidden>
Date: Wed Feb 10 15:44:28 2021 -0300

    Update cinder directives build file

    Currently, cinder docker image doesn't contain nfs mount helpers
    installed, so trying to mount a NFS volume on a cinder-backup
    container fails.
    In order to enable support for NFS backend on cinder-backup, we
    need to install the nfs-utils package on cinder image, so that
    cinder-backup is able to mount NFS volumes.

    Task: 41796
    Story: 2008613
    Change-Id: Ib8e4675069292dc43f98ff55c25626a19ed37b12
    Signed-off-by: Daniel Pereira <email address hidden>

commit d7573c28f9257280239b37985f142cfd416e443c
Author: Chen, Haochuan Z <email address hidden>
Date: Thu Jan 7 13:33:22 2021 +0800

    WA to fix mysqlclient version conflict with Django

    https://stackoverflow.com/questions/55657752/django-installing-mysqlclient-error-mysqlclient-1-3-13-or-newer-is-required
    Fix with guide from stackoverflow. After openstack image
    upgrade to ussuri, Django upgrade to 2.2, which request
    mysqlclient newer than 1.3.13, conflict with version
    0.9.3 in current image. Fix with WA in above link, and
    currently we use pymysql not mysql.

    Closes-Bug: 1908117

    Change-Id: Ic7054c6736993394d92bb0aec25397fd22f84d31
    Signed-off-by: Chen, Haochuan Z <email address hidden>

commit 4a545ec5844cc24a942b5eafd90dfa69ff68a921
Author: Don Penney <email address hidden>
Date: Thu Dec 17 13:21:18 2020 -0500

    Add auto-version for remaining stx/upstream packages

    Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
    use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
    the version is incremented above the hardcoded version.

    Story: 2008455
    Task: 41458
    Signed-off-by: Don Penney <email address hidden>
    Change-Id: Iaf71fdb3f9c79573ef64f6c82b1a2120d224d959

commit e96d8b71778413710cd369cc32c1f2a9ee95e986
Author: Zhipeng Liu <email address hidden>
Date: Fri Jul 10 18:37:25 2020 +0800

    Fix gnocchi-api could not start up issue

    After using python3 to build image, need change related
 ...

Read more...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.