stx-openstack failed to apply. Error while installing release osh-openstack-pci-irq-affinity-agent

Bug #1959558 reported by Alexandru Dimofte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Unassigned

Bug Description

Brief Description
-----------------
stx-openstack failed to apply. Error while installing release osh-openstack-pci-irq-affinity-agent
750.002 | Application Apply Failure | k8s_application=stx-openstack | major | 2022-01-29T12:22:11.913811
All configurations are affected.

Severity
--------
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
Try to install latest master image: 20220129T040220Z. It will fail during provision step, stx-openstack will not apply.

Expected Behavior
------------------
StarlingX should work properly.

Actual Behavior
----------------
It fails at the provisioning step. We can observe an alarm too:
750.002 | Application Apply Failure | k8s_application=stx-openstack | major | 2022-01-29T12:22:11.913811

and extracted from the logs:
cat /var/log/armada/stx-openstack-apply_2022-01-29-11-52-07.log
..
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller [-] [chart=openstack-pci-irq-affinity-agent]: Error while installing release osh-openstack-pci-irq-affinity-agent: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "release osh-openstack-pci-irq-affinity-agent failed: timed out waiting for the condition"
        debug_error_string = "{"created":"@1643458931.480579891","description":"Error received from peer ipv4:127.0.0.1:24134","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"release osh-openstack-pci-irq-affinity-agent failed: timed out waiting for the condition","grpc_status":2}"
>
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller Traceback (most recent call last):
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 466, in install_release
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller metadata=self.metadata)
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 946, in __call__
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller return _end_unary_response_blocking(state, call, False, None)
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller raise _InactiveRpcError(state)
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller status = StatusCode.UNKNOWN
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller details = "release osh-openstack-pci-irq-affinity-agent failed: timed out waiting for the condition"
2022-01-29 12:22:11.480 682 ERROR armada.handlers.tiller debug_error_string = "{"created":"@1643458931.480579891","description":"Error received from peer ipv4:127.0.0.1:24134","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"release osh-openstack-pci-irq-affinity-agent failed: timed out waiting for the condition","grpc_status":2}"

..

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
One node system, Two node system, Multi-node system, Dedicated storage

Branch/Pull Time/Commit
-----------------------
master branch: 20220129T040220Z

Last Pass
---------
20220128T035225Z

Timestamp/Logs
--------------
will be attached.

Test Activity
-------------
Sanity

Workaround
----------
-

Revision history for this message
Alexandru Dimofte (adimofte) wrote :
Revision history for this message
Thiago Ribeiro Carvalho (trcarvalh) wrote :

Hi Alexandru - This issue seems duplicate of https://bugs.launchpad.net/starlingx/+bug/1959480 (fix already submitted for review). Can you please double-check if possible?

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

It is true that I can see that issue Thiago.
  Normal Pulled 60m (x447 over 2d) kubelet, compute-0 Container image "registry.local:9001/docker.io/starlingx/stx-pci-irq-affinity-agent:master-centos-stable-20220128T165659Z.0 │
│ " already present on machine │
│ Warning Unhealthy 36m (x450 over 2d) kubelet, compute-0 Readiness probe failed: Traceback (most recent call last): │
│ File "/tmp/health-probe.py", line 29, in <module> │
│ from pci_irq_affinity.nova_provider import novaClient │
│ ImportError: cannot import name novaClient │
│ Warning Unhealthy 6m28s (x1364 over 2d) kubelet, compute-0 Liveness probe failed: Traceback (most recent call last): │
│ File "/tmp/health-probe.py", line 29, in <module> │
│ from pci_irq_affinity.nova_provider import novaClient │
│ ImportError: cannot import name novaClient │
│ Warning BackOff 82s (x5447 over 47h) kubelet, compute-0 Back-off restarting failed container

Probably this is the reason why stx-openstack fails to apply. When the fix will be merged and integrated, after testing I will confirm if it fixes this bug too(I think you are right, we'll see next days). Thanks!

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Fixing the status since this LP was resolved via the duplicate.

Changed in starlingx:
status: New → Fix Released
importance: Undecided → High
tags: added: stx.7.0 stx.distro.openstack
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.