cert-mgr resource creation fails after swact on duplex ipv6 system

Bug #1881967 reported by ayyappa
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Sabeel Ansari

Bug Description

Brief Description
-----------------
The cert-manager resource creation fails after swact on wcp_78_79 ipv6 duplex system

Severity
--------
Major

Steps to Reproduce
------------------
1)Create a stepca issuer
[sysadmin@controller-1 ~(keystone_admin)]$ kubectl get clusterissuers.cert-manager.io
NAME READY AGE
stepca-issuer True 78s

2)create a certificate request with the following yaml file
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: cert-1
spec:
  secretName: cert-1
  issuerRef:
    name: stepca-issuer
    kind: ClusterIssuer
  organization:
  - Windriver
  dnsNames:
  - cgcs-wildcat-78-79.cumulus.wrs.com

3)certificate is issued by stepca
4)swact the active controller
5)wait for swact operation is succesful, now delete the stepca_issuer
[sysadmin@controller-1 ~(keystone_admin)]$ kubectl delete -f stepca_issuer.yaml
clusterissuer.cert-manager.io "stepca-issuer" deleted

6)Now if the user tries to create it again shows the following error on the system
[sysadmin@controller-1 ~(keystone_admin)]$ kubectl create -f stepca_issuer.yaml
Error from server (InternalError): error when creating "stepca_issuer.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cm-cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp [abcd:207::a15c]:443: connect: network is unreachable
[sysadmin@controller-1 ~(keystone_admin)]$

7)Tried the same operation on r430_3_4 ipv6 system and it seems to be working fine

System Configuration
--------------------
duplex system,wcp_78_79_ipv6 -- fails on this system
duplex system, wcp_r430_3_4_ipv6 -- passes on this system

Branch/Pull Time/Commit
-----------------------
2020-06-02_20-00-00

Last Pass
---------
Never on wcp_78_79

Timestamp/Logs
--------------
2020-06-03T19:08:18.175878694Z

Test Activity
-------------
Automation

Workaround
----------
swact again and perform the same operation

Revision history for this message
ayyappa (mantri425) wrote :
Ghada Khalil (gkhalil)
summary: - cm resource creation fails after swact on duplex ipv6 system
+ cert-mgr resource creation fails after swact on duplex ipv6 system
Changed in starlingx:
assignee: nobody → Sabeel Ansari (sansariwr)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - issue appears to be seen on one system. Requires further investigation as it's not clear what's different about this system; perhaps a timing issue where the cert-mgr has not recovered after the swact operation (?)

description: updated
tags: added: stx.apps
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.4.0
Revision history for this message
Nimalini Rasa (nrasa) wrote :
Download full text (4.1 KiB)

Also seen in DC lab, apply failed after swacting with the following error:
get_results /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:215^[[00m
2020-06-17 14:15:37.255 16 INFO armada.handlers.lock [-] Releasing lock^[[00m
2020-06-17 14:15:37.259 16 ERROR armada.cli [-] Caught unexpected exception: grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Name resolution failure"
        debug_error_string = "{"created":"@1592403336.274257011","description":"Failed to create subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2721,"referenced_errors":[{"created":"@1592403336.274254598","description":"Name resolution failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3026,"grpc_status":14}]}"
>
2020-06-17 14:15:37.259 16 ERROR armada.cli Traceback (most recent call last):
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/__init__.py", line 38, in safe_invoke
2020-06-17 14:15:37.259 16 ERROR armada.cli self.invoke()
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 213, in invoke
2020-06-17 14:15:37.259 16 ERROR armada.cli resp = self.handle(documents, tiller)
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py", line 81, in func_wrapper
2020-06-17 14:15:37.259 16 ERROR armada.cli return future.result()
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
2020-06-17 14:15:37.259 16 ERROR armada.cli return self.__get_result()
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2020-06-17 14:15:37.259 16 ERROR armada.cli raise self._exception
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
2020-06-17 14:15:37.259 16 ERROR armada.cli result = self.fn(*self.args, **self.kwargs)
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 256, in handle
2020-06-17 14:15:37.259 16 ERROR armada.cli return armada.sync()
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 189, in sync
2020-06-17 14:15:37.259 16 ERROR armada.cli known_releases = self.tiller.list_releases()
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 252, in list_releases
2020-06-17 14:15:37.259 16 ERROR armada.cli releases = get_results()
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 220, in get_results
2020-06-17 14:15:37.259 16 ERROR armada.cli for message in response:
2020-06-17 14:15:37.259 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 364, in __next__
2020-06-17 14:15:37.259 16...

Read more...

Revision history for this message
Nimalini Rasa (nrasa) wrote :

reapply worked fine

Ghada Khalil (gkhalil)
tags: added: stx.security
Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Nimalini,
The issue you are reporting is not the same as the original issue in this LP. Unless you've seen the following webhook error, I believe this is a different issue:
Error from server (InternalError): error when creating "stepca_issuer.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cm-cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp [abcd:207::a15c]:443: connect: network is unreachable

Please collect a new set of logs and open a new LP. There isn't enough information in your note to investigate the failure you saw any further.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

The original test scenario is a very specific test-case where a cert-manager operation is triggered right after a swact before the cert-manager is fully recovered. This is a corner case. Therefore, moving to stx.5.0

tags: added: stx.5.0
removed: stx.4.0
Revision history for this message
Sabeel Ansari (sansariwr) wrote :

After the Calico binding issue fix went in, this bug has seems to have been addressed.

Link to other LP: https://bugs.launchpad.net/starlingx/+bug/1885582

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as a duplicate / fix released based on the fix for https://bugs.launchpad.net/starlingx/+bug/1885582 which merged on 2020-07-03

Changed in starlingx:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.