StarlingX

Bug #1833609
Comment #9

Comment 9 for bug 1833609

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2019-07-01:

Here is a detail explanation of the issue.
When the 1st apply stx-openstack application, for osh-openstack-ceph-rgw chart, there are 3 jobs will be executed. And we could find them in job and pod list as below, all of them are labelled with osh-openstack-ceph-rgw:
job:
ceph-ks-endpoints 1/1 31s 37m
ceph-ks-service 1/1 16s 37m
swift-ks-user 1/1 42s 37m
pod:
ceph-ks-endpoints-bcjts 0/3 Completed 0 36m
ceph-ks-service-wmsdc 0/1 Completed 0 36m
swift-ks-user-cdbc8 0/1 Completed 0 36m

After swact and unlock controller-0, controller-0 will be rebooted, and these pods executed in controller-0 will be destroyed.
Then when do application re-apply triggered by controller-0 unlock, armada will try to wait job and pod to ensure the chart apply is succeed.
But since the job is completed, it will not run again, so no new pod is created, and the old pod is destroyed due to the controller-0 reboot. So armada will stuck at pod wait.
It is armada's issue, since it assume there will be at least 1 pod executed for chart apply. It is true for most case, but not for the re-apply case and chart with job only.

It is hard to work-around it in STX. I will report an issue in armada project.