StarlingX

Bug #1916620
Comment #2

Comment 2 for bug 1916620

Revision history for this message

Douglas Henrique Koerich (dkoerich-wr) wrote on 2021-02-25:

The issue is indeed caused by the concurrency in high load between kubelet (launching the pods) and puppet (applying worker manifest), as depicted in the table below, obtained from tests on AIO-SX lab with a small, generic pod:

Table 1: Elapsed time between relevant events vs. number of pods
+-----------------------------------+----------+----------+-----------+
| Event | 100 pods | 200 pods | 300 pods* |
+-----------------------------------+-------- -+----------+-----------+
| Finished with controller manifest | 0 sec | 0 sec | 0 sec |
| Pods gets launched | 26 sec | 23 sec | 27 sec |
| Started with worker manifest | 1 sec | 3 sec | <1 sec |
| Triggered delete of sriovdp | 23 sec | 46 sec | 62 sec |
| sriovdp deleted | 18 sec | 107 sec | 245 sec |
| Finished with worker manifest | 1 sec | 10 sec | 1 sec |
+-----------------------------------+-------- -+----------+-----------+
(*) System got unstable due to heavy load from pods