Distributed Cloud: seeing QueuePool limit exception for dcorch audit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Yuxing |
Bug Description
Brief Description
-----------------
While running distributed cloud system with lots of subclouds, seeing QueuePool limit exceptions during dcorch audit. When these happen the audit will be unable verify if the particular subcloud resource is synced between the system controller and subcloud until next audit attempt.
Severity
--------
Minor
Steps to Reproduce
------------------
Set up large distributed cloud system. Leave running in steady state.
Expected Behavior
------------------
Observed behaviour of /var/log/
Actual Behavior
----------------
See QueuePool error
TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30
Reproducibility
---------------
Intermittent - will affect 10 minute audit ~5% of the time. Multiple resources are affected during the same audit when this happens.
System Configuration
-------
Distributed Cloud - system controller
Branch/Pull Time/Commit
-------
2020-06-27_18-35-20
Last Pass
---------
none
Timestamp/Logs
--------------
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
2020-07-24 17:40:47.233 1576608 ERROR dcorch.
Test Activity
-------------
Distributed Cloud system testing
Workaround
----------
none
tags: | added: stx.distcloud |
Changed in starlingx: | |
assignee: | nobody → Yuxing (yuxing) |
It looks like we previously increased the DB connection pool limits for dcmanager - looks like we need to do the same thing for dcorch: /review. opendev. org/#/c/ 718490
https:/