Error with QueuePool limit of size

Bug #2067345 reported by Khoi
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Magnum
New
Undecided
Unassigned

Bug Description

Hello.

I got this problem when I create and delete cluster about 15 times repeatedly :

Ignore error [QueuePool limit of size 1 overflow 50 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)] when syncing up cluster status.: sqlalchemy.exc.TimeoutError: QueuePool limit of size 1 overflow 50 reached, connection timed out, timeout 30.00

This is full log:

https://paste.sh/gWg28T2s#zctJBJsW34bX9pG705VR5PMP

Openstack:2024.01
Ubuntu: 22.04
Deployment Tool: Kolla Ansible
Driver: Magnum-Cluster-API(0.17.0) from Vexxhost

Thank you much.

Revision history for this message
Khoi (khoinh5) wrote :

Hello.
I believe this is a critical bug. Because I created only 1 cluster and It still happened after 1 day without touch anything.

Revision history for this message
rasty94 (rasty94) wrote (last edit ):

I have the same problem with this version:

Openstack:2024.01
Rocky 9
Deployment Tool: Kolla Ansible

we can achieve deploy with magnum increasing the limits on the queue:
[database]
connection_recycle_time = 10
max_pool_size = 50
max_overflow = 100
max_retries = -1
pool_timeout = 180

But this is not a solution...

Revision history for this message
Khoi (khoinh5) wrote :

Thank you for your sharing.

I wonder why It is ok with 2023.01. I wont need tune these parameters.

Revision history for this message
rasty94 (rasty94) wrote :

i was reviewing was has changed from 2023.1 to 2024.1 in the perodic.py but i can't see anything that could be this problem.

Only added:
            # Clean up trusts and certificates, if they still exist.
            os_client = clients.OpenStackClients(self.ctx)
            LOG.debug("Calling delete_trustee_and_trusts from periodic "
                      "DELETE_COMPLETE")
            trust_manager.delete_trustee_and_trust(os_client, self.ctx,
                                                   self.cluster)
            cert_manager.delete_certificates_from_cluster(self.cluster,
                                                          context=self.ctx)

Revision history for this message
Khoi (khoinh5) wrote :

Oh.

I hope there will some advices for this. If tunning these parameters so I will continue to QueuePool limit of size after a period of time.

Revision history for this message
Khoi (khoinh5) wrote :

Which Magnum driver did u use?

Revision history for this message
Khoi (khoinh5) wrote :

I increased max_pool_size = 2 and I still happened again in two days,

Revision history for this message
Michel Jouvin (mijouvin) wrote :

Hi,

I am experiencing the same problem with Caracal Magnum (Kubernetes driver) and I have max_pool_size set to 5. I'm running AlmaLinux 9.4.

The problem happens on a test instance with a very limited number of clusters where I have not seen it on the production Magnum instance running Antelope with ~20 clusters active.

Michel

Revision history for this message
Michel Jouvin (mijouvin) wrote :

In fact setting max_pool_sie to 0 (no limit) seems to workaround the problem. Not sure what it means in the long run...

Michel

Revision history for this message
Khoi (khoinh5) wrote : Re: [Bug 2067345] Re: Error with QueuePool limit of size

Hi.
I think i will make system crash. I hope that we will have a good sulution
for it.

On Sun, Jun 30, 2024, 9:20 PM Michel Jouvin <email address hidden>
wrote:

> In fact setting max_pool_sie to 0 (no limit) seems to workaround the
> problem. Not sure what it means in the long run...
>
> Michel
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2067345
>
> Title:
> Error with QueuePool limit of size
>
> Status in Magnum:
> New
>
> Bug description:
> Hello.
>
> I got this problem when I create and delete cluster about 15 times
> repeatedly :
>
> Ignore error [QueuePool limit of size 1 overflow 50 reached,
> connection timed out, timeout 30.00 (Background on this error at:
> https://sqlalche.me/e/14/3o7r)] when syncing up cluster status.:
> sqlalchemy.exc.TimeoutError: QueuePool limit of size 1 overflow 50
> reached, connection timed out, timeout 30.00
>
> This is full log:
>
> https://paste.sh/gWg28T2s#zctJBJsW34bX9pG705VR5PMP
>
> Openstack:2024.01
> Ubuntu: 22.04
> Deployment Tool: Kolla Ansible
> Driver: Magnum-Cluster-API(0.17.0) from Vexxhost
>
> Thank you much.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/magnum/+bug/2067345/+subscriptions
>
>

Revision history for this message
Michel Jouvin (mijouvin) wrote :

FYI, setting `max_pool_size` to 0 just delays the problem. At some point there is a very high number of open connections from MAgnum to the MariaDB server and the errors appears again (may be some ressources are exhausted on the server side) and the only solution is to restart the magnum-conductor service. Last occurence of the problem the number of open connection was ~385 on a test cluster with less than 10 clusters...

Would lowering conn_pool_ttl help to recycle connections?

Michel

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.