Sahara cluster can't be prepared due the error "QueuePool limit of size 5 overflow 10 reached"

Bug #1497365 reported by Leontii Istomin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Denis Egorenko
6.1.x
In Progress
High
Denis Egorenko
7.0.x
Fix Committed
High
Denis Meltsaykin
8.0.x
Fix Released
High
Denis Egorenko

Bug Description

Sahara cluster (236 hadoop nodes) couldn't be created. On Preparing step occurred the following error:
2015-09-18 16:09:21.046 905 ERROR sahara.service.ops [-] Error during operating on cluster sahara-cluster-nnOwBBKKKzwFGH9lS4sj (reason: An error occurred in thread 'configure-instance-sahara-cluster-nnowbbkkkzwfgh9ls4sj-worker-ng-023': QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30
Error ID: a7609278-7d53-4680-81c6-b3bf4c079a44)
http://paste.openstack.org/show/468150/

Cluster configuration:
Baremetal, Ubuntu,IBP,HA, Neutron-vxlan,Ceph-all,Nova-debug,Nova-quotas,Sahara,7.0-296
Controllers:3 Computes+Ceph:47

api: '1.0'
astute_sha: 6c5b73f93e24cc781c809db9159927655ced5012
auth_required: true
build_id: '296'
build_number: '296'
feature_groups:
- mirantis
fuel-agent_sha: 082a47bf014002e515001be05f99040437281a2d
fuel-library_sha: f2eef7717b15c6c0a3e76ef98ad4c7c4532d56f9
fuel-nailgun-agent_sha: d7027952870a35db8dc52f185bb1158cdd3d1ebd
fuel-ostf_sha: 1f08e6e71021179b9881a824d9c999957fcc7045
fuelmain_sha: 6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85
nailgun_sha: 16a39d40120dd4257698795f12de4ae8200b1778
openstack_version: 2015.1.0-7.0
production: docker
python-fuelclient_sha: 2864459e27b0510a0f7aedac6cdf27901ef5c481
release: '7.0'

Diagnostic Snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2015-09-18_16-18-22.tar.xz

Changed in mos:
assignee: nobody → MOS Sahara (mos-sahara)
milestone: none → 7.0-updates
importance: Undecided → High
description: updated
Changed in mos:
status: New → Confirmed
Revision history for this message
Leontii Istomin (listomin) wrote :

I've added the following lines to sahara.conf file:
max_pool_size=60
max_retries=-1
max_overflow=120
And the issue has gone. Let's set them as a default parameters for Sahara.

affects: mos → fuel
Changed in fuel:
milestone: 7.0-updates → none
Dina Belova (dbelova)
tags: added: sahara
Changed in fuel:
assignee: MOS Sahara (mos-sahara) → MOS Puppet Team (mos-puppet)
Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

Leontiy,

this fix should be discussed with sahara and oslo.db team

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Leontiy, that's not a good idea due to the DB API driver we use - MySQL-Python.

The problem is that MySQL-Python does not play well with eventlet green threading concurrency model: the context switch between green threads *does not* happen on queries to MySQL in our case, which means we hold 1 DB connection for the duration of a DB transaction and other threads *won't* get a chance to be executed during this time. This effectively means we can use at most 1 DB connection at a time concurrently and literally don't get any benefit of pooling of DB connections.

I'd rather increase the number of Sahara API forks and leave as is / decrease the number of connections in the pool of each fork.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Though, this might change in 8.0, if we decide to switch to pymysql (as upstream did): it's a pure Python DB API driver, which can be monkey-patched by eventlet and use connections concurrently.

affects: fuel → mos
Changed in mos:
assignee: MOS Puppet Team (mos-puppet) → MOS Sahara (mos-sahara)
Changed in mos:
milestone: none → 7.0-updates
Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

Need to set up db parameters in puppet

Revision history for this message
Leontii Istomin (listomin) wrote :

Need to try to reproduce the following patch:
https://review.openstack.org/#/c/233672/

Revision history for this message
Denis Egorenko (degorenko) wrote :
Revision history for this message
Denis Egorenko (degorenko) wrote :
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

The fix for stable/7.0 has been merged - https://review.openstack.org/#/c/235979/

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Hi,

looks like we can reproduce and verify this issue only on scale environments, we need to inlove QA Performance team for the verification.

Revision history for this message
Evgeny Sikachev (esikachev) wrote :
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Seems to be another case of Jenkins missing events from Gerrit. Moved to 7.0-mu-2 as it's too late to merge it again

Revision history for this message
Evgeny Sikachev (esikachev) wrote :

reproduced on iso 275

Revision history for this message
Denis Egorenko (degorenko) wrote :

Waiting for Scale testing for this bug. Move to Fix Committed for now.

tags: added: area-sahara
removed: sahara
tags: added: 7.0-mu-2
Revision history for this message
Evgeny Sikachev (esikachev) wrote :

verified on scale(iso 529)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.