While adding more than 25K policies - pycassa.pool.AllServersUnavailable

Bug #1354542 reported by Senthilnathan Murugappan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Fix Committed
Low
Hampapur Ajay

Bug Description

During scale testing, observed the below errors while adding more than 25K policies distributed across 25 VNs.

INFO:pycassa.pool:Connection 319824848 (10.84.7.39:9160) in pool 53533136 failed: timed out
INFO:pycassa.pool:Connection 8916896 (10.84.7.39:9160) in pool 53533136 failed: timed out
INFO:pycassa.pool:Connection 8916896 (10.84.7.39:9160) in pool 53533136 failed: timed out
ERROR:vnc_cfg_api_server.vnc_cfg_api_server:Exception in REST api handler:
<class 'pycassa.pool.AllServersUnavailable'>
Python 2.7.3: /usr/bin/python
Fri Aug 8 09:41:40 2014

A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they occurred.

 /usr/lib/python2.7/dist-packages/vnc_cfg_api_server/vnc_cfg_api_server.py in handler_trap_exception(*args=(), **kwargs={'id': 'dcf18aad-dcc0-4b62-8e88-18f08300c1f2'})
  399 def handler_trap_exception(*args, **kwargs):
  400 try:
  401 return handler(*args, **kwargs)
  402 except Exception as e:
  403 # don't log details of bottle.abort i.e handled error cases
handler = <bound method VncApiServer.project_http_get of <...i_server.vnc_cfg_api_server.VncApiServer object>>
args = ()
kwargs = {'id': 'dcf18aad-dcc0-4b62-8e88-18f08300c1f2'}

 /usr/lib/python2.7/dist-packages/vnc_cfg_api_server/gen/vnc_api_server_gen.py in project_http_get(self=<vnc_cfg_api_server.vnc_cfg_api_server.VncApiServer object>, id='dcf18aad-dcc0-4b62-8e88-18f08300c1f2')
15708 obj_fields = obj_fields + children
15709
15710 (ok, result) = db_conn.dbe_read('project', obj_ids, obj_fields)
15711 if not ok:
15712 # Not present in DB
ok = True
result = ''
db_conn = <vnc_cfg_api_server.vnc_cfg_ifmap.VncDbClient object>
db_conn.dbe_read = <bound method VncDbClient.dbe_read of <vnc_cfg_api_server.vnc_cfg_ifmap.VncDbClient object>>
obj_ids = {'uuid': 'dcf18aad-dcc0-4b62-8e88-18f08300c1f2'}
obj_fields = [u'quota', u'id_perms', u'display_name', u'namespace_refs', u'floating_ip_pool_refs']

 /usr/lib/python2.7/dist-packages/vnc_cfg_api_server/vnc_cfg_ifmap.py in dbe_read(self=<vnc_cfg_api_server.vnc_cfg_ifmap.VncDbClient object>, obj_type='project', obj_ids={'uuid': 'dcf18aad-dcc0-4b62-8e88-18f08300c1f2'}, obj_fields=[u'quota', u'id_perms', u'display_name', u'namespace_refs', u'floating_ip_pool_refs'])
 1506 self._cassandra_db, "_cassandra_%s_read" % (method_name))
 1507 try:
 1508 (ok, cassandra_result) = method([obj_ids['uuid']], obj_fields)
 1509 except NoIdError as e:
 1510 return (False, str(e))
ok undefined
cassandra_result undefined
method = <bound method VncCassandraClient._cassandra_proj..._server.vnc_cfg_ifmap.VncCassandraClient object>>
obj_ids = {'uuid': 'dcf18aad-dcc0-4b62-8e88-18f08300c1f2'}
obj_fields = [u'quota', u'id_perms', u'display_name', u'namespace_refs', u'floating_ip_pool_refs']

 /usr/lib/python2.7/dist-packages/vnc_cfg_api_server/gen/vnc_cassandra_client_gen.py in _cassandra_project_read(self=<vnc_cfg_api_server.vnc_cfg_ifmap.VncCassandraClient object>, obj_uuids=['dcf18aad-dcc0-4b62-8e88-18f08300c1f2'], field_names=[u'quota', u'id_perms', u'display_name', u'namespace_refs', u'floating_ip_pool_refs'])
10010 obj_rows = obj_uuid_cf.multiget(obj_uuids,
10011 column_count = 10000000,
10012 include_timestamp = True)
10013
10014 if (len(obj_uuids) == 1) and not obj_rows:
include_timestamp undefined
builtinTrue = True

 /usr/lib/python2.7/dist-packages/pycassa/columnfamily.py in multiget(self=<pycassa.columnfamily.ColumnFamily object>, keys=['dcf18aad-dcc0-4b62-8e88-18f08300c1f2'], columns=None, column_start='', column_finish='', column_reversed=False, column_count=10000000, include_timestamp=True, super_column=None, read_consistency_level=None, buffer_size=1024, include_ttl=False)
  770 while offset < len(packed_keys):
  771 new_keymap = self.pool.execute('multiget_slice', 772 packed_keys[offset:offset + buffer_size], cp, sp, consistency) 773 keymap.update(new_keymap) 774 offset += buffer_sizepacked_keys = ['dcf18aad-dcc0-4b62-8e88-18f08300c1f2']offset = 0
buffer_size = 1024
cp = ColumnParent(column_family='obj_uuid_table', super_column=None)
sp = SlicePredicate(column_names=None, slice_range=Sl...t=10000000, start='', reversed=False, finish=''))consistency = 2
 /usr/lib/python2.7/dist-packages/pycassa/pool.py in execute(self=<pycassa.pool.ConnectionPool object>, f='multiget_slice', *args=(['dcf18aad-dcc0-4b62-8e88-18f08300c1f2'], ColumnParent(column_family='obj_uuid_table', super_column=None), SlicePredicate(column_names=None, slice_range=Sl...t=10000000, start='', reversed=False, finish='')), 2), **kwargs={}) 575 try: 576 conn = self.get() 577 return getattr(conn, f)(*args, **kwargs)
  578 finally:
  579 if conn:
builtingetattr = <built-in function getattr>
conn = <pycassa.pool.ConnectionWrapper object>f = 'multiget_slice'args = (['dcf18aad-dcc0-4b62-8e88-18f08300c1f2'], ColumnParent(column_family='obj_uuid_table', super_column=None), SlicePredicate(column_names=None, slice_range=Sl...t=10000000, start='', reversed=False, finish='')), 2)kwargs = {} /usr/lib/python2.7/dist-packages/pycassa/pool.py in new_f(self=<pycassa.pool.ConnectionWrapper object>, *args=(['dcf18aad-dcc0-4b62-8e88-18f08300c1f2'], ColumnParent(column_family='obj_uuid_table', super_column=None), SlicePredicate(column_names=None, slice_range=Sl...t=10000000, start='', reversed=False, finish='')), 2), **kwargs={'reset': True}) 151 152 kwargs['reset'] = True 153 return new_f(self, *args, **kwargs) 154 155 new_f.__name__ = f.__name__new_f = <function multiget_slice>self = <pycassa.pool.ConnectionWrapper object>args = (['dcf18aad-dcc0-4b62-8e88-18f08300c1f2'], ColumnParent(column_family='obj_uuid_table', super_column=None), SlicePredicate(column_names=None, slice_range=Sl...t=10000000, start='', reversed=False, finish='')), 2)kwargs = {'reset': True} /usr/lib/python2.7/dist-packages/pycassa/pool.py in new_f(self=<pycassa.pool.ConnectionWrapper object>, *args=(['dcf18aad-dcc0-4b62-8e88-18f08300c1f2'], ColumnParent(column_family='obj_uuid_table', super_column=None), SlicePredicate(column_names=None, slice_range=Sl...t=10000000, start='', reversed=False, finish='')), 2), **kwargs={}) 123 allow_retries = kwargs.pop('allow_retries', True) 124 if kwargs.pop('reset', False): 125 self._pool._replace_wrapper() # puts a new wrapper in the queue 126 self._replace(self._pool.get()) # swaps out transport
  127 result = f(self, *args, **kwargs)
self = <pycassa.pool.ConnectionWrapper object>self._pool = <pycassa.pool.ConnectionPool object>self._pool._replace_wrapper = <bound method ConnectionPool._replace_wrapper of <pycassa.pool.ConnectionPool object>>
 /usr/lib/python2.7/dist-packages/pycassa/pool.py in _replace_wrapper(self=<pycassa.pool.ConnectionPool object>) 456 """Try to replace the connection.""" 457 if not self._q.full(): 458 conn = self._create_connection() 459 conn._checkin() 460conn undefinedself = <pycassa.pool.ConnectionPool object>self._create_connection = <bound method ConnectionPool._create_connection of <pycassa.pool.ConnectionPool object>> /usr/lib/python2.7/dist-packages/pycassa/pool.py in _create_connection(self=<pycassa.pool.ConnectionPool object>) 429 raise AllServersUnavailable('An attempt was made to connect to each of the servers ' + 430 'twice, but none of the attempts succeeded. The last failure was %s: %s' % 431 (exc.__class__.__name__, exc)) 432 433 def fill(self):exc = timeout('timed out',)exc.__class__ = <class 'socket.timeout'>exc.__class__.__name__ = 'timeout'<class 'pycassa.pool.AllServersUnavailable'>: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was timeout: timed out
    __class__ = <class 'pycassa.pool.AllServersUnavailable'>
    __delattr__ = <method-wrapper '__delattr__' of AllServersUnavailable object> __dict__ = {}
    __doc__ = 'Raised when none of the servers given to a pool can be connected to.'
    __format__ = <built-in method __format__ of AllServersUnavailable object> __getattribute__ = <method-wrapper '__getattribute__' of AllServersUnavailable object> __getitem__ = <method-wrapper '__getitem__' of AllServersUnavailable object> __getslice__ = <method-wrapper '__getslice__' of AllServersUnavailable object> __hash__ = <method-wrapper '__hash__' of AllServersUnavailable object> __init__ = <method-wrapper '__init__' of AllServersUnavailable object> __module__ = 'pycassa.pool'
    __new__ = <built-in method __new__ of type object>
    __reduce__ = <built-in method __reduce__ of AllServersUnavailable object>
    __reduce_ex__ = <built-in method __reduce_ex__ of AllServersUnavailable object>
    __repr__ = <method-wrapper '__repr__' of AllServersUnavailable object> __setattr__ = <method-wrapper '__setattr__' of AllServersUnavailable object>
    __setstate__ = <built-in method __setstate__ of AllServersUnavailable object>
    __sizeof__ = <built-in method __sizeof__ of AllServersUnavailable object>
    __str__ = <method-wrapper '__str__' of AllServersUnavailable object> __subclasshook__ = <built-in method __subclasshook__ of type object>
    __unicode__ = <built-in method __unicode__ of AllServersUnavailable object>
    __weakref__ = None
    args = ('An attempt was made to connect to each of the se...ucceeded. The last failure was timeout: timed out',)
    message = 'An attempt was made to connect to each of the se...ucceeded. The last failure was timeout: timed out'

The above is a description of an error in a Python program. Here is
the original traceback:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/vnc_cfg_api_server/vnc_cfg_api_server.py", line 401, in handler_trap_exception
    return handler(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/vnc_cfg_api_server/gen/vnc_api_server_gen.py", line 15710, in project_http_get
    (ok, result) = db_conn.dbe_read('project', obj_ids, obj_fields)
  File "/usr/lib/python2.7/dist-packages/vnc_cfg_api_server/vnc_cfg_ifmap.py", line 1508, in dbe_read
    (ok, cassandra_result) = method([obj_ids['uuid']], obj_fields)
  File "/usr/lib/python2.7/dist-packages/vnc_cfg_api_server/gen/vnc_cassandra_client_gen.py", line 10012, in _cassandra_project_read
    include_timestamp = True)
  File "/usr/lib/python2.7/dist-packages/pycassa/columnfamily.py", line 772, in multiget
    packed_keys[offset:offset + buffer_size], cp, sp, consistency)
  File "/usr/lib/python2.7/dist-packages/pycassa/pool.py", line 577, in execute
    return getattr(conn, f)(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pycassa/pool.py", line 153, in new_f
    return new_f(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pycassa/pool.py", line 125, in new_f
    self._pool._replace_wrapper() # puts a new wrapper in the queue
  File "/usr/lib/python2.7/dist-packages/pycassa/pool.py", line 458, in _replace_wrapper
    conn = self._create_connection()
  File "/usr/lib/python2.7/dist-packages/pycassa/pool.py", line 431, in _create_connection
    (exc.__class__.__name__, exc))
AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was timeout: timed out

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/bottle.py", line 764, in _handle
    return route.call(**args)
  File "/usr/lib/python2.7/dist-packages/bottle.py", line 1575, in wrapper
    rv = callback(*a, **ka)
  File "/usr/lib/python2.7/dist-packages/vnc_cfg_api_server/vnc_cfg_api_server.py", line 414, in handler_trap_exception
    raise e
AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was timeout: timed out
127.0.0.1 - - [2014-08-08 09:41:40] "GET /project/dcf18aad-dcc0-4b62-8e88-18f08300c1f2?exclude_back_refs=True&exclude_children=True HTTP/1.1" 500 156 2.514230
80f-82dd-468b-8a9a-5081904490e2'}

Tags: config scale
tags: added: config
Revision history for this message
Senthilnathan Murugappan (msenthil) wrote :

Similar exceptions are observed while scaling different params like Virtual-Networks, Policy-rules

Changed in juniperopenstack:
assignee: nobody → Hampapur Ajay (hajay)
Revision history for this message
Senthilnathan Murugappan (msenthil) wrote :

Ajay had a look at the setup and tweaking pycassa connection timeout from 0.5 sec to like 5 secs resolved the issue.
Max time the cassandra threads were stopped is around 2 seconds.

discovery/disc_cassdb.py
vnc_cfg_api_server/vnc_cfg_ifmap.py

Also had to increase the zookeeper client timeout from 20 sec to 200 sec to fix the cfgm trashing issue during restart.

cfgm_common/zkclient.py

information type: Proprietary → Public
Revision history for this message
Sachin Bansal (sbansal) wrote :

Timeout has already been increased and zk timeout is configurable. This should be fixed now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.