Always times out, all agents fail to join

Bug #1967184 reported by Ted Lum
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
shaker
New
Undecided
Unassigned

Bug Description

Always times out waiting for any agent to join, i.e.

2022-03-30 15:06:12.519 726899 ERROR shaker.engine.server Exception: Agents failed to join: {'shaker_mouzts_primary_4': 'lost', 'shaker_mouzts_minion_2': 'lost', 'shaker_mouzts_primary_2': 'lost', 'shaker_mouzts_minion_5': 'lost', 'shake
r_mouzts_primary_5': 'lost', 'shaker_mouzts_minion_6': 'lost', 'shaker_mouzts_minion_1': 'lost', 'shaker_mouzts_minion_3': 'lost', 'shaker_mouzts_primary_0': 'lost', 'shaker_mouzts_primary_1': 'lost', 'shaker_mouzts_primary_3': 'lost', '
shaker_mouzts_primary_7': 'lost', 'shaker_mouzts_minion_7': 'lost', 'shaker_mouzts_minion_0': 'lost', 'shaker_mouzts_primary_6': 'lost', 'shaker_mouzts_minion_4': 'lost'}

The shaker-agent never starts due to error in 'if' statement, and subsequently passing wrong variable type as argument to socket.setsockopt().

I would suggest the following patch, at the very least:

index 322fb60..b5a44c3 100644
--- a/shaker/agent/agent.py
+++ b/shaker/agent/agent.py
@@ -97,9 +97,9 @@ def sleep(seconds):
 def get_socket(context, endpoint):
     socket = context.socket(zmq.REQ)
     socket.setsockopt(zmq.LINGER, 0)
- if 'agent_socket_recv_timeout' in cfg.CONF:
+ if 'agent_socket_recv_timeout' in cfg.CONF and type(cfg.CONF.agent_socket_recv_timeout) == 'int':
         socket.setsockopt(zmq.RCVTIMEO, cfg.CONF.agent_socket_recv_timeout)
- if 'agent_socket_send_timeout' in cfg.CONF:
+ if 'agent_socket_send_timeout' in cfg.CONF and type(cfg.CONF.agent_socket_send_timeout) == 'int':
         socket.setsockopt(zmq.SNDTIMEO, cfg.CONF.agent_socket_send_timeout)
     socket.connect('tcp://%s' % endpoint)
     return socket

The "agent_socket_recv_timeout" and "agent_socket_send_timeout" keys are present, but are empty (none), so the present conditional allows bad types to get passed, which throws an exception. I don't know if there is an expectation that these keys be given a reasonable default elsewhere, so there may be more than one issue, but the existing conditional is insufficient.

This a excerpt of DEBUG output:

2022-03-30 22:01:17.625 1361 DEBUG shaker.engine.utils [-] ================================================================================ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2580
2022-03-30 22:01:17.625 1361 DEBUG shaker.engine.utils [-] agent_dir = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.625 1361 DEBUG shaker.engine.utils [-] agent_id = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.625 1361 DEBUG shaker.engine.utils [-] agent_socket_conn_retries = 10 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.625 1361 DEBUG shaker.engine.utils [-] agent_socket_recv_timeout = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.626 1361 DEBUG shaker.engine.utils [-] agent_socket_send_timeout = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.626 1361 DEBUG shaker.engine.utils [-] config_dir = [] log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.626 1361 DEBUG shaker.engine.utils [-] config_file = [] log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.626 1361 DEBUG shaker.engine.utils [-] config_source = [] log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.626 1361 DEBUG shaker.engine.utils [-] debug = True log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.626 1361 DEBUG shaker.engine.utils [-] default_log_levels = ['amqp=WARN', 'amqplib=WARN', 'boto=WARN', 'qpid=WARN', 'sqlalchemy=WARN', 'suds=INFO', 'oslo.messaging=INFO', 'oslo_messaging=INFO', 'iso8601=WARN', 'requests.packages.urllib3.connectionpool=WARN', 'urllib3.connectionpool=WARN', 'websocket=WARN', 'requests.packages.urllib3.util.retry=WARN', 'urllib3.util.retry=WARN', 'keystonemiddleware=WARN', 'routes.middleware=WARN', 'stevedore=WARN', 'taskflow=WARN', 'oslo.cache=INFO', 'oslo_policy=INFO', 'dogpile.core.dogpile=INFO', 'pykwalify=INFO'] log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.627 1361 DEBUG shaker.engine.utils [-] instance_format = [instance: %(uuid)s] log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.627 1361 DEBUG shaker.engine.utils [-] instance_uuid_format = [instance: %(uuid)s] log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.627 1361 DEBUG shaker.engine.utils [-] log_config_append = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.627 1361 DEBUG shaker.engine.utils [-] log_date_format = %Y-%m-%d %H:%M:%S log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.627 1361 DEBUG shaker.engine.utils [-] log_dir = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.627 1361 DEBUG shaker.engine.utils [-] log_file = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.628 1361 DEBUG shaker.engine.utils [-] log_rotate_interval = 1 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.628 1361 DEBUG shaker.engine.utils [-] log_rotate_interval_type = days log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.628 1361 DEBUG shaker.engine.utils [-] log_rotation_type = none log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.628 1361 DEBUG shaker.engine.utils [-] logging_context_format_string = %(asctime)s.%(msecs)03d %(process)d %(levelname)s %(name)s [%(request_id)s %(user_identity)s] %(instance)s%(message)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.628 1361 DEBUG shaker.engine.utils [-] logging_debug_format_suffix = %(funcName)s %(pathname)s:%(lineno)d log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.628 1361 DEBUG shaker.engine.utils [-] logging_default_format_string = %(asctime)s.%(msecs)03d %(process)d %(levelname)s %(name)s [-] %(instance)s%(message)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.629 1361 DEBUG shaker.engine.utils [-] logging_exception_prefix = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.629 1361 DEBUG shaker.engine.utils [-] logging_user_identity_format = %(user)s %(tenant)s %(domain)s %(user_domain)s %(project_domain)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.629 1361 DEBUG shaker.engine.utils [-] max_logfile_count = 30 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.629 1361 DEBUG shaker.engine.utils [-] max_logfile_size_mb = 200 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.630 1361 DEBUG shaker.engine.utils [-] polling_interval = 10 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.630 1361 DEBUG shaker.engine.utils [-] publish_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.630 1361 DEBUG shaker.engine.utils [-] rate_limit_burst = 0 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.630 1361 DEBUG shaker.engine.utils [-] rate_limit_except_level = CRITICAL log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.630 1361 DEBUG shaker.engine.utils [-] rate_limit_interval = 0 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.630 1361 DEBUG shaker.engine.utils [-] server_endpoint = 192.168.4.45:55555 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.631 1361 DEBUG shaker.engine.utils [-] syslog_log_facility = LOG_USER log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.631 1361 DEBUG shaker.engine.utils [-] use_eventlog = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.631 1361 DEBUG shaker.engine.utils [-] use_journal = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.631 1361 DEBUG shaker.engine.utils [-] use_json = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.631 1361 DEBUG shaker.engine.utils [-] use_stderr = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.631 1361 DEBUG shaker.engine.utils [-] use_syslog = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.631 1361 DEBUG shaker.engine.utils [-] watch_log_file = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2589
2022-03-30 22:01:17.632 1361 DEBUG shaker.engine.utils [-] ******************************************************************************** log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2599

A workaround for this error would be to set --agent-socket-recv-timeout and --agent-socket-send-timeout on the command line.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.