I was setting up the micro-services based contrail, when i enable SSL_ENABLE : True in contrail_configuration, i get the following errors for collector and can see Sandesh errors on all other services. I wanna know is it a bug or there is some mis-config in my setup, thanks.
config/instances.yaml
--------------------------
global_configuration:
CONTAINER_REGISTRY: opencontrailnightly
REGISTRY_PRIVATE_INSECURE: True
provider_config:
bms:
ssh_pwd: admin
ssh_user: root
ntpserver: pool.ntp.org
domainsuffix: local
instances:
bms1:
provider: bms
ip: 10.240.133.114
roles:
config_database:
config:
control:
analytics_database:
analytics:
webui:
bms2:
provider: bms
ip: 10.240.133.113
roles:
vrouter:
openstack_nodes:
ostack1:
provider: bms
ip: 10.240.133.112
contrail_configuration:
CONTAINER_REGISTRY: opencontrailnightly
CONTRAIL_VERSION: latest
CLOUD_ORCHESTRATOR: openstack
RABBITMQ_NODE_PORT: 5673
VROUTER_GATEWAY: 10.240.133.254
PHYSICAL_INTERFACE: ens3
AUTH_MODE: keystone
KEYSTONE_AUTH_HOST: 10.240.133.112
KEYSTONE_AUTH_URL_VERSION: /v3
KEYSTONE_AUTH_ADMIN_USER: admin
KEYSTONE_AUTH_ADMIN_PASSWORD: admin
imageManager_ip: 10.240.133.112
ANALYTICSDB_NODES: 10.240.133.114
CONTROLLER_NODES: 10.240.133.114
WEBUI_NODES: 10.240.133.114
ANALYTICS_NODES: 10.240.133.114
CONTROL_NODES: 10.240.133.114
CONFIGDB_NODES: 10.240.133.114
SSL_ENABLE: True
(analytics-collector)[root@rh-3 /]$ tail -f /var/log/contrail/contrail-collector.log
2018-08-01 Wed 05:59:17:791.316 EDT rh-3 [Thread 140310955316992, Pid 1]: SANDESH: Send FAILED: 1533117557790996 [SYS_NOTICE]: SandeshModuleClientTrace: data= [ name = rh-3:Analytics:contrail-collector:0 client_info= [ status = ClientInit successful_connections = 1 pid = 1 http_port = 8089 start_time = 1533117557756874 collector_name = collector_ip = 127.0.0.1:8086 collector_list= [ [ (*_iter40) = 127.0.0.1:8086, ] ] rx_socket_stats= [ bytes = 0 calls = 0 average_bytes = 0 blocked_duration = 00:00:00 blocked_count = 0 average_blocked_duration = errors = 0 ] tx_socket_stats= [ bytes = 0 calls = 0 average_bytes = 0 blocked_duration = 00:00:00 blocked_count = 0 average_blocked_duration = errors = 0 ] ] msg_type_agg= [ [ _iter58->first = ConfigCassandraReadRetry [ messages_sent = 0 messages_sent_dropped_no_queue = 0 messages_sent_dropped_no_client = 0 messages_sent_dropped_no_session = 0 messages_sent_dropped_queue_level = 0 messages_sent_dropped_client_send_failed = 0 messages_sent_dropped_session_not_connected = 0 messages_sent_dropped_header_write_failed = 0 messages_sent_dropped_write_failed = 0 messages_sent_dropped_wrong_client_sm_state = 0 messages_sent_dropped_validation_failed = 0 messages_sent_dropped_rate_limited = 0 messages_sent_dropped_sending_disabled = 3 messages_sent_dropped_sending_to_syslog = 0 ], _iter58->first = NodeStatusUVE [ messages_sent = 0 messages_sent_dropped_no_queue = 0 messages_sent_dropped_no_client = 9 messages_sent_dropped_no_session = 0 messages_sent_dropped_queue_level = 0 messages_sent_dropped_client_send_failed = 0 messages_sent_dropped_session_not_connected = 0 messages_sent_dropped_header_write_failed = 0 messages_sent_dropped_write_failed = 0 messages_sent_dropped_wrong_client_sm_state = 5 messages_sent_dropped_validation_failed = 0 messages_sent_dropped_rate_limited = 0 messages_sent_dropped_sending_disabled = 0 messages_sent_dropped_sending_to_syslog = 0 ], _iter58->first = SandeshModuleClientTrace [ messages_sent = 0 messages_sent_dropped_no_queue = 0 messages_sent_dropped_no_client = 1 messages_sent_dropped_no_session = 0 messages_sent_dropped_queue_level = 0 messages_sent_dropped_client_send_failed = 0 messages_sent_dropped_session_not_connected = 0 messages_sent_dropped_header_write_failed = 0 messages_sent_dropped_write_failed = 0 messages_sent_dropped_wrong_client_sm_state = 3 messages_sent_dropped_validation_failed = 0 messages_sent_dropped_rate_limited = 0 messages_sent_dropped_sending_disabled = 0 messages_sent_dropped_sending_to_syslog = 0 ], _iter58->first = TcpServerMessageLog [ messages_sent = 0 messages_sent_dropped_no_queue = 0 messages_sent_dropped_no_client = 0 messages_sent_dropped_no_session = 0 messages_sent_dropped_queue_level = 0 messages_sent_dropped_client_send_failed = 0 messages_sent_dropped_session_not_connected = 0 messages_sent_dropped_header_write_failed = 0 messages_sent_dropped_write_failed = 0 messages_sent_dropped_wrong_client_sm_state = 0 messages_sent_dropped_validation_failed = 0 messages_sent_dropped_rate_limited = 0 messages_sent_dropped_sending_disabled = 1 messages_sent_dropped_sending_to_syslog = 0 ], _iter58->first = TcpSessionMessageLog [ messages_sent = 0 messages_sent_dropped_no_queue = 0 messages_sent_dropped_no_client = 0 messages_sent_dropped_no_session = 0 messages_sent_dropped_queue_level = 0 messages_sent_dropped_client_send_failed = 0 messages_sent_dropped_session_not_connected = 0 messages_sent_dropped_header_write_failed = 0 messages_sent_dropped_write_failed = 0 messages_sent_dropped_wrong_client_sm_state = 0 messages_sent_dropped_validation_failed = 0 messages_sent_dropped_rate_limited = 0 messages_sent_dropped_sending_disabled = 1 messages_sent_dropped_sending_to_syslog = 0 ], ] ] tx_msg_agg= [ [ _iter59->first = dropped_client_send_failed _iter59->second = 0, _iter59->first = dropped_header_write_failed _iter59->second = 0, _iter59->first = dropped_no_client _iter59->second = 10, _iter59->first = dropped_no_queue _iter59->second = 0, _iter59->first = dropped_no_session _iter59->second = 0, _iter59->first = dropped_queue_level _iter59->second = 0, _iter59->first = dropped_rate_limited _iter59->second = 0, _iter59->first = dropped_sending_disabled _iter59->second = 5, _iter59->first = dropped_sending_to_syslog _iter59->second = 0, _iter59->first = dropped_session_not_connected _iter59->second = 0, _iter59->first = dropped_validation_failed _iter59->second = 0, _iter59->first = dropped_write_failed _iter59->second = 0, _iter59->first = dropped_wrong_client_sm_state _iter59->second = 8, _iter59->first = sent _iter59->second = 0, ] ] ]
2018-08-01 Wed 05:59:17:793.059 EDT rh-3 [Thread 140310976308992, Pid 1]: SANDESH: Send FAILED: 1533117557792896 [SYS_NOTICE]: SandeshModuleServerTrace: data= [ name = rh-3:Control:contrail-dns:0 generator_info= [ [ [ hostname = rh-3 gen_attr= [ connects = 1 connect_time = 1533117557791871 resets = 0 reset_time = 0 in_clear = 0 ] ], ] ] ]
2018-08-01 Wed 05:59:17:793.143 EDT rh-3 [Thread 140310976308992, Pid 1]: SANDESH: Send FAILED: 1533117557792913 [SYS_NOTICE]: SandeshModuleServerTrace: data= [ name = rh-3:Config:contrail-schema:0 generator_info= [ [ [ hostname = rh-3 gen_attr= [ connects = 1 connect_time = 1533117557791662 resets = 0 reset_time = 0 in_clear = 0 ] ], ] ] ]
2018-08-01 Wed 05:59:17:793.719 EDT rh-3 [Thread 140310972110592, Pid 1]: SANDESH: Send FAILED: 1533117557793245 [SYS_NOTICE]: SandeshModuleServerTrace: data= [ name = rh-3:Database:contrail-database-nodemgr:0 generator_info= [ [ [ hostname = rh-3 gen_attr= [ connects = 1 connect_time = 1533117557791414 resets = 0 reset_time = 0 in_clear = 0 ] ], ] ] ]
2018-08-01 Wed 05:59:17:793.782 EDT rh-3 [Thread 140310972110592, Pid 1]: SANDESH: Send FAILED: 1533117557793295 [SYS_NOTICE]: SandeshModuleServerTrace: data= [ name = rh-3:Config:contrail-device-manager:0 generator_info= [ [ [ hostname = rh-3 gen_attr= [ connects = 1 connect_time = 1533117557791618 resets = 0 reset_time = 0 in_clear = 0 ] ], ] ] ]
2018-08-01 Wed 05:59:17:794.025 EDT rh-3 [Thread 140310972110592, Pid 1]: SANDESH: Send FAILED: 1533117557793525 [SYS_NOTICE]: SandeshModuleServerTrace: data= [ name = rh-3:Analytics:contrail-collector:0 generator_info= [ [ [ hostname = rh-3 gen_attr= [ connects = 1 connect_time = 1533117557792095 resets = 0 reset_time = 0 in_clear = 0 ] ], ] ] ]
2018-08-01 Wed 05:59:17:794.063 EDT rh-3 [Thread 140310972110592, Pid 1]: SANDESH: Send FAILED: 1533117557793784 [SYS_NOTICE]: SandeshModuleServerTrace: data= [ name = rh-3:Control:contrail-control:0 generator_info= [ [ [ hostname = rh-3 gen_attr= [ connects = 1 connect_time = 1533117557791190 resets = 0 reset_time = 0 in_clear = 0 ] ], ] ] ]
2018-08-01 Wed 06:00:04:949.729 EDT rh-3 [Thread 140310959515392, Pid 1]: ConfigClientManager [SYS_WARN]: ConfigClientFQNameCache: FQ Name Cache entry not found on UPDATE for: for object type: access_control_list with FQ name: default-domain:default-project:ip-fabric:ip-fabric and uuid: c87a04ef-c63a-4ae3-bcc0-a4f7666251b7 src/contrail-common/config-client-mgr/config_amqp_client.cc 316
2018-08-01 Wed 06:00:44:093.896 EDT rh-3 [Thread 140311215927296, Pid 1]: TCP [SYS_ERR]: TcpSessionMessageLog: Session 10.240.133.114:8089::10.240.133.114:33282 > SSL Handshake failed due to error: 336130315 category: asio.ssl message: wrong version number src/contrail-common/io/ssl_server.cc 92
2018-08-01 Wed 06:00:44:141.809 EDT rh-3 [Thread 140311215927296, Pid 1]: TCP [SYS_ERR]: TcpSessionMessageLog: Session 10.240.133.114:8089::10.240.133.114:33284 < Read failed due to error asio.ssl 335544539 : short read src/contrail-common/io/tcp_session.cc 487
(analytics-collector)[root@rh-3 /]$ tail -f /var/log/contrail/contrail-analytics-api.log
08/01/2018 05:58:58 AM [contrail-analytics-api] [ERROR]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = rh-3:Analytics:contrail-analytics-api:0 client_info = << status = Idle successful_connections = 0 pid = 1 http_port = 8090 start_time = 1533117511843909 collector_name = collector_ip = 10.240.133.114:8086 collector_list = [ 10.240.133.114:8086, ] >> sm_queue_count = 1 max_sm_queue_count = 6 >>
08/01/2018 05:59:07 AM [contrail-analytics-api] [ERROR]: Session Event: TCP Connect Fail
08/01/2018 05:59:07 AM [contrail-analytics-api] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = rh-3 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Up description = Partitions:30 >>, << type = Zookeeper name = OpServer server_addrs = [ 10.240.133.114:2182, ] status = Up description = >>, << type = Redis-UVE name = 10.240.133.114:6379 server_addrs = [ 10.240.133.114:6379, ] status = Initializing >>, << type = Collector name = server_addrs = [ 10.240.133.114:8086, ] status = Initializing description = Idle to Connect on EvIdleHoldTimerExpired >>, ] description = Redis-UVE:10.240.133.114:6379[None], Collector connection down >>, ] >>
08/01/2018 05:59:07 AM [contrail-analytics-api] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = rh-3 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Up description = Partitions:30 >>, << type = Zookeeper name = OpServer server_addrs = [ 10.240.133.114:2182, ] status = Up description = >>, << type = Redis-UVE name = 10.240.133.114:6379 server_addrs = [ 10.240.133.114:6379, ] status = Initializing >>, << type = Collector name = server_addrs = [ 10.240.133.114:8086, ] status = Down description = Connect to Idle on EvTcpConnectFail >>, ] description = Redis-UVE:10.240.133.114:6379[None], Collector connection down >>, ] >>
08/01/2018 05:59:07 AM [contrail-analytics-api] [ERROR]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = rh-3:Analytics:contrail-analytics-api:0 client_info = << status = Idle successful_connections = 0 pid = 1 http_port = 8090 start_time = 1533117511843909 collector_name = collector_ip = 10.240.133.114:8086 collector_list = [ 10.240.133.114:8086, ] >> sm_queue_count = 1 max_sm_queue_count = 6 >>
08/01/2018 05:59:16 AM [contrail-analytics-api] [ERROR]: Session Event: TCP Connect Fail
08/01/2018 05:59:16 AM [contrail-analytics-api] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = rh-3 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Up description = Partitions:30 >>, << type = Zookeeper name = OpServer server_addrs = [ 10.240.133.114:2182, ] status = Up description = >>, << type = Redis-UVE name = 10.240.133.114:6379 server_addrs = [ 10.240.133.114:6379, ] status = Initializing >>, << type = Collector name = server_addrs = [ 10.240.133.114:8086, ] status = Initializing description = Idle to Connect on EvIdleHoldTimerExpired >>, ] description = Redis-UVE:10.240.133.114:6379[None], Collector connection down >>, ] >>
08/01/2018 05:59:16 AM [contrail-analytics-api] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = rh-3 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Up description = Partitions:30 >>, << type = Zookeeper name = OpServer server_addrs = [ 10.240.133.114:2182, ] status = Up description = >>, << type = Redis-UVE name = 10.240.133.114:6379 server_addrs = [ 10.240.133.114:6379, ] status = Initializing >>, << type = Collector name = server_addrs = [ 10.240.133.114:8086, ] status = Down description = Connect to Idle on EvTcpConnectFail >>, ] description = Redis-UVE:10.240.133.114:6379[None], Collector connection down >>, ] >>
08/01/2018 05:59:16 AM [contrail-analytics-api] [ERROR]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = rh-3:Analytics:contrail-analytics-api:0 client_info = << status = Idle successful_connections = 0 pid = 1 http_port = 8090 start_time = 1533117511843909 collector_name = collector_ip = 10.240.133.114:8086 collector_list = [ 10.240.133.114:8086, ] >> sm_queue_count = 1 max_sm_queue_count = 6 >>
08/01/2018 05:59:20 AM [contrail-analytics-api] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = rh-3 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Up description = Partitions:30 >>, << type = Zookeeper name = OpServer server_addrs = [ 10.240.133.114:2182, ] status = Up description = >>, << type = Redis-UVE name = 10.240.133.114:6379 server_addrs = [ 10.240.133.114:6379, ] status = Initializing >>, << type = Collector name = server_addrs = [ 10.240.133.114:8086, ] status = Initializing description = Idle to Connect on EvIdleHoldTimerExpired >>, ] description = Redis-UVE:10.240.133.114:6379[None], Collector connection down >>, ] >>
I tried setting ENABLE_SSL: True and did see the errors mentioned above, though contrail-status shows services to be UP. Assigning to Ananth to confirm if the logs seen are really errors or if they can be ignored.
%<----- ------- ------- analytics- alarm-gen running Up 15 minutes analytics- api running Up 15 minutes analytics- collector running Up 15 minutes analytics- query-engine running Up 15 minutes analytics- snmp-collector running Up 15 minutes analytics- topology running Up 15 minutes controller- config- api running Up 17 minutes controller- config- devicemgr running Up 17 minutes controller- config- schema running Up 17 minutes controller- config- svcmonitor running Up 17 minutes external- cassandra running Up 18 minutes external- rabbitmq running Up 18 minutes external- zookeeper running Up 18 minutes controller- control- control running Up 16 minutes controller- control- dns running Up 16 minutes controller- control- named running Up 16 minutes external- cassandra running Up 15 minutes external- kafka running Up 15 minutes external- zookeeper running Up 15 minutes controller- webui-job running Up 16 minutes controller- webui-web running Up 16 minutes
[srvr1] ~ # contrail-status
Pod Service Original Name State Status
analytics alarm-gen contrail-
analytics api contrail-
analytics collector contrail-
analytics nodemgr contrail-nodemgr running Up 15 minutes
analytics query-engine contrail-
analytics snmp-collector contrail-
analytics topology contrail-
config api contrail-
config device-manager contrail-
config nodemgr contrail-nodemgr running Up 17 minutes
config schema contrail-
config svc-monitor contrail-
config-database cassandra contrail-
config-database nodemgr contrail-nodemgr running Up 18 minutes
config-database rabbitmq contrail-
config-database zookeeper contrail-
control control contrail-
control dns contrail-
control named contrail-
control nodemgr contrail-nodemgr running Up 16 minutes
database cassandra contrail-
database kafka contrail-
database nodemgr contrail-nodemgr running Up 15 minutes
database zookeeper contrail-
webui job contrail-
webui web contrail-
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: initializing (Disk for DB is too low. )
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail database ==
kafka: active
nodemgr: initializing (Disk for DB is too low. )
zookeeper: ...