API server fails to come up properly with all connections after service restart

Bug #1440695 reported by Ritam Gangopadhyay
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Ritam Gangopadhyay
Trunk
Fix Committed
High
Ritam Gangopadhyay

Bug Description

Following a restart of API service - "service contrail-api restart" the api server fails to come up properly.

Logs are at - /cs-shared/test_runs/nodeb9/2015_04_02_06_45_23/logs/contrail/contrail-api-0-stdout.log
and /cs-shared/test_runs/nodeb9/2015_04_02_06_45_23/logs/contrail/contrail-api.log

10.204.216.2 - - [2015-04-02 10:31:58] "GET /virtual-machine/9297ed92-1cfc-4ee8-8ac2-edac9766e7e7 HTTP/1.1" 200 1147 0.007341
10.204.216.2 - - [2015-04-02 10:31:58] "GET /virtual-machine-interface/5f744b23-db1d-45de-8a91-2ceb4da3a82c HTTP/1.1" 200 2565 0.008655
localhost.localdomain - - [02/Apr/2015 10:32:17] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 3897
localhost.localdomain - - [02/Apr/2015 10:32:26] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 3897
/usr/lib/python2.6/site-packages/vnc_cfg_api_server/gen/vnc_api_server_gen.py:66: DeprecationWarning: object.__new__() takes no parameters
  obj = super(VncApiServerGen, cls).__new__(cls, *args, **kwargs)
04/02/2015 10:32:29 AM [nodeb9:contrail-api:Config:0]: SANDESH: CONNECT TO COLLECTOR: True
INFO:nodeb9:contrail-api:Config:0:SANDESH: CONNECT TO COLLECTOR: True
04/02/2015 10:32:29 AM [nodeb9:contrail-api:Config:0]: SANDESH: Logging: False -> 1
INFO:nodeb9:contrail-api:Config:0:SANDESH: Logging: False -> 1
04/02/2015 10:32:29 AM [nodeb9:contrail-api:Config:0]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
INFO:nodeb9:contrail-api:Config:0:SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
WARNING:keystoneclient.middleware.auth_token:Configuring auth_uri to point to the public identity endpoint is required; clients may not be able to authenticate against an admin endpoint
ERROR:nodeb9:contrail-api:Config:0:Starting Introspect on HTTP Port 8084
ERROR:nodeb9:contrail-api:Config:0:Cannot write http_port 8084 to /tmp/contrail-api.4721.http_port
WARNING:nodeb9:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Auth token fetched from keystone.
WARNING:nodeb9:contrail-api:Config:0:__default__ [SYS_NOTICE]: VncApiError: Connecting to ifmap on 10.204.216.2:8443 as api-server
Bottle v0.11.6 server starting up (using GeventServer())...
Listening on http://localhost:8095/
Hit Ctrl-C to quit.

ERROR:cfgm_common.ifmap.client:Uknown error sending IF-MAP message to server [Errno 104] Connection reset by peer

Tags: config
Changed in juniperopenstack:
assignee: nobody → Praneet Bachheti (praneetb)
information type: Proprietary → Public
tags: added: config
removed: api
Revision history for this message
Praneet Bachheti (praneetb) wrote :

I tried "service contrail-api restart" with R2.2-19
All services came up just fine.

I looked at the logs, it shows the api-server channel to ifmap was disconnected.
did you use ifmap-view to connect to ifmap-server?

Can you please try it again? and hold setup for debugging.

Revision history for this message
Ritam Gangopadhyay (ritam) wrote :

Please use nodeg6 which is in the issue reproduced state.

Revision history for this message
Praneet Bachheti (praneetb) wrote :

api-server is up in nodeg6

Revision history for this message
Ritam Gangopadhyay (ritam) wrote :

As mentioned in my earlier mails here are error logs seen in API stdout logs on nodeg6.

root@nodeg6:/var/log/contrail# grep -rn "cfgm_common.ifmap.client:Uknown error sending" ./
./contrail-api-0-stdout.log:1676:ERROR:cfgm_common.ifmap.client:Uknown error sending IF-MAP message to server
./contrail-api-0-stdout.log:1678:ERROR:cfgm_common.ifmap.client:Uknown error sending IF-MAP message to server [Errno 111] Connection refused
root@nodeg6:/var/log/contrail#

Since this setup - nodeg6 is lying in this state since 20th there might be a restart issued for API server. Moreover it is not that API server doesn't come up after the restart. It comes up, but fails in establishing proper connections.

nodeg6 setup is still available and logs from 20th May will show the issue.

Hi Ajay/Praneet,

        Can any of you please find some time to look into the setup. It was in problem state since 18th but was rebooted in between. So I re-created it today again.

        If you can take a look it would be a big help. Thanks.

Regards,
Ritam.

_____________________________________________
From: Ritam Gangopadhyay
Sent: Monday, May 18, 2015 8:26 PM
To: Hampapur Ajay; Praneet Bachheti
Cc: Sudheendra Rao; Nagabhushana R
Subject: 1440695 :: reproduction setup.

Hi Ajay/Praneet,

     I have nodeg6, a single node setup in the issue state. Please take a look when you find time.

root@nodeg6:~# cat /var/log/contrail/contrail-api* | grep "Uknown error sending IF-MAP message to server"
ERROR:cfgm_common.ifmap.client:Uknown error sending IF-MAP message to server
ERROR:cfgm_common.ifmap.client:Uknown error sending IF-MAP message to server [Errno 111] Connection refused
root@nodeg6:~#

Regards,
Ritam.

Revision history for this message
Ritam Gangopadhyay (ritam) wrote :

Closing the bug. Will change the script to look into introspect for ifmap connection status and sleep for 60 sec as per advise from Prakash and Praneet.

http://nodeg6.englab.juniper.net:8084/Snh_SandeshUVECacheReq?x=NodeStatus

Ritam,
   Depends what your test is trying to achieve. You could wait for ifmap to restart and also have a higher timeout limit of lets say 60 seconds.

Thanks,
- Praneet

From: Ritam Gangopadhyay <email address hidden>
Date: Thursday, May 28, 2015 at 11:52 AM
To: Praneet Bachheti <email address hidden>
Cc: Sudheendra Rao <email address hidden>, Nagabhushana R <email address hidden>
Subject: RE: Bug 1440695

Hi Praneet,

      In a test script we are doing this sequence:-
1. Vrouter-agent service restart
2. Contrail control service restart
3. Contrail Api service restart

    This is followed by a sleep of 10 sec. Now if api restart results in ifmap restart as well then how can we make sure that every thing has synced and connections are up across the daemons after they come up. What is that, we should check for instead of just sleeping for a fixed no of secs. If there is some pointer it would be of great help. I would close the bug with this inference.

Thanks a lot,
Regards,
Ritam.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11369
Submitter: Ritam Gangopadhyay (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11368
Submitter: Ritam Gangopadhyay (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11368
Committed: http://github.org/Juniper/contrail-test/commit/5032bc6b8815724794e9fff8a46d6d4f8f0f3e9a
Submitter: Zuul
Branch: master

commit 5032bc6b8815724794e9fff8a46d6d4f8f0f3e9a
Author: Ritam Gangopadhyay <email address hidden>
Date: Mon Jun 8 15:40:20 2015 +0530

Closes-Bug: 1440695 Adding connection verifications.

Adding connection verifications after agent, control and api
server restart to check everything has come up properly before
proceeding with the test.

Change-Id: I4bda13e6f2eb43f00264731e2de06e826de670d4

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11369
Committed: http://github.org/Juniper/contrail-test/commit/ddecbc95745190dacebcdfdf210fcccab060ca55
Submitter: Zuul
Branch: R2.20

commit ddecbc95745190dacebcdfdf210fcccab060ca55
Author: Ritam Gangopadhyay <email address hidden>
Date: Mon Jun 8 15:40:20 2015 +0530

Closes-Bug: 1440695 Adding connection verifications.

Adding connection verifications after agent, control and api
server restart to check everything has come up properly before
proceeding with the test.

Change-Id: I4bda13e6f2eb43f00264731e2de06e826de670d4

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.