Contrail - API time>50 secs causing Connection reset

Bug #1676983 reported by Deepak Jeyaraman
This bug report is a duplicate of:  Edit Remove
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
New
Undecided
Sachin Bansal

Bug Description

Observing contrail api time > 50 secs - due to which there a lot of heat operations failure (heat V2)

Setup:

3 contrail control nodes, 16 compute nodes.

root@ccra-17:~# contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-analytics 3.0.2.0-51 51
contrail-config 3.0.2.0-51 51
contrail-config-openstack 3.0.2.0-51 51
contrail-control 3.0.2.0-51 51
contrail-dns 3.0.2.0-51 51
contrail-docs 3.0.2.0-51 51
contrail-f5 3.0.2.0-51 51
contrail-fabric-utils 3.0.2.0-51 51
contrail-heat 3.0.2.0-51 51

Test:

scale upto 500 vsrx instances at 50 per iteration.

Logs:

root@ccra-17:/var/log/contrail# grep -n socket contrail-api-0.log | grep POST
39336:97.0.0.17 - - [2017-03-28 08:53:25] "POST /ref-update HTTP/1.1" socket - 73.988813
39697:97.0.0.17 - - [2017-03-28 08:53:31] "POST /ref-update HTTP/1.1" socket - 79.441704
40257:97.0.0.17 - - [2017-03-28 08:53:43] "POST /ref-update HTTP/1.1" socket - 91.056502
40580:97.0.0.17 - - [2017-03-28 08:53:46] "POST /ref-update HTTP/1.1" socket - 91.616593
40832:97.0.0.17 - - [2017-03-28 08:53:48] "POST /ref-update HTTP/1.1" socket - 95.972355
41071:97.0.0.17 - - [2017-03-28 08:53:48] "POST /ref-update HTTP/1.1" socket - 94.105240
41420:97.0.0.17 - - [2017-03-28 08:53:52] "POST /ref-update HTTP/1.1" socket - 97.865286
41702:97.0.0.17 - - [2017-03-28 08:53:54] "POST /ref-update HTTP/1.1" socket - 96.856899
41992:97.0.0.17 - - [2017-03-28 08:53:57] "POST /ref-update HTTP/1.1" socket - 98.459944
42422:97.0.0.17 - - [2017-03-28 08:54:04] "POST /ref-update HTTP/1.1" socket - 100.484664
42749:97.0.0.17 - - [2017-03-28 08:54:08] "POST /ref-update HTTP/1.1" socket - 107.162964
43000:97.0.0.17 - - [2017-03-28 08:54:09] "POST /ref-update HTTP/1.1" socket - 102.370817
43271:97.0.0.17 - - [2017-03-28 08:54:09] "POST /ref-update HTTP/1.1" socket - 100.639601
43554:97.0.0.17 - - [2017-03-28 08:54:13] "POST /ref-update HTTP/1.1" socket - 106.021679
43807:97.0.0.17 - - [2017-03-28 08:54:13] "POST /ref-update HTTP/1.1" socket - 100.504785
44179:97.0.0.17 - - [2017-03-28 08:54:18] "POST /ref-update HTTP/1.1" socket - 97.419570
44460:97.0.0.17 - - [2017-03-28 08:54:20] "POST /ref-update HTTP/1.1" socket - 98.502710
44912:97.0.0.17 - - [2017-03-28 08:54:26] "POST /ref-update HTTP/1.1" socket - 79.879059
45161:97.0.0.17 - - [2017-03-28 08:54:26] "POST /ref-update HTTP/1.1" socket - 79.403091
45395:97.0.0.17 - - [2017-03-28 08:54:27] "POST /ref-update HTTP/1.1" socket - 77.856576
45634:97.0.0.17 - - [2017-03-28 08:54:27] "POST /ref-update HTTP/1.1" socket - 78.348658
45874:97.0.0.17 - - [2017-03-28 08:54:29] "POST /ref-update HTTP/1.1" socket - 79.848005
46177:97.0.0.17 - - [2017-03-28 08:54:32] "POST /ref-update HTTP/1.1" socket - 75.647759
46447:97.0.0.17 - - [2017-03-28 08:54:34] "POST /ref-update HTTP/1.1" socket - 62.705114
46708:97.0.0.17 - - [2017-03-28 08:54:34] "POST /ref-update HTTP/1.1" socket - 56.943514
112065:97.0.0.17 - - [2017-03-28 09:32:17] "POST /neutron/subnet HTTP/1.1" socket - 114.207590
117360:97.0.0.17 - - [2017-03-28 09:33:27] "POST /neutron/subnet HTTP/1.1" socket - 67.087183
118066:97.0.0.17 - - [2017-03-28 09:33:34] "POST /neutron/subnet HTTP/1.1" socket - 60.053293
125194:97.0.0.17 - - [2017-03-28 09:36:11] "POST /neutron/subnet HTTP/1.1" socket - 52.694634
131268:97.0.0.17 - - [2017-03-28 09:38:07] "POST /neutron/subnet HTTP/1.1" socket - 61.057328
149899:97.0.0.17 - - [2017-03-28 09:42:39] "POST /neutron/subnet HTTP/1.1" socket - 52.098861
151194:97.0.0.17 - - [2017-03-28 09:42:50] "POST /neutron/subnet HTTP/1.1" socket - 50.425593
root@ccra-17:/var/log/contrail#

127.0.0.1 - - [2017-03-28 09:42:50] "GET /virtual-networks?count=False&obj_uuids=8b304607-3a0f-4745-ae7f-aae0fc1270c4%2C82f2c82e-7143-400d-a1fc-496bcb3b8003%2C506c7efd-4e7c-4cfc-b8cd-c47c1bf5465d&detail=True HTTP/1.1" 200 42497 0.070993
127.0.0.1 - - [2017-03-28 09:42:50] "POST /useragent-kv HTTP/1.1" 200 172 0.008971
<pre>Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_interface.py", line 263, in plugin_get_subnet
    subnet_info = cfgdb.subnet_read(subnet['id'])
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 2325, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 2623, in subnet_read
    subnet_id=subnet_id)
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 194, in _raise_contrail_exception
    bottle.abort(400, json.dumps(exc_info))
  File "/usr/lib/python2.7/dist-packages/bottle.py", line 2310, in abort
    raise HTTPError(code, text)
HTTPError
</pre>
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 508, in handle_one_response
    self.run_application()
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 495, in run_application
    self.process_result()
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 486, in process_result
    self.write(data)
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 380, in write
    self._write_with_headers(data)
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 400, in _write_with_headers
    self._sendall(towrite)
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 355, in _sendall
    self.socket.sendall(data)
  File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 458, in sendall
    data_sent += self.send(_get_memory(data, data_sent), flags)
  File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 435, in send
    return sock.send(data, flags)
error: [Errno 104] Connection reset by peer

Revision history for this message
Deepak Jeyaraman (jdeepak) wrote :
information type: Proprietary → Public
description: updated
Revision history for this message
C K Ravi (ckravi) wrote :
Download full text (5.7 KiB)

We are still seeing timeouts in heat from vnc_api. Please see the log below.

2017-04-03 15:31:06.461 47874 INFO heat.engine.resource [-] CREATE: AttachPolicy "ns-entry" Stack "NSO-d10789f7-ba21-4a0e-bec0-0e974c463e2a" [edfa7c59-496d-4adf-9f91-e3ebd1892875]
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource Traceback (most recent call last):
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/heat/engine/resource.py", line 525, in _action_recorder
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource yield
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/heat/engine/resource.py", line 595, in _do_action
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource yield self.action_handler_task(action, args=handler_args)
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/heat/engine/scheduler.py", line 296, in wrapper
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource step = next(subtask)
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/heat/engine/resource.py", line 566, in action_handler_task
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource handler_data = handler(*args)
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/vnc_api/gen/heat/resources/attach_policy.py", line 86, in handle_create
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource 'network-policy', policy_obj.uuid, None, 'ADD', policy_order)
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py", line 39, in wrapper
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource return func(self, *args, **kwargs)
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py", line 871, in ref_update
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource content = self._request_server(rest.OP_POST, uri, data=json_body)
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py", line 712, in _request_server
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource retry_count=retry_count)
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource File "/usr/lib/python2.7/dist-packages/vnc_api/vnc_api.py", line 760, in _request
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource raise TimeOutError('Gateway Timeout 504')
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource TimeOutError: Timed out due to: Gateway Timeout 504
2017-04-03 15:31:06.461 47874 TRACE heat.engine.resource
2017-04-03 15:31:06.524 47874 DEBUG heat.engine.scheduler [-] Task update_task from Stack "NSO-d10789f7-ba21-4a0e-bec0-0e974c463e2a" [edfa7c59-496d-4adf-9f91-e3ebd1892875] sleeping _sleep /usr/lib/python2.7/dist-packages/heat/engine/scheduler.py:164
2017-04-03 15:31:06.931 47921 DEBUG neutronclient.client [req-bb8b8a53-fdb9-4e85-8885-eb7b8b4e4e89 e54f4104921a457d88835972125493ba e81dedd775c74c96939bd3d7f866fc65]
REQ: curl -i http://10.102.2...

Read more...

Revision history for this message
C K Ravi (ckravi) wrote :
Download full text (3.3 KiB)

 There are several exceptions seen in Contrail API. I ve captured the different types of errors seen below.

127.0.0.1 - - [2017-04-03 14:30:14] "GET /domains?count=False&obj_uuids=ab3f6aa6-41b7-49e0-b11f-bbb49a282168&detail=False HTTP/1.1" 200 280 0.006902
97.0.0.17 - - [2017-04-03 14:30:14] "GET /domain/ab3f6aa6-41b7-49e0-b11f-bbb49a282168?exclude_back_refs=True&exclude_children=True HTTP/1.1" 200 817 0.019539
127.0.0.1 - - [2017-04-03 14:30:14] "GET /virtual-machine-interface/a673ad75-489e-465e-b286-00648bc45b84?fields=logical_router_back_refs%2Cinstance_ip_back_refs%2Cfloating_ip_back_refs HTTP/1.1" 200 2267 0.007323
127.0.0.1 - - [2017-04-03 14:30:14] "GET /virtual-network/506c7efd-4e7c-4cfc-b8cd-c47c1bf5465d?exclude_back_refs=True&exclude_children=True HTTP/1.1" 200 2014 0.008068
127.0.0.1 - - [2017-04-03 14:30:14] "GET /instance-ip/fe6041f1-8b43-48d6-a322-f327c723dec6?exclude_back_refs=True&exclude_children=True HTTP/1.1" 200 1642 0.005853
97.0.0.17 - - [2017-04-03 14:30:14] "POST /neutron/port HTTP/1.1" 200 813 0.042795
127.0.0.1 - - [2017-04-03 14:30:14] "POST /fqname-to-id HTTP/1.1" 404 181 0.004663
<pre>Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_interface.py", line 1270, in plugin_get_virtual_router
    vr_info = cfgdb.virtual_router_read(virtual_router['id'])
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 2265, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 4156, in virtual_router_read
    vrouter_id=vrouter_id)
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 193, in _raise_contrail_exception

127.0.0.1 - - [2017-04-03 16:47:47] "PUT /virtual-network/fe05a704-940d-4599-9c63-a63be134ada1 HTTP/1.1" 409 167 0.041116
<pre>Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_interface.py", line 303, in plugin_delete_subnet
    cfgdb.subnet_delete(subnet['id'])
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 2265, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 2689, in subnet_delete
    self._virtual_network_update(net_obj)
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 306, in _virtual_network_update

---

127.0.0.1 - - [2017-04-03 19:31:35] "GET /service-instances?count=False&fields=logical_router_back_refs&detail=True HTTP/1.1" 200 272740 0.101832
97.0.0.17 - - [2017-04-03 19:31:35] "POST /neutron/port HTTP/1.1" 200 2475 0.241841
127.0.0.1 - - [2017-04-03 19:31:36] "POST /useragent-kv HTTP/1.1" 404 182 0.004875
<pre>Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_interface.py", line 263, in plugin_get_subnet
    subnet_info = cfgdb.subnet_read(subnet['id'])
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron_plugin_db.py", line 2265, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/vnc_openstack/neutron...

Read more...

Revision history for this message
C K Ravi (ckravi) wrote :
Download full text (10.6 KiB)

The ref-update operation from Heat engine is sent to Contrail API.
Contrail API succeeds in processing the request (doing the ref-update to ADD in the DB), but fails when it tries to send the response back to heat engine.
Now, Heat thinks that the operation has failed, and starts rollback. As a part of rollback it tries to delete the Virtual Network, but the operation fails in Contrail since the network policy reference exists.

Log from: root@ccra-14
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 508, in handle_one_response
    self.run_application()
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 495, in run_application
    self.process_result()
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 486, in process_result
    self.write(data)
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 380, in write
    self._write_with_headers(data)
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 400, in _write_with_headers
    self._sendall(towrite)
  File "/usr/lib/python2.7/dist-packages/gevent/pywsgi.py", line 355, in _sendall
    self.socket.sendall(data)
  File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 458, in sendall
    data_sent += self.send(_get_memory(data, data_sent), flags)
  File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 435, in send
    return sock.send(data, flags)
error: [Errno 104] Connection reset by peer
{'CONTENT_LENGTH': '261',
 'CONTENT_TYPE': 'application/json; charset="UTF-8"',
 'GATEWAY_INTERFACE': 'CGI/1.1',
 'HTTP_ACCEPT': '*/*',
 'HTTP_ACCEPT_ENCODING': 'gzip, deflate, compress',
 'HTTP_HOST': '97.0.0.40:8082',
 'HTTP_USER_AGENT': 'python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-85-generic',
 'HTTP_X_AUTH_TOKEN': '82f2a54312ab4e75a36efbd5d6c5e830',
 'HTTP_X_CONTRAIL_USERAGENT': 'ccra-16:/usr/bin/heat-engine',
 'HTTP_X_DOMAIN_ID': None,
 'HTTP_X_DOMAIN_NAME': None,
 'HTTP_X_IDENTITY_STATUS': 'Confirmed',
 'HTTP_X_PROJECT_DOMAIN_ID': 'default',
 'HTTP_X_PROJECT_DOMAIN_NAME': 'Default',
 'HTTP_X_PROJECT_ID': u'abd6287e1990448a8f5ea2e0adc1a2b2',
 'HTTP_X_PROJECT_NAME': u'admin',
 'HTTP_X_ROLE': u'admin',
 'HTTP_X_ROLES': u'admin’,
 'HTTP_X_SERVICE_CATALOG': '[{"endpoints": [{"adminURL": "http://10.102.28.148:8774/v2/abd6287e1990448a8f5ea2e0adc1a2b2", "region": "RegionOne", "internalURL": "http://10.102.28.148:8774/v2/abd6287e1990448a8f5ea2e0adc1a2b2", "id": "78147981addb42848dc381a986592411", "publicURL": "http://10.102.28.148:8774/v2/abd6287e1990448a8f5ea2e0adc1a2b2"}, {"adminURL": "http://10.102.29.60:8774/v2/abd6287e1990448a8f5ea2e0adc1a2b2", "region": "RegionTwo", "internalURL": "http://10.102.29.60:8774/v2/abd6287e1990448a8f5ea2e0adc1a2b2", "id": "43835e2dd07246da883aacd801fef302", "publicURL": "http://10.102.29.60:8774/v2/abd6287e1990448a8f5ea2e0adc1a2b2"}], "endpoints_links": [], "type": "compute", "name": "nova"}, {"endpoints": [{"adminURL": "http://10.102.28.148:9696/", "region": "RegionOne", "internalURL": "http://10.102.28.148:9696/", "id": "53b8a4ca9f1144db9233310590915f11", "publicURL": "http://10.102.28.148:9696/"}, {"adminURL": "http://10.102.29.60:9696/", "region": "R...

Rudra Rugge (rrugge)
Changed in juniperopenstack:
assignee: nobody → Sachin Bansal (sbansal)
milestone: none → r3.2.6.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.
  • Duplicate of a private bug Remove

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.