contrail-schema unable to clean old routing instance objects

Bug #1665486 reported by Piyush Srivastava
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.21.x
Fix Committed
Undecided
Sachin Bansal
R3.0
Fix Committed
Undecided
Unassigned
R3.1
Fix Committed
Undecided
Unassigned
R3.2
Fix Committed
Undecided
Unassigned
Trunk
Fix Committed
Undecided
Unassigned
OpenContrail
Fix Committed
Undecided
Unassigned

Bug Description

Contrail version: 2.21.3-56

After reboot of contrail services, it looks like contrail-schema tries to clean up stale routing instance objects. We were seeing a lot of errors related to schema not able to delete routing instance objects. The impact
of this issue is that new virtual machine interfaces are not receiving
a routing instance object.

02/10/2017 08:54:41 PM [contrail-schema]: Error while deleting routing instance default-domain:wd5-ttint.az2.eng.pdx.wd:e2:e2: HTTP Status: 500 Content: Internal Server Error
02/10/2017 08:54:43 PM [contrail-schema]: Error while deleting routing instance default-domain:wd5-ttprod.az2.eng.pdx.wd:e2:e2: HTTP Status: 500 Content: Internal Server Error
02/10/2017 08:59:34 PM [contrail-schema]: Starting Introspect on HTTP Port 8087
02/10/2017 08:59:34 PM [contrail-schema]: Cannot write http_port 8087 to /tmp/contrail-schema.2826.http_port
02/10/2017 08:59:39 PM [contrail-schema]: Error while deleting routing instance default-domain:wd5-ttint.az2.eng.pdx.wd:e2:e2: HTTP Status: 500 Content: Internal Server Error
02/10/2017 08:59:39 PM [contrail-schema]: Error while deleting routing instance default-domain:wd5-ttprod.az2.eng.pdx.wd:e2:e2: HTTP Status: 500 Content: Internal Server Error

On closer inspection, we found out that the routing instance objects had 'fq_name' attribute missing which caused schema to throw exceptions and crash. As a side effect of this, tap interfaces for new VMs on openstack were not receiving a vrf and showing in ERROR state. To work around this problem we added the following patch to

/usr/lib/python2.6/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py

1313 obj_dict = self._db_conn.uuid_to_obj_dict(uuid)
1314 if 'fq_name' not in obj_dict: # patched line
1315 return (True, '') # patched line
1316 parent_fq_name = json.loads(obj_dict['fq_name'])[:-1]
1317 try:
1318 parent_uuid = self._db_conn.fq_name_to_uuid(
                                                             1319 parent_type, parent_fq_name)
1320 except NoIdError:

Why are the routing instance getting into a corrputed state and what is the proper fix for this issue?

Tags: config
description: updated
Revision history for this message
Sachin Bansal (sbansal) wrote :

Already fixed in 3.0 onwards with this: https://review.opencontrail.org/13258
Will back port relevant part of the fix to R2.21.x

Changed in opencontrail:
status: New → Fix Committed
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/29365
Submitter: Sachin Bansal (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29365
Committed: http://github.org/Juniper/contrail-controller/commit/2334fde1adde933fbd6021388d850ad2f18032ee
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit 2334fde1adde933fbd6021388d850ad2f18032ee
Author: Sachin Bansal <email address hidden>
Date: Mon Mar 6 15:43:10 2017 -0800

Do not use fq_name from obj_dict for parent_fq_name

We already know the fq_name of the object. From that, we can
identify parent_fq_name. There is no need to rely on fq_name
being present in obj_dict.

Change-Id: I0ef61e1e481dfc7bea29c2e94c9825200511d403
Closes-Bug: 1665486

Jeba Paulaiyan (jebap)
tags: added: config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.