[vDNS]: Records lost on named restart if scaled configuration is present

Bug #1583566 reported by Pulkit Tandon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Nipa
R2.21.x
Fix Committed
High
Nipa
R2.22.x
Fix Committed
High
Nipa
R3.0
Fix Committed
High
Nipa
R3.0.2.x
Fix Committed
High
Nipa
Trunk
Fix Committed
High
Nipa

Bug Description

BUG Template : Ubuntu

OS version: 3.13.0-40-generic #69-Ubuntu
Contrail Version: 3.0.2.0-38

Setup details:
Multi node setup.
Testbed file attached

Configurations and Description:
As I was having 4 control nodes, I made contrail-dns on 2 control nodes as permanently down to focus the test on rest of the 2 nodes only.
Nodes in test : nodeg12 and nodec28

Step 1: Configured 1 VDNS server
Step 2: Configured 5000 records for that server.
Step 3: Verified that all records are present in zone file for that domain.
Step 4: Stop and Start the contrail-named of nodec28
Step 5: Verified after sometime on node28. Only 1082 records were recovered and rest of the records were lost.
Step 6: Did the same test on nodeg12. The test passed on that node. All 5000 records recovered. (May be as the node types are different.)

Logs:
Server : <email address hidden>
Path: /home/bhushana/Documents/technical/bugs/<bug-id>

Tags: vdns
Revision history for this message
Pulkit Tandon (pulkitt) wrote :
Revision history for this message
Nipa (nipak) wrote :

We will implement an exponential backoff to retry record updates from agent.

Pulkit Tandon (pulkitt)
summary: - [vDNS0]: Records lost on named restart if scaled configuration is
- present
+ [vDNS]: Records lost on named restart if scaled configuration is present
Jeba Paulaiyan (jebap)
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/20645
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
Nipa (nipak) wrote :

Loss of records as named is not able to handle the bulk send from contrail-dns. We will rate-limit the updates/sec based on the number of hardware thread count which will be equivalent to number of udp listen interfaces on named.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/20645
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
Nipa (nipak) wrote :

Rate-Limit is set to = No of CPU Threads/sec eg: on a 32 core CPU - it is 32 DNS updates/sec.
Experiments shows draining of 64000 records takes about 1 hour : 6 mins completed within 9 retries.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/20645
Committed: http://github.org/Juniper/contrail-controller/commit/397e9e0786a93fd9eec34e9215243b9b5ad461f1
Submitter: Zuul
Branch: master

commit 397e9e0786a93fd9eec34e9215243b9b5ad461f1
Author: Nipa Kumar <email address hidden>
Date: Mon May 23 16:42:13 2016 -0700

Send DNS records to named in batches instead of bulk send.

DNS record send and retries to named are done in a burst and does not succeed when it
reaches named limit of handling DNS update rates.

DNS records are now bunched in group sizes equivalent to hardware threads
and sent at periodical intervals (default = 1000msec).
DNS Update Rate = No of Hardware Threads/sec
DNS Update Rate is maintained constant.

o Added introspect command to Snh_ShowBindPendingList to show list of pending records
that have failed the configurable number of retries.
o Added end-of-config from IfMapServer to indicate sync to oper DB and initiate
writes to contrail-named.
o New config knobs are added to tune DNS Record send to named
o DEFAULT.named_max_retransmissions - Maximum number of retries to send records to named
( default set to 12 retries)
o DEFAULT.named_retransmission_interval - Retransmission interval in msec
( default set to 1sec)
o Also DEFAULT.named_max_cache_size is set to 32M as this is purely for caching queries.

Change-Id: I9f23f9d2208fbd8dfd92e0ad737762ed50d762ba
Closes-Bug:1583566
Closes-Bug:1574454

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/21313
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/21314
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/21315
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/21316
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/21464
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/21313
Committed: http://github.org/Juniper/contrail-controller/commit/0181096058428ab3d79da0144c8b92aa136c980a
Submitter: Zuul
Branch: R3.0

commit 0181096058428ab3d79da0144c8b92aa136c980a
Author: Nipa Kumar <email address hidden>
Date: Mon May 23 16:42:13 2016 -0700

Send DNS records to named in batches instead of bulk send.

DNS record send and retries to named are done in a burst and does not succeed when it
reaches named limit of handling DNS update rates.

DNS records are now bunched in group sizes equivalent to hardware threads
and sent at periodical intervals (default = 1000msec).
DNS Update Rate = No of Hardware Threads/sec
DNS Update Rate is maintained constant.

o Added introspect command to Snh_ShowBindPendingList to show list of pending records
that have failed the configurable number of retries.
o Added end-of-config from IfMapServer to indicate sync to oper DB and initiate
writes to contrail-named.
o New config knobs are added to tune DNS Record send to named
o DEFAULT.named_max_retransmissions - Maximum number of retries to send records to named
( default set to 12 retries)
o DEFAULT.named_retransmission_interval - Retransmission interval in msec
( default set to 1sec)
o Also DEFAULT.named_max_cache_size is set to 32M as this is purely for caching queries.

Change-Id: I9f23f9d2208fbd8dfd92e0ad737762ed50d762ba
Closes-Bug:1583566
Closes-Bug:1574454

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/21314
Committed: http://github.org/Juniper/contrail-controller/commit/1dedfa65a3f3863d7b2769560c41f51412aa1701
Submitter: Zuul
Branch: R2.20

commit 1dedfa65a3f3863d7b2769560c41f51412aa1701
Author: Nipa Kumar <email address hidden>
Date: Mon May 23 16:42:13 2016 -0700

Send DNS records to named in batches instead of bulk send.

DNS record send and retries to named are done in a burst and does not succeed when it
reaches named limit of handling DNS update rates.

DNS records are now bunched in group sizes equivalent to hardware threads
and sent at periodical intervals (default = 1000msec).
DNS Update Rate = No of Hardware Threads/sec
DNS Update Rate is maintained constant.

o Added introspect command to Snh_ShowBindPendingList to show list of pending records
that have failed the configurable number of retries.
o Added end-of-config from IfMapServer to indicate sync to oper DB and initiate
writes to contrail-named.
o New config knobs are added to tune DNS Record send to named
o DEFAULT.named_max_retransmissions - Maximum number of retries to send records to named
( default set to 12 retries)
o DEFAULT.named_retransmission_interval - Retransmission interval in msec
( default set to 1sec)
o Also DEFAULT.named_max_cache_size is set to 32M as this is purely for caching queries.

Change-Id: I9f23f9d2208fbd8dfd92e0ad737762ed50d762ba
Closes-Bug:1583566
Closes-Bug:1574454

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/21464
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/21315
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/21464
Committed: http://github.org/Juniper/contrail-controller/commit/f87a53f4697efdb489e2d985372bb4252452b8c3
Submitter: Zuul
Branch: R2.21.x

commit f87a53f4697efdb489e2d985372bb4252452b8c3
Author: Nipa Kumar <email address hidden>
Date: Mon Jun 27 15:14:58 2016 -0700

Send DNS records to named in batches instead of bulk send.

DNS record send and retries to named are done in a burst and does not succeed when it
reaches named limit of handling DNS update rates.

DNS records are now bunched in group sizes equivalent to hardware threads
and sent at periodical intervals (default = 1000msec).
DNS Update Rate = No of Hardware Threads/sec
DNS Update Rate is maintained constant.

o Added introspect command to Snh_ShowBindPendingList to show list of pending records
that have failed the configurable number of retries.
o Added end-of-config from IfMapServer to indicate sync to oper DB and initiate
writes to contrail-named.
o New config knobs are added to tune DNS Record send to named
o DEFAULT.named_max_retransmissions - Maximum number of retries to send records to named
( default set to 12 retries)
o DEFAULT.named_retransmission_interval - Retransmission interval in msec
( default set to 1sec)
o Also DEFAULT.named_max_cache_size is set to 32M as this is purely for caching queries.

Change-Id: Ie489e2d3014954f2990b7d8f50ace582d541360d
Closes-Bug:1583566
Closes-Bug:1574454

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/21315
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.2.x

Review in progress for https://review.opencontrail.org/21625
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/21625
Committed: http://github.org/Juniper/contrail-controller/commit/9ff8b81153c627ab314626b7367950c4a1992a4b
Submitter: Zuul
Branch: R3.0.2.x

commit 9ff8b81153c627ab314626b7367950c4a1992a4b
Author: Nipa Kumar <email address hidden>
Date: Mon May 23 16:42:13 2016 -0700

Send DNS records to named in batches instead of bulk send.

DNS record send and retries to named are done in a burst and does not succeed when it
reaches named limit of handling DNS update rates.

DNS records are now bunched in group sizes equivalent to hardware threads
and sent at periodical intervals (default = 1000msec).
DNS Update Rate = No of Hardware Threads/sec
DNS Update Rate is maintained constant.

o Added introspect command to Snh_ShowBindPendingList to show list of pending records
that have failed the configurable number of retries.
o Added end-of-config from IfMapServer to indicate sync to oper DB and initiate
writes to contrail-named.
o New config knobs are added to tune DNS Record send to named
o DEFAULT.named_max_retransmissions - Maximum number of retries to send records to named
( default set to 12 retries)
o DEFAULT.named_retransmission_interval - Retransmission interval in msec
( default set to 1sec)
o Also DEFAULT.named_max_cache_size is set to 32M as this is purely for caching queries.

Change-Id: I9f23f9d2208fbd8dfd92e0ad737762ed50d762ba
Closes-Bug:1583566
Closes-Bug:1574454

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/22047
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/22047
Committed: http://github.org/Juniper/contrail-controller/commit/3661c074d691e8b979a90f2925fb05ebf2f2e160
Submitter: Zuul
Branch: R2.22.x

commit 3661c074d691e8b979a90f2925fb05ebf2f2e160
Author: Nipa Kumar <email address hidden>
Date: Mon May 23 16:42:13 2016 -0700

Send DNS records to named in batches instead of bulk send.

DNS record send and retries to named are done in a burst and does not succeed when it
reaches named limit of handling DNS update rates.

DNS records are now bunched in group sizes equivalent to hardware threads
and sent at periodical intervals (default = 1000msec).
DNS Update Rate = No of Hardware Threads/sec
DNS Update Rate is maintained constant.

o Added introspect command to Snh_ShowBindPendingList to show list of pending records
that have failed the configurable number of retries.
o Added end-of-config from IfMapServer to indicate sync to oper DB and initiate
writes to contrail-named.
o New config knobs are added to tune DNS Record send to named
o DEFAULT.named_max_retransmissions - Maximum number of retries to send records to named
( default set to 12 retries)
o DEFAULT.named_retransmission_interval - Retransmission interval in msec
( default set to 1sec)
o Also DEFAULT.named_max_cache_size is set to 32M as this is purely for caching queries.

Change-Id: I9f23f9d2208fbd8dfd92e0ad737762ed50d762ba
Closes-Bug:1583566
Closes-Bug:1574454

vDNS limits to 64k pending records.

o Implement lo and hi watermark to allow draining of vDNS records.
Default hi water-mark=32k and low water-mark=8k
o Use bitset vector to keep tab of allocated DNS transaction ids.

Change-Id: Iaf1520685dde627670a5b5f25dba1679be80385c
Closes-Bug:1593895

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.