Bug #1719236 “Contrail analytics response time varies based on t...” : Bugs : Juniper Openstack

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-09-25:

#1

sv-25_log-large_sv-24_down.tar.gz Edit (3.7 MiB, application/x-tar)

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-09-25:

#2

sv-26_log-large_sv-24_down.tar.gz Edit (3.9 MiB, application/x-tar)

vijaya kumar shankaran (vijayks) on 2017-09-25

information type:	Proprietary → Private
information type:	Private → Public

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-10-05:

#3

Hi,

Any Update?

Best Regards,
Vijay Kumar

Revision history for this message

Sundaresan Rajangam (srajanga) wrote on 2017-10-06: Re: [Bug 1719236] Contrail analytics response time varies based on the number of VN/VMI when one of the control node fails

#4

Download full text (5.2 KiB)

Hi Vijay,

Can you please confirm the ubuntu version?
UVE aggregation doesn’t use kafka if the ubuntu version is 12.X
So, I need to know the ubuntu version to look at the right code path.

Thanks,
Sundar
> On Oct 4, 2017, at 8:24 PM, vijaya kumar shankaran <email address hidden> wrote:
>
> Hi,
>
> Any Update?
>
> Best Regards,
> Vijay Kumar
>
> --
> You received this bug notification because you are a member of Contrail
> Systems engineering, which is subscribed to Juniper Openstack.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_bugs_1719236&d=DwIFaQ&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=LPHaOrEhcHUkaXTIgszI3jGHWJ2DkgIMvg2FajOezdI&m=hTKGHt_jLdtJnSnpw5RdXV3SE5w0_3VAg1LLTf7jFjo&s=vuECLcAMoAuQ6Iru8-E9Zcps8Bmg82Dw53EUy7hwJK8&e=
>
> Title:
> Contrail analytics response time varies based on the number of VN/VMI
> when one of the control node fails
>
> Status in Juniper Openstack:
> New
>
> Bug description:
> Customer is testing analytics response time when one of the control
> node fails. Response time varies based on the number of VN and VMI’s.
> Greater the number VN’s & VMI it takes longer for the response.
>
> Customer setup is ass below
> 3 Control, config
> 3 collector
> 3 DB
> 1 openstack
> 6 compute nodes
> 2 TSN nodes
>
>
> /etc/contrail/contrail-vrouter-agent.conf is modified to point to collector nodes on each compute node.
> Customer has provided scripts to create VN & VMI and to query the analytics. They shutdown one of the control node and note down the time. They see a large difference in correct response for the analytics queries based on the number of VN’s and VMI
> VN VMI Response time
> 303 600 5 Sec
> 1500 3000 50 secs
> 3000 6000 approx 2 min With one control node shutdown
>
> The above delta time doubles when two control nodes are shutdown.
> Is this intended behavior?
> Why is this difference noticed in clustered scenario when collector nodes stop responding (to replicate the nodes are shutdown). The DB nodes are all up and running when performing this test.
> Can the response time be reduced & consistent irrespective number of interfaces.
>
> I could replicate the issue in lab up to 1500 VN and 3000 VMI. Due to
> resource constraints couldn’t scale this higher.
>
> When querying fro VMI we were getting Http 200 K as response but
> nothing pertaining to interface or network (output of script)
>
> Valid response
> 10.204.74.242:8081 default-domain:mock:vmi_ntt-comp5_0100_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
>
> Invalid response
> 10.204.74.242:8081 default-domain:mock:vmi_ntt-comp6_0100_01 200 {}
>
>
> From LogsFrom contrail-alarm-gen.log sv-25_log-large_sv-24_down
>
> 09/05/2017 11:46:02 AM [contrail-alarm-gen]: -uve-3 An exception of
> type LeaderNotAvailableError occured. Arguments:
>
> LeaderNotAvailableError: LeaderNotAvailableError - 5 - This error is
> thrown if we are in the middle of a leadership election and there is
> currently no leader for this partition and hence it is unavailable for
> writes...

Hi Vijay,

Can you please confirm the ubuntu version?
UVE aggregation doesn’t use kafka if the ubuntu version is 12.X
So, I need to know the ubuntu version to look at the right code path.

Thanks, 
Sundar
> On Oct 4, 2017, at 8:24 PM, vijaya kumar shankaran <1719236@bugs.launchpad.net> wrote:
> 
> Hi,
> 
> Any Update?
> 
> Best Regards,
> Vijay Kumar
> 
> -- 
> You received this bug notification because you are a member of Contrail
> Systems engineering, which is subscribed to Juniper Openstack.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_bugs_1719236&d=DwIFaQ&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=LPHaOrEhcHUkaXTIgszI3jGHWJ2DkgIMvg2FajOezdI&m=hTKGHt_jLdtJnSnpw5RdXV3SE5w0_3VAg1LLTf7jFjo&s=vuECLcAMoAuQ6Iru8-E9Zcps8Bmg82Dw53EUy7hwJK8&e= 
> 
> Title:
>  Contrail analytics response time varies based on the number of VN/VMI
>  when one of the control node fails
> 
> Status in Juniper Openstack:
>  New
> 
> Bug description:
>  Customer is testing analytics response time when one of the control
>  node fails. Response time varies based on the number of VN and VMI’s.
>  Greater the number VN’s & VMI it takes longer for the response.
> 
>  Customer setup is ass below
>  3 Control, config
>  3 collector
>  3 DB
>  1 openstack 
>  6 compute nodes
>  2 TSN nodes
> 
> 
>  /etc/contrail/contrail-vrouter-agent.conf is modified to point to collector nodes on each compute node.
>  Customer has provided scripts to create VN & VMI and to query the analytics. They shutdown one of the control node and note down the time. They see a large difference in correct response for the analytics queries based on the number of VN’s and VMI
>  VN		VMI		Response time
>  303		600		5 Sec
>  1500		3000		50 secs
>  3000		6000		approx 2 min		With one control node shutdown
> 
>  The above delta time doubles when two control nodes are shutdown.
>  Is this intended behavior? 
>  Why is this difference noticed in clustered scenario when collector nodes stop responding (to replicate the nodes are shutdown). The DB nodes are all up and running when performing this test. 
>  Can the response time be reduced & consistent irrespective number of interfaces.
> 
>  I could replicate the issue in lab up to 1500 VN and 3000 VMI. Due to
>  resource constraints couldn’t scale this higher.
> 
>   When querying fro VMI we were getting Http 200 K as response but
>  nothing pertaining to interface or network  (output of script)
> 
>  Valid response
>  10.204.74.242:8081   default-domain:mock:vmi_ntt-comp5_0100_01   200      {"UveVMInterfaceAgent": {"ip6_active": f
> 
>  Invalid response
>  10.204.74.242:8081   default-domain:mock:vmi_ntt-comp6_0100_01   200      {}
> 
> 
>  From LogsFrom contrail-alarm-gen.log  sv-25_log-large_sv-24_down
> 
>  09/05/2017 11:46:02 AM [contrail-alarm-gen]: -uve-3 An exception of
>  type LeaderNotAvailableError occured. Arguments:
> 
>  LeaderNotAvailableError: LeaderNotAvailableError - 5 - This error is
>  thrown if we are in the middle of a leadership election and there is
>  currently no leader for this partition and hence it is unavailable for
>  writes.
> 
>  09/05/2017 11:46:07 AM [contrail-alarm-gen]: -uve-23 An exception of type LeaderNotAvailableError occured. Arguments:
>  LeaderNotAvailableError: LeaderNotAvailableError - 5 - This error is thrown if we are in the middle of a leadership election and there is currently no leader for this partition and hence it is unavailable for writes.
> 
>  09/05/2017 11:46:07 AM [contrail-alarm-gen]: Error: Consumer Failure LeaderNotAvailableError occured. Arguments:
>  LeaderNotAvailableError: LeaderNotAvailableError - 5 - This error is thrown if we are in the middle of a leadership election and there is currently no leader for this partition and hence it is unavailable for writ
> 
>  09/05/2017 11:51:00 AM [contrail-alarm-gen]: redis-uve failed Error connecting to 192.168.0.124:6379. timed out. for key ObjectVRouter:sv-39: (u'192.168.0.124', 6379, 1149) tb Traceback (most recent call last):
>  ConnectionError: Error connecting to 192.168.0.124:6379. timed out.
> 
> 
>  09/05/2017 11:51:00 AM [contrail-alarm-gen]: redis-uve failed Error connecting to 192.168.0.124:6379. timed out. for key ObjectGeneratorInfo:sv-21:Control:contrail-dns:0: (u'192.168.0.124', 6379, 1149) tb Traceback (most recent call last):
> 
>  ConnectionError: Error connecting to 192.168.0.124:6379. timed out.
> 
>  09/05/2017 11:51:00 AM [contrail-alarm-gen]: Exception KeyError in notif worker. Arguments:
>  ((u'192.168.0.124', 6379, 1149),) : traceback Traceback (most recent call last):
> 
> 
>  09/05/2017 11:51:04 AM [contrail-alarm-gen]: -uve-12 An exception of type KeyError occured. Arguments:
> 
> 
>  09/05/2017 11:51:40 AM [contrail-alarm-gen]: Starting part 2
>  collectors [u'192.168.0.126:6379', u'192.168.0.125:6379']
> 
> To manage notifications about this bug go to:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_juniperopenstack_-2Bbug_1719236_-2Bsubscriptions&d=DwIFaQ&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=LPHaOrEhcHUkaXTIgszI3jGHWJ2DkgIMvg2FajOezdI&m=hTKGHt_jLdtJnSnpw5RdXV3SE5w0_3VAg1LLTf7jFjo&s=Sc6qHGHmHO6jxZXFet4GwQWQMIJKh9AxZ4g3wZcELyk&e=

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-10-24:

#5

Hi Sundar,

Customer is testing this on Ubuntu 14.04.5.

Best Regards,
Vijay Kumar

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-01:

#6

Hi Sundar,

Could you please provide us an update?

Best Regards,
Vijay Kumar

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-06:

#7

Hi Sundar,

Any updates?

Best Regards,
Vijay Kumar

Sundaresan Rajangam (srajanga) on 2017-11-06

Changed in juniperopenstack:
assignee:	nobody → Arvind (arvindv)

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-08:

#8

Hi Arvind,

This is one of the long pending issue and customer is looking forward for an update.

Could you please provide us an update or an ETA?

Best Regards,
Vijay Kumar

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-13:

#9

Hi Arvind,

Any updates?

Best Regards,
Vijay Kumar

Revision history for this message

Arvind (arvindv) wrote on 2017-11-14:

#10

Hi Vijaya Kumar,

sorry for the delayed response.
Trying to understand the setup and queries issued more...

1) Are we having control and config on the same node or are they on different nodes?
2) Can you explain a bit about the script they are using to query analytics ?
We want to understand how are the queries being issued. Are the issuing individual GET request
for the VN and VMI and then timing the entire operation. Do they wait for individual requests to succeed and then issue the newer one ? Are they trying to determine the time taken to issue (300,600) GET requests vs (1500,3000) GET requests vs (3000, 6000) GET requests. [If there are more API requests more time will be taken, so can u clarify if my understanding of the issue is wrong]

3) Do they give time after each of the control-node shutdown and issuing query against analytics
This can be checked by looking at the redis.log in analytics node and make sure no deletes are happening before issuing the query against analytics.

4) Also the messages u have reported are unrelated to the issue reported by the customer. You are experiencing connectivity issues in analytics to redis. Before trying to issue queries, make sure contrail-status on analytics is ok.
Thanks
Arvind

Revision history for this message

Arvind (arvindv) wrote on 2017-11-14:

#11

can u please let us know the contrail-version

vijaya kumar shankaran (vijayks) on 2017-11-15

description:

updated

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-15:

#12

Download full text (11.6 KiB)

Hi Arvind,

Please find answers for your queries inline

1) Are we having control and config on the same node or are they on different nodes?
yes (attaching testbed.py for your reference)

2) Can you explain a bit about the script they are using to query analytics ?
We want to understand how are the queries being issued. Are the issuing individual GET request
for the VN and VMI and then timing the entire operation. Do they wait for individual requests to succeed and then issue the newer one ? Are they trying to determine the time taken to issue (300,600) GET requests vs (1500,3000) GET requests vs (3000, 6000) GET requests. [If there are more API requests more time will be taken, so can u clarify if my understanding of the issue is wrong]
Customer has created script get_port_uve.py to query against the api servers.

Customer runs the query as the following parameter
python get_port_uve.py --api_ips 10.0.0.100:8081 10.0.0.124:9081 10.0.0.125:9081 10.0.0.126:9081 --search 0 0500_01

In the above example
10.0.0.124:8081 sv-24 Collector
10.0.0.125:9081 sv-25 Collector
10.0.0.126:9081 sv-26 Collector

Customer has created dummy VN amd VMI. VMI (port) is created with following syntax

vmi_<hostname>_<VN id)_<port id>

In the above command customer is searching for VMI ending with port 0500_01. This is a sequential search.

Output of the script and the issue is as mentioned in the output below
1) The last time when all 3 nodes respond with non-zero-length json contents.
2) The time when remaining 2 nodes start to respond with non-zero-length json contents stably.

For example, in case of large_sv-24_down.log, the timestamp of 1) before I shutdown sv-24 was 11:49:57.

-------------------------------------------------------------------------------------------------------------
virtual-machine-interfaces at 2017/09/05 11:49:57
-------------------------------------------------------------------------------------------------------------
node object status result
---------------- --------------------------------------- ------- -----------------------------------------
10.0.0.100:8081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active":...

Hi Arvind,

Please find answers for your queries inline

1) Are we having control and config on the same node or are they on different nodes?
yes (attaching testbed.py for your reference)

2) Can you explain a bit about the script they are using to query analytics ?
We want to understand how are the queries being issued. Are the issuing individual GET request
for the VN and VMI and then timing the entire operation. Do they wait for individual requests to succeed and then issue the newer one ? Are they trying to determine the time taken to issue (300,600) GET requests vs (1500,3000) GET requests vs (3000, 6000) GET requests. [If there are more API requests more time will be taken, so can u clarify if my understanding of the issue is wrong]
Customer has created script get_port_uve.py to query against the api servers.

Customer runs the query as the following parameter
python get_port_uve.py --api_ips 10.0.0.100:8081 10.0.0.124:9081 10.0.0.125:9081 10.0.0.126:9081 --search 0 0500_01

In the above example
10.0.0.124:8081  sv-24  Collector
10.0.0.125:9081	 sv-25  Collector
10.0.0.126:9081  sv-26  Collector

Customer has created dummy VN amd VMI. VMI (port) is created with following syntax

vmi_<hostname>_<VN id)_<port id>

In the above command customer is searching for VMI ending with port 0500_01. This is a sequential search.

Output of the script and the issue is as mentioned in the output below
1) The last time when all 3 nodes respond with non-zero-length json contents.
2) The time when remaining 2 nodes start to respond with non-zero-length json contents stably.
 
For example, in case of large_sv-24_down.log, the timestamp of 1) before I shutdown sv-24 was 11:49:57.
 
-------------------------------------------------------------------------------------------------------------
virtual-machine-interfaces at 2017/09/05 11:49:57
-------------------------------------------------------------------------------------------------------------
node object status result
---------------- --------------------------------------- ------- -----------------------------------------
10.0.0.100:8081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
 
After shutting down sv-24 (10.0.0.124), request to that collector node timed out. This is expected and normal.
However, request for some UVEs toward VIP (10.0.0.100) or remaining collectors (10.0.0.125/10.0.0.125) also timed out or result in zero-length json body for a while.
From the analytics API point of view, this response is not collect and I count this as downtime.
 
---------------------------------------------------------------------------------------------------------------------------------
virtual-machine-interfaces at 2017/09/05 11:52:22
---------------------------------------------------------------------------------------------------------------------------------
node object status result
---------------- --------------------------------------- ------- -------------------------------------------------------------
10.0.0.100:8081 default-domain:mock6:vmi_sv-39_0500_01 200 {} <<<<< This is not expected.
10.0.0.100:8081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock6:vmi_sv-39_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock3:vmi_sv-36_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock4:vmi_sv-37_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock1:vmi_sv-34_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock2:vmi_sv-35_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock5:vmi_sv-38_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.125:9081 default-domain:mock6:vmi_sv-39_0500_01 200 {} <<<<< This is not expected.
10.0.0.125:9081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock6:vmi_sv-39_0500_01 200 {} <<<<< This is not expected.
10.0.0.126:9081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
 
So, the timestamp of 2) in this case is 11:52:27 and the downtime is 11:52:27 - 11:49:57 = 00:02:30.
 
---------------------------------------------------------------------------------------------------------------------------------
virtual-machine-interfaces at 2017/09/05 11:52:27
---------------------------------------------------------------------------------------------------------------------------------
node object status result
---------------- --------------------------------------- ------- -------------------------------------------------------------
10.0.0.100:8081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.100:8081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.124:9081 default-domain:mock6:vmi_sv-39_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock3:vmi_sv-36_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock4:vmi_sv-37_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock1:vmi_sv-34_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock2:vmi_sv-35_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.124:9081 default-domain:mock5:vmi_sv-38_0500_01 n/a Connection to 10.0.0.124:9081 timed out. (connect timeout=2)
10.0.0.125:9081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.125:9081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock6:vmi_sv-39_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock3:vmi_sv-36_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock4:vmi_sv-37_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock1:vmi_sv-34_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock2:vmi_sv-35_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
10.0.0.126:9081 default-domain:mock5:vmi_sv-38_0500_01 200 {"UveVMInterfaceAgent": {"ip6_active": f

3) Do they give time after each of the control-node shutdown and issuing query against analytics
This can be checked by looking at the redis.log in analytics node and make sure no deletes are happening before issuing the query against analytics. 
What they are looking for is the response with 200 OK and not out put in it like the below
10.0.0.100:8081 default-domain:mock6:vmi_sv-39_0500_01 200 {} <<<<< This is not expected.

The query keeps on running and in the above example we can see conenction timeout which indicates that the server is dowm.

4) Also the messages u have reported are unrelated to the issue reported by the customer. You are experiencing connectivity issues in analytics to redis. Before trying to issue queries, make sure contrail-status on analytics is ok.

I was not able to replicate the issue in our lab. My setup was on VM and couldn't scale up for the number of VN and VMI.

5) contrail-version
3.1.3.0-75

Scripts attached here
Best Regards,
Vijay Kumar

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-15:

#13

mock_port.tar Edit (30.0 KiB, application/x-tar)

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-15:

#14

testbed.py Edit (5.7 KiB, text/x-python)

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-20:

#15

Hi Arvind,

Any update?

Best Regards,
Vijay Kumar

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-11-27:

#16

Hi Arvind,

Do we have any update?

Best Regards,
Vijay Kumar

Revision history for this message

Arvind (arvindv) wrote on 2017-11-28:

#17

Thanks Vijay Kumar.
1) do u have the get_port_uve.py script. Do you send the UVE requests both to VIP as well as the
internal IP's ? [would like to understand how u are issuing the requests to analytics api's]
2) Can I access your testbed, I would like to debug it. We don't have multinode with release u are trying.
Thanks
Arvind

Revision history for this message

Arvind (arvindv) wrote on 2017-12-04:

#18

Hi VijayKumar,

With regards to your concern about not being able to read the UVE's while doing HA, it is by design.
We will not be able to read the UVE's(until the generators connect to the collectors in the new nodes) if they belonged to a partition that is owned by the node that went down.
In your case, the reason why there is a temporary empty json returned by the analytics-api for your queries are because the UVE's owned by the node that went down took a min to show up in the other nodes (surviving).

So when comparing the time taken by analytics-api to return the API's we cannot include this downtime. So let me know if you have concern in fetch times for various configuration(500, 1000, 1500 VN) outside this window.
Thanks
Arvind

Revision history for this message

vijaya kumar shankaran (vijayks) wrote on 2017-12-05:

#19

Hi Arvind,

when you mention generators do you mean contrail-alarm-gen?

I am trying to follow up the update
In your case, the reason why there is a temporary empty json returned by the analytics-api for your queries are because the UVE's owned by the node that went down took a min to show up?

Does the time depend on the number of VN/VNI?

Best Regards,
Vijay Kumar

vivekananda shenoy (vshenoy83) on 2018-02-14

Changed in juniperopenstack:
importance:	Undecided → Critical

Vineet Gupta (vineetrf) on 2018-03-19

tags:

added: nttc

vijaya kumar shankaran (vijayks) on 2018-03-21

tags:	added: 2017-0905-0258 jtac
tags:	added: config
tags:	added: analytics removed: config

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-17: [Review update] R3.1

#20

Review in progress for https://review.opencontrail.org/41986
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-17:

#21

Review in progress for https://review.opencontrail.org/41990
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-17:

#22

Review in progress for https://review.opencontrail.org/41991
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-17:

#24

Review in progress for https://review.opencontrail.org/41992
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-17:

#25

Review in progress for https://review.opencontrail.org/41986
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-21: [Review update] R3.2

#30

Review in progress for https://review.opencontrail.org/42319
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-21: [Review update] R4.0

#33

Review in progress for https://review.opencontrail.org/42338
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-21: [Review update] R4.1

#35

Review in progress for https://review.opencontrail.org/42341
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-21: [Review update] R4.0

#37

Review in progress for https://review.opencontrail.org/42338
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-21: [Review update] R3.1

#39

Review in progress for https://review.opencontrail.org/41986
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-21: [Review update] R3.2

#41

Review in progress for https://review.opencontrail.org/42319
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

vivekananda shenoy (vshenoy83) wrote on 2018-04-23:

#43

Hi Zhiqiang,

What is the ETA for this issue ?

Regards,
Vivek

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-25: [Review update] R3.1

#44

Review in progress for https://review.opencontrail.org/41986
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-25: [Review update] R3.2

#46

Review in progress for https://review.opencontrail.org/42319
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-25: [Review update] R4.0

#48

Review in progress for https://review.opencontrail.org/42338
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-25: [Review update] R4.1

#50

Review in progress for https://review.opencontrail.org/42341
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-04-25: A change has been merged

#52

Reviewed: https://review.opencontrail.org/41986
Committed: http://github.com/Juniper/contrail-controller/commit/666fe371caced9db7c7206dda95b5c744ef72fcc
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 666fe371caced9db7c7206dda95b5c744ef72fcc
Author: zcui <email address hidden>
Date: Mon Apr 16 17:55:34 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

Change-Id: If39df7650ec514d4645e3eae30edc7b8ed0b5d1d
Closes-Bug: #1719236

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-05-02:

#53

Reviewed: https://review.opencontrail.org/42341
Committed: http://github.com/Juniper/contrail-controller/commit/a101ea508f9d63662fae243b4928bc3a170e2c8d
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit a101ea508f9d63662fae243b4928bc3a170e2c8d
Author: zcui <email address hidden>
Date: Fri Apr 20 14:51:19 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

Change-Id: Id2efdb4ec6a600697442e081042a903a01812dd9
Closes-Bug: #1719236

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-05-09:

#54

Reviewed: https://review.opencontrail.org/42319
Committed: http://github.com/Juniper/contrail-controller/commit/fdda437edceb65ffce60efe3d40fe2e084fff664
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit fdda437edceb65ffce60efe3d40fe2e084fff664
Author: zcui <email address hidden>
Date: Mon Apr 16 17:55:34 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

Change-Id: If39df7650ec514d4645e3eae30edc7b8ed0b5d1d
Closes-Bug: #1719236

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-05-09:

#55

Reviewed: https://review.opencontrail.org/42338
Committed: http://github.com/Juniper/contrail-controller/commit/8f184ca728d3812c85a78f6404a68885cb841729
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 8f184ca728d3812c85a78f6404a68885cb841729
Author: zcui <email address hidden>
Date: Fri Apr 20 14:34:30 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

Change-Id: I600e2c33671a1c16545732cd64d1493678095198
Closes-Bug: #1719236

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-21: [Review update] R5.0

#56

Review in progress for https://review.opencontrail.org/44035
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-28: [Review update] master

#64

Review in progress for https://review.opencontrail.org/44207
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-28: [Review update] R5.0

#66

Review in progress for https://review.opencontrail.org/44035
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-28: [Review update] master

#68

Review in progress for https://review.opencontrail.org/44207
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-28: [Review update] R4.1

#70

Review in progress for https://review.opencontrail.org/44220
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-28: [Review update] R4.0

#72

Review in progress for https://review.opencontrail.org/44221
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-28: [Review update] R3.2

#74

Review in progress for https://review.opencontrail.org/44222
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-28: [Review update] R3.1

#76

Review in progress for https://review.opencontrail.org/44223
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-29: [Review update] R4.1

#78

Review in progress for https://review.opencontrail.org/44220
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-29: [Review update] R4.0

#80

Review in progress for https://review.opencontrail.org/44221
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-29: [Review update] R3.2

#82

Review in progress for https://review.opencontrail.org/44222
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-06-29: [Review update] R3.1

#84

Review in progress for https://review.opencontrail.org/44223
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-07-03: A change has been merged

#86

Reviewed: https://review.opencontrail.org/44207
Committed: http://github.com/Juniper/contrail-analytics/commit/9abcdec4c72d86d7a2e345b1611d6ef8cf137bb8
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 9abcdec4c72d86d7a2e345b1611d6ef8cf137bb8
Author: zcui <email address hidden>
Date: Thu Jun 21 16:17:06 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

For R5.0, we delete the way to read from DB7, and read from DB1 directly

Change-Id: I064a5d38d8ae04770e033251ab12ee7b329054cc
Closes-Bug: #1719236

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-07-03:

#87

Reviewed: https://review.opencontrail.org/44220
Committed: http://github.com/Juniper/contrail-controller/commit/f8167648d12ebb2340b334d84798c141ca43d991
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit f8167648d12ebb2340b334d84798c141ca43d991
Author: zcui <email address hidden>
Date: Thu Jun 28 15:19:03 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

This commit fix bug: legacy get_alarm do not use _usecache flag

Closes-Bug: #1719236

Change-Id: I03dcee1a1b4103587197774cccd2901e4faa3a42

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-07-03:

#88

Reviewed: https://review.opencontrail.org/44035
Committed: http://github.com/Juniper/contrail-analytics/commit/f7daaa6e8d8ecd617c4aba218f647bf2bfc4112e
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit f7daaa6e8d8ecd617c4aba218f647bf2bfc4112e
Author: zcui <email address hidden>
Date: Thu Jun 21 16:17:06 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

For R5.0, we delete the way to read from DB7, and read from DB1 directly

Change-Id: I064a5d38d8ae04770e033251ab12ee7b329054cc
Closes-Bug: #1719236

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-07-04:

#89

Reviewed: https://review.opencontrail.org/44222
Committed: http://github.com/Juniper/contrail-controller/commit/5901f91c41ce26add952bb8f2922f72fc4ac53b8
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 5901f91c41ce26add952bb8f2922f72fc4ac53b8
Author: zcui <email address hidden>
Date: Thu Jun 28 15:19:03 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

This commit fix bug: legacy get_alarm do not use _usecache flag

Closes-Bug: #1719236

Change-Id: I03dcee1a1b4103587197774cccd2901e4faa3a42

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-07-04:

#91

Reviewed: https://review.opencontrail.org/44221
Committed: http://github.com/Juniper/contrail-controller/commit/0deb61632a1217446132d9453afb0e70646b465f
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 0deb61632a1217446132d9453afb0e70646b465f
Author: zcui <email address hidden>
Date: Thu Jun 28 15:19:03 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

This commit fix bug: legacy get_alarm do not use _usecache flag

Closes-Bug: #1719236

Change-Id: I03dcee1a1b4103587197774cccd2901e4faa3a42

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-07-06:

#92

Reviewed: https://review.opencontrail.org/44223
Committed: http://github.com/Juniper/contrail-controller/commit/36ce23cd579cf03cb3edbd4f2cbbe77b4e510751
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 36ce23cd579cf03cb3edbd4f2cbbe77b4e510751
Author: zcui <email address hidden>
Date: Thu Jun 28 15:19:03 2018 -0700

Add new config option use_aggregated_uve_db

Provide an option to enable/disable serve UVE queries from the aggregated
UVE db. In a scale setup, HA event (alarm-gen down), alarm-gen may take a
long time to aggregate the UVEs depending on the number of UVEs, number
of partitions it owns, etc., So, enabling this option would reduce the
down time in case of HA events (collector down - time taken for generators
to connect to the new collector and resync the UVEs, alarm-gen down - no
impact on UVE queries, only the alarms would be reevaluated)

This commit fix bug: legacy get_alarm do not use _usecache flag

Closes-Bug: #1719236

Change-Id: I03dcee1a1b4103587197774cccd2901e4faa3a42

	Status	Importance	Assigned to	Milestone
Juniper Openstack	Status tracked in Trunk
R3.1	Fix Committed	High	Zhiqiang Cui	Juniper Openstack r3.1.4.0
R3.2	Fix Committed	High	Zhiqiang Cui	Juniper Openstack r3.2.12.0
R4.0	Fix Committed	High	Zhiqiang Cui	Juniper Openstack r4.0.3.0
R4.1	Fix Committed	High	Zhiqiang Cui	Juniper Openstack r4.1.2.0
R5.0	Fix Committed	High	Zhiqiang Cui	Juniper Openstack r5.0.1
Trunk	Fix Committed	High	Zhiqiang Cui	Juniper Openstack r5.1.0

Juniper Openstack

Contrail analytics response time varies based on the number of VN/VMI when one of the control node fails

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches