R3.2-4 : if_stats details not sent to analytics sometimes

Bug #1644125 reported by Ankit Jain
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.2
Fix Committed
Medium
Ashok Singh
Trunk
Fix Committed
Medium
Ashok Singh

Bug Description

Output from agent introspect :

http://nodeh3:8085/Snh_SandeshUVECacheReq?x=UveVMInterfaceAgent
UveVMInterfaceAgentTrace

raw_if_stats

in_pkts
0
in_bytes
0
out_pkts
0
out_bytes
0
drop_pkts
0
if_stats
if_stats

in_pkts
0
in_bytes
0
out_pkts
0
out_bytes
0
drop_pkts
0

Output from analytics UVE

http://nodeg13:8081/analytics/uves/virtual-machine-interface/default-domain:admin:02b2cba7-c8ce-4f07-bd3e-de83b1428cc9?flat

UveVMInterfaceAgent": {
"ip6_active": false,
"if_stats": {
"out_bytes": 42,
"in_pkts": 19,
"drop_pkts": 18,
"out_pkts": 1,
"in_bytes": 1806
},

Issue : As per the above logs, if_stats details from agent introspect not matching with the if_stat details in analytics uve.

1) when pkt drop was stopped ( in this case flow_action_drop ), agent did not update the if_stat details to analytics, and analytics if_stats continued to show old values.

2) All fields of raw_if_stats became zero ( vrouter agent was not restarted )

From contrail-logs below it seems VmInterfaceStats was not sent to analytics, and hence analytics did not update if_stats

2016 Nov 23 12:49:35.196190 nodeh3 [Compute:contrail-vrouter-agent:0:None][INVALID] : UveVMInterfaceAgentTrace:4686 [UveVMInterfaceAgent: name = default-domain:admin:02b2cba7-c8ce-4f07-bd3e-de83b1428cc9, virtual_network = default-domain:admin:vn1, vm_name = vm1, [VmInterfaceStats: in_pkts = 19, in_bytes = 1806, out_pkts = 1, out_bytes = 42, drop_pkts = 18], [AnomalyResult: samples = 2398, algo = EWM, config = 0.1, state: {mean: 21.4668, stddev: 13.0776, }, sigma = -0.188625], [AnomalyResult: samples = 2398, algo = EWM, config = 0.1, state: {mean: 0.746259, stddev: 0.435512, }, sigma = 0.582628], vm_uuid = c5fe2df4-25b3-4c48-bada-489ca3b94a85, in_bw_usage = 481, [VrouterFlowRate: added_flows = 36, max_flow_adds_per_second = 2, min_flow_adds_per_second = 0, deleted_flows = 36, max_flow_deletes_per_second = 2, min_flow_deletes_per_second = 0, active_flows = 0], [AnomalyResult: samples = 2398, algo = EWM, config = 0.1, state: {mean: 41.441, stddev: 25.3233, }, sigma = -0.214863], [AnomalyResult: samples = 2398, algo = EWM, config = 0.1, state: {mean: 41.4569, stddev: 25.3834, }, sigma = -0.21498], [AnomalyResult: samples = 2398, algo = EWM, config = 0.1, state: {mean: 0.1429, stddev: 0.51515, }, sigma = -0.277395], drop_stats: {flow_action_drop: 18, }, drop_stats_1h: {cksum_err: 0, cloned_original: 0, discard: 0, duplicated: 0, flow_action_drop: 107, flow_action_invalid: 0, flow_invalid_protocol: 0, flow_nat_no_rflow: 0, flow_no_memory: 0, flow_queue_limit_exceeded: 0, flow_table_full: 0, flow_unusable: 0, frag_err: 0, head_alloc_fail: 0, interface_drop: 0, interface_rx_discard: 0, interface_tx_discard: 0, invalid_arp: 0, invalid_if: 0, invalid_label: 0, invalid_mcast_source: 0, invalid_nh: 0, invalid_packet: 0, invalid_protocol: 0, invalid_source: 0, invalid_vnid: 0, l2_no_route: 0, mcast_clone_fail: 0, mcast_df_bit: 0, misc: 0, no_fmd: 0, nowhere_to_go: 0, pcow_fail: 0, pull: 0, push: 0, rewrite_fail: 0, trap_no_if: 0, ttl_exceeded: 0, vlan_fwd_enq: 0, vlan_fwd_tx: 0, }]
2016 Nov 23 12:49:37.340638 nodeh3 [Compute:contrail-vrouter-agent:0:None][INVALID] : UveVMInterfaceAgentTrace:4687 [UveVMInterfaceAgent: name = default-domain:admin:02b2cba7-c8ce-4f07-bd3e-de83b1428cc9, virtual_network = default-domain:admin:vn1, vm_name = vm1, vm_uuid = c5fe2df4-25b3-4c48-bada-489ca3b94a85, [sg_rule_stats: [rule = 00000000-0000-0000-0000-000000000001, count = 32] [rule = 00000000-0000-0000-0000-000000000003, count = 0]]]
2016 Nov 23 12:50:05.196455 nodeh3 [Compute:contrail-vrouter-agent:0:None][INVALID] : UveVMInterfaceAgentTrace:4688 [UveVMInterfaceAgent: name = default-domain:admin:02b2cba7-c8ce-4f07-bd3e-de83b1428cc9, virtual_network = default-domain:admin:vn1, vm_name = vm1, [AnomalyResult: samples = 2399, algo = EWM, config = 0.1, state: {mean: 19.3201, stddev: 13.9784, }, sigma = -1.38214], [AnomalyResult: samples = 2399, algo = EWM, config = 0.1, state: {mean: 0.671633, stddev: 0.46992, }, sigma = -1.42925], vm_uuid = c5fe2df4-25b3-4c48-bada-489ca3b94a85, in_bw_usage = 0, out_bw_usage = 0, [VrouterFlowRate: added_flows = 0, max_flow_adds_per_second = 0, min_flow_adds_per_second = 0, deleted_flows = 0, max_flow_deletes_per_second = 0, min_flow_deletes_per_second = 0, active_flows = 0], [AnomalyResult: samples = 2399, algo = EWM, config = 0.1, state: {mean: 37.2969, stddev: 27.0501, }, sigma = -1.37881], [AnomalyResult: samples = 2399, algo = EWM, config = 0.1, state: {mean: 37.3112, stddev: 27.1028, }, sigma = -1.37665], [AnomalyResult: samples = 2399, algo = EWM, config = 0.1, state: {mean: 0.12861, stddev: 0.490591, }, sigma = -0.262153], drop_stats_1h: {cksum_err: 0, cloned_original: 0, discard: 0, duplicated: 0, flow_action_drop: 78, flow_action_invalid: 0, flow_invalid_protocol: 0, flow_nat_no_rflow: 0, flow_no_memory: 0, flow_queue_limit_exceeded: 0, flow_table_full: 0, flow_unusable: 0, frag_err: 0, head_alloc_fail: 0, interface_drop: 0, interface_rx_discard: 0, interface_tx_discard: 0, invalid_arp: 0, invalid_if: 0, invalid_label: 0, invalid_mcast_source: 0, invalid_nh: 0, invalid_packet: 0, invalid_protocol: 0, invalid_source: 0, invalid_vnid: 0, l2_no_route: 0, mcast_clone_fail: 0, mcast_df_bit: 0, misc: 0, no_fmd: 0, nowhere_to_go: 0, pcow_fail: 0, pull: 0, push: 0, rewrite_fail: 0, trap_no_if: 0, ttl_exceeded: 0, vlan_fwd_enq: 0, vlan_fwd_tx: 0, }]

Tags: vrouter
Revision history for this message
Anish Mehta (amehta00) wrote :

From: Ashok Singh R <email address hidden>
Date: Wednesday, November 23, 2016 at 4:56 PM
To: Anish Mehta <email address hidden>
Cc: "Ankit Jain (MVI)" <email address hidden>, Hari Prasad Killi <email address hidden>
Subject: Stats behavior query

Hi Anish,

I have some questions regarding bug https://bugs.launchpad.net/bugs/1644125

(1) We have many fields starting with raw_ names where we send aggregate values, except for the field UveVMInterfaceAgent.raw_if_stats where we send diff stats and this field does not have metric=”agg’’ annotation. Shouldn’t we change this field to have metric=”agg’’ annotation and send only aggregate values atleast for the sake of consistency w.r.t other raw_ attribute fields.

This is really up to you as the client of the library. The end-user does not see the “raw” fields.
You may want to send aggregate values to simplify your code. You can treat it as code-cleanup activity, along with other features of cleanup.
Or, you may want to maintain the current behavior of sending diffs, if you do not need to change code that is not broken.

(2) The derived stat field UveVMInterfaceAgent.if_stats derived from UveVMInterfaceAgent.raw_if_stats field should have last Non-zero sample sent by Agent in UVE (as seen in analytics)?

Yes, this is the behavior.
Basically, if_stats is useful for recording statistics, but it does not represent a snapshot of the latest state of this attribute (unlike most UVE attributes)
If we want to expose the latest snapshot, maybe we could show the activity over the last hour.
(UI team will like this approach – they show this number in network monitoring, and will be able to get it from UVE instead of doing a Stats Query. I’m cc’ing Abhishek)

35: optional VmInterfaceStats if_stats_1h (stats=“raw_if_stats:DSSum:3600")

Actually, there is an issue with the usage of “DSSum” in vrouter.sandesh and interface.sandesh.
I changed the interface in R3.2, and I missed updating you about it; the number after DSSum not indicated number of seconds, not number of samples.
So, your usage of “DSSum:120” should be replaced with “DSSum:3600”.
Can you please change it, or do you want me to?

(3) When we add attribute metric=”agg” say to field UveVMInterfaceAgent.raw_if_stats, will it have any impact in analytics modules? If we just replace newly built agent is it enough or should we
build analytics modules as well?

There is no impact on analytics modules.
Building new vrouter-agent is sufficient.

Regards,
Ashok

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/26716
Submitter: Ashok Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/26718
Submitter: Ashok Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/26718
Committed: http://github.org/Juniper/contrail-controller/commit/78a311a28be285020bec7e603c3595ccd195786e
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 78a311a28be285020bec7e603c3595ccd195786e
Author: ashoksingh <email address hidden>
Date: Fri Dec 2 15:39:28 2016 +0530

Fix usage of DSSum attribute in Agent UVEs

Earlier this attribute used to take number of samples as argument. Now it takes number of seconds
Updated the argument for DSSum to reflect the number of seconds so that the field represents value for 1 hour.

Also, update the following
1. Use metric=”agg” for raw_if_stats of UveVMInterfaceAgent
2. Send last 1 hour drop-stats as UVE for vhost interface.
3. Update comments

Closes-Bug: #1644125
(cherry picked from commit 847459e4198032e4441545bf8d60296885709cb3)

Change-Id: I5f3a69116cd67c2d9fa25cf2c0d89d57b8c47ecc

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/26716
Committed: http://github.org/Juniper/contrail-controller/commit/847459e4198032e4441545bf8d60296885709cb3
Submitter: Zuul (<email address hidden>)
Branch: master

commit 847459e4198032e4441545bf8d60296885709cb3
Author: ashoksingh <email address hidden>
Date: Fri Dec 2 15:39:28 2016 +0530

Fix usage of DSSum attribute in Agent UVEs

Earlier this attribute used to take number of samples as argument. Now it takes number of seconds
Updated the argument for DSSum to reflect the number of seconds so that the field represents value for 1 hour.

Also, update the following
1. Use metric=”agg” for raw_if_stats of UveVMInterfaceAgent
2. Send last 1 hour drop-stats as UVE for vhost interface.
3. Update comments

Change-Id: I7cbe038663dd1f3d2680ab830b69f3b24a1a23e9
Closes-Bug: #1644125

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.