Build 2696: Alarms: When agent process state becomes down, all types of vrouter alarms are being raised

Bug #1533158 reported by Ankit Jain
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Fix Released
Medium
Anish Mehta

Bug Description

1)Alarms of types VrouterInterface, PartialSysinfoCompute, AddressMismatchCompute are also being raised along with process status alarms when vrouter-process status is down.

2) For alarm type VrouterInterface, rule info, operand values are also missing as shown below

"any_of": [

    {
        "all_of": [ ]
    }

],
"severity": ​4,
"ack": false,
"timestamp": ​1452596857815859,
"token": "eyJ0aW1lc3RhbXAiOiAxNDUyNTk2ODU3ODE1ODU5LCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTAuMjA0LjIxNi4xNyJ9",
"type": "VrouterInterface"

Pasting the output of http://nodeg13:8081/analytics/uves/vrouter/nodeg20?flat

{

    "NodeStatus":

{

    "deleted": false,
    "disk_usage_info":

[

{

    "partition_space_available_1k": ​72376656,
    "partition_space_used_1k": ​5433516,
    "partition_name": "/dev/mapper/nodeg20--vg-root",
    "partition_type": "ext4"

},

    {
        "partition_space_available_1k": ​160929,
        "partition_space_used_1k": ​67602,
        "partition_name": "/dev/sda1",
        "partition_type": "ext2"
    }

],
"process_status":
[

    {
        "instance_id": "0",
        "module_id": "contrail-vrouter-nodemgr",
        "state": "Functional",
        "description": null
    }

],
"process_info":
[

{

    "process_name": "contrail-vrouter-agent",
    "start_count": ​1,
    "process_state": "PROCESS_STATE_STOPPED",
    "last_stop_time": "1452596858060193",
    "core_file_list": [ ],
    "last_start_time": "1452596789356888",
    "stop_count": ​1,
    "last_exit_time": null,
    "exit_count": ​0

},
{

    "process_name": "contrail-vrouter-nodemgr",
    "start_count": ​1,
    "process_state": "PROCESS_STATE_RUNNING",
    "last_stop_time": null,
    "core_file_list": [ ],
    "last_start_time": "1452596784346269",
    "stop_count": ​0,
    "last_exit_time": null,
    "exit_count": ​0

},

        {
            "process_name": "openstack-nova-compute",
            "start_count": ​1,
            "process_state": "PROCESS_STATE_RUNNING",
            "last_stop_time": null,
            "core_file_list": [ ],
            "last_start_time": "1452596783283656",
            "stop_count": ​0,
            "last_exit_time": null,
            "exit_count": ​0
        }
    ]

},
"ContrailConfig":
{

    "elements":

[

[

    {
        "fq_name": "[\"default-global-system-config\", \"nodeg20\"]",
        "uuid": "\"bc915855-5e94-44ea-bd4e-48bcb6fa00a7\"",
        "parent_uuid": "\"f355badd-33d7-4382-96fc-557b19af2a21\"",
        "parent_href": "\"http://0.0.0.0:9100/global-system-config/f355badd-33d7-4382-96fc-557b19af2a21\"",
        "parent_type": "\"global-system-config\"",
        "perms2": "{\"owner\": \"ff6dc5c4bedb43c787f036f305013f07\", \"owner_access\": 7, \"global_access\": 0, \"share\": []}",
        "virtual_router_type": "[]",
        "display_name": "\"nodeg20\"",
        "id_perms": "{\"enable\": true, \"description\": null, \"creator\": null, \"created\": \"2016-01-12T06:33:08.866429\", \"user_visible\": true, \"last_modified\": \"2016-01-12T10:39:52.831130\", \"permissions\": {\"owner\": \"admin\", \"owner_access\": 7, \"other_access\": 7, \"group\": \"admin\", \"group_access\": 7}, \"uuid\": {\"uuid_mslong\": 13587738674435736810, \"uuid_lslong\": 13640920296712700071}}",
        "virtual_router_ip_address": "\"10.204.217.60\""
    },
    "nodeg20:Config:contrail-api:0",
    "nodea21:Config:contrail-api:0"

],
[

            {
                "fq_name": "[\"default-global-system-config\", \"nodeg20\"]",
                "uuid": "\"bc915855-5e94-44ea-bd4e-48bcb6fa00a7\"",
                "parent_href": "\"http://0.0.0.0:9100/global-system-config/f355badd-33d7-4382-96fc-557b19af2a21\"",
                "parent_type": "\"global-system-config\"",
                "perms2": "{\"owner\": \"ff6dc5c4bedb43c787f036f305013f07\", \"owner_access\": 7, \"global_access\": 0, \"share\": []}",
                "virtual_router_type": "[]",
                "display_name": "\"nodeg20\"",
                "virtual_router_ip_address": "\"10.204.217.60\"",
                "id_perms": "{\"enable\": true, \"uuid\": {\"uuid_mslong\": 13587738674435736810, \"uuid_lslong\": 13640920296712700071}, \"created\": \"2016-01-12T06:33:08.866429\", \"description\": null, \"creator\": null, \"user_visible\": true, \"last_modified\": \"2016-01-12T06:33:08.866429\", \"permissions\": {\"owner\": \"admin\", \"owner_access\": 7, \"other_access\": 7, \"group\": \"admin\", \"group_access\": 7}}",
                "parent_uuid": "\"f355badd-33d7-4382-96fc-557b19af2a21\""
            },
            "nodeg13:Config:contrail-api:0"
        ]
    ]

},
"UVEAlarms":
{

    "alarms":

[

{

    "any_of":

[

{

    "all_of":

[

{

    "json_operand1_value": "null",
    "rule":

{

    "oper": "==",
    "operand1":

{

    "keys":

    [
        "VrouterAgent",
        "build_info"
    ]

},
"operand2":

                        {
                            "json_value": "null"
                        }
                    }
                }
            ]
        }
    ],
    "severity": ​4,
    "ack": false,
    "timestamp": ​1452596857815492,
    "token": "eyJ0aW1lc3RhbXAiOiAxNDUyNTk2ODU3ODE1NDkyLCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTAuMjA0LjIxNi4xNyJ9",
    "type": "PartialSysinfoCompute"

},
{

    "any_of":

[

{

    "all_of":

[

{

    "json_operand2_value": "null",
    "json_operand1_value": "\"10.204.217.60\"",
    "rule":

{

    "oper": "not in",
    "operand1":

{

    "keys":

    [
        "ContrailConfig",
        "elements",
        "virtual_router_ip_address"
    ],
    "json": ​2

},
"operand2":
{

    "keys":

            [
                "VrouterAgent",
                "self_ip_list"
            ]
        }
    }

},
{

    "json_operand2_value": "null",
    "json_operand1_value": "\"10.204.217.60\"",
    "rule":

{

    "oper": "!=",
    "operand1":

{

    "keys":

    [
        "ContrailConfig",
        "elements",
        "virtual_router_ip_address"
    ],
    "json": ​2

},
"operand2":
{

    "keys":

                            [
                                "VrouterAgent",
                                "control_ip"
                            ]
                        }
                    }
                }
            ]
        }
    ],
    "severity": ​3,
    "ack": false,
    "timestamp": ​1452596857815775,
    "token": "eyJ0aW1lc3RhbXAiOiAxNDUyNTk2ODU3ODE1Nzc1LCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTAuMjA0LjIxNi4xNyJ9",
    "type": "AddressMismatchCompute"

},
{

    "any_of":

[

        {
            "all_of": [ ]
        }
    ],
    "severity": ​4,
    "ack": false,
    "timestamp": ​1452596857815859,
    "token": "eyJ0aW1lc3RhbXAiOiAxNDUyNTk2ODU3ODE1ODU5LCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTAuMjA0LjIxNi4xNyJ9",
    "type": "VrouterInterface"

},
{

    "any_of":

[

{

    "all_of":

[

{

    "json_operand1_value": "\"PROCESS_STATE_STOPPED\"",
    "rule":

{

    "oper": "!=",
    "operand1":

{

    "keys":

    [
        "NodeStatus",
        "process_info",
        "process_state"
    ]

},
"operand2":

    {
        "json_value": "\"PROCESS_STATE_RUNNING\""
    }

},
"json_vars":

                                {
                                    "NodeStatus.process_info.process_name": "contrail-vrouter-agent"
                                }
                            }
                        ]
                    }
                ],
                "severity": ​3,
                "ack": false,
                "timestamp": ​1452596858840714,
                "token": "eyJ0aW1lc3RhbXAiOiAxNDUyNTk2ODU4ODQwNzE0LCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTAuMjA0LjIxNi4xNyJ9",
                "type": "ProcessStatus"
            }
        ]
    }

}

== Contrail vRouter ==
supervisor-vrouter: active
contrail-vrouter-agent inactive
contrail-vrouter-nodemgr active

== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager backup
contrail-discovery:0 active
contrail-schema backup
contrail-svc-monitor backup
ifmap active

== Contrail Database ==
contrail-database: active
supervisor-database: active
contrail-database-nodemgr active
kafka active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

Revision history for this message
Anish Mehta (amehta00) wrote :

I see the problem with the AddressMismatchCompute alarm.
It should not be raised when the VrouterAgent struct is absent.

For VrouterInterface, I think the plugin is crashing. There should be a message in contrail-alarm-gen.log or contrail-alarm-gen-stdout.log. I will take a look.

Under the current alarm definition, ProcessStatus and PartialSysinfo are valid alams.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/16436
Submitter: Anish Mehta (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/16436
Committed: http://github.org/Juniper/contrail-controller/commit/61cafbf3e7f8377fb5917f684a4753b2e73992f2
Submitter: Zuul
Branch: master

commit 61cafbf3e7f8377fb5917f684a4753b2e73992f2
Author: Anish Mehta <email address hidden>
Date: Fri Jan 22 15:49:55 2016 -0800

Alarmgen should ensure that we always able to catch the gevent kill exception in worker gevents.
We go into a sleep after a kafka error. We should sleep in a section that can catch exceptions

AddressMismatchCompute and VrouterInterface alarms should not be raised when the VrouterAgent struct is absent.
AddressMismatchControl alarms should not be raised when the BgpRouterState struct is absent

When we check the object-type in api-server (during ContrailConfig insertion), we must replace "-" by "_"
Use "oper" instead of obj_dict to decide whether the object is being deleted. See:
https://github.com/Juniper/contrail-controller/commit/bf6cba2bb5b061bfd56fa99c72b7b2323f08486f

Change-Id: I2d3f8f66ac8ad8df13db816936e98ed7320c54a0
Closes-Bug: #1533158
Closes-Bug: #1536085

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.