tacker

vnf monitoring using ping is not consistent

Bug #1497474 reported by Santosh on 2015-09-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tacker	Fix Released	Critical	Bob Haddleton

Bug Description

1. Create a vnfd with following data.
template_name: sample-vnfd-nonparam
description: demo-example

service_properties:
  Id: sample-vnfd
  vendor: tacker
  version: 1

vdus:
  vdu1:
    id: vdu1
    vm_image: cirros-0.3.4-x86_64-uec
    instance_type: m1.tiny

    network_interfaces:
      management:
        network: net_mgmt
        management: true
      pkt_in:
        network: net0
      pkt_out:
        network: net1

placement_policy:
availability_zone: nova

    auto-scaling: noop
    monitoring_policy: ping
    failure_policy: respawn

    config:
      param0: key0
      param1: key1

2. Createa a vnf based on above vnfd.
3. Once vnf instance is up completely, do "sudo ifdown eth0" so that ping checks fail (verify tacker.log)
4. vnf is respawned.
5. Repeating this test again and again, system will run into issue where pings no longer happen and vnf is no longer spawned incase if it is not reachable.

Tags:

Revision history for this message

Santosh (ksantosh-cs) wrote on 2015-09-18:

tacker.log Edit (749.5 KiB, text/html)

Revision history for this message

Bob Haddleton (bob-haddleton) wrote on 2015-09-24:

This is being fixed as part of the monitor driver framework spec implementation. The issue is caused by the monitor thread holding a lock while it loops through the hosting_devices to run the ping check. When it detects a failure and respawns the device, it is still holding the lock, and it calls delete_device() which calls delete_hosting_device(), which tries to obtain the same lock. The thread is then blocked waiting for itself to release the lock. Since that will never happen all monitoring stops.

The solution is to use RLock instead of Lock so that the lock is smart enough to recognize when the requesting thread already holds the lock it is requesting. That allows delete_hosting_device() to proceed and monitoring of the new device (and all other devices) can continue.

Changed in tacker:
assignee:	nobody → Bob Haddleton (bob-haddleton)
status:	New → In Progress

Sridhar Ramaswamy (srics-r) on 2015-09-25

Changed in tacker:
importance:	Undecided → Critical

OpenStack Infra (hudson-openstack) on 2015-10-02

Changed in tacker:
assignee:	Bob Haddleton (bob-haddleton) → bharaththiruveedula (bharath-ves)

bharaththiruveedula (bharath-ves) on 2015-10-02

Changed in tacker:
assignee:	bharaththiruveedula (bharath-ves) → Bob Haddleton (bob-haddleton)

Sridhar Ramaswamy (srics-r) on 2015-10-08

tags:

added: vnf-health-monitoring

Sridhar Ramaswamy (srics-r) on 2015-10-08

tags:

added: liberty-critical

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-10-13: Fix merged to tacker (master)

Reviewed: https://review.openstack.org/224384
Committed: https://git.openstack.org/cgit/stackforge/tacker/commit/?id=1afd26a13b40bd2f8319911690f25103955bb976
Submitter: Jenkins
Branch: master

commit 1afd26a13b40bd2f8319911690f25103955bb976
Author: Bob HADDLETON <email address hidden>
Date: Wed Sep 16 21:03:35 2015 -0500

Implement Monitoring Framework

* Changes the monitor function to use a loadable driver

     * Changes the monitoring thread to use a re-entrant lock
       (RLock()) to prevent it from blocking itself during
        recovery actions

    Change-Id: Icf40ffd3123f3b804de16c88164d84077fbf28e2
    Implements: blueprint health-monitoring
    Closes-Bug: 1497474

Changed in tacker:
status:	In Progress → Fix Committed

dharmendra (dharmendra-kushwaha) on 2017-01-27

Changed in tacker:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

tacker.log Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.