vnf monitoring using ping is not consistent

Bug #1497474 reported by Santosh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tacker
Fix Released
Critical
Bob Haddleton

Bug Description

1. Create a vnfd with following data.
template_name: sample-vnfd-nonparam
description: demo-example

service_properties:
  Id: sample-vnfd
  vendor: tacker
  version: 1

vdus:
  vdu1:
    id: vdu1
    vm_image: cirros-0.3.4-x86_64-uec
    instance_type: m1.tiny

    network_interfaces:
      management:
        network: net_mgmt
        management: true
      pkt_in:
        network: net0
      pkt_out:
        network: net1

    placement_policy:
      availability_zone: nova

    auto-scaling: noop
    monitoring_policy: ping
    failure_policy: respawn

    config:
      param0: key0
      param1: key1

2. Createa a vnf based on above vnfd.
3. Once vnf instance is up completely, do "sudo ifdown eth0" so that ping checks fail (verify tacker.log)
4. vnf is respawned.
5. Repeating this test again and again, system will run into issue where pings no longer happen and vnf is no longer spawned incase if it is not reachable.

Revision history for this message
Santosh (ksantosh-cs) wrote :
Revision history for this message
Bob Haddleton (bob-haddleton) wrote :

This is being fixed as part of the monitor driver framework spec implementation. The issue is caused by the monitor thread holding a lock while it loops through the hosting_devices to run the ping check. When it detects a failure and respawns the device, it is still holding the lock, and it calls delete_device() which calls delete_hosting_device(), which tries to obtain the same lock. The thread is then blocked waiting for itself to release the lock. Since that will never happen all monitoring stops.

The solution is to use RLock instead of Lock so that the lock is smart enough to recognize when the requesting thread already holds the lock it is requesting. That allows delete_hosting_device() to proceed and monitoring of the new device (and all other devices) can continue.

Changed in tacker:
assignee: nobody → Bob Haddleton (bob-haddleton)
status: New → In Progress
Changed in tacker:
importance: Undecided → Critical
Changed in tacker:
assignee: Bob Haddleton (bob-haddleton) → bharaththiruveedula (bharath-ves)
Changed in tacker:
assignee: bharaththiruveedula (bharath-ves) → Bob Haddleton (bob-haddleton)
tags: added: vnf-health-monitoring
tags: added: liberty-critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tacker (master)

Reviewed: https://review.openstack.org/224384
Committed: https://git.openstack.org/cgit/stackforge/tacker/commit/?id=1afd26a13b40bd2f8319911690f25103955bb976
Submitter: Jenkins
Branch: master

commit 1afd26a13b40bd2f8319911690f25103955bb976
Author: Bob HADDLETON <email address hidden>
Date: Wed Sep 16 21:03:35 2015 -0500

    Implement Monitoring Framework

     * Changes the monitor function to use a loadable driver

     * Changes the monitoring thread to use a re-entrant lock
       (RLock()) to prevent it from blocking itself during
        recovery actions

    Change-Id: Icf40ffd3123f3b804de16c88164d84077fbf28e2
    Implements: blueprint health-monitoring
    Closes-Bug: 1497474

Changed in tacker:
status: In Progress → Fix Committed
Changed in tacker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.