vnf monitoring using ping is not consistent
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tacker |
Fix Released
|
Critical
|
Bob Haddleton |
Bug Description
1. Create a vnfd with following data.
template_name: sample-
description: demo-example
service_properties:
Id: sample-vnfd
vendor: tacker
version: 1
vdus:
vdu1:
id: vdu1
vm_image: cirros-
instance_type: m1.tiny
network_
management:
network: net_mgmt
management: true
pkt_in:
network: net0
pkt_out:
network: net1
placement_
availabil
auto-scaling: noop
monitoring_
failure_policy: respawn
config:
param0: key0
param1: key1
2. Createa a vnf based on above vnfd.
3. Once vnf instance is up completely, do "sudo ifdown eth0" so that ping checks fail (verify tacker.log)
4. vnf is respawned.
5. Repeating this test again and again, system will run into issue where pings no longer happen and vnf is no longer spawned incase if it is not reachable.
Changed in tacker: | |
importance: | Undecided → Critical |
Changed in tacker: | |
assignee: | Bob Haddleton (bob-haddleton) → bharaththiruveedula (bharath-ves) |
Changed in tacker: | |
assignee: | bharaththiruveedula (bharath-ves) → Bob Haddleton (bob-haddleton) |
tags: | added: vnf-health-monitoring |
tags: | added: liberty-critical |
Changed in tacker: | |
status: | Fix Committed → Fix Released |
This is being fixed as part of the monitor driver framework spec implementation. The issue is caused by the monitor thread holding a lock while it loops through the hosting_devices to run the ping check. When it detects a failure and respawns the device, it is still holding the lock, and it calls delete_device() which calls delete_ hosting_ device( ), which tries to obtain the same lock. The thread is then blocked waiting for itself to release the lock. Since that will never happen all monitoring stops.
The solution is to use RLock instead of Lock so that the lock is smart enough to recognize when the requesting thread already holds the lock it is requesting. That allows delete_ hosting_ device( ) to proceed and monitoring of the new device (and all other devices) can continue.