Improve logic that determines liveliness of dhcp agent

Bug #1417708 reported by Eugene Nikanorov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Eugene Nikanorov

Bug Description

It has been revealed that under load (when a bunch of networks is processed) DHCP agent fails to send heartbeat updates and neutron-server starts to consider it dead.

In network failover feature this has been worked around by checking if DHCP agent has just started up and giving it amount of time proportional to number of networks scheduled to it.

However in scheduling logic there's no such additional liveliness conditions, so starting agent may be considered dead and network will be scheduled on additional agent beyond configured limit.

So in order to avoid such negative side effect, agent liveliness check logic needs to be unified.

Tags: l3-ipam-dhcp
Revision history for this message
yong sheng gong (gongysh) wrote :

what do u mean by 'network failover feature'? where is the feature?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/152891

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/152891
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=28fe7e85cc26d1a0254990ebee1bececae06f374
Submitter: Jenkins
Branch: master

commit 28fe7e85cc26d1a0254990ebee1bececae06f374
Author: Eugene Nikanorov <email address hidden>
Date: Wed Feb 4 15:05:36 2015 +0300

    Unify logic that determines liveliness of DHCP agent

    For DHCP agents sometimes it's not enough to check agent's last heartbeat
    time because in its starting period the agent may fail to send state reports
    because it's busy processing networks.
    In rescheduling logic such DHCP agent is given additional time after start.
    Additional time is proportional to amount of networks the agent is hosting.
    Need to apply the same logic to DHCP agent scheduler to avoid a case
    when starting agent is considered dead and a network gets more hosting
    agents than configured.

    Change-Id: I0fe6244c7d2ed42e4744351be34f251318322c54
    Closes-Bug: #1417708

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.