dhcp-agent: too many pending messages for inactive node

Bug #1433940 reported by Han Zhou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Han Zhou

Bug Description

Neutron server keeps sending notification messages (e.g. port create/update/delete, network create/update/delete etc. ) to all dhcp-agents including inactive ones. This creates a problem that after running for a long time there can be huge amount of pending messages in rabbitmq for those inactive nodes.

The consequence is, when those inactive nodes are back to active, it will stuck if there are too many messages to handle (depends on how long the agent has stayed in inactive state), and sometime failed to start properly due to errors like:

ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: [Errno 104] Connection reset by peer

So the proposal is to send notifications to active nodes only.

Tags: l3-ipam-dhcp
Revision history for this message
Han Zhou (zhouhan) wrote :

In the code dhcp_rpc_agent_api.py, the logic is:
        enabled_agents = [x for x in agents if x.admin_state_up]
        active_agents = [x for x in agents if x.is_active]
        len_enabled_agents = len(enabled_agents)
        len_active_agents = len(active_agents)
        if len_active_agents < len_enabled_agents:
            LOG.warn(_LW("Only %(active)d of %(total)d DHCP agents associated "
                         "with network '%(net_id)s' are marked as active, so "
                         "notifications may be sent to inactive agents."),
                     {'active': len_active_agents,
                      'total': len_enabled_agents,
                      'net_id': network_id})

I wonder why printing the warning instead of just not sending to those nodes. Or we can send to nodes that are both enabled and active.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/165749

Changed in neutron:
assignee: nobody → Han Zhou (zhouhan)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Han Zhou (<email address hidden>) on branch: master
Review: https://review.openstack.org/165749

Changed in neutron:
importance: Undecided → Medium
status: In Progress → New
Revision history for this message
Gary Kotton (garyk) wrote :

This look like it may be related to https://launchpad.net/bugs/1505166.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.