neutron-dhcp-agent does not recover known ports cache after restart

Bug #1470612 reported by Shraddha Pandhe
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Expired
Medium
Unassigned

Bug Description

When the agent restarts, it loses its previous network cache. As soon as the agent starts, as part of "__init__", it rebuilds that cache [1]. But it does not put the ports in there [2].

In sync_state, Neutron tries to enable/disable networks, by checking the diff between Neutron's state and its own network cache that it just built [3]. It enables any NEW networks and disables any DELETED networks, but it does nothing to PREVIOUSLY KNOWN NETWORKS. So those subnets and ports remain empty lists.

Now, if such a port is deleted, [4] will return None and the port will never get deleted from the config.

Filing this bug based on my conversation with Kevin Benton on IRC [5]

[1] https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L68
[2] https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L79-L86
[3] https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L154-L171
[4] https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L349
[5] http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2015-07-01.log.html

Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
Revision history for this message
shihanzhang (shihanzhang) wrote :

I have check it with master branch, it is a bug, it is easy to reproduce.

Changed in neutron:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
venkata anil (anil-venkata) wrote :

@shihanzhang

Can you kindly explain me the steps you used to reproduce this bug.

Thanks
Anil

Revision history for this message
Shraddha Pandhe (shraddha-pandhe) wrote :

@Anil,

I had a discussion about this with Kevin and we had planned on working on this together. Assigning back to me.

Changed in neutron:
assignee: venkata anil (anil-venkata) → Shraddha Pandhe (shraddha-pandhe)
Revision history for this message
venkata anil (anil-venkata) wrote :

no problem, thanks

Revision history for this message
venkata anil (anil-venkata) wrote :

Please also check whether this change https://review.openstack.org/#/c/197937/ is related?

Thanks
Anil

Revision history for this message
shihanzhang (shihanzhang) wrote :

@venkata anil
reproduce steps:
1. create a network, subnet
2. create VMs in this network
3. restart dhcp agent for this network
4. delete a VM in this network

you will find that the deleted VM's IP and MAC already in dhcp leases file

Revision history for this message
shihanzhang (shihanzhang) wrote :

@Shraddha Pandhe, did you have submit patch for this bug?

Revision history for this message
shihanzhang (shihanzhang) wrote :

@Shraddha Pandhe, do you submit patch for this bug recently

Revision history for this message
Sridhar Gaddam (sridhargaddam) wrote :

@Shraddha, I'm working on a related patch [1] and this bug came up during the review phase.
If you plan to submit a patch for this issue, let me know. Otherwise, I can look into it.

[1] https://review.openstack.org/#/c/205888/3/neutron/agent/dhcp/agent.py

Revision history for this message
Shraddha Pandhe (shraddha-pandhe) wrote :

Hi @shihanzhang, @Sridhar,

I am working on the patch. Looking at your bug and related patch, it seems this bug is about the dhcp port and not instance ports. Also the scenario is not about agent restart.

It seems your patch is slightly related to this bug. I will talk to you on IRC in case I need any inputs from you. Thanks!

Revision history for this message
Sridhar Gaddam (sridhargaddam) wrote :

Sure @Shraddha, please go ahead and propose a patch.

Revision history for this message
Shraddha Pandhe (shraddha-pandhe) wrote :

@shihanzhang, you said you were able to reproduce the issue, right?

Looking at the code one more time, I see that the agent recovers all the networks, and not just the known networks. Following the code,

1. __init__ gets called which stores subnets = [] and ports = [] for each network
2. init_host gets called which calls sync_state with networks=None
3. In sync_state, looking at [1], only_nets is going to be [], which is going to trigger safe_configure_dhcp_for_network for all the networks, and not just the existing ones.
4. This call eventually leads to calling spawn_process on the driver, which refreshes all the config files - host, opts, lease etc.

Am I missing something? When I initially saw this issue, I was using a patched version of Juno. I haven't tried using Master yet.

[1] https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L166-L167

Revision history for this message
shihanzhang (shihanzhang) wrote :

@Shraddha Pandhe, maybe it was my devstack environment problem when I test it at #6 comment, today I test is with latest master codes, there is no this problem, then I read the codes in /dhcp/agents.py carefully, I think your analysis is correct, in sync_state funcition, it will call 'configure_dhcp_for_network', when it call driver 'enable' successfully, it will put this network which includes 'network, subnet, ports' info in cache with 'self.cache.put(network)'
    def configure_dhcp_for_network(self, network):
        if not network.admin_state_up:
            return

        enable_metadata = self.dhcp_driver_cls.should_enable_metadata(
                self.conf, network)
        dhcp_network_enabled = False

        for subnet in network.subnets:
            if subnet.enable_dhcp:
                if self.call_driver('enable', network):
                    dhcp_network_enabled = True
                    self.cache.put(network)
                break

Revision history for this message
Shraddha Pandhe (shraddha-pandhe) wrote :

@shihanzhang, should I call this bug as Invalid then?

Revision history for this message
Kyle Mestery (mestery) wrote :

Moving priority down a bit while Shraddha determines if this is a valid bug or not.

Changed in neutron:
importance: High → Medium
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This bug is > 172 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in neutron:
assignee: Shraddha Pandhe (shraddha-pandhe) → nobody
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.