port_update from nova could potentially occurs after get_device_details/update_device_up

Bug #1274160 reported by Mathieu Rohon
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Invalid
Medium
Mathieu Rohon

Bug Description

during live-migration, if neutron process port_update(binding:host) after processing the rpc message get_device_details/update_device_up, the port will remain in a BUILD state.

this is a race condition that I have never seen, but I think this could potentially occurs, since the agent could trigger its RPC message before neutron receives its API call.

Tags: ml2
Changed in neutron:
assignee: nobody → Mathieu Rohon (mathieu-rohon)
tags: added: ml2
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Mathieu, is there any progress with this bug?

Changed in neutron:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Sean Dague (sdague) wrote :

Marking as invalid as it's a theory yet to be seen.

Sean Dague (sdague)
no longer affects: nova
Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

@haruka was able to reproduce this bug thanks to pdb;

I think we should make sure that nova call to neutron update_port(binding:new_host) is handled before nova-compute plugs the port into the bridge.

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

I may have hit this here: http://logs.openstack.org/36/143236/8/gate/gate-tempest-dsvm-neutron-full-2/90f4a61/console.html

port_state remained in the BUILD state when it was expected to be ACTIVE, anyway.

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

@Terry : I don't know how interface attachment works, but it seems that the same scenario might occur : the attached interface sends its update_device_up before nova has sent its port_update(binding:host).
This would result in leaving the port in the BUILD state.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/163178

Changed in neutron:
status: Incomplete → In Progress
Changed in neutron:
assignee: Mathieu Rohon (mathieu-rohon) → Assaf Muller (amuller)
Assaf Muller (amuller)
Changed in neutron:
assignee: Assaf Muller (amuller) → nobody
Revision history for this message
Mike Kolesnik (mkolesni) wrote :

I saw a manifestation of the bug using HA router ports.

For me it happened in a 2 net node setup where the sequence of events was as such:
1. Port is bound to net node A
2. Port is bound to net node B (overwriting A's binding info)
3. B requests the port info
3.1. Port's status updated to BUILD
4. A requests the port info (port status not changed since it's already BUILD)
5. B reports port as UP
5.1. Port status updated to ACTIVE
6. B requests the port info (ancillary port treatment)
6.1. Port's status updated to BUILD
7. B reports port as UP (ancillary port treatment)
7.1. Port status updated to ACTIVE
8. A reports port as UP (ignored since it's "bound" on B)
9. A requests the port info (ancillary port treatment)
9.1. Port's status updated to BUILD
10. A reports port as UP (ancillary port treatment, ignored since it's "bound" on B)

Seems that the fix proposed by Mathieu will solve this.

Changed in neutron:
assignee: nobody → Mathieu Rohon (mathieu-rohon)
Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

Actually, once nova send port_update() message to notify the new host of the port, ML2 will send update_port( to every agents.
The agent which is hosting the port will now resend get_device_details()/update_device_up() and the port will not remain in BUILD state.

That's the reason why I'm invalidating the bug.

The behavior described bug Mike on comment #7 will be tracked by bug 1416933

Changed in neutron:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.