[RFE] Port status update

Bug #1598081 reported by Carlos Goncalves on 2016-07-01
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Wishlist
Carlos Goncalves

Bug Description

Neutron port status field represents the current status of a port in the cloud infrastructure. The field can take one of the following values: 'ACTIVE', 'DOWN', 'BUILD' and 'ERROR'.

At present, if a network event occurs in the data-plane (e.g. virtual or physical switch fails or one of its ports, cable gets pulled unintentionally, infrastructure topology changes, etc.), connectivity to logical ports may be affected and tenants' services interrupted. When tenants/cloud administrators are looking up their resources' status (e.g. Nova instances and services running in them, network ports, etc.), they will wrongly see everything looks fine. The problem is that Neutron will continue reporting port 'status' as 'ACTIVE'.

Many SDN Controllers managing network elements have the ability to detect and report network events to upper layers. This allows SDN Controllers' users to be notified of changes and react accordingly. Such information could be consumed by Neutron so that Neutron could update the 'status' field of those logical ports, and additionally generate a notification message to the message bus.

However, Neutron misses a way to be able to receive such information through e.g. ML2 driver or the REST API ('status' field is read-only). There are pros and cons on both of these approaches as well as other possible approaches. This RFE intends to trigger a discussion on how Neutron could be improved to receive fault/change events from SDN Controllers or even also from 3rd parties not in charge of controlling the network (e.g. monitoring systems, human admins).

Changed in neutron:
importance: Undecided → Wishlist
Changed in neutron:
status: New → Confirmed

Incidentally there's a public Python API being worked on to provide the ability to ML2 drivers to set the port status. That has the drawback of being such that only ML2 drivers can call it according to their own logic to detect faults etc.

In reality Neutron is considered the source of truth for Neutron ports and providing the ability to REST clients to force a status change on the ports is clearly in violation of that. An API client can always force an admin_status_up change, which 'disables' the port from the data plane.

[1] https://review.openstack.org/#/c/336068/

Some food for thought, ideally what you need can always be realized via some sort of callback, ML2 mechanism that access the internals of the Neutron logical model. That obviously has the drawback of tighter integration with Neutron, but if you care about forcing a status change of Neutron ports you are coupled with Neutron anyway :)

I'd lean more on providing a Python API for triggering a port status change than changing the REST API, but I am happy to hear other opinions.

Changed in neutron:
status: Confirmed → Triaged
status: Triaged → Confirmed
Carlos Goncalves (cgoncalves) wrote :

Yes, [1] is in line with this RFE. I may be missing how exactly a mech driver would be triggered to invoke set_status() as no hints are given in there or in bug #1587401. I have some thoughts on that and very briefly mentioned two of many possible approaches in this RFE description. A bit more --verbose:

1) allow read-write of Port 'status' through the REST API (currently read-only)
   + easy to implement
   + SDN controllers and 3rd parties could call as REST API clients
   - conceptually call should flow bottom-up

2) introduce 'force-down' parameter in Port -- similar to what was done in Nova (force mark host down)
   + easy to implement
   + SDN controllers and 3rd parties could call as REST API clients
   - could potentially conflict with admin_status_up (actually its role is not that much well defined/formalized but...)
   - conceptually call should flow bottom-up

3) extend each mech driver in such a way it could receive callbacks (via e.g. REST, RPC, etc) from SDN controllers; perhaps that's the idea [1] have in mind?
   + each mech driver plugin could have its own and different implementation
   - lacking of common approach cross-mech drivers or duplicated code if many implement same way
   - N listeners per N mech driver, plus authentication and authorization would be needed be validate caller.
   - tightly coupled with and supporting updating only from mech drivers

4) new neutron plugin receiving callbacks from SDN controllers, generating notification message and dispatching via Neutron callback system [2]. An internal Neutron component (e.g. mech driver) subscribed to that event (e.g. type=='port_status_update' and source=='sdn_controller_X') would be called back
   + plugin could be extended to support REST, RPC, etc calls
   + 3rd parties could also invoke it
   + works bottom-up and up-bottom-up; I'm calling this "roller coaster" flow approach :-)
   + not tightly coupled to a neutron component (e.g. mech driver)
   +/- multiple neutron components could be called back simultaneously; possibly beneficial for cases like neutron servers (FWaaS, LBaaS) that could use such information to update services status quickly/in parallel with neutron port status (just "thinking out loud"...)
   - confusing?

[1] https://review.openstack.org/#/c/336068/
[2] http://docs.openstack.org/developer/neutron/devref/callbacks.html

Carlos Goncalves (cgoncalves) wrote :

Comment #3 item 4): s/like neutron servers/like neutron services/

So far it seems I am the only one voicing an opinion.

Having said that, the mechanism to invoke the port status change is left to the mech driver/plugin managing that port, ie. it is implementation specific. For instance, a plugin that uses an SDN controller may already use some sort of sync mechanism that could be piggybacked to reflect status changes on the port.

This means that only internal parts of a Neutron system can affect the port status and that's ok because only Neutron has authority over a Neutron port, no other external entity. An API consumer may disable a port via admin_state_up. Now if you think that admin_state_up behavior is not well defined for a port, rather than adding a new attribute, I'd invite you to fix the one we already have :)

Changed in neutron:
status: Confirmed → Triaged

Agreed with @armax on that one. Drivers should be responsible for port states. We should provide them with a way to report back to neutron, if they have such monitoring ability. Indeed boolean may not be enough to reflect different states of a port. If so, we should consider exposing another more flexible read only attribute for the resource that could be populated by core plugin.

Carlos Goncalves (cgoncalves) wrote :

I agree once more drivers should be responsible for port states, and hence approach 3 (in comment #3). Although I still see a gap that prevents other systems (e.g. monitoring systems) to report failures to Neutron and thus approach 4 (as a new Neutron service plugin) could help fill that gap. Infrastructures may use those external systems because either the SDN controller may not support reporting of failures or its Neutron driver not support receiving those notifications (i.e. does not implement approach 3) or even because the faulty network infrastructure may not be managed by an SDN controller.

It does not need to be an either-or situation between the mentioned approaches. I am saying there is potentially space and need for both approach 3 and 4.

Look at bug 1575146 for further details. As of now the team suggests to develop a spec so that we can more formally discuss with a wider audience.

Fix proposed to branch: master
Review: https://review.openstack.org/351675

Changed in neutron:
assignee: nobody → Carlos Goncalves (cgoncalves)
status: Triaged → In Progress

Reviewed: https://review.openstack.org/351675
Committed: https://git.openstack.org/cgit/openstack/neutron-specs/commit/?id=e919701c29d556eb9eeb014180d854c04bb46775
Submitter: Jenkins
Branch: master

commit e919701c29d556eb9eeb014180d854c04bb46775
Author: Carlos Goncalves <email address hidden>
Date: Fri Aug 5 13:46:33 2016 +0200

    Port data plane status

    Partial-Bug: #1598081
    Partial-Bug: #1575146

    Change-Id: I1a832c1e3dd0f8ce27780518d5a4876c0e19dd16

Akihiro Motoki (amotoki) on 2017-01-20
tags: added: rfe-approved
removed: rfe

Reviewed: https://review.openstack.org/424868
Committed: https://git.openstack.org/cgit/openstack/neutron-lib/commit/?id=ee74cb2a5ccdc13e8bf137d7387f01c6b202c150
Submitter: Jenkins
Branch: master

commit ee74cb2a5ccdc13e8bf137d7387f01c6b202c150
Author: Carlos Goncalves <email address hidden>
Date: Tue Jan 24 21:52:27 2017 +0000

    API definition and reference for data plane status extension

    Related-Bug: #1598081
    Related-Bug: #1575146

    Partial-Implements: blueprint port-data-plane-status

    Change-Id: I04eef902b3310f799b1ce7ea44ed7cf77c74da04

Reviewed: https://review.openstack.org/424340
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=89de63de05e296af583032cb17a3d76b4b4d6a40
Submitter: Jenkins
Branch: master

commit 89de63de05e296af583032cb17a3d76b4b4d6a40
Author: Carlos Goncalves <email address hidden>
Date: Mon Jan 23 19:53:04 2017 +0000

    Port data plane status extension implementation

    Implements the port data plane status extension. Third parties
    can report via Neutron API issues in the underlying data plane
    affecting connectivity from/to Neutron ports.

    Supported statuses:
      - None: no status being reported; default value
      - ACTIVE: all is up and running
      - DOWN: no traffic can flow from/to the Neutron port

    Setting attribute available to admin or any user with specific role
    (default role: data_plane_integrator).

    ML2 extension driver loaded on request via configuration:

      [ml2]
      extension_drivers = data_plane_status

    Related-Bug: #1598081
    Related-Bug: #1575146

    DocImpact: users can get status of the underlying port data plane;
    attribute writable by admin users and users granted the
    'data-plane-integrator' role.
    APIImpact: port now has data_plane_status attr, set on port update

    Implements: blueprint port-data-plane-status

    Depends-On: I04eef902b3310f799b1ce7ea44ed7cf77c74da04
    Change-Id: Ic9e1e3ed9e3d4b88a4292114f4cb4192ac4b3502

Changed in neutron:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related blueprints