Diagnosing Neutron connectivity issues

Bug #1537686 reported by Hynek Mlnarik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Wishlist
Unassigned

Bug Description

One of common questions seen at ask.openstack.org and mailing lists is "Why cannot I ping my floating IP address?". Usually, there are common steps in the diagnostics required to answer the question involving pinging the instance, checking security groups settings etc. Currently, these steps need to be performed manually.

neutron-debug command seems to be perfect fit for performing this diagnostics, yet currently it only supports commands related to probe port manipulation and a ping-all command.

This RFE proposes extending neutron-debug with commands that help automating the diagnostics. Commands would be implemented for ping both floating and related fixed IP address, checking whether given TCP port is open, validation of namespaces (check for namespace existence), and indicative diagnosing security groups settings.

Proposed Change
===============
This RFE suggest enhancing neutron-debug with a set of commands
that offer more fine-grained commands for diagnostic of networking
setup. Initially, the command is expected to operate either
from network or compute node (depending on the context of the command),
similarly to the current ping-all command of neutron-debug.

Each of the commands outputs a list of tests and whether they passes/failed. The list of commands and the tests follows:

* Diagnose validity of security groups for ping/TCP/UDP traffic

  | Command: ``diagnose-secgroups-ping <port-id>``
  | Command: ``diagnose-secgroups-tcp --tcp-port <NN> <port-id>``
  | Command: ``diagnose-secgroups-udp --udp-port <NN> <port-id>``

  Should be run from: any node

  This command uses API calls to check whether security groups allow external traffic to go trough to the given port.

* Diagnose validity of router settings and verify that target IP can be pinged directly from the router. Target-IP can be both fixed and floating.

  Command: ``diagnose-router <router-id> <target-IP>``

  Should be run from: router node

  This command depends on actual networking implementation, here Linux/OVS implementation is used for illustration of the tests.

  Tests:

  * Check if namespace is configured on L3 agent host
  * Ping the *target-IP* from DHCP namespace
  * Ping the *target-IP* from router namespace
  * Check that floating IP is assigned at router [only when target-IP is floating]
  * Check that both GW port and port facing LAN are in enabled admin_state_up

Initial release will only support Linux hosts and OVS switches.

**Future work**

The weakness of neutron-debug command is that it cannot be executed remotely. After completing this RFE, the next step would be to transform the current single-node CLI that executes local commands into a CLI and agent. CLI would then be able to invoke commands remotely via RPC, thus eliminating needs to get access to the affected system beforehand.

This architecture would also create a basis upon which GUI diagnostic tools can operate. Demand for this tool can be seen e.g. in
`bug 1507499 <https://bugs.launchpad.net/neutron/+bug/1507499>`_.

Tags: rfe
Revision history for this message
Manjeet Singh Bhatia (manjeet-s-bhatia) wrote :

will be interesting to have this feature if a command can tell whats the issue .

description: updated
Revision history for this message
Hynek Mlnarik (hmlnarik-s) wrote :

I have modified the description to include more details on aim of this work. As of current plan, it is not exactly about telling, what's the issue, but providing diagnostics that can be used to reveal the cause. Automatic suggestion of a (list of) potential cause(s) is a logical next step that would be possible to implement but would be out of scope of this RFE.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/273075

Miguel Lavalle (minsel)
Changed in neutron:
importance: Undecided → Wishlist
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

There are a number of BP/RFE related to this issues, most recently. We should try to keep the conversation in one place.

[1] https://bugs.launchpad.net/neutron/+bug/1507499

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Marking as duplicate just to make sure we can all focus in one dashboard. I am thinking of raising the profile of this topic at the forthcoming mid-cycle.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Hynek Mlnarik (<email address hidden>) on branch: master
Review: https://review.openstack.org/273075
Reason: The POC has served its purpose

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-specs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/308973

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-specs (master)

Reviewed: https://review.openstack.org/308973
Committed: https://git.openstack.org/cgit/openstack/neutron-specs/commit/?id=dc11da5109759d13636aaaef35420fa4ac1d88d6
Submitter: Jenkins
Branch: master

commit dc11da5109759d13636aaaef35420fa4ac1d88d6
Author: Boden R <email address hidden>
Date: Wed Feb 15 15:47:07 2017 -0700

    Neutron resource diagnostics

    This spec proposes the introduction of a neutron diagnostics framework
    and API extension capable collecting resource diagnostics across
    neutron API and agent nodes. To keep the spec containable, the proposal
    suggests only providing a sample diagnostic check and reiterating on
    concrete diagnostics once we get the plumbing in place.

    While this spec has some inspiration from nova diagnostics [1],
    the approach herein is more generic and extensible supporting a
    broader set of use cases longer term.

    Finally it seeks to pave the way for supporting use case / features
    proposed in the related bugs.

    [1] https://wiki.openstack.org/wiki/Nova_VM_Diagnostics

    Related-Bug: #1507499
    Related-Bug: #1519537
    Related-Bug: #1537686
    Related-Bug: #1563538

    Change-Id: Id534acb1593f1fe210c561b1451656dce69514db

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers