Add validation to disallow OVS round-robin bonding

Bug #1612786 reported by Assaf Muller
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Undecided
Unassigned

Bug Description

OVS balance-tcp mode bonding is known to cause packet loss issues in modern versions of OVS. More information may be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1267291.

This request is for TripleO, perhaps os-net-config to validate input so that this particular bonding mode is forbidden, and that TripleO will throw an error instead of configuring it having users complain of connectivity issues much later.

Dan Sneddon (dsneddon)
summary: - Add validation to disallow OVS round-robin bonding
+ Add validation to disallow OVS balance-tcp bonding
summary: - Add validation to disallow OVS balance-tcp bonding
+ Add validation to disallow OVS round-robin bonding
Revision history for this message
Dan Sneddon (dsneddon) wrote :

The applicable resource is the BondInterfaceOvsOptions parameter. When this is set to configure OVS for "balance-tcp" mode, a packet-loss bug is triggered in OVS 2.4.

Today this data is usually set in the parameter_defaults section of the network-environment.yaml, and that parameter is used in the various NIC config templates that are assigned to each role.

There is no handling of this parameter within the TripleO Heat Templates, except to pass that parameter to the NetworkConfig stack. The value is not written to /etc/puppet/hieradata, nor is it used by Puppet, so the validation can't happen within Puppet.

Since this is a freeform string parameter, we will need to use a regular expression or simple pattern match for "balance-tcp" in this parameter. We might be able to include this in the TripleO validations. I hesitate to add a check within os-net-config, simply because by the time os-net-config has run the deployment is already more than halfway done. We should ideally do this validation before kicking off the node deployments.

Revision history for this message
Brent Eagles (beagles) wrote :

Agreed. I'm testing using allowed-pattern constraints on the template parameter to see if we can block it there.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/355073

Changed in tripleo:
assignee: nobody → Brent Eagles (beagles)
status: New → In Progress
Changed in tripleo:
importance: Undecided → Medium
milestone: none → newton-3
Revision history for this message
Brent Eagles (beagles) wrote :

Pulled from review comments (good points, thanks bnemec!)

"... this constraint needs to be added to our nic-configs since those are where the bond options get applied to a system. Actually, I still think we need a proper external validation for this since it involves templates that users will be editing, and thus may have existing versions without this constraint that will be carried forward into this release, but I'm good with this as a partial fix until the validations framework is fully integrated.
Note that we also need to fix the examples in http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/network_isolation.html to include the constraint, besides the sample nic-configs we ship in t-h-t."

Steven Hardy (shardy)
Changed in tripleo:
milestone: newton-3 → newton-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/355073
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6298b7c35ed4a7411d8e8da1008efc0f585ffb75
Submitter: Jenkins
Branch: master

commit 6298b7c35ed4a7411d8e8da1008efc0f585ffb75
Author: Brent Eagles <email address hidden>
Date: Fri Aug 12 18:07:24 2016 -0230

    Add constraint to prohibit balance-tcp from BondInterfaceOvsOptions

    This patch adds an allowed_pattern contraint that uses a negative
    lookahead assertion to only allow options strings that do not contain
    the 'balance-tcp' option.

    Change-Id: Icf8874e4e585f9a42d38091f8b38c3685f403cf1
    Partial-Bug: #1612786

Revision history for this message
Brent Eagles (beagles) wrote :

Where this is only a partial fix, I propose bumping the bug to Ocata.

Revision history for this message
Brent Eagles (beagles) wrote :

Actually re-reading Ben's comments, there is still doc and example clean up work that should be done for this release.

Changed in tripleo:
milestone: newton-rc1 → newton-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-docs (master)

Fix proposed to branch: master
Review: https://review.openstack.org/372590

Changed in tripleo:
assignee: Brent Eagles (beagles) → Emilien Macchi (emilienm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-docs (master)

Reviewed: https://review.openstack.org/372590
Committed: https://git.openstack.org/cgit/openstack/tripleo-docs/commit/?id=b4892e6f3d1269983621d8c827cf4a98e42b222a
Submitter: Jenkins
Branch: master

commit b4892e6f3d1269983621d8c827cf4a98e42b222a
Author: Brent Eagles <email address hidden>
Date: Mon Sep 19 13:11:00 2016 -0230

    Remove references to balance-tcp

    Due to repeated reports of package loss using the balance-tcp bond mode
    with OVS, we've added a constraint* to prohibit its use. This document
    update removes references to that bond mode in the network isolation
    configuration examples.

    * https://review.openstack.org/#/c/355073/

    Change-Id: I42a22b01d4e3c1fdb76441a4ca9b7c666a96fada
    Partial-Bug: #1612786

Revision history for this message
Brent Eagles (beagles) wrote :

The patches that are appropriate for newton have merged (hot validation and docs). Checks via validation framework are deferred to Ocata.

Changed in tripleo:
milestone: newton-rc2 → ocata-1
assignee: Emilien Macchi (emilienm) → Brent Eagles (beagles)
Steven Hardy (shardy)
Changed in tripleo:
milestone: ocata-1 → ocata-2
Changed in tripleo:
milestone: ocata-2 → ocata-3
Changed in tripleo:
milestone: ocata-3 → ocata-rc1
Changed in tripleo:
milestone: ocata-rc1 → ocata-rc2
Changed in tripleo:
milestone: ocata-rc2 → pike-1
Changed in tripleo:
milestone: pike-1 → pike-2
Changed in tripleo:
milestone: pike-2 → pike-3
Changed in tripleo:
milestone: pike-3 → pike-rc1
Changed in tripleo:
milestone: pike-rc1 → queens-1
Revision history for this message
Alex Schultz (alex-schultz) wrote :

This appears to not have any open patches and hasn't been updated in some time. Moving backed to triaged.

Changed in tripleo:
status: In Progress → Triaged
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Revision history for this message
Emilien Macchi (emilienm) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (FUTURE, PIKE, QUEENS, ROCKY, STEIN).
  Valid example: CONFIRMED FOR: FUTURE

Changed in tripleo:
assignee: Brent Eagles (beagles) → nobody
importance: Medium → Undecided
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.