neutron_l3_agent and neutron_l3_metadata groups include physical host

Bug #1645979 reported by Miguel Alejandro Cantu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-ansible
Fix Released
Undecided
Logan V

Bug Description

I'm running an OpenStack-Ansible AIO off the stable/newton branch with the default configurations.

The physical host is included in the neutron_l3_agent and neutron_metadata_agent group, as shown here in this sample of my openstack_inventory.json file:
https://gist.github.com/alextricity25/d509574d094ed8104534c381a57b0ccd

This wasn't the behavior in Mitaka, and I find it strange that this was introduced in Newton since the physical host(aio1) is not running the l3 or metadata agents.

Am I missing anything?

-Alex

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

Indeed, neutron_l3_agent and metadata_agent are extended in N vs M.

I'll investigate why.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

Ok, the env.d/nova has now the following (N/master):

  nova_compute_container:
    belongs_to:
      - compute_containers
    contains:
      - neutron_linuxbridge_agent
      - neutron_openvswitch_agent
      - neutron_l3_agent
      - neutron_metadata_agent
      - nova_compute
    properties:
      is_metal: true
      service_name: nova

instead of (M):

  nova_compute_container:
    belongs_to:
      - compute_containers
    contains:
      - neutron_linuxbridge_agent
      - nova_compute
    properties:
      is_metal: true
      service_name: nova
      container_release: trusty

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

This was introduced for ovs dvr support in the commit: 43f585a3

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

There is therefore a valid bug IMO: Why does default neutron lb implementation mentions (in inventory) components on hosts where they don't belong...

Revision history for this message
Travis Truman (travis-truman) wrote :
Revision history for this message
Miguel Alejandro Cantu (miguel-cantu) wrote :

Sorry for not being a little more thorough in the description. This is causing a problem for the playbooks that we run to deploy code that checks the status of the l3_agent(note: we are running the linuxbridge plugin). Our playbooks target the neutron_l3_agent and neutron_metadata group then deploys code that targets the hosts in said groups to see if the agents are running in these hosts. Since the compute node now is part of these groups, our code tries to see if the l3_agent is running on the compute node and fails because there is no l3 agent running on the compute node. I personally still feel like this is incorrect behavior. The compute nodes are not running the l3 or metadata agent, but are still part of that group. IMO that shouldn't be happening. Only containers/hosts that are running the l3 agent should be part of the neutron_l3_agent group, similarly for the containers/hosts running the metadata agent.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

Travis, I agree with you, osa doesn't deploy where it's not needed, and it's fine by itself. However, the inventory is unclean.

Let me answer here with a few workarounds and solutions, keeping in mind that N is now stable.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

Here are our options:

Option 1: Return to pre-N linux bridge behavior with code, other branches are informed with docs.

Fix the current env.d/nova entries in master by setting them to [1]:

  nova_compute_container:
    belongs_to:
      - compute_containers
    contains:
      - neutron_linuxbridge_agent
      - neutron_openvswitch_agent
      - nova_compute
    properties:
      is_metal: true
      service_name: nova

This change should also introduce the documentation in the neutron role in the "Using OVS", to properly state that if the user wants dvr he needs to create an env.d/nova with the following content [2]:

  nova_compute_container:
    belongs_to:
      - compute_containers
    contains:
      - neutron_linuxbridge_agent
      - neutron_openvswitch_agent
      - neutron_l3_agent
      - neutron_metadata_agent
      - nova_compute
    properties:
      is_metal: true
      service_name: nova

Previous branches, in order to not change the stable behavior, would require a release note with a known issue: Existing deployers that are impacted by this issue would basically have to ship their own env.d override like above [1].

Pros/Cons:
+ The inventory makes sense and is consistent with what's deployed
+ No change in stable/ branches
+ Lxb deployers have the same behavior they use to have
- It requires work for OVS deployers when upgrading and onwards

Option 2: Introduce a conditional behavior of env.d in master, other branches are informed with docs.

This would require work on the inventory, to check user_variables* and see if ovs/lxb was used, and then change the env.d accordingly.

Pros:
+ Deployers of ovs/lxb would automatically get what they need without changes
- It's a lot of wiring for something that could be reduced as a doc change, and maintaining an env.d file.

Option 3: Do nothing.

Because the issue is an edge case (problem of inventory definition/usage), ppl could workaround it the way it's done for neutron role, like Travis explained before. (Basically workarounding by having conditional behavior based on inventory).

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

> Option 2: Introduce a conditional behavior of env.d in master, other branches are informed with docs.
>
> This would require work on the inventory, to check user_variables* and see if ovs/lxb was used, and then change the env.d accordingly.
>
> Pros:
> + Deployers of ovs/lxb would automatically get what they need without changes
> - It's a lot of wiring for something that could be reduced as a doc change, and maintaining an env.d file.

While this is certainly something we could explore, I'm hesitant to do so because of the uncertain future of the env.d mechanisms, specifically as discussed in https://bugs.launchpad.net/openstack-ansible/+bug/1643955. While env.d won't be going away in the near term, my preference would be to avoid adding too many behaviors to it that might introduce barriers to migrating away.

I'd also agree that forcing deployers to carry separate environment files for OVS would be annoying, but not necessarily out of line with how we generally do things. LXB is still our default. The downside is that to flex the OVS functionality in the gates some (more) env.d manipulation would have to be put in place.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

sorry for the misclick

Changed in openstack-ansible:
importance: Undecided → Wishlist
importance: Wishlist → Undecided
Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

This is gonna be discussed in the next community meeting.

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

For now - it might be an idea to simply remove those options and document it better.

This will result in the following:

* New OVS-DVR deploys will need to know (and read the documentation we add) to create a custom env.d file.
* Existing OVS-DVR deploys will continue as normal, EXCEPT when adding a new compute host (assuming they first updated their OSA deploy and then tried to deploy to the new host) - a release note will be added.

This is because the dynamic_inventory doesn't actively remove the hosts, since they haven't changed, as such the groupings remain the same.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

Agreed.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

I agree with Andy's suggestion.

Revision history for this message
Logan V (loganv) wrote :

For right now, it seems like Option #1 of reverting the environment, known issue the stable branches, carrying documentation to implement the override, and requiring operators to drop in the overrides when using OVS+DVR makes sense and follows the OSA conventions used for deploying other non-LXB neutron plugins.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

There's also a 4th option:

Use Ansible patterns to target more specifically, something like

`neutron_l3_agent:neutron_metadata_agent:!compute_hosts`

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

So, to bring this full circle, after looking more closely at the tasks Alex is trying to execute, I don't think that the host pattern will work. The playbook and roles are highly generalized, such that changing the patterns leaves out most of the hosts and containers. There are conditionals in the task files that might be modified, but they're also very general and hard to get specific with.

In light of that, I think the short term fix for Alex is to carry an override file, and OSA should update its environment and documentation.

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

Is there another approach:

* Remove the neutron_l3_agent and neutron_metadata_agent from the group.
* conditionally apply any l3/metadata_agent bits on hosts that are also in group nova_compute WHEN using ml2.ovs.dvr - or are in the l3_agent/metadata_agent group.

No change in functionality for ovs.dvr but the grouping stays correct.

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

Or another option from odyssey4me:

"can always just use an add_host task to add the host to the group conditionally
ie if you're using dvr, then add the compute host into the group"

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

I personally prefer the add_host way + the cleanup of the env.d (to be closer to std way we deploy things) + release note to indicate the change.

With the add_host, we conditionally build a dynamic group for execution of the work, which seems what we should have done all along!

Changed in openstack-ansible:
assignee: nobody → Logan V (loganv)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/411348
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=4ba6bc315ed9c94b5088b4efdd2734f5c9476196
Submitter: Jenkins
Branch: master

commit 4ba6bc315ed9c94b5088b4efdd2734f5c9476196
Author: Logan V <email address hidden>
Date: Thu Dec 15 08:37:59 2016 -0600

    Dynamically group Neutron OVS DVR to compute

    Revert I3aaf8ced1ac406069f2d6e3b52b8d59eddda7eac to fix a bug where
    deployments not using OVS+DVR contain unnecessary group mappings
    that are filtered out within the Neutron role. Instead, we stop
    adding these services to the compute host groups by default, and
    conditionally add them when DVR is selected as the Neutron plugin.

    Closes-Bug: #1645979
    Change-Id: I543d1b5d3d50b6e7936605f436805d2f0ad9b3e9

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible 15.0.0.0b3

This issue was fixed in the openstack/openstack-ansible 15.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/459443

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (stable/newton)

Reviewed: https://review.openstack.org/459443
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=00773520740caa8f57600399db1717dca5b2abeb
Submitter: Jenkins
Branch: stable/newton

commit 00773520740caa8f57600399db1717dca5b2abeb
Author: Major Hayden <email address hidden>
Date: Tue Apr 25 08:38:58 2017 -0500

    Dynamically group Neutron OVS DVR to compute

    Revert I3aaf8ced1ac406069f2d6e3b52b8d59eddda7eac to fix a bug where
    deployments not using OVS+DVR contain unnecessary group mappings
    that are filtered out within the Neutron role. Instead, we stop
    adding these services to the compute host groups by default, and
    conditionally add them when DVR is selected as the Neutron plugin.

    This is a manual cherry pick from the stable/ocata commit
    4ba6bc315ed9c94b5088b4efdd2734f5c9476196.

    Related to RPC-O PR: rcbops/rpc-openstack#2187
    Closes-Bug: #1645979
    Change-Id: I543d1b5d3d50b6e7936605f436805d2f0ad9b3e9

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible 14.2.2

This issue was fixed in the openstack/openstack-ansible 14.2.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers