Eavesdropping private traffic

Bug #1734320 reported by Paul Peereboom
50
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
sean mooney
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
neutron
Fix Released
High
Rodolfo Alonso
os-vif
Fix Released
High
sean mooney

Bug Description

Eavesdropping private traffic
=============================

Abstract
--------

We've discovered a security issue that allows end users within their own private network to receive from, and send traffic to, other private networks on the same compute node.

Description
-----------

During live-migration there is a small time window where the ports of instances are untagged. Instances have a port trunked to the integration bridge and receive 802.1Q tagged private traffic from other tenants.

If the port is administratively down during live migration, the port will remain in trunk mode indefinitely.

Traffic is possible between ports is that are administratively down, even between tenants self-service networks.

Conditions
----------

The following conditions are necessary.

* Openvswitch Self-service networks
* An Openstack administrator or an automated process needs to schedule a Live migration

We tested this on newton.

Issues
------

This outcome is the result of multiple independent issues. We will list the most important first, and follow with bugs that create a fragile situation.

Issue #1 Initially creating a trunk port

When the port is initially created, it is in trunk mode. This creates a fail-open situation.
See: https://github.com/openstack/os-vif/blob/newton-eol/vif_plug_ovs/linux_net.py#L52
Recommendation: create ports in the port_dead state, don't leave it dangling in trunk-mode. Add a drop-flow initially.

Issue #2 Order of creation.

The instance is actually migrated before the (networking) configuration is completed.

Recommendation: wait with finishing the live migration until the underlying configuration has been applied completely.

Issue #3 Not closing the port when it is down.

Neutron calls the port_dead function to ensure the port is down. It sets the tag to 4095 and adds a "drop" flow if (and only if) there is already another tag on the port. The port_dead function will keep untagged ports untagged.

https://github.com/openstack/neutron/blob/stable/newton/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L995

Recommendation: Make port_dead also shut the port if no tag is found. Log a warning if this happens.

Issue #4 Putting the port administratively down actually puts the port on a compute node shared vlan

Instances from different projects on different private networks can talk to each other if they put their ports down. The code does install an openflow "drop" rule but it has a lower priority (2) than the allow rules.

Recommendation: Increase the port_dead openflow drop rule priority to MAX

Timeline
--------

 2017-09-14 Discovery eavesdropping issue
 2017-09-15 Verify workaround.
 2017-10-04 Discovery port-down-traffic issue
 2017-11-24 Vendor Disclosure to Openstack

Steps to reproduce
------------------

1. Attach an instance to two networks:

admin$ openstack server create --nic net-id=<net-uuid1> --nic net-id=<net-uuid2> --image <image_id> --flavor <flavor_id> instance_temp

2. Attach a FIP to the instance to be able to log in to this instance

3. Verify:

admin$ openstack server show -c name -c addresses fe28a2ee-098f-4425-9d3c-8e2cd383547d

+-----------+-----------------------------------------------------------------------------+
| Field | Value |
+-----------+-----------------------------------------------------------------------------+
| addresses | network1=192.168.99.8, <FIP>; network2=192.168.80.14 |
| name | instance_temp |
+-----------+-----------------------------------------------------------------------------+

4. Ssh to the instance using network1 and run a tcpdump on the other port network2

[root@instance_temp]$ tcpdump -eeenni eth1

5. Get port-id of network2

admin$ nova interface-list fe28a2ee-098f-4425-9d3c-8e2cd383547d
+------------+--------------------------------------+--------------------------------------+---------------+-------------------+
| Port State | Port ID | Net ID | IP addresses | MAC Addr |
+------------+--------------------------------------+--------------------------------------+---------------+-------------------+
| ACTIVE | a848520b-0814-4030-bb48-49e4b5cf8160 | d69028f7-9558-4f14-8ce6-29cb8f1c19cd | 192.168.80.14 | fa:16:3e:2d:8b:7b |
| ACTIVE | fad148ca-cf7a-4839-aac3-a2cd8d1d2260 | d22c22ae-0a42-4e3b-8144-f28534c3439a | 192.168.99.8 | fa:16:3e:60:2c:fa |
+------------+--------------------------------------+--------------------------------------+---------------+-------------------+

6. Force port down on network 2

admin$ neutron port-update a848520b-0814-4030-bb48-49e4b5cf8160 --admin-state-up False

7. Port gets tagged with vlan 4095, the dead vlan tag, which is normal:

compute1# grep a848520b-0814-4030-bb48-49e4b5cf8160 /var/log/neutron/neutron-openvswitch-agent.log | tail -1
INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-e008feb3-8a35-4c97-adac-b48ff88165b2 - - - - -] VIF port: a848520b-0814-4030-bb48-49e4b5cf8160 admin state up disabled, putting on the dead VLAN

8. Verify the port is tagged with vlan 4095

compute1# ovs-vsctl show | grep -A3 qvoa848520b-08
      Port "qvoa848520b-08"
          tag: 4095
          Interface "qvoa848520b-08"

9. Now live-migrate the instance:

admin# nova live-migration fe28a2ee-098f-4425-9d3c-8e2cd383547d

10. Verify the tag is gone on compute2, and take a deep breath

compute2# ovs-vsctl show | grep -A3 qvoa848520b-08
      Port "qvoa848520b-08"
          Interface "qvoa848520b-08"
      Port...
compute2# echo "Wut!"

11. Now traffic of all other self-service networks present on compute2 can be sniffed from instance_temp

[root@instance_temp] tcpdump -eenni eth1
13:14:31.748266 fa:16:3e:6a:17:38 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Request who-has 10.103.12.160 tell 10.103.12.152, length 28
13:14:31.804573 fa:16:3e:e8:a2:d2 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33, p 0, ethertype ARP, Request who-has 10.0.1.9 tell 10.0.1.70, length 28
13:14:31.810482 fa:16:3e:95:ca:3a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33, p 0, ethertype ARP, Request who-has 10.0.1.9 tell 10.0.1.154, length 28
13:14:31.977820 fa:16:3e:6f:f4:9b > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33, p 0, ethertype ARP, Request who-has 10.0.1.9 tell 10.0.1.150, length 28
13:14:31.979590 fa:16:3e:0f:3d:cc > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 9, p 0, ethertype ARP, Request who-has 10.103.9.163 tell 10.103.9.1, length 28
13:14:32.048082 fa:16:3e:65:64:38 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33, p 0, ethertype ARP, Request who-has 10.0.1.9 tell 10.0.1.101, length 28
13:14:32.127400 fa:16:3e:30:cb:b5 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Request who-has 10.103.12.160 tell 10.103.12.165, length 28
13:14:32.141982 fa:16:3e:96:cd:b0 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33, p 0, ethertype ARP, Request who-has 10.0.1.9 tell 10.0.1.100, length 28
13:14:32.205327 fa:16:3e:a2:0b:76 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33, p 0, ethertype ARP, Request who-has 10.0.1.9 tell 10.0.1.153, length 28
13:14:32.444142 fa:16:3e:1f:db:ed > 01:00:5e:00:00:12, ethertype 802.1Q (0x8100), length 58: vlan 72, p 0, ethertype IPv4, 192.168.99.212 > 224.0.0.18: VRRPv2, Advertisement, vrid 50, prio 103, authtype none, intvl 1s, length 20
13:14:32.449497 fa:16:3e:1c:24:c0 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33, p 0, ethertype ARP, Request who-has 10.0.1.9 tell 10.0.1.20, length 28
13:14:32.476015 fa:16:3e:f2:3b:97 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33, p 0, ethertype ARP, Request who-has 10.0.1.9 tell 10.0.1.22, length 28
13:14:32.575034 fa:16:3e:44:fe:35 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Request who-has 10.103.12.160 tell 10.103.12.163, length 28
13:14:32.676185 fa:16:3e:1e:92:d7 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Request who-has 10.103.12.160 tell 10.103.12.150, length 28
13:14:32.711755 fa:16:3e:99:6c:c8 > 01:00:5e:00:00:12, ethertype 802.1Q (0x8100), length 62: vlan 10, p 0, ethertype IPv4, 10.103.12.154 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 49, authtype simple, intvl 1s, length 24
13:14:32.711773 fa:16:3e:f5:23:d5 > 01:00:5e:00:00:12, ethertype 802.1Q (0x8100), length 58: vlan 12, p 0, ethertype IPv4, 10.103.15.154 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 49, authtype simple, intvl 1s, length 20

Workaround
----------

We temporary fixed this issue by forcing the dead vlan tag on port creation on compute nodes:

/usr/lib/python2.7/site-packages/vif_plug_ovs/linux_net.py:

def _create_ovs_vif_cmd(bridge, dev, iface_id, mac,
                        instance_id, interface_type=None,
                        vhost_server_path=None):
+ # ODCN: initialize port as dead
+ # ODCN: TODO: set drop flow
    cmd = ['--', '--if-exists', 'del-port', dev, '--',
            'add-port', bridge, dev,
+ 'tag=4095',
            '--', 'set', 'Interface', dev,
            'external-ids:iface-id=%s' % iface_id,
            'external-ids:iface-status=active',
            'external-ids:attached-mac=%s' % mac,
            'external-ids:vm-uuid=%s' % instance_id]
    if interface_type:
        cmd += ['type=%s' % interface_type]
    if vhost_server_path:
        cmd += ['options:vhost-server-path=%s' % vhost_server_path]
    return cmd

https://github.com/openstack/neutron/blob/stable/newton/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L995

    def port_dead(self, port, log_errors=True):
        '''Once a port has no binding, put it on the "dead vlan".

        :param port: an ovs_lib.VifPort object.
        '''
        # Don't kill a port if it's already dead
        cur_tag = self.int_br.db_get_val("Port", port.port_name, "tag",
                                         log_errors=log_errors)
+ # ODCN GM 20170915
+ if not cur_tag:
+ LOG.error('port_dead(): port %s has no tag', port.port_name)
+ # ODCN AJS 20170915
+ if not cur_tag or cur_tag != constants.DEAD_VLAN_TAG:
- if cur_tag and cur_tag != constants.DEAD_VLAN_TAG:
           LOG.info('port_dead(): put port %s on dead vlan', port.port_name)
           self.int_br.set_db_attribute("Port", port.port_name, "tag",
                                         constants.DEAD_VLAN_TAG,
                                         log_errors=log_errors)
            self.int_br.drop_port(in_port=port.ofport)

plugins/ml2/drivers/openvswitch/agent/openflow/ovs_ofctl/ovs_bridge.py
    def drop_port(self, in_port):
+ # ODCN AJS 20171004:
- self.install_drop(priority=2, in_port=in_port)
+ self.install_drop(priority=65535, in_port=in_port)

Regards,

ODC Noord.
Gerhard Muntingh
Albert Siersema
Paul Peereboom

CVE References

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status: New → Incomplete
description: updated
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Looking.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

My fear is that this vulnerability might be present on releases newer than Newton.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

The other thing that surprised me about this is that there are rules forwarding traffic to the port when it doesn't have a VLAN tag. Is the behavior of the NORMAL action to flood to all untagged ports as if they were trunks carrying all VLANs?

The other suggested fixes all seem to be working under the assumption that traffic will flow to ports if no rules are installed. I think it would be safest to ensure that a newly added port to br-int doesn't receive any traffic until flows are setup. Going that route would also avoid having to ban all earlier versions of os-vif because a VLAN tag would not need to be set on plugging.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Should we subscribe the Nova project for the issue #2 (Order of creation)?

Also, isn't instance migration an admin only operation? If so, can a regular user abuse this bug without an admin cooperation?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Tristan: perhaps looping in the nova project is not a bad idea, and yes, you're right that migration can only be performed by an admin according to the default policy.json.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Today I was looking at this a bit to see if 'resizing' would expose the same behavior, as that is available to regular users. I shall be able to find more time tomorrow to assess what's going on pike and ocata.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Thanks Armando for the prompt feedback. I've subscribed nova-coresec to discuss the scope of this vulnerability, in particular the issue #2.

@nova-coresec, is it correct to assume migration operation are restricted to admin? If a deployment does authorize regular user to do migration operation, wouldn't they be vulnerable to other unexpected issues anyway?

Revision history for this message
Paul Peereboom (peereb) wrote :

@Tristan: yes nova live-migration is by default an admin only operation. But forcing the port-down can be done by a user aswell. Then it's just waiting for an admin to do a live-migration of a instance. We live-migrate instances regularly to empty a compute node and patch the compute node.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Block migration would also be affected by the same behavior.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

@Paul, well I tend to agree this is a vulnerability (class A according to the VMT taxonomy: http://security.openstack.org/vmt-process.html#incident-report-taxonomy ). However, since a malicious user can't control the condition of exploitation, other VMT member may disagree with issuing an advisory for this issue.

I guess it boils down to how likely a user can snoop sensitive traffic.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

That's why I am trying to verify that the resize operation has the same behavior, because if that's the case, the user is very much in control!

Revision history for this message
Paul Peereboom (peereb) wrote :

Agree lets see if we can reproduce without admin credentials.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I used a Pike-based deployment as a testing platform and I can confirm that the resize operation exposes the same behavior as live/block migration (as reported in this report), i.e. the port connected to the VM shows up without a local VLAN tag on the target host.

I still have to look at the extent of the damage that the user can inflict. So, it looks like the malicious user can be in full control of the condition of exploitation.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Thanks Armando,

so the next question is, can this be fixed across all the supported stable release? (e.g.: pike and ocata) ?
And is there something to fix in the os-vif and nova project too?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Hi Tristan,

I wanted to look a bit deeper at what happens on the datapath, but I keep getting sidetracked. The report and analysis from Paul has been excellent, and I welcome that level of detail very much, but I want to see this through a bit more to formulate a recommendation.

Thanks for everyone involved!

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

OK, I finally had some interrupted time in which I could look into this more carefully. the TL;DR summary is: the behavior is exposed in live-migrate/block-migrate/resize operations. That said, I don't believe this vulnerability is serious at least under default conditions. Here's why:

1) live/block migration is an admin operation: if the admin avoids turning a port down, the vulnerability is not exposed. Turning a port down is not strictly necessary. Furthermore block migration is disruptive, in that users loses console and network connectivity to the instance.

2) resize is a user allowed operation, but the operation leads to traffic disruption. Even though a user can explicitly turn a port DOWN and resize the instance without admin intervention, there's no way she can keep connectivity to the instance while the operation is in progress. Turning the port back UP will reinstate the local VLAN tag and restore connectivity. The loophole at that point is closed.

For these reasons, I would cautiously say that this vulnerability is not easily exploitable, but we do want to warn the admin of the potential loophole (OSSN B2?), and eventually address the neutron OVS agent codebase to ensure that a port is always tagged even when it's in ADMIN_DOWN after a migration/resize operation.

I'd recommend against changing os-vif to start a port on the dead vlan for two reasons:

a) neutron use cases go beyond just spinning up VMs. So if possible we should find a fix confined within neutron alone.

b) I feel that the os-vif fix as proposed is ineffective in closing any timing window because while the port is untagged, it's practically unlikely that a user can get hold of the console remotely.

I am open to feedback on my recommendation, but in the meantime I would like to thank the reporter for the very detailed, well thought and challenging report. I had fun cracking this one. Keep them coming!

Needless to say, open to further discussion.

@Tristan: what's next steps? Is this OSSA then marked CONFIRMED? I would like to involve Ihar and Miguel for what comes next. Unless my rationale is flawed, I consider my triage complete and I'd like to push the ball into someone else's court.

Cheers,
Armando

Changed in neutron:
status: New → Triaged
Changed in nova:
status: New → Confirmed
Changed in neutron:
importance: Undecided → Low
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Thank you Armando for this thorough investigation.

You mention console access as a requirement, but couldn't the user run nohup task in the background to capture the traffic, and then retrieve the data later?

To answer your last question, if this is triaged as B2, then we would subscribe the ossg-coresec group so that an OSSN could be prepared and may be sent to the downstream stakeholders through the embargo-notice mailing list before disclosing this bug. Though the live migration case may be bad enough (e.g.: for operator going through that process often) that we may want to issue an advisory instead.

Please feel free to subscribe more neutron or nova developers to this bug, at least to check for unforeseen scenarios.

Revision history for this message
Gerhard Muntingh (gerhard-1) wrote :

Hi Armando, I'm also impressed by the thorough investigation. Especially the resize insight.

Please carefully read point two below. I think the issue is more serious because of it.

1) Turning the port down (as a user) is not necessary, but it changes the trunk-mode time from a couple of seconds to indefinitely.

2) There's no need to put the port (eth0) back up, if the attacker creates a secondary network interface (eth1) to access the instance. No console access is needed. This is the described scenario in the original report.

Thanks,
Gerhard.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I didn't mean that console would be a requirement for the exploit, and indeed I overlooked the two-NICs scenario. Thanks for your feedback...I feared I was missing something ;)

That said, I have been thinking if ensuring a tag on the port after resize/migration operation suffices to mitigate the exploitation.

Ihar identified snippet of code [1] as potential contentious, and I wonder if along location [2] would be the only two areas required to be hardened.

Thoughts?
Thanks,
Armando

[1] https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1426-L1436
[2] https://github.com/openstack/neutron/blob/newton-eol/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L995

Revision history for this message
Gerhard Muntingh (gerhard-1) wrote :

Hardening both locations doesn't fix the issue. It brings the time window back to a couple of seconds.

Out of curiosity, can't nova-compute skip the os-vif step, and have neutron configure the port entirely?

Looping will also work.

Also, don't forget the less serious issue #4.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I agree with Gerhard on the fact that hardening on neutron side won't completely eliminate the issue in nova/neutron setup. We should also have a fix on os-vif side where we enforce dead tag on newly created ports. AFAIU os-vif may not have been used before recent releases of Nova so the fix may need to be both in os-vif and nova stable branches.

As for #4, I think it will not be an issue if we enforce dead tag when putting a port down. (The problem is currently we don't do it for unbound ports.)

If I try to capture all the things we may need to patch, I get the following:
- (neutron) port_dead to always set dead tag on new unbound ports;
- (os-vif/nova) enforce dead tag for new ovs ports.

The first patch fixes the issue on neutron side for setups that are not using Nova / os-vif. In those setups, components calling to Neutron should make sure newly created ports are dead, otherwise they are still exposed to a (short) vulnerable window. (Should we include this info in a release note / security report? Should we reach out to Neutron API consumers that may be affected?)

The second fix will completely close the short vulnerable window in neutron/nova setups discussed above.

(I actually feel that those two issues are independent and should be treated as two separate CVEs targeting different components.)

Then additional hardening patches could also be:
- (neutron) use drop rule for normal action on br-int; (may be invasive / breaking external code);
(and/or)
- (neutron) bump priority for drop flow rule set for dead ports.

The hardening may happen in public since neither of those issues should expose anything as long as os-vif/neutron patches as described before.

Does it make sense?

Is anyone from Nova team aware of the issue?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Ihar, I agree with your assessment, thanks for pushing this forward; I am a bit uneasy about the implications of adding a drop rule for normal actions on br-int for the reasons you've stated (maybe that breaks Tap as a Service or SFC or other stuff?). We should also make sure that the change on the os-vif side will not break things like OVN/ODL or other drivers that rely on OVS because I don't believe there's any local VLAN tagging involved for those. Have you considered these scenarios?

To answer Gerhard's question about os-vif, iirc the rationale of having nova do the plugging is to enable scenarios where there's no network agent involved on the neutron side and/or to abstract the networking service from hypervisor internals.

Revision history for this message
Gerhard Muntingh (gerhard-1) wrote :

Thanks, that makes sense.

I suspect adding a drop rule for normal action on br-int will require quite some explicit flows to deal with all the other bridges (br-ex), but we'll see.

On neutron hardening: Bumping priority for drop flow rule set for dead ports will fix issue #4.

Revision history for this message
Miguel Lavalle (minsel) wrote :

Thanks to everybody who has helped in diagnosing this issue. It seems we have two sets of actions:

1) Fixing: that involves port_dead to always set dead tag on new unbound ports in Neutron and enforce dead tag for new ovs ports in nova / os-vif

2) Hardening of Neutron code

I suggest that as the very next step we concentrate in the fixing part. For this the Neutron side seems to be uncontroversial. The only sticking point left is the potential breakage of other back-ends with the os-vif fix. Can we involve someone from OVN to see what the implications might be?

Revision history for this message
Jeremy Stanley (fungi) wrote :

Please subscribe anyone you think may be able to help identify a safe solution.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I subscribed Miguel Angel Ajo from networking-ovn team to assess whether proposed os-vif fix would break them.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

(something we discussed with Armando yesterday)

if for some reason the suggested change to os-vif to mark all new ports with dead tag is breaking other projects, we may roll it in as opt-in, where consumer passes some flag through binding profile dictionary that would instruct nova to also mark a new port as dead; then the flag would be proxied into os-vif binding code that would enact the request. If the flag was not set / passed disabled, then nova would not request port death, and os-vif would do the same action as it does right now.

There are several compatibility issues to be concerned about here.

1. we effectively change api payload for binding extension; but because we already treat profile dict as a large box of random crap to pass around between neutron and nova, unpatched nova should already be ready to receive a dict with unknown keys.

2. it may be that os-vif is not patched while nova is, which would potentially lead to binding call to os-vif failing because of unknown flag passed by nova. I don't think we can bump minimal supported os-vif version in requirements.txt, so nova will need to find a way to deal with older releases. If nothing else, we can always catch TypeError: ... expected at least X arguments, got Y error and fall back to flag-less call to os-vif (plus a warning).

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

This ^ would of course imply that instead of having one/two places to patch (os-vif + maybe neutron), we will have three (neutron -> nova -> os-vif).

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Hi,

Thank you very much for looping me in,

I'm not sure if networking-ovn, or other plugins that don't use tags+NORMAL-rules in OVS would work with the os-vif change (which as far as I undertood creates the ports right away in the dead vlan).

Could we also loop Numan Siddique to verify how ovn behaves in this case?, tomorrow we have a bank holiday in Spain, and I will be physically away from my computer until late.

The question is if the "vlan tag" will be just ignored based on the used OpenFlow rules, or what happens exactly.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Miguel: my request from comment #26 definitely applies to you as well. Please subscribe anyone you think may be able to help identify a safe solution.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Sorry, I didn't know that I had permissions to add more people, I did it with numan, but I will verify now that I'm back from PTO

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

[root@primary ~]# ovs-vsctl show
a6b77a09-2647-47d7-8815-ef4d4f689ce8
    Bridge br-int
        fail_mode: secure
        Port "tap5a27427b-22"
            Interface "tap5a27427b-22"
        Port br-int
            Interface br-int
                type: internal
        Port "patch-br-int-to-provnet-d556080a-799f-4621-bb2d-d4ac9b8bb32e"
            Interface "patch-br-int-to-provnet-d556080a-799f-4621-bb2d-d4ac9b8bb32e"
                type: patch
                options: {peer="patch-provnet-d556080a-799f-4621-bb2d-d4ac9b8bb32e-to-br-int"}
        Port "tapa4e1ef4d-40"
            tag: 4095
            Interface "tapa4e1ef4d-40"
        Port "tap4b56611b-99"
            tag: 4095
            Interface "tap4b56611b-99"
        Port "tap473919fe-31"
            tag: 4095
            Interface "tap473919fe-31"
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
        Port "patch-provnet-d556080a-799f-4621-bb2d-d4ac9b8bb32e-to-br-int"
            Interface "patch-provnet-d556080a-799f-4621-bb2d-d4ac9b8bb32e-to-br-int"
                type: patch
                options: {peer="patch-br-int-to-provnet-d556080a-799f-4621-bb2d-d4ac9b8bb32e"}
[root@primary ~]# ip netns exec ovnmeta-a4e1ef4d-47ce-4b92-8043-70d88237eff1 ssh cirros@10.0.0.4
cirros@10.0.0.4's password:

[root@primary ~]# ip netns exec ovnmeta-a4e1ef4d-47ce-4b92-8043-70d88237eff1 ssh cirros@10.0.0.6
The authenticity of host '10.0.0.6 (10.0.0.6)' can't be established.
RSA key fingerprint is SHA256:cDLkQEB0LfxZfIvpd084MucUa4uohUd0COf3ArPa1A0.
RSA key fingerprint is MD5:cd:4e:f7:f5:e1:bb:61:ea:a7:4d:46:f8:67:43:20:00.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.0.6' (RSA) to the list of known hosts.
cirros@10.0.0.6's password:

[root@primary ~]# ip netns exec ovnmeta-a4e1ef4d-47ce-4b92-8043-70d88237eff1 ssh cirros@10.0.0.8
The authenticity of host '10.0.0.8 (10.0.0.8)' can't be established.
RSA key fingerprint is SHA256:XUX8EfLF2oRJLqDChEEw3smHGeHm7zcQapdpayZcb0Y.
RSA key fingerprint is MD5:33:00:41:fb:96:25:a4:79:b5:2e:63:c8:03:8f:2e:be.
Are you sure you want to continue connecting (yes/no)? ^C
[root@primary ~]#
[root@primary ~]#

Sorry for the delay, I've verified that the solution would be fine for OVN (and I suspect that also for other openflow based solutions which don't use the "NORMAL" rule).

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Tocayo Miguel Ajo,

Thanks for the testing. Much appreciated

Based on this I think we can move ahead with the "fixing" part, which involves:

1) port_dead to always set dead tag on new unbound ports in Neutron
2) Enforce dead tag for new ovs ports in nova / os-vif

What are the mechanics of creating, reviewing and merging these two patches under the embargo conditions of this CVE? I am available to work on this

I'll still loop in someone from ODL to make sure they are fine with the proposed approach

Revision history for this message
Jeremy Stanley (fungi) wrote :

If the issue is considered widespread, easy enough to exploit and able to do a sufficient amount of damage or obtain sensitive information with some certainty, then we would want to keep this report privately embargoed while fixes are developed, attached to this bug and tested/reviewed locally by the bug's subscribers. Once working fixes for master are identified and backports are created and attached to this bug for all supported stable branches, we will get a CVE identifier assigned, propose a disclosure schedule and provide advance notice and copies of the patches privately to downstream stakeholders.

If the risk from this bug is of questionable severity, we should subscribe the ossg-coresec team to get their opinion on the possibility of switching to our (simpler) public workflow for this report instead.

As a reminder, our process for both of these options is described here: https://security.openstack.org/vmt-process.html

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I tried to reach out to Matt (Nova PTL) via email, so far no luck. I think their involvement is crucial here.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

This is the neutron piece of the puzzle to make ovs agent mark all untagged ports as dead. I am considering adding a new functional or fullstack test for the scenario too, but posting what I have for now to start review.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

In nova, there is also some code in nova/network/linux_net.py (specifically, class LinuxOVSInterfaceDriver) that seems to have plug() method that calls add-port and would benefit from dead tag setting too, but I failed to find a place where this code is hooked into actual services, so I wonder if this code is not some dead end that will be cleaned up in due course after os-vif switch. I will leave it up to nova team to decide.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Backporting for neutron seems simple: the patch applies cleanly to stable/ocata which is the oldest version upstream supports.

Since I also have an interest in earlier releases that may still be supported by Red Hat, I also checked that the patch applies with some adjustments up to liberty. As for earlier releases, they are probably not affected because they don't include https://review.openstack.org/#/q/I5ef9665770df3a9bbaf79049b219fadd73e20309 that made neutron ovs agent skip tagging ports as dead if they don't have a tag in the first place.

Which makes me think that we should make sure that the CVE fix doesn't break the assumptions that the patch I linked to above made. The concern there is, as far as I understand, is that drop flows are left on the bridge even after port is gone, which may hinder performance etal.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

As for nova / os-vif backporting, someone from nova team should assess that too. I believe because of os-vif switch in latest cycles, there may be some work needed to adopt the change I posted to older branches.

Changed in neutron:
importance: Low → High
Revision history for this message
Matt Riedemann (mriedem) wrote :

Sorry for being late, I just read up on all of this.

Just to clarify from the original description:

> Issue #2 Order of creation.
>
> The instance is actually migrated before the (networking) configuration is completed.
>
> Recommendation: wait with finishing the live migration until the underlying configuration has been > applied completely.

We don't need to try something here on the nova side ^? And what exactly is "networking configuration is completed"? Does that mean waiting for the vif-plugged event from neutron when nova plugs the vifs on the destination compute before starting the live migration in the hypervisor? If so, that's something we've been talking about doing for awhile anyway, and is part of this spec for using the neutron port binding API during live migration to reduce network down time:

https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/neutron-new-port-binding-api.html

Waiting for vif-plugged events during live migration would be a backportable change, that entire spec would not.

The vif plugging dance is a bit more fuzzy to me wrt cold migrate / resize. I see that the libvirt driver doesn't wait for the vif-plugged event on the destination host:

https://github.com/openstack/nova/blob/bd3da5d763d2dcb0426e9acaa419e9e9574569ca/nova/virt/libvirt/driver.py#L8088-L8098

I haven't seen anyone really touch on this part of the issue since the original description though so if it's not critical to the fix here, ignore my questions.

--

Regarding the changes to os-vif, I'm definitely no expert in that code. I'll add Sean Mooney, Jay Pipes and Dan Smith to this review. I'm not sure what the backport impacts might be, but nova has been using os-vif for several releases now (since newton: https://review.openstack.org/#/c/269672/ ).

As Ihar noted earlier, we probably can't bump the minimum required version of os-vif in global-requirements on stable branches, so we might have to make the nova code deal with a scenario that the change to os-vif doesn't exist (not sure if that would show up as a TypeError or AttributeError or what from os-vif - would need to be tested).

Revision history for this message
sean mooney (sean-k-mooney) wrote :

reading through the recommendations i am not sure that using a drop rule and null vlan(4095) is the correct approch.

refering to http://openvswitch.org/support/dist-docs/ovs-ofctl.8.txt
using
ovs-ofctl mod-port down|up
would appear to be more correct as it disables the tx and rx quese in the interface.

a drop rule would only prevent the vm from transmitting packet it would not prevent the vm from reciving traffic as openflow does not allow you to match on output port.

my only concern with ovs-ofctl mod-port down is that odl/ovn and any other ml2 driver that uses vif_ovs would also need this change to ovs-ofctl mod-port up when they are finished.
granted they would likely also need to be modifed to fix the 4095 vlan and remove the drop rule too.

so to resolve Issue #1 Initially creating a trunk port
i would suggest that when we create teh port we initially set it down and have the ovs agent set up as part of wiring up the port.

Issue #2 Order of creation should be resolved by the multiple port binding spec i believe.

Issue #3 Not closing the port when it is down.

I think this can be resolved by updating neutron to use ovs-ofctl mod-port down|up to actually disable the port in ovs instead of using the null vlan.

Issue #4 Putting the port administratively down actually puts the port on a compute node shared vlan
issue 4 will be resulve by moving to ovs-ofctl mod-port down|up
however if you are currently able to transmit/recive packet in a vm that is attached to vlan 4095 i think you have a ovs bug. vlan 4095 is reserved as the null/dead vlan and ovs is drop all packet to/from interfaes with a tag of 4095.

my main concern though is that solving this isuue in os-vif in this way will not solve it for all usecase. when the neutron vif_type is ovs but hybrid_plug is set to false os-vif is not used to plug the port. this configuration is used when kernel ovs is installed on the has and neutron is configured to use either the noop security group driver or the openvsiwtch conntrack driver is used.

when vif_type=ovs and hybird_plug=false then libvirt is used to plug the vif into ovs directly.
libvirt does not set the state of the port or the mtu(that's a different bug) i we want to cover all edgecase we need to finally convert this codepath to use os-vif to handel the vif plugging when vif_type=ovs and hybird_plug=false.

Revision history for this message
Gerhard Muntingh (gerhard-1) wrote :

any news on this issue?

Revision history for this message
Jeremy Stanley (fungi) wrote :

Given this discussion has been going on for 5 months already, I've lost much hope of seeing consensus achieved in private across three different involved codebases plus coordinating agreeable backports with stable branch maintainers. I feel like switching to our public workflow and getting more eyes on it is going to be the only way forward.

Revision history for this message
Gerhard Muntingh (gerhard-1) wrote :

Switching to the public workflow sounds good, but let's make sure the workarounds are good enough en properly documented so openstack admins get a fair chance of closing the vulnerability.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Let's move to the public workflow.

There's also a bug in openvswitch that makes it "peg" the CPU when untagged ports exist on the system sometimes...

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :
Revision history for this message
Jeremy Stanley (fungi) wrote :

Given a week with no objections from the (numerous) subscribers to this bug report as well as the other bug linked by Miguel which sort of makes this issue public already, I'm going to switch it to public triaging the vulnerability report as class B2 for now. If it's determined that sufficient fixes can be implemented in the stable branches of the services/libraries/drivers involved we can revisit as a possible class A vulnerability with accompanying advisory.

description: updated
Changed in ossa:
status: Incomplete → Won't Fix
information type: Private Security → Public
tags: added: security
Revision history for this message
Miguel Lavalle (minsel) wrote :

@Jeremy,

I just went through the entire series of comments in this bug. I also looked at the bug that Miguel Ajo filed a few days ago (https://bugs.launchpad.net/neutron/+bug/1767422) and the associated information in https://bugzilla.redhat.com/show_bug.cgi?id=1558336. My comments / questions are the following:

1) The work that Miguel Ajo is doing as a consequence of https://bugs.launchpad.net/neutron/+bug/1767422 has to do with the agents and performance issues. Although it is related to what we have been discussing in this bug, it doesn't address it

2) Reading carefully Sean's comment in #43 above, I notice that he states the following: "if you are currently able to transmit/recive packet in a vm that is attached to vlan 4095 i think you have a ovs bug. vlan 4095 is reserved as the null/dead vlan and ovs is drop all packet to/from interfaes with a tag of 4095...". So he seems to have written his comments (specifically the recommendation about ovs-ofctl mod-port down|up) under the assumption that one of the problems to be solved is that VMs can receive traffic even when their interfaces have been put on the dead vlan (4095). As far as I can tell, nobody has reported that here, which means that the patches proposed by Ihar in #37 and #38 are worth trying. Ihar also reports in #40 that the Neutron patch back-ports as is to Ocata and all the way to Liberty with some adjustments. So even though, according to Sean, there are use cases that might not be covered, the patches in #37 and #38 start us in the right direction mitigating the problem. Once they are in place, we can start nibbling at os-vif to work on the use cases that according to Sean are not fixed. I'd be happy to take a stab at that

3) Am I correct in my understanding that at this point we can submit the patches in #37 and #38 under the public / normal process in Gerrit? As soon a I get a confirmation of this, I will submit the patches for review

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

I agree with @Miguel's line of thought here, and on the related patch I'm working on I will add a table 0 rule to drop all VLAN 4095 traffic, although I could keep that as a separate patch If I see that the whole patch doesn't backport far

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Since I posted some patches here, to everyone: feel free to take them over and post to gerrit, w/ or w/o attribution. I won't have time to post and polish them for merge. (I am open for doing some reviews though if needed.)

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

+ on I will add a table 0 rule to drop all VLAN 4095 traffic, <-- this doesn't work for traffic handled by NORMAL, yikes. (thanks jakub for the pointer).

thanks @Ihar! ;)

Revision history for this message
Miguel Lavalle (minsel) wrote :

I tested:

sudo ovs-ofctl mod-port br-int tap903894b4-1f down

and it still receives traffic. So I think the route suggested by Miguel Ajo in #53 is the way to go

Revision history for this message
Gerhard Muntingh (gerhard-1) wrote :

Any progress on this (now public) security issue?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-vif (master)

Fix proposed to branch: master
Review: https://review.openstack.org/594118

Changed in os-vif:
assignee: nobody → Slawek Kaplonski (slaweq)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-vif (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/594118
Reason: That isn't good approach for sure. We will have to find something better

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Yesterday I talked with Brian about this issue and we though about possible solution only in Neutron.
Maybe change of openflows in br-int can fix this issue.
We can try to use something like:
1. default action in br-int set to DROP
2. For each lvm id (local vlan id used for each network on host) set OF rule with ACTION=normal to proceed such packets in same way as it is now for all rules.

What do You think about such solution?

Revision history for this message
Miguel Lavalle (minsel) wrote : Re: [Bug 1734320] Re: Eavesdropping private traffic
Download full text (13.2 KiB)

Hi Slawek,

You kind of read my mind, although I was thinking of a little broader
solution.... If you read the notes in the bug, Sean Mooney left a couple of
extensive ones. In a sense, they complicated solving the issue, because he
shone the light on the entire problem, i. e., Nova and os-vif. Why don't we
have a conversation with him in Denver and try to put together a plan to
fix the entire thing?

Best regards

Miguel

On Tue, Sep 11, 2018 at 9:20 AM, Slawek Kaplonski <
<email address hidden>> wrote:

> Yesterday I talked with Brian about this issue and we though about
> possible solution only in Neutron.
> Maybe change of openflows in br-int can fix this issue.
> We can try to use something like:
> 1. default action in br-int set to DROP
> 2. For each lvm id (local vlan id used for each network on host) set OF
> rule with ACTION=normal to proceed such packets in same way as it is now
> for all rules.
>
> What do You think about such solution?
>
> --
> You received this bug notification because you are a member of Neutron
> Core Security reviewers, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1734320
>
> Title:
> Eavesdropping private traffic
>
> Status in neutron:
> Triaged
> Status in OpenStack Compute (nova):
> Confirmed
> Status in os-vif:
> In Progress
> Status in OpenStack Security Advisory:
> Won't Fix
>
> Bug description:
> Eavesdropping private traffic
> =============================
>
> Abstract
> --------
>
> We've discovered a security issue that allows end users within their
> own private network to receive from, and send traffic to, other
> private networks on the same compute node.
>
> Description
> -----------
>
> During live-migration there is a small time window where the ports of
> instances are untagged. Instances have a port trunked to the
> integration bridge and receive 802.1Q tagged private traffic from
> other tenants.
>
> If the port is administratively down during live migration, the port
> will remain in trunk mode indefinitely.
>
> Traffic is possible between ports is that are administratively down,
> even between tenants self-service networks.
>
> Conditions
> ----------
>
> The following conditions are necessary.
>
> * Openvswitch Self-service networks
> * An Openstack administrator or an automated process needs to schedule a
> Live migration
>
> We tested this on newton.
>
> Issues
> ------
>
> This outcome is the result of multiple independent issues. We will
> list the most important first, and follow with bugs that create a
> fragile situation.
>
> Issue #1 Initially creating a trunk port
>
> When the port is initially created, it is in trunk mode. This creates a
> fail-open situation.
> See: https://github.com/openstack/os-vif/blob/newton-eol/vif_
> plug_ovs/linux_net.py#L52
> Recommendation: create ports in the port_dead state, don't leave it
> dangling in trunk-mode. Add a drop-flow initially.
>
> Issue #2 Order of creation.
>
> The instance is actually migrated before the (networking)
> configuration is completed.
>
> Recommendation: wait with finishing the live migration until the
> un...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-vif (master)

Fix proposed to branch: master
Review: https://review.openstack.org/602384

Changed in os-vif:
assignee: Slawek Kaplonski (slaweq) → sean mooney (sean-k-mooney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/602432

Changed in nova:
assignee: nobody → sean mooney (sean-k-mooney)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by sean mooney (<email address hidden>) on branch: master
Review: https://review.openstack.org/602432
Reason: this should not be needed

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-vif (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/609850

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-vif (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/609851

Revision history for this message
sean mooney (sean-k-mooney) wrote :

Note a CVE has been issued for this bug in on 2018-07-27 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-14636

as a quick status update on this bug.

first i want to mention that a second variant of this issue has also been highlight to me.
i will be filing a separate private bug for that variant as it will require changes outside of os-vif
to address and will be more involved. the second variant is also not covered by the existing cve description so i want to track it separately.

regarding the state of this bug i have proposed
https://review.openstack.org/602432 to os-vif master and a stable/rocky backport https://review.openstack.org/609850. i am in the process of rebaseing that backport
and expect both to be viable to merge later today.

There are some limitations however this this fix.
- First for it to work corectly this will require the multiple port binding support added in rocky to nova. As such on stable/queens and older branches the backport is only a partial mitigation as nova will not wait for neutron to signel it has finished wiring up the port before resuming the guest on the destination node.

- Second If the neutron ovs ml2 agent crashes after the ml2 driver binds the port on the destination node but before the ml2 agents wires up the port addded to ovs by os-vif then we cannot detect this from nova/os-vif and the mitigation will not be effective. as the neutron control plane would be in an undefined state if its agents crashed/exited on the compute node i feel its fair to declare this limitation out of scope of this current bug.

It may be possible to address one or both of these limitation as part of the the second variant's mitigation however it will require more extensive modification to nova, neutron and os-vif which are unlikely to be back portable easily as they may require a minor backwards compatible api extension.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-vif (master)

Fix proposed to branch: master
Review: https://review.openstack.org/612534

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-vif (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/616285

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-vif (stable/queens)

Change abandoned by sean mooney (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/609851
Reason: ill backport a seprerate reivew as master has chaged significantly form this version.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/616609

Changed in neutron:
assignee: nobody → sean mooney (sean-k-mooney)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-vif (master)

Reviewed: https://review.openstack.org/602384
Committed: https://git.openstack.org/cgit/openstack/os-vif/commit/?id=165ed325917e5deadb274ad9c122db157c0b55b2
Submitter: Zuul
Branch: master

commit 165ed325917e5deadb274ad9c122db157c0b55b2
Author: Sean Mooney <email address hidden>
Date: Thu Sep 13 16:50:33 2018 +0100

    always create ovs port during plug

    - This change modifies the ovs plugin to always
      create the ovs interface in the ovs db.
    - This change enables the neutron l2 agent to configure
      the ovs interface by assigning a vlan tag and
      installing openflow rules as appropriate.
    - This change will reduce the live migration
      time for kernel ovs ports with hybrid plug false
      by creating the ovs port as part of plug before
      the migration starts.
    - This change adds the privsep decorator
      to delete_net_dev to account for it new usage
      via _unplug_vif_generic and address bug #1801072

    Change-Id: Iaf15fa7a678ec2624f7c12f634269c465fbad930
    Partial-Bug: #1734320
    Closes-Bug: #1801072

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/612534
Committed: https://git.openstack.org/cgit/openstack/os-vif/commit/?id=d291213f1ea62f93008deef5224506fb5ea5ee0d
Submitter: Zuul
Branch: master

commit d291213f1ea62f93008deef5224506fb5ea5ee0d
Author: Sean Mooney <email address hidden>
Date: Tue Oct 23 00:10:38 2018 +0100

    add isolate_vif config option

    - This change add a new isolate_vif config
      option to the OVS plugin.

    - The isolate_vif option defaults to False
      for backwards compatiblity with SDN-based
      deployments.

    - This change is a partial mitigation of bug
      1734320, when isolate_vif is set to True
      os-vif will assign VIFs to the neutron
      l2 agent dead VLAN 4095. This should only
      be set when using the ml2/ovs neutron
      backend.

    Change-Id: I87ee9626cc6b4a01465a6b1908bc66bc7be0a4bc
    Partial-Bug: #1734320

Changed in os-vif:
importance: Undecided → High
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/616609
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e3dc447b908f57e9acc0378111b8e09cbd88ddc5
Submitter: Zuul
Branch: master

commit e3dc447b908f57e9acc0378111b8e09cbd88ddc5
Author: Sean Mooney <email address hidden>
Date: Thu Nov 8 16:07:55 2018 +0000

    raise priority of dead vlan drop

    - This change adds a max priority flow to drop
      all traffic that is associated with the
      DEAD VLAN 4095.
    - This change is part of a partial mitigation of
      bug 1734320. Without this change vlan 4095 traffic
      will be dropped via a low priority flow after being
      processed by part/all of the openflow pipeline.
      By raising the priorty and droping in table 0
      we drop invalid packets as soon as they enter
      the pipeline.

    Change-Id: I3482c7c4f00942828cc9396cd2f3d646c9e8c9d1
    Partial-Bug: #1734320

tags: added: neutron-proactive-backport-potential
Changed in os-vif:
status: Fix Committed → Fix Released
Changed in neutron:
status: In Progress → Fix Committed
Changed in nova:
status: In Progress → Won't Fix
Revision history for this message
sean mooney (sean-k-mooney) wrote :

hi just updating this.
the os-vif fix was released in 1.13.0 but that had a file Handel leak so
1.13.1 is the minium version that should be used to receive this fix.

the neutron change has been commited to master but not released.
both the neutron and os-vif change should be backported.

the nova change was not required so i have set it to wont fix.

the code that is currently commited should be suffient to resovle the bug however there are a number of related changes that i have not written yet that would make this more robost in general.
those changes will likely have to wait until Train to implement as i likely wont have time in stein.

the two addtional changes in neutron would be extending neutron to record the ml2 diriver that bound the port in then neutron port binding_detils and recorded in the ml2 driver send vif plug enent on port bind or on interface plug.

in nova i would like to modify it to pass this info to os-vif so that it can be used by the os-vif plugins to dynamically enable isolation when ml2/ovs is used.
i would also like to use use the info regarding when vif plug events are sent so we can more correctly Handel them in nova.

none of these additional changes are required to fix this bug however the will harden the interface between nova and neutron and should help prevent related bugs from being introduced in the future.

Changed in nova:
status: Won't Fix → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/635083

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/640258

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/640258
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-vif (stable/queens)

Change abandoned by sean mooney (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/609851

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-vif (stable/rocky)

Change abandoned by sean mooney (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/609850

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by sean mooney (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/616285

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by sean mooney (<email address hidden>) on branch: master
Review: https://review.opendev.org/635083
Reason: we have changed the direction somewhat so this is not needed in it current form.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/686345

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/686346

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/686347

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/686345
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ffee956d44da34e38e62f82a3bc676cdd6179f48
Submitter: Zuul
Branch: stable/rocky

commit ffee956d44da34e38e62f82a3bc676cdd6179f48
Author: Sean Mooney <email address hidden>
Date: Thu Nov 8 16:07:55 2018 +0000

    raise priority of dead vlan drop

    - This change adds a max priority flow to drop
      all traffic that is associated with the
      DEAD VLAN 4095.
    - This change is part of a partial mitigation of
      bug 1734320. Without this change vlan 4095 traffic
      will be dropped via a low priority flow after being
      processed by part/all of the openflow pipeline.
      By raising the priorty and droping in table 0
      we drop invalid packets as soon as they enter
      the pipeline.

    Change-Id: I3482c7c4f00942828cc9396cd2f3d646c9e8c9d1
    Partial-Bug: #1734320
    (cherry picked from commit e3dc447b908f57e9acc0378111b8e09cbd88ddc5)

tags: added: in-stable-rocky
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Should this bug be reopened for os-vif now when revert https://review.opendev.org/#/c/631829 was merged?

@sean-k-mooney can you please share that status of this bug? Is there anything except neutron https://review.opendev.org/#/c/640258 and nova https://review.opendev.org/#/c/602432/ changes pending?

Revision history for this message
sean mooney (sean-k-mooney) wrote :

there is a mitagation for this bug in all cases except when the ovs firewall driver is used with kernel ovs.
https://review.opendev.org/#/c/631829 has not been reverted and there is no plans to revert it.
it.

https://review.opendev.org/#/c/612534/ was intoduced to add a config option to enable isolation.
https://review.opendev.org/#/c/636061/ allows the caller of os-vif to determin if os-vif should plug the interface to the network backend.

the nova change uses that abilty to delegate the plugging to os-vif instead of libvirt
https://review.opendev.org/#/c/602432/13/nova/network/os_vif_util.py

but we cannot do that until the neutron change is merged https://review.opendev.org/#/c/640258

i am not really activly workign on either patch right now.
i tried to repoduce the dvr failutre locally but in my env it seams to work fine.
we know from the upstream testing that in a non dvr env this seams to work fine.
if some neutron dvr folks can try and fix the dvr issue that woudl move things forward
but a few neutorn review have looked and we are not sure why this is broken.

i think the issue is here https://review.opendev.org/#/c/640258/15/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py@404
we are not looking at the correct field however i have not had tiem to actully debug this as i have been working on other issue.

so to summarize the status
form an os-vif point of view i consider this bug to be fixed.
the nova fix is currently blocked by dvr support in the neutron patch.

if you are using a configuration other than kernel ovs with the ovs firewall driver we believe this bug is fixed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/686346
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9b0919e64809729f4c0186e9be63b1082510bbb1
Submitter: Zuul
Branch: stable/queens

commit 9b0919e64809729f4c0186e9be63b1082510bbb1
Author: Sean Mooney <email address hidden>
Date: Thu Nov 8 16:07:55 2018 +0000

    raise priority of dead vlan drop

    - This change adds a max priority flow to drop
      all traffic that is associated with the
      DEAD VLAN 4095.
    - This change is part of a partial mitigation of
      bug 1734320. Without this change vlan 4095 traffic
      will be dropped via a low priority flow after being
      processed by part/all of the openflow pipeline.
      By raising the priorty and droping in table 0
      we drop invalid packets as soon as they enter
      the pipeline.

    Conflicts:
        neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/openflow/native/test_br_int.py

    Change-Id: I3482c7c4f00942828cc9396cd2f3d646c9e8c9d1
    Partial-Bug: #1734320
    (cherry picked from commit e3dc447b908f57e9acc0378111b8e09cbd88ddc5)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.opendev.org/686347
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=797a379c51a05fca356d2f2575bfc386b0287afd
Submitter: Zuul
Branch: stable/pike

commit 797a379c51a05fca356d2f2575bfc386b0287afd
Author: Sean Mooney <email address hidden>
Date: Thu Nov 8 16:07:55 2018 +0000

    raise priority of dead vlan drop

    - This change adds a max priority flow to drop
      all traffic that is associated with the
      DEAD VLAN 4095.
    - This change is part of a partial mitigation of
      bug 1734320. Without this change vlan 4095 traffic
      will be dropped via a low priority flow after being
      processed by part/all of the openflow pipeline.
      By raising the priorty and droping in table 0
      we drop invalid packets as soon as they enter
      the pipeline.

    Conflicts:
        neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/openflow/native/test_br_int.py

    Change-Id: I3482c7c4f00942828cc9396cd2f3d646c9e8c9d1
    Partial-Bug: #1734320
    (cherry picked from commit e3dc447b908f57e9acc0378111b8e09cbd88ddc5)

tags: added: in-stable-pike
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Copying here my comment on neutron patch https://review.opendev.org/640258/: not sure the failures are related to DVR: it seems we only run full tempest set with dvr enabled (with OVS). Test failures are mostly about VM not becoming ACTIVE (stuck in BUILD), I guess due to neutron VM port not becoming ACTIVE. This should not be affected by DVR in any way. Also in ovs agent logs I see errors on ovs-vsctl operations with ports having ofport -1.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

yes so as i commented before
the issue is here https://review.opendev.org/#/c/640258/15/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py@404
We are not reading the correct field so once we fix that we expect it to work in the dvr case.
that is noted in the Gerrit review. the failures are all cased by the fact we try to read a filed that
does not exist this raise an error and the port does not become acitve. we the reach the timout on the nova side and role back the vm.

it is not an error for a port to have -1 on the ovs side.
that is what we want to allow.
a ofport id of -1 mean the port is declare in the control plane e.g.
in the ovs-db but has not yet been attach to the data plane.
when the tap deive is actully created in the kernel it will automaticaly be
added to the ovs dataplane and a ofport id will be assigned.

so do be clear we expect ove to assing a port id of -1 and it may also raise a waring in the ovs-db
but that is normal and the intended behavior. this is what happens with vhost-user by the way as the ofport id does not get asigned until the vm startrs and until that point it will have a ofportid of -1 not []

Revision history for this message
Paul Peereboom (peereb) wrote :
Download full text (4.1 KiB)

Hi Sean,

I've finally managed to setup a Stein environment to test the isolate_vif option. I tried to set isolate_vif in nova.conf but instances than fails to spawn with the following in the nova-compute.log:

2019-12-18 13:32:07.180 30660 ERROR vif_plug_ovs.ovsdb.impl_vsctl [req-303dc31c-3c1e-446e-8f1e-83b604cee0c6 68910e67a8cb49e1af2d2bb3b4e3fa49 d5d22a5d7e194a0789a07a3c4d2affd6 - default default] Unable to e
xecute ['ovs-vsctl', '--timeout=120', '--oneline', '--format=json', '--db=tcp:127.0.0.1:6640', '--', '--may-exist', 'add-port', 'br-int', 'qvo5639e4bf-52', '--', 'set', 'Interface', 'qvo5639e4bf-52', 'ext
ernal_ids:iface-id=5639e4bf-52d7-4674-9084-e258acb47fae', 'external_ids:iface-status=active', 'external_ids:attached-mac=fa:16:3e:d8:ff:c8', 'external_ids:vm-uuid=1a6cfd1e-7414-4ac3-8c5d-81918a445ab5', 't
ag=4095']. Exception: Unexpected error while running command.
Command: ovs-vsctl --timeout=120 --oneline --format=json --db=tcp:127.0.0.1:6640 -- --may-exist add-port br-int qvo5639e4bf-52 -- set Interface qvo5639e4bf-52 external_ids:iface-id=5639e4bf-52d7-4674-9084
-e258acb47fae external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:d8:ff:c8 external_ids:vm-uuid=1a6cfd1e-7414-4ac3-8c5d-81918a445ab5 tag=4095
Exit code: 1
Stdout: ''
Stderr: 'ovs-vsctl: Interface does not contain a column whose name matches "tag"\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2019-12-18 13:32:07.254 30660 INFO os_vif [req-303dc31c-3c1e-446e-8f1e-83b604cee0c6 68910e67a8cb49e1af2d2bb3b4e3fa49 d5d22a5d7e194a0789a07a3c4d2affd6 - default default] Successfully plugged vif VIFBridge(
active=False,address=fa:16:3e:d8:ff:c8,bridge_name='qbr5639e4bf-52',has_traffic_filtering=True,id=5639e4bf-52d7-4674-9084-e258acb47fae,network=Network(c4197b2d-f568-4bd2-9b0a-eb0fa05e7428),plugin='ovs',po
rt_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap5639e4bf-52')
2019-12-18 13:32:07.275 30660 ERROR vif_plug_ovs.ovsdb.impl_vsctl [req-be4f125b-e4e5-427d-ba84-b0fb86478807 68910e67a8cb49e1af2d2bb3b4e3fa49 d5d22a5d7e194a0789a07a3c4d2affd6 - default default] Unable to e
xecute ['ovs-vsctl', '--timeout=120', '--oneline', '--format=json', '--db=tcp:127.0.0.1:6640', '--', '--may-exist', 'add-port', 'br-int', 'qvo8e2e0cb9-27', '--', 'set', 'Interface', 'qvo8e2e0cb9-27', 'ext
ernal_ids:iface-id=8e2e0cb9-2785-4d09-835d-f20897035cf2', 'external_ids:iface-status=active', 'external_ids:attached-mac=fa:16:3e:41:40:43', 'external_ids:vm-uuid=4773263b-d538-4e1b-b405-7c39c96eb7e2', 't
ag=4095']. Exception: Unexpected error while running command.
Command: ovs-vsctl --timeout=120 --oneline --format=json --db=tcp:127.0.0.1:6640 -- --may-exist add-port br-int qvo8e2e0cb9-27 -- set Interface qvo8e2e0cb9-27 external_ids:iface-id=8e2e0cb9-2785-4d09-835d
-f20897035cf2 external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:41:40:43 external_ids:vm-uuid=4773263b-d538-4e1b-b405-7c39c96eb7e2 tag=4095
Exit code: 1
Stdout: ''
Stderr: 'ovs-vsctl: Interface does not contain a column whose name matches "tag"\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.

Open...

Read more...

Revision history for this message
Paul Peereboom (peereb) wrote :

Aha think I got it, looks like the tag order is wrong:

Fails:
ovs-vsctl --timeout=120 --oneline --format=json --db=tcp:127.0.0.1:6640 -- --may-exist add-port br-int qvo8e2e0cb9-27 -- set Interface qvoc94a40f4-a3 external_ids:iface-id=8e2e0cb9-2785-4d09-835d-f20897035cf2 external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:41:40:43 external_ids:vm-uuid=4773263b-d538-4e1b-b405-7c39c96eb7e2 tag=4095

Works:
ovs-vsctl --timeout=120 --oneline --format=json --db=tcp:127.0.0.1:6640 -- --may-exist add-port br-int qvo8e2e0cb9-27 tag=4095 -- set Interface qvoc94a40f4-a3 external_ids:iface-id=8e2e0cb9-2785-4d09-835d-f20897035cf2 external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:41:40:43 external_ids:vm-uuid=4773263b-d538-4e1b-b405-7c39c96eb7e2

tags: removed: neutron-proactive-backport-potential
ABDULLAH (l7kx)
Changed in neutron:
status: Fix Committed → Confirmed
status: Confirmed → New
Changed in nova:
status: In Progress → Fix Released
Changed in neutron:
status: New → Incomplete
status: Incomplete → Confirmed
Changed in neutron:
assignee: sean mooney (sean-k-mooney) → Rodolfo Alonso (rodolfo-alonso-hernandez)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/640258
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7fd2725cb14b81f442eb57a38755829270ff2c43
Submitter: Zuul
Branch: master

commit 7fd2725cb14b81f442eb57a38755829270ff2c43
Author: Sean Mooney <email address hidden>
Date: Fri Mar 1 04:43:20 2019 +0000

    Do not skip ports with ofport unset or invalid

    This change removes the "_check_ofport" function and its use form
    the ovs_lib.py file.

    By skipping ports without a unique ofport in the "get_vifs_by_ids"
    and "get_vifs_by_id" functions, the OVS agent incorrectly treated
    newly added port with an ofport of -1 as removed ports in the
    "treat_devices_added_or_updated" function.

    Co-Authored-By: Rodolfo Alonso Hernandez <email address hidden>

    Change-Id: I79158baafbb99bee99a1d687039313eb454d3a9b
    Partial-Bug: #1734320
    Partial-Bug: #1815989

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/748296

tags: added: neutron-proactive-backport-potential
Revision history for this message
sean mooney (sean-k-mooney) wrote :

by the way while the neutron patch is now merged the nova patch to delegate the pluggin to os-vif is blocked as the neutron api seams to be sending network-vif-plugged events at the wrong time when we are evacuating nad during some other lifecycle operations like revert. so untill we figure out that the backport fo the most resent neturon patch will do nothing help adress this.

the way nova is using neutron has not changed so libvirt will still delete the port on ovs and recreated it defeating the work that has been done to validate the netowrking is set up before we migrate.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

with that said it is safe to backport the neutron patch that recnetly merged and it will have to be backported before the nova patch is but we were recommending waiting for the nova patch and any other neutron patche that might be required to merge first before starting the backports
we only want to backport a fully working solution

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by sean mooney (<email address hidden>) on branch: master
Review: https://review.opendev.org/748296

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/754475

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/753314
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c8a819aff4db62e58192af0a272a7f1ce7923146
Submitter: Zuul
Branch: master

commit c8a819aff4db62e58192af0a272a7f1ce7923146
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Sep 24 09:44:47 2020 +0000

    Filter out port with invalid ofport in OVS firewall

    Since [1], "get_vif_port_by_id" is also returning ports with an
    invalid ofport. OVS firewall cannot set an OpenFlow rule for a port
    without a valid ofport. "get_ovs_port" should filter out those ports.

    Related-Bug: #1815989
    Related-Bug: #1734320

    [1]https://review.opendev.org/#/c/640258/

    Change-Id: Id12486b3127ab4ac8ad9ef2b3641da1b79a25a50

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/756478

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/756478
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=24dd977c2217bb10baf2cf226eab2bd0e5b47d62
Submitter: Zuul
Branch: stable/victoria

commit 24dd977c2217bb10baf2cf226eab2bd0e5b47d62
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Sep 24 09:44:47 2020 +0000

    Filter out port with invalid ofport in OVS firewall

    Since [1], "get_vif_port_by_id" is also returning ports with an
    invalid ofport. OVS firewall cannot set an OpenFlow rule for a port
    without a valid ofport. "get_ovs_port" should filter out those ports.

    Related-Bug: #1815989
    Related-Bug: #1734320

    [1]https://review.opendev.org/#/c/640258/

    Change-Id: Id12486b3127ab4ac8ad9ef2b3641da1b79a25a50
    (cherry picked from commit c8a819aff4db62e58192af0a272a7f1ce7923146)

tags: added: in-stable-victoria
Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/602432
Committed: https://opendev.org/openstack/nova/commit/a62dd42c0dbb6b2ab128e558e127d76962738446
Submitter: "Zuul (22348)"
Branch: master

commit a62dd42c0dbb6b2ab128e558e127d76962738446
Author: Stephen Finucane <email address hidden>
Date: Fri Apr 30 12:51:35 2021 +0100

    libvirt: Delegate OVS plug to os-vif

    os-vif 1.15.0 added the ability to create an OVS port during plugging
    by specifying the 'create_port' attribute in the 'port_profile' field.
    By delegating port creation to os-vif, we can rely on it's 'isolate_vif'
    config option [1] that will temporarily configure the VLAN to 4095
    (0xfff), which is reserved for implementation use [2] and is used by
    neutron to as a dead VLAN [3]. By doing this, we ensure VIFs are plugged
    securely, preventing guests from accessing other tenants' networks
    before the neutron OVS agent can wire up the port.

    This change requires a little dance as part of the live migration flow.
    Since we can't be certain the destination host has a version of os-vif
    that supports this feature, we need to use a sentinel to indicate when
    it does. Typically we would do so with a field in
    'LibvirtLiveMigrateData', such as the 'src_supports_numa_live_migration'
    and 'dst_supports_numa_live_migration' fields used to indicate support
    for NUMA-aware live migration. However, doing this prevents us
    backporting this important fix since o.vo changes are not backportable.
    Instead, we (somewhat evilly) rely on the free-form nature of the
    'VIFMigrateData.profile_json' string field, which stores JSON blobs and
    is included in 'LibvirtLiveMigrateData' via the 'vifs' attribute, to
    transport this sentinel. This is a hack but is necessary to work around
    the lack of a free-form "capabilities" style dict that would allow us do
    backportable fixes to live migration features.

    Note that this change has the knock on effect of modifying the XML
    generated for OVS ports: when hybrid plug is false will now be of type
    'ethernet' rather than 'bridge' as before. This explains the larger than
    expected test damage but should not affect users.

    [1] https://opendev.org/openstack/os-vif/src/tag/2.4.0/vif_plug_ovs/ovs.py#L90-L93
    [2] https://en.wikipedia.org/wiki/IEEE_802.1Q#Frame_format
    [3] https://answers.launchpad.net/neutron/+question/231806

    Change-Id: I11fb5d3ada7f27b39c183157ea73c8b72b4e672e
    Depends-On: Id12486b3127ab4ac8ad9ef2b3641da1b79a25a50
    Closes-Bug: #1734320
    Closes-Bug: #1815989

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/790447

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/797142

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/797144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/797291

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/797316

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/797428

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/797142
Committed: https://opendev.org/openstack/nova/commit/99cf5292c782828ea33a3448ae0e7da028d5d176
Submitter: "Zuul (22348)"
Branch: master

commit 99cf5292c782828ea33a3448ae0e7da028d5d176
Author: Stephen Finucane <email address hidden>
Date: Fri Jun 18 17:55:50 2021 +0100

    objects: Fix VIFMigrateData.supports_os_vif_delegation setter

    We're using a 'profile' getter/setter in the 'VIFMigrateData' object to
    ensure we transform the JSON encoded string we store in the database to
    an actual dictionary on load and vice versa on save. However, because
    the getter is returning a new object (constructed from 'json.loads')
    rather than a reference to the original data (which is a string),
    modifying this object doesn't actually change the underlying data in the
    object. We were relying on this broken behavior to set the
    'supports_os_vif_delegation' attribute of the 'VIFMigrateData' object
    and trigger the delegated port creation introduced in change
    I11fb5d3ada7f27b39c183157ea73c8b72b4e672e, which means that code isn't
    actually doing anything yet. Resolve this.

    Change-Id: I362deb1088c88cdcd8219922da9dc9a01b10a940
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1734320
    Related-Bug: #1815989

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/nova/+/797428
Committed: https://opendev.org/openstack/nova/commit/fa0fb2fe3d61de1cb871c48ee97053cf2fb5827a
Submitter: "Zuul (22348)"
Branch: master

commit fa0fb2fe3d61de1cb871c48ee97053cf2fb5827a
Author: Stephen Finucane <email address hidden>
Date: Tue Jun 22 11:37:22 2021 +0100

    libvirt: Always delegate OVS plug to os-vif

    In change I11fb5d3ada7f27b39c183157ea73c8b72b4e672e, we started
    delegating plugging of OVS ports to os-vif to work around a number of
    bugs. However, this was only introduced for live migration. Plugging is
    still handled by libvirt for spawn. This results in an odd situation,
    whereby an interface of type 'bridge' will be use when creating the
    instance initially, only for this to change to 'ethernet' on live
    migration. Resolve this by *always* delegating plugging to os-vif. This
    is achieved by consistently setting the 'delegate_create' attribute of
    'nova.network.model.VIF' to 'True', which will later get transformed to
    the 'create_port' attribute of the 'os_vif.objects.vif.VIFOpenVSwitch'
    object(s) created in 'nova.network.os_vif_util._nova_to_osvif_vif_ovs'
    and ultimately result in delegate port creation.

    Note that we don't need to worry about making the setting of
    'delegate_create' conditional on whether we're looking at an OVS port or
    not: this will be handled by '_nova_to_osvif_vif_ovs'. We also don't
    need to worry about unsetting this attribute before a live migration:
    the 'delegate_create' attribute is always overridden as part of
    'nova.objects.migrate_data.VIFMigrateData.get_dest_vif'.

    Change-Id: I014c5a81752f86c6b99d19d769c42f318e18e676
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1734320
    Related-Bug: #1815989

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/828148

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Stephen Finucane <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/797316

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Stephen Finucane <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/797291

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Stephen Finucane <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/797144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/wallaby)

Change abandoned by "Stephen Finucane <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/790447

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ussuri)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/754475
Reason: If needed, please restore the patch and address the comments.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)
Download full text (4.4 KiB)

Reviewed: https://review.opendev.org/c/openstack/nova/+/790447
Committed: https://opendev.org/openstack/nova/commit/23a4b27dc0c156ad6cbe5260d518da3fd62294b8
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 23a4b27dc0c156ad6cbe5260d518da3fd62294b8
Author: Stephen Finucane <email address hidden>
Date: Fri Apr 30 12:51:35 2021 +0100

    libvirt: Delegate OVS plug to os-vif

    os-vif 1.15.0 added the ability to create an OVS port during plugging
    by specifying the 'create_port' attribute in the 'port_profile' field.
    By delegating port creation to os-vif, we can rely on it's 'isolate_vif'
    config option [1] that will temporarily configure the VLAN to 4095
    (0xfff), which is reserved for implementation use [2] and is used by
    neutron to as a dead VLAN [3]. By doing this, we ensure VIFs are plugged
    securely, preventing guests from accessing other tenants' networks
    before the neutron OVS agent can wire up the port.

    This change requires a little dance as part of the live migration flow.
    Since we can't be certain the destination host has a version of os-vif
    that supports this feature, we need to use a sentinel to indicate when
    it does. Typically we would do so with a field in
    'LibvirtLiveMigrateData', such as the 'src_supports_numa_live_migration'
    and 'dst_supports_numa_live_migration' fields used to indicate support
    for NUMA-aware live migration. However, doing this prevents us
    backporting this important fix since o.vo changes are not backportable.
    Instead, we (somewhat evilly) rely on the free-form nature of the
    'VIFMigrateData.profile_json' string field, which stores JSON blobs and
    is included in 'LibvirtLiveMigrateData' via the 'vifs' attribute, to
    transport this sentinel. This is a hack but is necessary to work around
    the lack of a free-form "capabilities" style dict that would allow us do
    backportable fixes to live migration features.

    Note that this change has the knock on effect of modifying the XML
    generated for OVS ports: when hybrid plug is false will now be of type
    'ethernet' rather than 'bridge' as before. This explains the larger than
    expected test damage but should not affect users.

    Changes:
      lower-constraints.txt
      requirements.txt
      nova/network/os_vif_util.py
      nova/tests/unit/virt/libvirt/test_vif.py
      nova/tests/unit/virt/libvirt/test_driver.py
      nova/virt/libvirt/driver.py

    NOTE(stephenfin): Change I362deb1088c88cdcd8219922da9dc9a01b10a940
    ("objects: Fix VIFMigrateData.supports_os_vif_delegation setter") which
    contains an important fix for the original change, is squashed into this
    change. In addition, the os-vif version bump we introduced in the
    original version of this patch is not backportable and as a result, we
    must introduce two additional checks. Both checks ensure we have a
    suitable version of os-vif and skip the new code paths if not. The first
    check is in the libvirt driver's 'check_can_live_migrate_destination'
    function, which as the name suggests runs on the destination host early
    in the live migration process. If os-v...

Read more...

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova wallaby-eom

This issue was fixed in the openstack/nova wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/wallaby)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/828148
Reason: stable/wallaby branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/wallaby if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/923035

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-vif (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/os-vif/+/923036

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.