Comment 0 for bug 1798904

Revision history for this message
sean mooney (sean-k-mooney) wrote :

This bug is a second variant of https://bugs.launchpad.net/neutron/+bug/1734320

The original bug which is now public, was limited to the case where a vm is live
migrated resulting in a short window where the teant instance could recive vlan
tag traffic on the destination node before the neutron ml2 agent wires up the
port on the ovs bridge.

Note that while the original bug implied that the vm was only able to easedrop
on trafic it was also possible for the vm to send traffic to a different tenant
network by creating a vlan subport which corresponded to vlan in use for tenant
isolation on the br-int.

The original bug was determined to be a result of the fact that during live
migratrion if the vif-type was ovs and ovs_hybrid_plug=false the VIF was pluged
to the ovs bridge by the hyperviors when the vm was started on the destination
node instead of pre plugging it and waiting for neutron to signel it had
completed wireing up the port before migrating the instance.

Since live migration is a admin only operation unless intentionally change by
the operator the scope of this inital vector was limited.

The second vector to create a running vm with an untagged port does not require
admin privalages.

If a user creates a neutron port and sets the admin-state-up field to False

openstack port create --disable --network < my network> <port name>

and then either boots a vm with this port

openstack server create --flavor <flavor id> --image <image id> --port <port name> <vm name>

or attaches the port to an existing vm

openstack server add port <vm name> <port name>

This will similarly create a window where the port is attached to the guest but
neutron has not yet wired up the interface.

Note that this was repoted to me for queens with ml2/ovs and iptables firewall.
i have not personnaly validated that how to recreate it but i intend to
to reporduce this on master next week an report back.

i belive there are a few way that this can be mitagated.
the mitgations for the live migration variant will narrow the window
in which this variant will be viable and in general may be suffient in the
cases where the netruon agent is is running correctly.

but a more complete fix would involve modifiaction to nova neutron and os-vif.

from a neutron perspective we could extend the neturon port binidngs to container 2 addtion
fields.

ml2_driver_names:
    a orderd comma sperated list of the agents that bound this port.
    Note: this will be used by os-vif to determin if it should preferom adtion
    actions such as taging the port, or setting its tx/rx quese down
    to mitigate this issue.

ml2_port_events
    a list of time port stats events are emitted by a ml2 driver
    or a enum.
    Note: currently ml2/ovs signals nova that it has completed wiring
    up the port only when the agent has configured the vswitch but odl send the
    notification when the port is bound in the ml2 driver before the vswtich is
    configured. to be able to use these more effectivly with in nova we need
    to be able to know if the event is sent only

additionally change to os-vif and nova will be required to process this new info.

on the nova side if we know that a backend will send a event when the port is
wired up on the vswitch we may be able to make attach wait untll that has been
done.

if os-vif know the ovs plugin was been used with ml2/ovs and the ovs l2 agent it could
also contionally wait for the interface to be tagged by neutron.
this could be done via a config option however since the plugin is shared with
sdn controllers that manage ovs such as odl, ovn, onos and dragon flow it would
have to default to not waiting as these other backends do not use vlans for
tenant isolation.

similarly instad of waiting we could have os-vif apply a drop rule and vlan 4095
based on a config option. again this would have to default to false or insecure
to not break sdn based deploymetns.

if we combine one of the config options with the ml2_driver_names change
we can backport the fix with the config option only for stable releases and use
the ml2_driver_names from the vif detail if presnet for stien to
dynamically enable the mitigation when informed by neutron that it is required.
this will minimise the upgrade path and make it secure by defualt going forward
without breaking compatablity for stable branches.