Comment 6 for bug 1915282

Revision history for this message
melanie witt (melwitt) wrote :

We discussed this issue the other day and Sean walked me through the details. I'm attempting to summarize here and Sean will correct me if I've got something wrong.

Tenant network isolation will not be enforced if (1) "'physical_network':null" is specified in the nova [pci]passthrough_whitelist and (2) any of the following deployment configurations are used:

  * Tunneled network + PF

  * Tunneled network + VF with vif_type=hw_veb (used by SRIOV NIC agent) + NIC that does not have features required for isolation (only possible with out-of-tree neutron driver)

The in-tree neutron driver (ml2) will only offload traffic to the NIC if the NIC has the required features to enforce tenant isolation, else the driver will fall back to not offloading the traffic. This is why the ml2 driver is safe, because it does its own validation.

The use of "'physical_network':null" in the nova [pci]passthrough_whitelist has been documented by neutron [1] and some operators are already using it in production. These operators are assumed to be using the ml2 driver with supported NICs.

Tunneled network + PF is something that doesn't make logical sense to configure and we expect zero operators to be doing this today.

Tunneled network + VF SRIOV + ml2 in-tree driver we expect operators are using today and is safe.

Tunneled network + VF SRIOV + out-of-tree driver + unsupported NICs is something that operators _could_ be in danger of doing and these are the operators who should be warned.

Action we can take now:

  * Block and reject tunneled network + PF when detected as it is dangerous and does not make sense. Upgrade impact not expected as we don't expect anyone is currently doing this. The block would be just to explicitly reject such an invalid config going forward

  * Add a warning/known issue to our release notes and [pci]passthrough_whitelist config option help to explain possible danger with out-of-tree driver config described earlier. Maybe also post to ML with the [ops] tag

Future action we could take (Xena release):

  * Block and reject tunneled network + VF by default and force operator opt-in via [workarounds] config option until we (nova + neutron) formally support this unintended feature of hardware offloading

Now that I [hopefully] understand the situation more, I lean toward thinking blocking and rejecting tunneled network + VF by default seems maybe too hostile, as it would require operators safely using the in-tree ml2 driver today to deploy a [nova] configuration change when they upgrade to Xena in order to avoid breaking their deployments. Operators in this boat probably got there by following the neutron documentation [1].

[1] https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html#configure-nodes-vxlan-configuration