[OVN] The router GW is "restoring" the NAT rule that disables SNAT

Bug #2033083 reported by Rodolfo Alonso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Invalid
Undecided
Unassigned
neutron
Invalid
Undecided
Unassigned

Bug Description

When the router GW info is updated with "enable_snat=False", the Neutron API issues a ``DeleteNATRuleInLRouterCommand`` command to remove the NAT rule.

In an Ironic job [1], the SNAT rule is restored again. The problem is that we still don't know when/why is this happening. I'll update this bug once finished my initial investigation.

[1]https://f65b0483f01fbf97cb5c-1988f1bc3d637497f7692396b58d77ce.ssl.cf2.rackcdn.com/885087/49/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool/0990cd1/controller/logs/screen-q-svc.txt

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

As a note, I'm going to revert that specific job to get an initial OVN change into a "green" state for Ironic's CI, and then I'll create a specific change to switch a job over which exercises the rescue functionality which seemed to tickle this issue.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

The issue is related to a ipxe request that should be receive from a private IP [1]. However, in 22:10:41, the request comes from an external IP.
  2023-08-24 22:10:41.560 [-] 172.24.5.109 "GET /boot.ipxe HTTP/1.1" 200 1004

The router GW is update with "enable_snat=False" [2]. However, just before the ipxe messages, a new FIP is created with fixed IP 10.1.0.12 and external IP 172.24.5.109 [3]. This new FIP is adding a new NAT rule, similar to the NAT rule deleted when the GW port was updated with "enable_snat=False".

[1]https://f65b0483f01fbf97cb5c-1988f1bc3d637497f7692396b58d77ce.ssl.cf2.rackcdn.com/885087/49/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool/0990cd1/controller/logs/apache/ipxe_access_log.txt
[2]https://paste.opendev.org/show/b06qEnS8T33M6DydP2tV/
[3]https://paste.opendev.org/show/bqP5pH5aYF1odKHlHbVo/

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

The root cause is that the tempest job creates a FIP to enable SSH to the node, and that creates a 1:1 nat record. This was the case with OVS, but with OVS we could forward TFTP traffic.

Since we can't forward TFTP through OVN (at this point, maybe ever?), this sort of shoots ironic in the foot completely on NAT enabled networking.

Ironic should discuss/explore possible options for operators who wish to use OVN with bare metal network boot operations.

Changed in neutron:
status: New → Invalid
Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

Marking invalid, this is how it worked with OVS as well, it just means we can't pass TFTP traffic through OVN.

Changed in ironic:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.