Iptables ordering in fuel-devops is not determenistic. Networks defined by fuel-devops must have production-like connectivity

Bug #1554177 reported by Sergey Yudin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Dennis Dmitriev
Mitaka
Confirmed
High
Dennis Dmitriev

Bug Description

For now devops defines networks in random order, and depending on which network will be created earlier the routing between public and management may be enabled or not cause icmp-port-unreachable may be injected before or after another network definition.

In case when pub network was created before mgmt we will have
-A FORWARD -s <pub_subnet> -i <pub_fuelbr> -j ACCEPT
<cut>
-A FORWARD -i <admin_fuelbr> -j REJECT --reject-with icmp-port-unreachable

in case when mgmt iface was created first we will have
-A FORWARD -i <admin_fuelbr> -j REJECT --reject-with icmp-port-unreachable
<cut>
-A FORWARD -s <pub_subnet> -i <pub_fuelbr> -j ACCEPT

which will lead to different behavior.

Expected behavior is to don't have access from public to management network all the time.

Tags: area-qa
description: updated
Changed in fuel:
assignee: nobody → Fuel DevOps (fuel-devops)
milestone: none → 9.0
summary: - networks defined by devops must have production-like connectivity
+ networks defined by fuel-devops must have production-like connectivity
Changed in fuel:
assignee: Fuel DevOps (fuel-devops) → Fuel QA Team (fuel-qa)
tags: added: area-qa
Changed in fuel:
status: New → Confirmed
Changed in fuel:
importance: Undecided → High
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote : Re: networks defined by fuel-devops must have production-like connectivity

There is defined order for networks: https://github.com/openstack/fuel-devops/blob/master/devops/models/network.py#L202

And via network names order there is an order in interfaces on nodes:
https://github.com/openstack/fuel-devops/blob/master/devops/models/node.py
http://paste.openstack.org/show/490209/

Can You clearify what network do You mean? Interface on admin node?

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Sergey Yudin (tsipa740) wrote :

I mean management and public network, what else that description can mean?

It also affect all other networks which is NOT under the NAT, but apparently it will be routable becasue the restrictions in iptables will not affect them.

I believe this method works with libvirt asynchronously and this "ordering" have not real effect on iptables rules and i'm 99% sure if you put sleep there it will work as expected, but to produce proper fix seems like someone have to put some more affort and take a look whats going on uder the hood.

And yeah, if you're too lazy to verify the bug is exist please go to the env where jobs is running and run something like

env_name=deploy_lcp_idc_edc_single.716.2016-03-10_11-23-50
q="" ;for f in admin management private public storage; do q="$q|`virsh net-dumpxml ${env_name}_$f | grep fuelbr | sed -e 's|.*\(fuelbr[0-9]*\).*|\1|g'`" ; done; iptables-save | grep -E "${q#|} "

on different envs and ensure the firewall rules order are different for different envs, if you'l read the rules(or topic message) carefully you'l notice that rules order will have impact on routing.

Changed in fuel:
status: Incomplete → New
Revision history for this message
Sergey Yudin (tsipa740) wrote :

env_name=deploy_lcp_idc_edc_single.716.2016-03-10_11-23-50

is just an placeholder, you have to go to any of your nodes and put your env names there.

Changed in fuel:
status: New → Confirmed
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

There is possibility that libvirt is doing work asynchroniously and rules in IPTABLES are applied concurrently.

Iptables rules are not determenistic: [0] and [1] are in different ordering. And order in IPTABLES is meaningfull.

I am setting this to framework-team lead, due to fact that this should be fixed in very correct way in fuel-devops framework

[0] http://paste.openstack.org/show/490229/
[1] http://paste.openstack.org/show/490228/

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Dennis Dmitriev (ddmitriev)
summary: - networks defined by fuel-devops must have production-like connectivity
+ Iptables ordering in fuel-devops is not determenistic. Networks defined
+ by fuel-devops must have production-like connectivity
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

@Sergey, can you please clarify which exactly networks you mean?

In the bug description, you shown 'admin_fuelbr' and 'pub_subnet', but point to management and public networks. So, 'management' network means Fuel admin network 'admin/PXE'? Or it is OpenStack management network?

Anyway, it is the libvirt behaviour: networks are created in such way that there is *must* be connectivity between different networks thru the host's routing.

The rules that are in the description ' --reject-with icmp-port-unreachable' - not for network isolation. It just filters out the packets going from/to the bridge of a network which are not belong to the networks's CIDR:

# Allow forward *to* libvirt network only for existing connections
-A FORWARD -d 10.109.0.0/24 -o virbr112 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

# Allow forward *from* libvirt network 10.109.0.0/24
-A FORWARD -s 10.109.0.0/24 -i virbr112 -j ACCEPT

# Allow any packets only inside the libvirt network
-A FORWARD -i virbr112 -o virbr112 -j ACCEPT

# Reject any other packets.
-A FORWARD -o virbr112 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i virbr112 -j REJECT --reject-with icmp-port-unreachable

So, if you try to access virbr112 directly *from* another libvirt network virbr113, your packets will be rejected by the rules for virbr113.

*But* if you try to access 10.109.0.0/24 addresses from another network, it *will* be accessible with host's routing table:
10.109.0.0/24 dev virbr112 proto kernel scope link src 10.109.0.1
10.109.2.0/24 dev virbr113 proto kernel scope link src 10.109.2.1

Please confirm that this issue is connected exactly to iptables rules, and provide more details:
 - how the networks are created/started (with fuel-qa system tests or with dos.py, or with some custom scripts that use virsh)
 - how exactly you check the presence of issue (to let us reproduce the check)
 - version of libvirt-bin package.

Revision history for this message
Sergey Yudin (tsipa740) wrote :

Hi. Guys, i've double checked and apparently i can't reproduce the issue anymore.

I've reched the original issue and it seems like originaly it was problem with two NATed networks - public and admin(PXE).

ADMIN(PXE) network rules:
-A FORWARD -d 10.109.0.0/24 -o fuelbr10805 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.109.0.0/24 -i fuelbr10805 -j ACCEPT
-A FORWARD -i fuelbr10805 -o fuelbr10805 -j ACCEPT
-A FORWARD -o fuelbr10805 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i fuelbr10805 -j REJECT --reject-with icmp-port-unreachable

PUB network:
-A FORWARD -d 10.109.3.0/24 -o fuelbr10808 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.109.3.0/24 -i fuelbr10808 -j ACCEPT
-A FORWARD -i fuelbr10808 -o fuelbr10808 -j ACCEPT
-A FORWARD -o fuelbr10808 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i fuelbr10808 -j REJECT --reject-with icmp-port-unreachable

If ADMIN net will be injected earlier than PUB
packets from 10.109.0.1 to 10.109.3.1 will be processed by
-A FORWARD -s 10.109.0.0/24 -i fuelbr10805 -j ACCEPT

otherwise, when PUB injected before ADMIN same packets will be processed by
-A FORWARD -o fuelbr10808 -j REJECT --reject-with icmp-port-unreachable

That leads to false positive tests of ostf tests which was supposed to failed because of absence of connectivity from PXE net to PUB net.

Apparently i can't see the presence of the issue anywhere now. Let's close the bug for a while, i will report you when i will see the env with this problem.

Changed in fuel:
milestone: 9.0 → 10.0
Revision history for this message
Sergey Yudin (tsipa740) wrote :
Download full text (5.1 KiB)

Here is unexpected behavior, i can't ping 10.0.213.3(public) from 10.0.210.2(admin)

(venv-fuel-devops-main-9.0-2.9.19)root@dl380-108:~# source /home/jenkins/venv-fuel-devops-main-9.0-2.9.19/bin/activate; dos.py net-list iso_800_multirack_0
NETWORK NAME IP NET
-------------- -------------
admin 10.0.210.0/24
management 10.0.211.0/24
storage 10.0.212.0/24
public 10.0.213.0/24
private 10.0.214.0/24
private2 10.0.215.0/24
management2 10.0.216.0/24
admin2 10.0.217.0/24
public2 10.0.218.0/24
private3 10.0.219.0/24
admin3 10.0.220.0/24
management3 10.0.221.0/24
public3 10.0.222.0/24

-A FORWARD -d 10.0.214.0/24 -o fuelbr4551 -j ACCEPT
-A FORWARD -s 10.0.214.0/24 -i fuelbr4551 -j ACCEPT
-A FORWARD -i fuelbr4551 -o fuelbr4551 -j ACCEPT
-A FORWARD -o fuelbr4551 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i fuelbr4551 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -d 10.0.213.0/24 -o fuelbr4550 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.0.213.0/24 -i fuelbr4550 -j ACCEPT
-A FORWARD -i fuelbr4550 -o fuelbr4550 -j ACCEPT
-A FORWARD -o fuelbr4550 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i fuelbr4550 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -d 10.0.212.0/24 -o fuelbr4549 -j ACCEPT
-A FORWARD -s 10.0.212.0/24 -i fuelbr4549 -j ACCEPT
-A FORWARD -i fuelbr4549 -o fuelbr4549 -j ACCEPT
-A FORWARD -o fuelbr4549 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i fuelbr4549 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -d 10.0.211.0/24 -o fuelbr4548 -j ACCEPT
-A FORWARD -s 10.0.211.0/24 -i fuelbr4548 -j ACCEPT
-A FORWARD -i fuelbr4548 -o fuelbr4548 -j ACCEPT
-A FORWARD -o fuelbr4548 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i fuelbr4548 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -d 10.0.210.0/24 -o fuelbr4547 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.0.210.0/24 -i fuelbr4547 -j ACCEPT
-A FORWARD -i fuelbr4547 -o fuelbr4547 -j ACCEPT
-A FORWARD -o fuelbr4547 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i fuelbr4547 -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -o fuelbr4551 -p udp -m udp --dport 68 -j ACCEPT
-A OUTPUT -o fuelbr4550 -p udp -m udp --dport 68 -j ACCEPT
-A OUTPUT -o fuelbr4549 -p udp -m udp --dport 68 -j ACCEPT
-A OUTPUT -o fuelbr4548 -p udp -m udp --dport 68 -j ACCEPT
-A OUTPUT -o fuelbr4547 -p udp -m udp --dport 68 -j ACCEPT

------------------------------
Expected behavior, i can ping 10.0.146.3(public) from 10.0.143.2(admin)

(venv-fuel-devops-main-9.0-2.9.19)root@dl380-107:~# source /home/jenkins/venv-fuel-devops-main-9.0-2.9.19/bin/activate; dos.py net-list deploy_aic_contrail_large_ha_env_ssl_single.248.2016-04-29_07-51-19
NETWORK NAME IP NET
-------------- -------------
public3 10.0.155.0/24
management3 10.0.154.0/24
admin3 10.0.153.0/24
private3 10.0.152.0/24
public2 10.0.151.0/24
admin2 10.0.150.0/24
management2 10.0.149.0/24
private2 10.0.148.0/24
private 10.0.147.0/24
public 10....

Read more...

Revision history for this message
Sergey Yudin (tsipa740) wrote :

https://github.com/openstack/fuel-devops/blob/master/devops/models/environment.py#L158

this code actually causing this problem. libvirt works asynchronously and starting networks in parallel create race condition here. My suggestion is to add sleep there, or implement something like 'status' method for driver and wait for status, for libvirt that may be iptables-save | grep bridge or something like that

Revision history for this message
Sergey Yudin (tsipa740) wrote :

Sorry, i meant this code https://github.com/openstack/fuel-devops/blob/2.9.20/devops/models/environment.py#L127

in master something was changed in the code(actually seems like minor changes here) and i'm not 100% the bug is still here.

Revision history for this message
Sergey Yudin (tsipa740) wrote :

1st problem is self.get_networks() is query to devops db and it is not ordered, so it may return networks in random order

2nd problem is asyncronous creation of networks by libvirt

so my suggestion is using self.get_networks().order_by('pk')

and append sleep after start

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.