vifs not reachable when VM have multiple vifs on same tenant network

Bug #1790941 reported by Senthil Mukundakumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
Patrick Bonnell

Bug Description

Brief Description
-----------------
Only first vif is reachable when vm has multiple vifs over the same tenant net

Severity
--------
Major

Steps to Reproduce
------------------
1. Launch VM with management + 15 vifs.
[2018-09-02 01:11:43,528] 690 INFO MainThread vm_helper.boot_vm :: Booting VM tenant2-if_attach-tis-centos-guest-image-16...
[2018-09-02 01:11:43,529] 691 INFO MainThread vm_helper.boot_vm :: nova boot --image 71deccb7-9e64-4160-b8f4-649086623581 --key-name keypair-tenant2 --flavor df57e14f-b0ba-4819-be24-b6dae850aa20 --nic net-id=221ed54b-df98-4875-a6e3-c179f02f1c5c,vif-model=virtio --nic port-id=50ef7208-edd0-466a-8493-0e851d9729bb,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio --nic net-id=0b736d87-413a-4fb8-a6c0-58800f9ba0c4,vif-model=virtio tenant2-if_attach-tis-centos-guest-image-16 --poll
2. Verify all vnics are pingable from natbox

Expected Behavior
------------------
All interfaces should be pingable

Actual Behavior
----------------
]0;root@tenant2-if-attach-tis-centos-guest-image-16:~tenant2-if-attach-tis-centos-guest-image-16:~#
[2018-09-02 01:41:58,063] 428 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2018-09-02 01:41:58,063] 264 DEBUG MainThread ssh.send :: Send 'ip route'
[2018-09-02 01:41:58,169] 391 DEBUG MainThread ssh.expect :: Output:
default via 192.168.185.1 dev eth0
169.254.169.254 via 172.18.1.128 dev eth15 proto static
172.18.1.0/24 dev eth1 proto kernel scope link src 172.18.1.131
172.18.1.0/24 dev eth2 proto kernel scope link src 172.18.1.133
172.18.1.0/24 dev eth3 proto kernel scope link src 172.18.1.142
172.18.1.0/24 dev eth4 proto kernel scope link src 172.18.1.143
172.18.1.0/24 dev eth5 proto kernel scope link src 172.18.1.137
172.18.1.0/24 dev eth6 proto kernel scope link src 172.18.1.139
172.18.1.0/24 dev eth7 proto kernel scope link src 172.18.1.134
172.18.1.0/24 dev eth8 proto kernel scope link src 172.18.1.144
172.18.1.0/24 dev eth9 proto kernel scope link src 172.18.1.129
172.18.1.0/24 dev eth10 proto kernel scope link src 172.18.1.146
172.18.1.0/24 dev eth11 proto kernel scope link src 172.18.1.147
172.18.1.0/24 dev eth12 proto kernel scope link src 172.18.1.140
172.18.1.0/24 dev eth13 proto kernel scope link src 172.18.1.135
172.18.1.0/24 dev eth14 proto kernel scope link src 172.18.1.149
172.18.1.0/24 dev eth15 proto kernel scope link src 172.18.1.132
192.168.185.0/27 dev eth0 proto kernel scope link src 192.168.185.10
192.168.185.32/27 dev eth0 proto static scope link
192.168.185.64/27 dev eth0 proto static scope link

E Details: Ping unsuccessful from vm (logged in via 192.168.185.3): {'172.18.1.149': 100, '172.18.1.142': 100, '172.18.1.137': 100, '172.18.1.129': 100, '172.18.1.144': 100, '172.18.1.135': 100, '172.18.1.147': 100, '172.18.1.134': 100, '172.18.1.139': 100, '172.18.1.146': 100, '172.18.1.133': 100, '172.18.1.131': 0, '172.18.1.143': 100, '172.18.1.140': 100, '172.18.1.132': 100}
Only first vif is reachable when vm has multiple vifs over the same tenant net

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
StarlingX master as of 2018-08-31_20-18-00

Ghada Khalil (gkhalil)
tags: added: stx.networking
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Steven Webster (swebster-wr)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Issue seems to be reproducible in a number of automated runs. Marking for stx.2018.10

tags: added: stx.2018.10
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: Steven Webster (swebster-wr) → Patrick Bonnell (pbonnell)
Ghada Khalil (gkhalil)
Changed in starlingx:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Patrick Bonnell (pbonnell) wrote :
Download full text (6.2 KiB)

This issue isn't actually a bug; this is how Linux behaves.
For this to make more sense, please consider the following information:

rp_filter - INTEGER
 0 - No source validation.
 1 - Strict mode as defined in RFC3704 Strict Reverse Path
     Each incoming packet is tested against the FIB and if the interface
     is not the best reverse path the packet check will fail.
     By default failed packets are discarded.
 2 - Loose mode as defined in RFC3704 Loose Reverse Path
     Each incoming packet's source address is also tested against the FIB
     and if the source address is not reachable via any interface
     the packet check will fail.

 Current recommended practice in RFC3704 is to enable strict mode
 to prevent IP spoofing from DDos attacks. If using asymmetric routing
 or other complicated routing, then loose mode is recommended.

 The max value from conf/{all,interface}/rp_filter is used
 when doing source validation on the {interface}.

 Default value is 0. Note that some distributions enable it
 in startup scripts.

arp_ignore - INTEGER
 Define different modes for sending replies in response to
 received ARP requests that resolve local target IP addresses:
 0 - (default): reply for any local target IP address, configured
 on any interface
 1 - reply only if the target IP address is local address
 configured on the incoming interface
 2 - reply only if the target IP address is local address
 configured on the incoming interface and both with the
 sender's IP address are part from same subnet on this interface
 3 - do not reply for local addresses configured with scope host,
 only resolutions for global and link addresses are replied
 4-7 - reserved
 8 - do not reply for all local addresses

 The max value from conf/{all,interface}/arp_ignore is used
 when ARP request is received on the {interface}

Now consider this system:

  ----------- 10.10.0.5 ---- -----------
 | eth0|-------------------| | | |
 | | 00:00:00:00:00:05 | | 10.10.0.12 | |
 | VM-1 | | br |-------------------| VM-2 |
 | | 10.10.0.6 | | 00:00:00:00:00:12 | |
 | eth1|-------------------| | | |
  ----------- 00:00:00:00:00:06 ---- -----------

IPs are assigned on a per node basis, not on a per interface basis.
By default, each vNIC is configured with rp_filter=1, meaning that if an ARP
request is received on an interface, it must be able to find a route back to
the source IP from the interface it was received on, otherwise no ARP response
will be sent back. In addition, each interface is configured with arp_ignore=0,
meaning that any interface can respond to the ARP request as long as the node
holds that IP.

This is what happens when 10.10.0.12 (VM-2) pings 10.10.0.6 (VM-1):
 - An ARP request is broadcasted by 10.10.0.12 asking for the MAC address
  of 10.10.0.6.
 - eth1 (10.10.0.6) sees the request and checks the routing table to determine
  if a route back to the source IP address can be found (this is a security
  mechanism put in place with rp...

Read more...

Revision history for this message
Patrick Bonnell (pbonnell) wrote :

The example system did not copy and paste properly.

VM-1
  - eth0 at 00:00:00:00:00:05 with IP 10.10.0.5
  - eth1 at 00:00:00:00:00:06 with IP 10.10.0.6

VM-2
  - one interface at 00:00:00:00:00:12 with IP 10.10.0.12

VM-1 and VM-2 are connected to a vbridge

Changed in starlingx:
status: Triaged → Invalid
Ken Young (kenyis)
tags: added: stx.1.0
removed: stx.2018.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.