OSTF test 'Check network connectivity from instance via floating IP' fails on Neutron VLAN and bond in balance-rr mode.

Bug #1538463 reported by Alexander Zatserklyany
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Fuel QA Team
8.0.x
Invalid
Medium
Fuel QA Team

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "478"
---------------------

Steps to reproduce
------------------
1. Create cluster with Neutron VLAN network provider
2. Add 3 nodes with controller and mongo roles
3. Add a node with compute and cinder roles
4. Setup bonding for public, management, private and storage interfaces in balance-rr mode; repeat for all nodes
5. Run network verification
6. Deploy the changes
7. Run network verification
8. Run OSTF

Expected results:
All tests pass

Actual results:
1. 'Check network connectivity from instance via floating IP' (661.4 s) VM connectivity doesn`t function properly.
2. 'Launch instance, create snapshot, launch instance from snapshot' (229.7 s) Snapshot of an instance can not be created.

Note:
All works as expected if bonded interfaces configured in active-backup mode.

Changed in fuel:
assignee: nobody → MOS Neutron (mos-neutron)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexander, could you please provide a diagnostic snapshot with logs?

Changed in fuel:
assignee: MOS Neutron (mos-neutron) → Alexander Zatserklyany (zatserklyany)
status: New → Incomplete
tags: added: area-neutron
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :
Changed in fuel:
status: Incomplete → Confirmed
importance: Undecided → High
milestone: 8.0 → 9.0
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Please attach the snapshot

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Alexander Zatserklyany (zatserklyany) wrote :
Changed in fuel:
status: Incomplete → In Progress
Revision history for this message
Alexander Zatserklyany (zatserklyany) wrote :

AssertionError: Failed 3 OSTF tests; should fail 0 tests. Names of failed tests:
  - Create volume and boot instance from it (failure) Failed to get to expected status. In error state. Please refer to OpenStack logs for more details.
  - Check network connectivity from instance via floating IP (failure) VM connectivity doesn`t function properly. Please refer to OpenStack logs for more details.
  - Launch instance with file injection (failure) Execution command on Instance fails with unexpected result. Please refer to OpenStack logs for more details.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Sorry I was confused by Max comment, and start to change status.

Max fix merged for your problem, thanks for quick notify

Changed in fuel:
assignee: Alexander Zatserklyany (zatserklyany) → Tatyanka (tatyana-leontovich)
status: In Progress → Incomplete
assignee: Tatyanka (tatyana-leontovich) → Alexander Zatserklyany (zatserklyany)
Changed in fuel:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-ostf (master)

Reviewed: https://review.openstack.org/273101
Committed: https://git.openstack.org/cgit/openstack/fuel-ostf/commit/?id=730483f43b6fc285f095c11bffe0c17bb7bbcf0b
Submitter: Jenkins
Branch: master

commit 730483f43b6fc285f095c11bffe0c17bb7bbcf0b
Author: tatyana-leontovich <email address hidden>
Date: Wed Jan 27 15:20:30 2016 +0000

    Revert "Fix a call to retry_command function"

    This patch broke bvt with:
    http://paste.openstack.org/show/485157/
    This reverts commit befc27e62da5ca9e3ca4da85ab3e8f86cb0eb89a.

    Closes-Bug:#1538463
    Change-Id: I0b5699a59d1e2f621a7eb2ec2952a6358dbedccf

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → Confirmed
Changed in fuel:
assignee: Alexander Zatserklyany (zatserklyany) → MOS Neutron (mos-neutron)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexander, I've taken a look at logs and there are multiple errors implicating that the env is way too slow:

http://paste.openstack.org/show/485296/

which caused multiple VM boot failures.

Still, connectivity errors seem to be unrelated.

Have you tested this on VMs or on baremetal? Could you please give access to your lab, so that we troubleshoot the connectivity issue?

Changed in fuel:
status: Confirmed → Incomplete
assignee: MOS Neutron (mos-neutron) → Alexander Zatserklyany (zatserklyany)
Revision history for this message
Alexander Zatserklyany (zatserklyany) wrote :

Roman, I've tested 'network connectivity from instance via floating IP' manually and waited ping long enough.
I tested on KVM and didn't tested on baremetal.

Revision history for this message
Alexander Zatserklyany (zatserklyany) wrote :

Env reverted on cz5163.bud.mirantis.net

Changed in fuel:
status: Incomplete → Confirmed
assignee: Alexander Zatserklyany (zatserklyany) → MOS Neutron (mos-neutron)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Download full text (4.8 KiB)

I tried to boot a few VMs on the env Alexander provided, and see the following:

- floating IPs work as expected: neutron router creates an interfaces, listens on the specified floating IP address, DNAT network traffic correctly

- the problem is that VMs don't receive fixed IP leases:

Initializing random number generator... done.
Starting acpid: OK
cirros-ds 'local' up at 6.59
no results found for mode=local. up 7.11. searched: nocloud configdrive ec2
Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending discover...
Sending discover...
Usage: /sbin/cirros-dhcpc <up|down>
No lease, failing
WARN: /etc/rc3.d/S40-network failed
cirros-ds 'net' up at 188.54
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 188.82. request failed
failed 2/20: up 191.39. request failed
failed 3/20: up 193.64. request failed
failed 4/20: up 195.82. request failed

tcpdump on br-prv (on the compute node) shows there are no replies to DHCP requests:

07:52:23.587684 IP6 :: > ff02::1:ff64:489b: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe64:489b, length 24
07:52:23.588235 IP6 :: > ff02::1:ff64:489b: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe64:489b, length 24
07:52:23.588266 IP6 :: > ff02::1:ff64:489b: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe64:489b, length 24
07:52:23.588289 IP6 :: > ff02::1:ff64:489b: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe64:489b, length 24
07:52:23.675068 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:64:48:9b (oui Unknown), length 290
07:52:23.675724 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:64:48:9b (oui Unknown), length 290
07:52:23.675759 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:64:48:9b (oui Unknown), length 290
07:52:23.675773 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:64:48:9b (oui Unknown), length 290
07:52:24.584216 ARP, Request who-has 10.109.1.2 (Broadcast) tell 10.109.1.2, length 46
07:52:24.584362 ARP, Request who-has 10.109.1.2 (Broadcast) tell 10.109.1.2, length 46
07:52:24.584211 ARP, Request who-has 10.109.1.2 (Broadcast) tell 10.109.1.2, length 46
07:52:24.584469 ARP, Request who-has 10.109.1.2 (Broadcast) tell 10.109.1.2, length 46
07:52:25.087188 ARP, Reply 10.109.1.3 is-at 1e:33:1c:55:51:9f (oui Unknown), length 46
07:52:25.087205 ARP, Reply 10.109.1.3 is-at 1e:33:1c:55:51:9f (oui Unknown), length 46
07:52:25.087210 ARP, Reply 10.109.1.3 is-at 1e:33:1c:55:51:9f (oui Unknown), length 46
07:52:25.087256 ARP, Reply 10.109.1.3 is-at 1e:33:1c:55:51:9f (oui Unknown), length 46

At the same time I can see both requests and replies in dnsmasq logs:

<30>Jan 30 07:54:23 node-1 dnsmasq-dhcp[8851]: DHCPDISCOVER(tap7452d06b-85) fa:16:3e:64:48:9b
<30>Jan 30 07:54:23 node-1 dnsmasq-dhcp[8851]: DHCPOFFER(tap7452d06b-85) 192.168.111.12 fa:16:3e:64:48:9b
<30>Jan 30 07:54:23 node-1 dnsmasq-dhcp[8851]: DHCPDISCOVER(tap7452d06b-85) fa:16:3e:64:48:9b
<30>Jan 30 07:54:23 node-1 dnsmasq-dhcp[8851]: DHCPOFFER(tap7452d06b-85) 192.168.111.12 fa:16:3e:64:48:9b
<30>Jan 30 07:54:23 node-1 dnsmasq-dhcp[8851]: DHCPDISC...

Read more...

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Bridges configuration:

root@node-2:~# ovs-vsctl show
f8329eee-a007-497d-a9f6-dd01bc9f265e
    Bridge br-prv
        Port "p_e52381cd-0"
            Interface "p_e52381cd-0"
                type: internal
        Port br-prv
            Interface br-prv
                type: internal
        Port phy-br-prv
            Interface phy-br-prv
                type: patch
                options: {peer=int-br-prv}
    Bridge br-int
        fail_mode: secure
        Port "qvocac24715-63"
            tag: 8
            Interface "qvocac24715-63"
        Port br-int
            Interface br-int
                type: internal
        Port int-br-prv
            Interface int-br-prv
                type: patch
                options: {peer=phy-br-prv}
    ovs_version: "2.3.1"
root@node-2:~# brctl show
bridge name bridge id STP enabled interfaces
br-aux 8000.5687fbc82a46 no bond0
       p_e52381cd-0
br-fw-admin 8000.6447cbcf4ba3 no bond1
br-mgmt 8000.6406ee223358 no bond0.101
br-storage 8000.6406ee223358 no bond0.102
qbrcac24715-63 8000.ba67870a8ab5 no qvbcac24715-63
       tapcac24715-63

Bonding:

root@node-2:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

Slave Interface: enp0s8
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 64:06:ee:22:33:58
Slave queue ID: 0

Slave Interface: enp0s5
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 64:b5:51:f6:db:4a
Slave queue ID: 0

Slave Interface: enp0s6
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 64:f1:ae:7d:b0:74
Slave queue ID: 0

Slave Interface: enp0s7
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 64:46:6c:7f:51:6f
Slave queue ID: 0

Revision history for this message
Oleg Bondarev (obondarev) wrote :
Download full text (3.8 KiB)

I see DHCP replies on on bond0 and br-aux:

root@node-2:~# tcpdump -nlei br-aux port 67 or port 68
tcpdump: WARNING: br-aux: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-aux, link-type EN10MB (Ethernet), capture size 65535 bytes
11:49:23.940680 fa:16:3e:f1:16:dd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 336: vlan 1000, p 0, ethertype IPv4, 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:f1:16:dd, length 290
11:49:23.941019 fa:16:3e:f1:16:dd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 336: vlan 1000, p 0, ethertype IPv4, 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:f1:16:dd, length 290
11:49:23.941062 fa:16:3e:f1:16:dd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 336: vlan 1000, p 0, ethertype IPv4, 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:f1:16:dd, length 290
11:49:23.941080 fa:16:3e:f1:16:dd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 336: vlan 1000, p 0, ethertype IPv4, 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:f1:16:dd, length 290
11:49:23.943101 fa:16:3e:55:88:3a > fa:16:3e:f1:16:dd, ethertype 802.1Q (0x8100), length 374: vlan 1000, p 0, ethertype IPv4, 192.168.111.2.67 > 192.168.111.16.68: BOOTP/DHCP, Reply, length 328
11:49:23.943102 fa:16:3e:55:88:3a > fa:16:3e:f1:16:dd, ethertype 802.1Q (0x8100), length 374: vlan 1000, p 0, ethertype IPv4, 192.168.111.2.67 > 192.168.111.16.68: BOOTP/DHCP, Reply, length 328
11:49:23.943368 fa:16:3e:55:88:3a > fa:16:3e:f1:16:dd, ethertype 802.1Q (0x8100), length 374: vlan 1000, p 0, ethertype IPv4, 192.168.111.2.67 > 192.168.111.16.68: BOOTP/DHCP, Reply, length 328
11:49:23.943369 fa:16:3e:55:88:3a > fa:16:3e:f1:16:dd, ethertype 802.1Q (0x8100), length 374: vlan 1000, p 0, ethertype IPv4, 192.168.111.2.67 > 192.168.111.16.68: BOOTP/DHCP, Reply, length 328
11:49:23.944348 fa:16:3e:94:3f:5e > fa:16:3e:f1:16:dd, ethertype 802.1Q (0x8100), length 374: vlan 1000, p 0, ethertype IPv4, 192.168.111.3.67 > 192.168.111.16.68: BOOTP/DHCP, Reply, length 328
11:49:23.944350 fa:16:3e:94:3f:5e > fa:16:3e:f1:16:dd, ethertype 802.1Q (0x8100), length 374: vlan 1000, p 0, ethertype IPv4, 192.168.111.3.67 > 192.168.111.16.68: BOOTP/DHCP, Reply, length 328
11:49:23.944351 fa:16:3e:94:3f:5e > fa:16:3e:f1:16:dd, ethertype 802.1Q (0x8100), length 374: vlan 1000, p 0, ethertype IPv4, 192.168.111.3.67 > 192.168.111.16.68: BOOTP/DHCP, Reply, length 328
11:49:23.944500 fa:16:3e:94:3f:5e > fa:16:3e:f1:16:dd, ethertype 802.1Q (0x8100), length 374: vlan 1000, p 0, ethertype IPv4, 192.168.111.3.67 > 192.168.111.16.68: BOOTP/DHCP, Reply, length 328

but there are no replies on p_e52381cd-0:

root@node-2:~# tcpdump -nlei p_e52381cd-0 port 67 or port 68
tcpdump: WARNING: p_e52381cd-0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on p_e52381cd-0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:49:23.940680 fa:16:3e:f1:16:dd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 336: vlan 1000, p 0, ethertype IPv4, 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Req...

Read more...

Changed in fuel:
assignee: MOS Neutron (mos-neutron) → Sergey Vasilenko (xenolog)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This will probably not make 8.0 HCF. Moving to 8.0 updates

tags: added: move-to-mu
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

Looks like RR-based bond is incompatible with floating and private networks.
I will research this case deeply.

tags: added: area-docs release-notes
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

fuel-tests should be refactored for all cases, related to bonding.
Details will be provided later.

Changed in fuel:
status: Confirmed → Triaged
assignee: Sergey Vasilenko (xenolog) → Fuel QA Team (fuel-qa)
importance: High → Medium
tags: added: area-qa
removed: area-docs area-neutron move-to-mu release-notes
tags: added: area-docs non-release
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

Should we move it to Invalid for 8.0 and reassign to fuel-qa for 9.0?

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Due to Medium priority issue moved to 9.0

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

Currently we can't test 'balance-rr' bonding with fuel-devops without additional tweaks, because this type of bonds requires ports aggregation on switch/bridge side, here is a quote from docs:

"The balance-rr, balance-xor and broadcast modes generally
require that the switch have the appropriate ports grouped together.
The nomenclature for such a group differs between switches, it may be
called an "etherchannel" (as in the Cisco example, above), a "trunk
group" or some other similar variation. For these modes, each switch
will also have its own configuration options for the switch's transmit
policy to the bond. Typical choices include XOR of either the MAC or
IP addresses. The transmit policy of the two peers does not need to
match. For these three modes, the bonding mode really selects a
transmit policy for an EtherChannel group; all three will interoperate
with another EtherChannel group. "

There are no automatic system tests which use 'balance-rr' mode. Closing this bug as invalid.

Changed in fuel:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.