Neutron packets with 1500MTU cannot go through a vm

Bug #1888287 reported by Yvonne Ding
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Incomplete
Medium
Yi Wang

Bug Description

Brief Description
-----------------
The traffic loss is observed when sending frame 1500MTU.

Severity
--------
Major

Steps to Reproduce
------------------
1. Launch a pair of test vms
2. Configuring vms with MTU=1500
3. Configure traffic frameSize to 1500
4. Start traffic and verify loss

TC-name:
test_jumbo_frames[virtio]

Expected Behavior
-----------------
No traffic loss when sending frame <= 1500

Actual Behavior
----------------
Traffic loss when sending frame <= 1500

Reproducibility
---------------
reproducible

System Configuration
--------------------
Regular standard 2+2

Lab-name:
wcp_7_10

Branch/Pull Time/Commit
-----------------------
BUILD_ID="r/stx.4.0"

Timestamp/Logs
--------------
[2020-07-17 05:59:27,193] 701 INFO MainThread ixia_helper.traffic_regenerate:: Regenerating all trafficItems
[2020-07-17 05:59:27,776] 712 INFO MainThread ixia_helper.traffic_apply:: Applying all traffic to hardware
[2020-07-17 05:59:30,940] 742 INFO MainThread ixia_helper.traffic_start:: Starting all traffic
[2020-07-17 05:59:48,965] 941 INFO MainThread ixia_helper.get_statistics:: matched with view ::ixNet::OBJ-/statistics/view:"Traffic Item Statistics"
[2020-07-17 05:59:49,973] 722 INFO MainThread test_qos_security.test_jumbo_frames:: Observed Loss %: 100
[2020-07-17 05:59:49,975] 761 INFO MainThread ixia_helper.traffic_stop:: Stopping all traffic

[2020-07-17 06:00:02,140] 61 DEBUG MainThread conftest.update_results:: ***Failure at test call: /home/svc-cgcsauto/wassp-repos.new/testcases/cgcs/CGCSAuto/testcases/functional/neutron/test_qos_security.py:727: AssertionError: expected no loss when sending frame <= provider_mtu
***Details: vm_type = 'virtio', check_avs_pattern = None, update_network_quotas = None
......

                if mtu < flow_mtu:
                    assert loss >= 99, "expected loss when sending frame > provider_mtu"
                else:
> assert loss <= 1, "expected no loss when sending frame <= provider_mtu"
E AssertionError: expected no loss when sending frame <= provider_mtu
E assert 100 <= 1

Logs of .tar and automation log as below,
https://files.starlingx.kube.cengn.ca/launchpad/1888171

Test Activity
-------------
Test neutron function with the openstack install system

Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to the distro.openstack team for review

Changed in starlingx:
assignee: nobody → yong hu (yhu6)
yong hu (yhu6)
Changed in starlingx:
importance: Undecided → Medium
assignee: yong hu (yhu6) → Yi Wang (wangyi4)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Issue w/ openstack regression after the rebase to Ussuri; maybe related to the rebase. Marking as stx.4.0 gating for now until further investigation by the distro.openstack team.

tags: added: stx.4.0
Changed in starlingx:
status: New → Triaged
Revision history for this message
Yi Wang (wangyi4) wrote :
Download full text (6.1 KiB)

@Yvonne Ding, I can't reproduce this issue in my environment. Can you share more information about your test? For example, network configuration, test app, etc.

Below is my configuration:

1. I deployed a multi (2+2) with stx.4.0-20200717T143848Z.
2. vm network
| mtu | 1500 |
| name | public-net0 |
| port_security_enabled | True |
| project_id | 3f7e6801f7b140f49b060952e32eba83 |
| provider:network_type | vlan |
| provider:physical_network | physnet0 |
| provider:segmentation_id | 400

3. subnet
controller-0:~$ openstack subnet show public-subnet0
| allocation_pools | 192.168.101.2-192.168.101.254 |
| cidr | 192.168.101.0/24 |
| created_at | 2020-07-24T03:09:22Z |
| description | |
| dns_nameservers | |
| enable_dhcp | True |
| gateway_ip | 192.168.101.1 |
| host_routes | |
| id...

Read more...

Revision history for this message
Yvonne Ding (yding) wrote :

Hi Yi Wang,

Could you please find details in https://files.starlingx.kube.cengn.ca/launchpad/1888171?
The issue was found in automation regression.

Revision history for this message
Yi Wang (wangyi4) wrote :

@Yvonne Ding, I checked "TIS_AUTOMATION.log". Found a piece of code there about the test case. From it, I know

1. one VM was created from "Tenant.get_primary()", the other one was created from "Tenant.get_secondary()". 2. mtu was got from the second nic of the two VMs.
3. configure some routing
4. create a session by "vm_helper.traffic_between_vms()" for traffic test.

Without other code, I have no idea about how the network was setup and how the traffic test was performed.
Could you share with me manual test steps?

Revision history for this message
Yvonne Ding (yding) wrote :

@Yi Wang,

The issue was reported by automation. I would assume manual tests are the same with automation steps.

Revision history for this message
Yi Wang (wangyi4) wrote :

@Yvonne Ding,

I already tried different tenant network configurations. But I can't reproduce it. I don't have your automation test code. That's why I need your help to share manual test steps.

Changed in starlingx:
status: Triaged → Incomplete
Revision history for this message
Yvonne Ding (yding) wrote :

@Yi Wang,

Given you don't have automation test code, here are some of my thoughts for your reference.

1)Identify the root cause from logs which include tarball .tgz and automation log. Provide me the patch and I will run automation test case for you.

2)If above is not applicable, reproduce manually. To me manually test steps would be the same with steps extracting from TIS_AUTOMATION.log. Either run on your side or my side has no difference since TIS_AUTOMATION.log is provided.

Thanks,

Revision history for this message
Yi Wang (wangyi4) wrote :

@Yvonne Ding,

1. As I already said, I can't reproduce it in my side.
2. Some test steps were performed by your automation code instead of openstack commands. I don't know exactly what they did from the log.

Revision history for this message
Yi Wang (wangyi4) wrote :

So far, I have tried two network configurations. One is two VMs connected to the same network, the other is two VMS connected to different networks (that is the case of this automation test). I didn't reproduce this issue.

Revision history for this message
Yi Wang (wangyi4) wrote :

https://bugs.launchpad.net/starlingx/+bug/1888300

LP 1888300 has the same log as this one. I think the two bugs were found in the same round test. As Yan commented in LP 1888300 #4, "And see from the whole log, the traffic statistics never passed, it seems that the connection is always failing."

1. Can you help to check if the test config is right?
2. If frameSize is less than 1500, did you see package loss?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.