tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_mtu_sized_frames is failing

Bug #1714660 reported by Arx Cruz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Sagi (Sergey) Shnaidman

Bug Description

tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_mtu_sized_frames test started to fail in master periodic job:

http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-nonha-tempest-oooq-master/842e791/logs/tempest.html.gz

Arx Cruz (arxcruz)
Changed in tripleo:
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Arx Cruz (arxcruz)
Changed in tripleo:
milestone: none → pike-rc2
Changed in tripleo:
milestone: pike-rc2 → queens-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/500271
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=35c0b6c5274fcf3ee9f2188c8ed0d3a755892f5e
Submitter: Jenkins
Branch: master

commit 35c0b6c5274fcf3ee9f2188c8ed0d3a755892f5e
Author: Arx Cruz <email address hidden>
Date: Sat Sep 2 13:20:00 2017 +0200

    Adding test_mtu_sized_frames to skip list

    Bug #1714660 adding to skip list while it's being investigated

    Related-Bug: #1714660
    Change-Id: I9794027f05d42ddc004af64ac9da50a3568fca16

Arx Cruz (arxcruz)
Changed in tripleo:
assignee: Arx Cruz (arxcruz) → nobody
tags: added: promotion-blocker
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Several observations:

0. it's l3 legacy setup with a single controller carrying all floating ips. For l2 connectivity, openvswitch is used.

1. the test case configures a server, assigns floating ip, waits till it gets to ACTIVE, and then ssh-connects to the instance using the fip. So the data plane involves both openvswitch agents (compute and controller) as well as l3 agent on the controller node.

2. service logs don't reveal any relevant errors around the time when floating ip is configured and transitions to ACTIVE (I checked ovs agents + l3 agent). Sadly, logs don't have debug enabled, so we can't see a lot of vital information, like whether l3 agent executed arping at the right moment, or whether it configured the floating IP address on qg (gateway) device.

3. when the test case fails, it usually logs console log for the failed instance by virtue of _log_console_output method. Sadly, tempest in the job is not configured to use this feature, so all we get is "Console output not supported, cannot log". We should probably enable the feature by setting CONF.compute_feature_enabled.console_output option in tempest.conf. This would help us to reveal any issues with the instance boot.

4. the intent of the test case is to validate that carefully crafted frames of the size that corresponds to MTU of the network can pass through l3 layer without being fragmented. I think we could enhance the test to first try simple fragmented connectivity for the instance under test before going with a more advanced MTU-limited check. If the former check would pass, we would at least know that the instance is not dead (f.e. because of a kernel crash) and the issue is in MTU-specific check. This wouldn't solve the issue but would give us some clue of what happens there.

To recap, I don't think we have enough logs to meaningfully reason about the failure. To get us to a happier place, we would need to 1) enable debug logs for neutron services; 2) enable console_output in tempest; 3) extend the test to first sanity check simple connectivity. Note it all won't solve a single thing but will at least give us a better chance to spot the root cause.

Revision history for this message
Emilien Macchi (emilienm) wrote :
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Emilien, I am asking for debug logs for services, not tempest.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Revision history for this message
Emilien Macchi (emilienm) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :

Patch to enable console_output for tests https://review.openstack.org/#/c/508071

tags: added: networking
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

It's because openstack/tripleo-quickstart-extras: 0c40d0e sets infrastructure MTU to 1350 (1500 seems to have problems in our infra), and we never tell neutron about it, so it happily creates networks with MTUs up to 1500 depending on their type.

To fix it, we should set global_physnet_mtu in neutron.conf on overcloud to 1350. It can be achieved with NeutronGlobalPhysnetMtu tripleo parameter.

Revision history for this message
wes hayutin (weshayutin) wrote :

* caboucha has quit (Quit: Leaving)
<ihrachys> still testing it
<weshay> rlandy, https://github.com/openstack/instack-undercloud/blob/master/instack_undercloud/undercloud.py#L208-L210
<weshay> rlandy, would just need to pass a variable to define that for ovb
<weshay> and for the overcloud
* weshay looks
<weshay> param: NeutronGlobalPhysnetMtu ?
<ihrachys> weshay, yes

Changed in tripleo:
assignee: nobody → wes hayutin (weshayutin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.openstack.org/509660

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
yatin (yatinkarel) wrote :

Adding it here for information purpose the issue i faced on ovb setup on rdocloud, The fix is same that proposed here:- https://bugs.launchpad.net/tripleo/+bug/1718655 review in: https://review.openstack.org/#/c/506152/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/509660
Reason: https://review.openstack.org/#/c/506152

Changed in tripleo:
assignee: wes hayutin (weshayutin) → Ihar Hrachyshka (ihar-hrachyshka)
Changed in tripleo:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → Sagi (Sergey) Shnaidman (sshnaidm)
Revision history for this message
wes hayutin (weshayutin) wrote :

https://review.openstack.org/#/c/506152/ closing, please reopen if I'm incorrect

Changed in tripleo:
status: In Progress → Fix Committed
Changed in tripleo:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/507642
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=3f431e3fe3c3199ec9153c6007a09c0ad242f038
Submitter: Zuul
Branch: master

commit 3f431e3fe3c3199ec9153c6007a09c0ad242f038
Author: Ihar Hrachyshka <email address hidden>
Date: Tue Sep 26 18:24:09 2017 +0000

    Revert "Adding test_mtu_sized_frames to skip list"

    This reverts commit 35c0b6c5274fcf3ee9f2188c8ed0d3a755892f5e.

    Closes-Bug: #1714660
    Change-Id: I623eb70babb22b1b5c42d845ffdcdb7848809041

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/509660
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=dbefdc2995d68e8480802eb42c93a62eba14c65a
Submitter: Zuul
Branch: master

commit dbefdc2995d68e8480802eb42c93a62eba14c65a
Author: Wes Hayutin <email address hidden>
Date: Wed Oct 4 18:10:03 2017 -0400

    Add config to set neutron mtu in undercloud/overcloud

    In some environments we are setting the mtu at the
    operating system level without setting the mtu in neutron.
    This can lead to errors and failed tempest tests.

    Closes-Bug: #1714660
    Depends-On: I16f8f49b8e4ea35407ab2f2794d5dce8f2c03019
    Change-Id: Iedd4cfbb0e1c9471cb1ae53b8b6acc266273463f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart-extras 2.1.1

This issue was fixed in the openstack/tripleo-quickstart-extras 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.