Scenario tests from neutron_tempest_plugin.scenario.test_port_forwardings.PortForwardingTestJSON failing due to ssh failure

Bug #1896735 reported by Slawek Kaplonski
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Slawek Kaplonski

Bug Description

It happens mostly when scenario job is run on Ubuntu Focal and it seems that we have race between spawning guest vm and checking it's hostname during the tests.

See example of error:

Traceback (most recent call last):
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_port_forwardings.py", line 136, in test_port_forwarding_editing_and_deleting_tcp_rule
    self.check_servers_hostnames(server)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/base.py", line 473, in check_servers_hostnames
    ssh_client.exec_command('hostname'))
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/tenacity/__init__.py", line 329, in wrapped_f
    return self.call(f, *args, **kw)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/tenacity/__init__.py", line 409, in call
    do = self.iter(retry_state=retry_state)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/tenacity/__init__.py", line 356, in iter
    return fut.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/tenacity/__init__.py", line 412, in call
    result = fn(*args, **kwargs)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/ssh.py", line 178, in exec_command
    return super(Client, self).exec_command(cmd=cmd, encoding=encoding)
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 158, in exec_command
    ssh = self._get_ssh_connection()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 126, in _get_ssh_connection
    raise exceptions.SSHTimeout(host=self.host,
tempest.lib.exceptions.SSHTimeout: Connection to the 172.24.5.12 via SSH timed out.
User: cirros, Password: None

Tags: gate-failure
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-tempest-plugin (master)

Fix proposed to branch: master
Review: https://review.opendev.org/753552

Changed in neutron:
status: Confirmed → In Progress
Changed in neutron:
assignee: Slawek Kaplonski (slaweq) → Bernard Cafarelli (bcafarel)
Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Download full text (13.1 KiB)

After some more investigation it seems that there is problem with configuration of iptables rules by l3 agent and that is causing this issue:

Sep 24 05:11:31.013868 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-ab6a74e7-3d3f-4442-ba8d-910bf49347c1', 'ip6tables-save'] {{(pid=81761) execute_rootwrap_daemon /opt/stack/neutron/neutron/agent/linux/utils.py:103}}
Sep 24 05:11:31.029396 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: DEBUG neutron.agent.linux.iptables_manager [-] IPTablesManager.apply completed with success. 6 iptables commands were issued {{(pid=81761) _apply_synchronized /opt/stack/neutron/neutron/agent/linux/iptables_manager.py:625}}
Sep 24 05:11:31.029652 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: ERROR neutron.agent.linux.iptables_manager [-] IPTables Rules did not converge. Diff: # Generated by iptables_manager
Sep 24 05:11:31.029652 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: *nat
Sep 24 05:11:31.029652 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: :neutron-l3-agent-fip-pf - [0:0]
Sep 24 05:11:31.029652 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: -I neutron-l3-agent-PREROUTING 1 -j neutron-l3-agent-fip-pf
Sep 24 05:11:31.029652 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: COMMIT
Sep 24 05:11:31.029652 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: # Completed by iptables_manager
Sep 24 05:11:31.030209 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: DEBUG oslo_concurrency.lockutils [-] Releasing lock "iptables-qrouter-ab6a74e7-3d3f-4442-ba8d-910bf49347c1" {{(pid=81761) lock /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:282}}
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: ERROR neutron.agent.l3.router_info [-] IPTables Rules did not converge. Diff: # Generated by iptables_manager
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: *nat
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: :neutron-l3-agent-fip-pf - [0:0]
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: -I neutron-l3-agent-PREROUTING 1 -j neutron-l3-agent-fip-pf
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: COMMIT
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: # Completed by iptables_manager: neutron_lib.exceptions.l3.IpTablesApplyException: IPTables Rules did not converge. Diff: # Generated by iptables_manager
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: *nat
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: :neutron-l3-agent-fip-pf - [0:0]
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: -I neutron-l3-agent-PREROUTING 1 -j neutron-l3-agent-fip-pf
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: COMMIT
Sep 24 05:11:31.037953 ubuntu-focal-rax-dfw-0020004694 neutron-l3-agent[81761]: # Completed by iptables_manager
Sep 24 05:11:31.0...

Changed in neutron:
importance: High → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-tempest-plugin (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/753552
Reason: This is not gonna fix the problem

Changed in neutron:
assignee: Bernard Cafarelli (bcafarel) → Slawek Kaplonski (slaweq)
Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Download full text (10.8 KiB)

On openvswitch based jobs I see also errors like:

neutron_lib.exceptions.ProcessExecutionError: Exit code: 255; Cmd: ['ip', 'netns', 'exec', 'qrouter-52946583-4366-4c54-ac83-449968b0c989', 'conntrack', '-D', '-d', '172.24.5.205', '-p', 'udp', '--dport', 1036]; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-52946583-4366-4c54-ac83-449968b0c989": No such file or directory
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib Traceback (most recent call last):
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 378, in delete_socket_conntrack_state
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib ip_wrapper.netns.execute(cmd, check_exit_code=True,
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 721, in execute
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib return utils.execute(cmd, check_exit_code=check_exit_code,
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib File "/opt/stack/neutron/neutron/agent/linux/utils.py", line 148, in execute
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib raise exceptions.ProcessExecutionError(msg,
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib neutron_lib.exceptions.ProcessExecutionError: Exit code: 255; Cmd: ['ip', 'netns', 'exec', 'qrouter-52946583-4366-4c54-ac83-449968b0c989', 'conntrack', '-D', '-d', '172.24.5.205', '-p', 'udp', '--dport', 1036]; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-52946583-4366-4c54-ac83-449968b0c989": No such file or directory
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: ERROR neutron.agent.linux.ip_lib
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: DEBUG oslo_concurrency.lockutils [None req-03a816ae-d43e-4b16-adf3-5d8007db4152 tempest-PortForwardingTestJSON-811925916 tempest-PortForwardingTestJSON-811925916] Acquired lock "iptables-qrouter-52946583-4366-4c54-ac83-449968b0c989" {{(pid=82060) lock /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:266}}
Sep 23 22:33:23.187565 ubuntu-focal-ovh-gra1-0020000672 neutron-l3-agent[82060]: DEBUG oslo_concurrency.lockutils [None req-03a816ae-d43e-4b16-adf3-5d8007db4152 tempest-PortForwardingTestJSON-811925916 tempest-PortForwardingTestJSON-811925916] Acquired external semaphore "iptables-qrouter-52946583-4366-4c54-ac83-449968b0c989" {{(pid=82060) lock /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:272}}
Sep 23 22:33:23.187565 ubuntu-...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/754114

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-tempest-plugin (master)

Reviewed: https://review.opendev.org/748367
Committed: https://git.openstack.org/cgit/openstack/neutron-tempest-plugin/commit/?id=de8e503274a223f6fb3a79d61e49d8ee47362302
Submitter: Zuul
Branch: master

commit de8e503274a223f6fb3a79d61e49d8ee47362302
Author: Slawek Kaplonski <email address hidden>
Date: Thu Aug 27 09:12:43 2020 +0200

    Migrate CI jobs to Ubuntu Focal

    Jobs for master branch are moved to be run on Ubuntu Focal.
    All jobs for Stein, Train and Ussuri will be still run on Ubuntu
    Bionic.

    We also need to switch to legacy ebtables implementation in the
    linuxbridge job because ebtables-nft implementation don't
    supports syntax for source and destination ipv4 address in arp
    tables. Please check bug [1] for more details.

    Additionally scenario tests for port forwarding are now marked as
    unstable as we have some problem with port forwarding on Ubuntu Focal.
    See [2] for details.

    Also test test_floating_ip_update is now marked as unstable as it is
    failing pretty often on Ubuntu Focal. See [3] for details.

    This patch also changes ovn hash used to be installed on the nodes
    in the ovn scenario job with Ussuri release as this job is still run
    on Ubuntu Bionic and we need to bump this hash there.

    This patch additionally switches neutron-tempest-plugin-bgpvpn-bagpipe
    jobs for master and ussuri to be non-voting due to bug [4].

    This patch also switches neutron-tempest-plugin-designate-scenario
    to be non-voting due to the bug [5]

    [1] https://bugs.launchpad.net/neutron/+bug/1889779
    [2] https://bugs.launchpad.net/neutron/+bug/1896735
    [3] https://bugs.launchpad.net/neutron/+bug/1897326
    [4] https://bugs.launchpad.net/networking-bagpipe/+bug/1897408
    [5] https://bugs.launchpad.net/neutron/+bug/1891309

    Related-Bug: #1896735

    Change-Id: I9252b6a8786c43524ba0ebaa59b480ef8e489ff1

Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/756107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-tempest-plugin (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/756114

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/756107
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2325ad1950412d3eef0cde848c56bb94fbbb7495
Submitter: Zuul
Branch: master

commit 2325ad1950412d3eef0cde848c56bb94fbbb7495
Author: Slawek Kaplonski <email address hidden>
Date: Mon Oct 5 17:07:43 2020 +0200

    Add locks for methods which sets nat rules in router

    Router_info class and port_forwarding L3 extensions are using same
    instance of the iptables manager class and it could happend that
    method which sets address scope rules and method which sets
    port forwarding nat rules where run in almost same time and
    one of them was adding rules which wasn't expected to be added.
    Because of that port forwarding rules wasn't configured properly.

    This patch fixed that by adding lock for methods which are changing
    rules in iptables_manager's nat table in both router_info and
    port_forwarding extension.

    Change-Id: Ic1d5f893a81b7b841745da82f38b7583e47e468d
    Closes-Bug: #1896735

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-tempest-plugin (master)

Reviewed: https://review.opendev.org/756114
Committed: https://git.openstack.org/cgit/openstack/neutron-tempest-plugin/commit/?id=8079b53afcdd48ba266c11773b23b047d14aa069
Submitter: Zuul
Branch: master

commit 8079b53afcdd48ba266c11773b23b047d14aa069
Author: Slawek Kaplonski <email address hidden>
Date: Mon Oct 5 17:16:16 2020 +0200

    Unmark port_forwarding tests as unstable

    Those tests were marked as unstable after migration of CI to the
    Ubuntu Focal due to related bug.
    Now this bug should be fixed with depends-on patch so lets make those
    tests as stable again.

    Depends-On: https://review.opendev.org/756107

    Change-Id: I35aebbc67d75ef609c4a8015deb8126be230bf2b
    Related-Bug: #1896735

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.3.0

This issue was fixed in the openstack/neutron 16.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.1.0

This issue was fixed in the openstack/neutron 17.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.3.2

This issue was fixed in the openstack/neutron 15.3.2 release.

tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.0.0.0rc1

This issue was fixed in the openstack/neutron 18.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.