Comment 8 for bug 1735154

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote : Re: [Bug 1735154] Re: Tempest jobs failing with SSH timeout sometimes

You nailed it down dude!!! :)

On Mon, Sep 17, 2018 at 6:31 PM Daniel Alvarez <email address hidden>
wrote:

> Confirmed through logs that the gateway port at least is not present in
> the router after I analyzed other logs:
>
>
> - The switch has 3 ports (metadata port, logical port for the VM and the
> router port):
>
> switch a8168e8d-49e5-4032-849b-2a7107e93610
> (neutron-6cbe7647-d46f-4ad6-a193-fc59fdc352f3) (aka
> tempest-network-smoke--988884713)
> port c9998671-bfa3-44f3-8837-2977f257481b
> type: localport
> addresses: ["fa:16:3e:95:24:2b 10.1.0.2"]
> port a02af8d6-890f-4a44-ac94-32d2df9700e6
> type: router
> router-port: lrp-a02af8d6-890f-4a44-ac94-32d2df9700e6
> port 3dcf857a-5c6a-4f36-a509-c55ec4d52b87
> addresses: ["fa:16:3e:a5:25:2d 10.1.0.3"]
> switch 9f07b6f6-0f09-459d-b614-9c0ba25d47d9
> (neutron-46e7378b-0b97-455e-977d-3daead3bf546) (aka
> tempest-test-network--878432264)
>
>
> - The router has the external network port missing:
>
>
> router 9612b8f5-b2d6-4cc0-a671-cabe59cf177b
> (neutron-cde28807-1542-4329-a314-20406e4c3e92) (aka
> tempest-TestNetworkBasicOps-router-580083611)
> port lrp-a02af8d6-890f-4a44-ac94-32d2df9700e6
> mac: "fa:16:3e:a0:b8:e1"
> networks: ["10.1.0.1/28"]
> nat be099503-5bbd-48e4-9fac-fd68f2db1663
> external ip: "172.24.5.30"
> logical ip: "10.1.0.0/28"
> type: "snat"
> nat d3e69a5e-bbfc-42f1-8d82-c9f8f5b391a5
> external ip: "172.24.5.25"
> logical ip: "10.1.0.3"
> type: "dnat_and_snat"
>
>
> Need to investigate why. Will update this soon.
>
> --
> You received this bug notification because you are subscribed to
> networking-ovn.
> https://bugs.launchpad.net/bugs/1735154
>
> Title:
> Tempest jobs failing with SSH timeout sometimes
>
> Status in networking-ovn:
> New
>
> Bug description:
> We've been observing lately that some tempest tests fail with timeout
> when trying to SSH the FIP of an instance. After debugging the
> failures, we can tell that the reason is that ARP replies from the FIP
> aren't coming through. This doesn't always happen and I can't
> reproduce it myself in devstack.
>
>
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh [-] Failed to
> establish authenticated ssh connection to cirros@172.24.5.10 after 16
> attempts: NoValidConnectionsError: [Errno None] Unable to connect to port
> 22 on 172.24.5.10
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh Traceback
> (most recent call last):
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh File
> "tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh
> sock=proxy_chan)
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh File
> "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/paramiko/client.py",
> line 357, in connect
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh raise
> NoValidConnectionsError(errors)
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh
> NoValidConnectionsError: [Errno None] Unable to connect to port 22 on
> 172.24.5.10
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh
>
>
> Instance IP is 10.1.0.3.
> NAT entry [0]:
>
> 187 OVSDB JSON 802 48c3f9ccd173a7722c22dea80c779ffc567fc162
>
> 188
> {"_date":1511790428367,"Logical_Router":{"f89bbcb7-063b-42d7-b943-5825ee8a839e":{"nat":["set",[["uuid","5ab4b7f5-da9a-4aeb-aaa8-4f471633d724"],["uuid","b2681871-58a3-485f-838a-672d6e92e5
>
> 5e"]]]}},"NAT":{"b2681871-58a3-485f-838a-672d6e92e55e":{"external_ip":"172.24.5.10","logical_ip":"10.1.0.3","type":"dnat_and_snat"}},"Logical_Switch":{"e55f3c0e-f59e-4d25-b566-8c4c63f8a6
>
> b6":{"ports":["set",[["uuid","0135db16-8a8c-4dc4-990d-3e5921cf79ee"],["uuid","31aa3b52-821b-415c-9770-8dad54c13d12"],["uuid","4557f804-96b7-4035-933e-90d8c79f2332"],["uuid","d18bd12c-ab2
>
> d-4002-856c-9fb8cc111f26"],["uuid","d78d9e54-14fe-46df-8341-031d68816cef"],["uuid","dbd6621b-60ff-4466-b995-213a03edf175"],["uuid","e89ec11b-7b7e-490b-b8ab-69e85597d536"]]]}},"Logical_Sw
> itch_Port":{"2ec1ede6-ec94-48d6-8680-895ba3d7b117":null}}
>
>
> Search for "Request who-has 172.24.5.10" at [1] and you won't see any
> replies back.
>
> Logical flows look good to me (at first glance) so it could be an ovn-
> controller/ovs-vswitchd bug. We'll try to include periodic dumps of
> openflows and see how they look like when this failure occurs.
>
> [0]
> http://logs.openstack.org/97/523097/2/check/networking-ovn-tempest-dsvm-ovs-release/9a70bc0/logs/ovs_dbs/ovnnb.txt.gz
> [1]
> http://logs.openstack.org/97/523097/2/check/networking-ovn-tempest-dsvm-ovs-release/9a70bc0/logs/screen-br-ex-tcpdump.txt.gz
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/networking-ovn/+bug/1735154/+subscriptions
>