On Mon, Sep 17, 2018 at 6:31 PM Daniel Alvarez <email address hidden>
wrote:
> Confirmed through logs that the gateway port at least is not present in
> the router after I analyzed other logs:
>
>
> - The switch has 3 ports (metadata port, logical port for the VM and the
> router port):
>
> switch a8168e8d-49e5-4032-849b-2a7107e93610
> (neutron-6cbe7647-d46f-4ad6-a193-fc59fdc352f3) (aka
> tempest-network-smoke--988884713)
> port c9998671-bfa3-44f3-8837-2977f257481b
> type: localport
> addresses: ["fa:16:3e:95:24:2b 10.1.0.2"]
> port a02af8d6-890f-4a44-ac94-32d2df9700e6
> type: router
> router-port: lrp-a02af8d6-890f-4a44-ac94-32d2df9700e6
> port 3dcf857a-5c6a-4f36-a509-c55ec4d52b87
> addresses: ["fa:16:3e:a5:25:2d 10.1.0.3"]
> switch 9f07b6f6-0f09-459d-b614-9c0ba25d47d9
> (neutron-46e7378b-0b97-455e-977d-3daead3bf546) (aka
> tempest-test-network--878432264)
>
>
> - The router has the external network port missing:
>
>
> router 9612b8f5-b2d6-4cc0-a671-cabe59cf177b
> (neutron-cde28807-1542-4329-a314-20406e4c3e92) (aka
> tempest-TestNetworkBasicOps-router-580083611)
> port lrp-a02af8d6-890f-4a44-ac94-32d2df9700e6
> mac: "fa:16:3e:a0:b8:e1"
> networks: ["10.1.0.1/28"]
> nat be099503-5bbd-48e4-9fac-fd68f2db1663
> external ip: "172.24.5.30"
> logical ip: "10.1.0.0/28"
> type: "snat"
> nat d3e69a5e-bbfc-42f1-8d82-c9f8f5b391a5
> external ip: "172.24.5.25"
> logical ip: "10.1.0.3"
> type: "dnat_and_snat"
>
>
> Need to investigate why. Will update this soon.
>
> --
> You received this bug notification because you are subscribed to
> networking-ovn.
> https://bugs.launchpad.net/bugs/1735154
>
> Title:
> Tempest jobs failing with SSH timeout sometimes
>
> Status in networking-ovn:
> New
>
> Bug description:
> We've been observing lately that some tempest tests fail with timeout
> when trying to SSH the FIP of an instance. After debugging the
> failures, we can tell that the reason is that ARP replies from the FIP
> aren't coming through. This doesn't always happen and I can't
> reproduce it myself in devstack.
>
>
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh [-] Failed to
> establish authenticated ssh connection to cirros@172.24.5.10 after 16
> attempts: NoValidConnectionsError: [Errno None] Unable to connect to port
> 22 on 172.24.5.10
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh Traceback
> (most recent call last):
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh File
> "tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh
> sock=proxy_chan)
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh File
> "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/paramiko/client.py",
> line 357, in connect
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh raise
> NoValidConnectionsError(errors)
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh
> NoValidConnectionsError: [Errno None] Unable to connect to port 22 on
> 172.24.5.10
> 2017-11-27 13:50:38.477 10303 ERROR tempest.lib.common.ssh
>
>
> Instance IP is 10.1.0.3.
> NAT entry [0]:
>
> 187 OVSDB JSON 802 48c3f9ccd173a7722c22dea80c779ffc567fc162
>
> 188
> {"_date":1511790428367,"Logical_Router":{"f89bbcb7-063b-42d7-b943-5825ee8a839e":{"nat":["set",[["uuid","5ab4b7f5-da9a-4aeb-aaa8-4f471633d724"],["uuid","b2681871-58a3-485f-838a-672d6e92e5
>
> 5e"]]]}},"NAT":{"b2681871-58a3-485f-838a-672d6e92e55e":{"external_ip":"172.24.5.10","logical_ip":"10.1.0.3","type":"dnat_and_snat"}},"Logical_Switch":{"e55f3c0e-f59e-4d25-b566-8c4c63f8a6
>
> b6":{"ports":["set",[["uuid","0135db16-8a8c-4dc4-990d-3e5921cf79ee"],["uuid","31aa3b52-821b-415c-9770-8dad54c13d12"],["uuid","4557f804-96b7-4035-933e-90d8c79f2332"],["uuid","d18bd12c-ab2
>
> d-4002-856c-9fb8cc111f26"],["uuid","d78d9e54-14fe-46df-8341-031d68816cef"],["uuid","dbd6621b-60ff-4466-b995-213a03edf175"],["uuid","e89ec11b-7b7e-490b-b8ab-69e85597d536"]]]}},"Logical_Sw
> itch_Port":{"2ec1ede6-ec94-48d6-8680-895ba3d7b117":null}}
>
>
> Search for "Request who-has 172.24.5.10" at [1] and you won't see any
> replies back.
>
> Logical flows look good to me (at first glance) so it could be an ovn-
> controller/ovs-vswitchd bug. We'll try to include periodic dumps of
> openflows and see how they look like when this failure occurs.
>
> [0]
> http://logs.openstack.org/97/523097/2/check/networking-ovn-tempest-dsvm-ovs-release/9a70bc0/logs/ovs_dbs/ovnnb.txt.gz
> [1]
> http://logs.openstack.org/97/523097/2/check/networking-ovn-tempest-dsvm-ovs-release/9a70bc0/logs/screen-br-ex-tcpdump.txt.gz
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/networking-ovn/+bug/1735154/+subscriptions
>
You nailed it down dude!!! :)
On Mon, Sep 17, 2018 at 6:31 PM Daniel Alvarez <email address hidden>
wrote:
> Confirmed through logs that the gateway port at least is not present in 49e5-4032- 849b-2a7107e936 10 6cbe7647- d46f-4ad6- a193-fc59fdc352 f3) (aka network- smoke-- 988884713) bfa3-44f3- 8837-2977f25748 1b 890f-4a44- ac94-32d2df9700 e6 890f-4a44- ac94-32d2df9700 e6 5c6a-4f36- a509-c55ec4d52b 87 0f09-459d- b614-9c0ba25d47 d9 46e7378b- 0b97-455e- 977d-3daead3bf5 46) (aka test-network- -878432264) b2d6-4cc0- a671-cabe59cf17 7b cde28807- 1542-4329- a314-20406e4c3e 92) (aka TestNetworkBasi cOps-router- 580083611) 890f-4a44- ac94-32d2df9700 e6 5bbd-48e4- 9fac-fd68f2db16 63 bbfc-42f1- 8d82-c9f8f5b391 a5 /bugs.launchpad .net/bugs/ 1735154 lib.common. ssh [-] Failed to onsError: [Errno None] Unable to connect to port lib.common. ssh Traceback lib.common. ssh File lib/common/ ssh.py" , line 107, in _get_ssh_connection lib.common. ssh lib.common. ssh File new/tempest/ .tox/tempest/ local/lib/ python2. 7/site- packages/ paramiko/ client. py", lib.common. ssh raise onsError( errors) lib.common. ssh onsError: [Errno None] Unable to connect to port 22 on lib.common. ssh 22c22dea80c779f fc567fc162 :1511790428367, "Logical_ Router" :{"f89bbcb7- 063b-42d7- b943-5825ee8a83 9e":{"nat" :["set" ,[["uuid" ,"5ab4b7f5- da9a-4aeb- aaa8-4f471633d7 24"],[" uuid"," b2681871- 58a3-485f- 838a-672d6e92e5 },"NAT" :{"b2681871- 58a3-485f- 838a-672d6e92e5 5e":{"external_ ip":"172. 24.5.10" ,"logical_ ip":"10. 1.0.3", "type": "dnat_and_ snat"}} ,"Logical_ Switch" :{"e55f3c0e- f59e-4d25- b566-8c4c63f8a6 :["set" ,[["uuid" ,"0135db16- 8a8c-4dc4- 990d-3e5921cf79 ee"],[" uuid"," 31aa3b52- 821b-415c- 9770-8dad54c13d 12"],[" uuid"," 4557f804- 96b7-4035- 933e-90d8c79f23 32"],[" uuid"," d18bd12c- ab2 856c-9fb8cc111f 26"],[" uuid"," d78d9e54- 14fe-46df- 8341-031d68816c ef"],[" uuid"," dbd6621b- 60ff-4466- b995-213a03edf1 75"],[" uuid"," e89ec11b- 7b7e-490b- b8ab-69e85597d5 36"]]]} },"Logical_ Sw :{"2ec1ede6- ec94-48d6- 8680-895ba3d7b1 17":null} } ovs-vswitchd bug. We'll try to include periodic dumps of logs.openstack. org/97/ 523097/ 2/check/ networking- ovn-tempest- dsvm-ovs- release/ 9a70bc0/ logs/ovs_ dbs/ovnnb. txt.gz logs.openstack. org/97/ 523097/ 2/check/ networking- ovn-tempest- dsvm-ovs- release/ 9a70bc0/ logs/screen- br-ex-tcpdump. txt.gz /bugs.launchpad .net/networking -ovn/+bug/ 1735154/ +subscriptions
> the router after I analyzed other logs:
>
>
> - The switch has 3 ports (metadata port, logical port for the VM and the
> router port):
>
> switch a8168e8d-
> (neutron-
> tempest-
> port c9998671-
> type: localport
> addresses: ["fa:16:3e:95:24:2b 10.1.0.2"]
> port a02af8d6-
> type: router
> router-port: lrp-a02af8d6-
> port 3dcf857a-
> addresses: ["fa:16:3e:a5:25:2d 10.1.0.3"]
> switch 9f07b6f6-
> (neutron-
> tempest-
>
>
> - The router has the external network port missing:
>
>
> router 9612b8f5-
> (neutron-
> tempest-
> port lrp-a02af8d6-
> mac: "fa:16:3e:a0:b8:e1"
> networks: ["10.1.0.1/28"]
> nat be099503-
> external ip: "172.24.5.30"
> logical ip: "10.1.0.0/28"
> type: "snat"
> nat d3e69a5e-
> external ip: "172.24.5.25"
> logical ip: "10.1.0.3"
> type: "dnat_and_snat"
>
>
> Need to investigate why. Will update this soon.
>
> --
> You received this bug notification because you are subscribed to
> networking-ovn.
> https:/
>
> Title:
> Tempest jobs failing with SSH timeout sometimes
>
> Status in networking-ovn:
> New
>
> Bug description:
> We've been observing lately that some tempest tests fail with timeout
> when trying to SSH the FIP of an instance. After debugging the
> failures, we can tell that the reason is that ARP replies from the FIP
> aren't coming through. This doesn't always happen and I can't
> reproduce it myself in devstack.
>
>
> 2017-11-27 13:50:38.477 10303 ERROR tempest.
> establish authenticated ssh connection to cirros@172.24.5.10 after 16
> attempts: NoValidConnecti
> 22 on 172.24.5.10
> 2017-11-27 13:50:38.477 10303 ERROR tempest.
> (most recent call last):
> 2017-11-27 13:50:38.477 10303 ERROR tempest.
> "tempest/
> 2017-11-27 13:50:38.477 10303 ERROR tempest.
> sock=proxy_chan)
> 2017-11-27 13:50:38.477 10303 ERROR tempest.
> "/opt/stack/
> line 357, in connect
> 2017-11-27 13:50:38.477 10303 ERROR tempest.
> NoValidConnecti
> 2017-11-27 13:50:38.477 10303 ERROR tempest.
> NoValidConnecti
> 172.24.5.10
> 2017-11-27 13:50:38.477 10303 ERROR tempest.
>
>
> Instance IP is 10.1.0.3.
> NAT entry [0]:
>
> 187 OVSDB JSON 802 48c3f9ccd173a77
>
> 188
> {"_date"
>
> 5e"]]]}
>
> b6":{"ports"
>
> d-4002-
> itch_Port"
>
>
> Search for "Request who-has 172.24.5.10" at [1] and you won't see any
> replies back.
>
> Logical flows look good to me (at first glance) so it could be an ovn-
> controller/
> openflows and see how they look like when this failure occurs.
>
> [0]
> http://
> [1]
> http://
>
> To manage notifications about this bug go to:
> https:/
>