OVS jobs randomly fails as Guest VMs not(or delayed) configured with DHCP

Bug #2045549 reported by yatin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
High
Unassigned

Bug Description

Seen couple of hits recently, Tests fails as:-
Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 136, in _get_ssh_connection
    ssh.connect(self.host, port=self.port, username=self.username,
  File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/paramiko/client.py", line 409, in connect
    raise NoValidConnectionsError(errors)
paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 172.24.5.37

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 278, in test_create_list_show_delete_interfaces_by_fixed_ip
    server, ifs, _ = self._create_server_get_interfaces()
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 88, in _create_server_get_interfaces
    self._wait_for_validation(server, validation_resources)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 73, in _wait_for_validation
    linux_client.validate_authentication()
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
    return function(self, *args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 123, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 245, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 155, in _get_ssh_connection
    raise exceptions.SSHTimeout(host=self.host,
tempest.lib.exceptions.SSHTimeout: Connection to the 172.24.5.37 via SSH timed out.
User: cirros, Password: password

From Console logs of the vm, eth0 interface is not configured correctly as it have IPv4LL address instead:-
Nov 25 08:44:45 cirros daemon.info dhcpcd[316]: eth0: using IPv4LL address 169.254.203.238

From DHCP aggent logs, there were DHCPDISCOVER/DHCPOFFER but no DHCPREQUEST/DHCPACK
Nov 25 08:44:35.898432 np0035864589 dnsmasq-dhcp[98241]: DHCPDISCOVER(tapb331ed5f-9e) fa:16:3e:fa:8b:d7
Nov 25 08:44:35.898464 np0035864589 dnsmasq-dhcp[98241]: DHCPOFFER(tapb331ed5f-9e) 10.1.0.14 fa:16:3e:fa:8b:d7

For in other failures it was seen differently like dhcp took time to configure and in meanwhile metadata failed(failed 20/20: up 56.38. request failed):-
Nov 22 06:08:28.142795 np0035837630 dnsmasq-dhcp[104598]: DHCPDISCOVER(tap2f7f2c03-6d) fa:16:3e:10:67:5f no address available
Nov 22 06:08:33.071307 np0035837630 dnsmasq-dhcp[104598]: DHCPDISCOVER(tap2f7f2c03-6d) fa:16:3e:10:67:5f no address available
Nov 22 06:08:42.063921 np0035837630 dnsmasq-dhcp[104598]: DHCPDISCOVER(tap2f7f2c03-6d) fa:16:3e:10:67:5f no address available
Nov 22 06:09:29.752568 np0035837630 dnsmasq-dhcp[104598]: DHCPDISCOVER(tap2f7f2c03-6d) fa:16:3e:10:67:5f
Nov 22 06:09:29.752593 np0035837630 dnsmasq-dhcp[104598]: DHCPOFFER(tap2f7f2c03-6d) 10.1.0.26 fa:16:3e:10:67:5f
Nov 22 06:09:29.756191 np0035837630 dnsmasq-dhcp[104598]: DHCPREQUEST(tap2f7f2c03-6d) 10.1.0.26 fa:16:3e:10:67:5f
Nov 22 06:09:29.756218 np0035837630 dnsmasq-dhcp[104598]: DHCPACK(tap2f7f2c03-6d) 10.1.0.26 fa:16:3e:10:67:5f tempest-server-test-761246100

Example builds:-
https://zuul.openstack.org/build/3e27967397a64fb587d8aae2ff215d10
https://zuul.openstack.org/build/fb288858b1e14d4893bc3710abca38d8
https://zuul.openstack.org/build/3f16434fef6a4ae5846bd11d49aab8ad
https://zuul.openstack.org/build/b7722266e9134fb2ae38484d284bacd3
https://zuul.openstack.org/build/abd7aada5fd84371a78182effb7f0050

Opensearch:- https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22:%20using%20IPv4LL%20address%22'),sort:!())

Tags: gate-failure
tags: added: gate-failure
Revision history for this message
Lajos Katona (lajos-katona) wrote :

severity can be increased if frequency increases

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
yatin (yatinkarel) wrote :
Download full text (3.8 KiB)

Another variant seen:-
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b51/914260/2/gate/neutron-tempest-plugin-openvswitch-enforce-scope-old-defaults/b51575b/testr_results.html

DHCP request:-
Apr 15 10:32:46.217414 np0037286317 dnsmasq-dhcp[104433]: DHCPDISCOVER(tap6978d05e-c0) fa:16:3e:fb:c4:fa
Apr 15 10:32:46.217428 np0037286317 dnsmasq-dhcp[104433]: DHCPOFFER(tap6978d05e-c0) 10.1.0.11 fa:16:3e:fb:c4:fa
Apr 15 10:32:46.218501 np0037286317 dnsmasq-dhcp[104433]: DHCPREQUEST(tap6978d05e-c0) 10.1.0.11 fa:16:3e:fb:c4:fa
Apr 15 10:32:46.218515 np0037286317 dnsmasq-dhcp[104433]: DHCPACK(tap6978d05e-c0) 10.1.0.11 fa:16:3e:fb:c4:fa tempest-internal-dns-test-vm-1365155710

Metadata request:-
Apr 15 10:33:10.806011 np0037286317 neutron-metadata-agent[60802]: INFO eventlet.wsgi.server [-] ::ffff:10.1.0.11,<local> "GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200 len: 341 time: 0.0893147

firth ssh attempt:-
2024-04-15 10:32:48,931 85216 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.45:22' as 'cirros' with public key authentication
2024-04-15 10:32:50,538 85216 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2024-04-15 10:32:50,598 85216 INFO [paramiko.transport] Authentication (publickey) failed.
Last ssh attempt:-
2024-04-15 10:46:12,508 85216 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.45 after 38 attempts. Proxy client: no proxy client

Even after keys injected, why ssh still fails? i recall reading some bug report related to this but can't find it, any one recall and know that issue?

Another variant:-
https://a2cd5747e9df40c0c253-4cf4bb096d4774f141f32d30757edf40.ssl.cf5.rackcdn.com/915467/1/check/neutron-tempest-plugin-openvswitch-enforce-scope-old-defaults/b6aae17/testr_results.html
https://fe84552b8a9ac6c570c5-2c52d5c43a05db941913513785ce0245.ssl.cf2.rackcdn.com/916603/1/gate/neutron-tempest-plugin-openvswitch/d4e5c7d/testr_results.html

test: test_qos
cirros console:-
Apr 11 18:27:44 cirros daemon.info dhcpcd[250]: eth0: soliciting a DHCP lease
Apr 11 18:27:49 cirros daemon.info dhcpcd[250]: eth0: probing for an IPv4LL address
Apr 11 18:27:54 cirros daemon.info dhcpcd[250]: eth0: using IPv4LL address 169.254.219.22

No DHCP request seen in dhcp agent logs

Another variant:-
https://a2cd5747e9df40c0c253-4cf4bb096d4774f141f32d30757edf40.ssl.cf5.rackcdn.com/915467/1/check/neutron-tempest-plugin-openvswitch-enforce-scope-old-defaults/b6aae17/testr_results.html

test:- test_reuse_ip_address_with_other_fip_on_other_router
Apr 11 17:30:10.163365 np0037266354 dnsmasq-dhcp[101823]: DHCPDISCOVER(tapb4f76e0b-f9) fa:16:3e:45:f4:61
Apr 11 17:30:10.163389 np0037266354 dnsmasq-dhcp[101823]: DHCPOFFER(tapb4f76e0b-f9) 10.1.0.14 fa:16:3e:45:f4:61
Apr 11 17:30:10.164179 np0037266354 dnsmasq-dhcp[101823]: DHCPREQUEST(tapb4f76e0b-f9) 10.1.0.14 fa:16:3e:45:f4:61
Apr 11 17:30:10.164193 np0037266354 dnsmasq-dhcp[101823]: DHCPACK(tapb4f76e0b-f9) 10.1.0.14 fa:16:3e:45:f4:61 tempest-server-test-911756074

no metadata request seen around that, seen at later time but may be different...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.