neutron

SSH timeouts due to problems with metadata server in ML2/OVN backend

Bug #2052787 reported by Slawek Kaplonski on 2024-02-09

This bug report is a duplicate of: Bug #2007166: [tempest] CI job "neutron-ovn-tempest-ipv6-only-ovs-release" unstable, most of the times because of "test_hotplug_nic". Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	neutron	In Progress	Critical	Slawek Kaplonski

Bug Description

It was visible in couple of jobs already that random tempest scenario jobs are failing due to timeout while SSHing to the guest vm.
In the VM's console log there is clearly problem with reaching metadata server:

When looking at the logs of the neutron-ovn-metadata-agent and then journal log it seems for me that those requests are never delivered to the haproxy spawned in the ovnmeta-xxx namespace as there is no any log with the log-tag configured in haproxy for that network.

Examples of failures like that:
https://3c8c3cc132d3ca41c1a0-8df332a8f6cbb54ee498032ff97f9d17.ssl.cf1.rackcdn.com/882350/2/check/cinder-plugin-ceph-tempest-mn-aa/df2995a/job-output.txt
https://ac3deee033df2f80309a-9b1010a8ed0ed23e4a7e66dfa043a295.ssl.cf5.rackcdn.com/907418/2/check/tempest-slow-py3/6dff044/job-output.txt

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-22: Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/909848

Slawek Kaplonski (slaweq) on 2024-02-22

Changed in neutron:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-27: Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/909848
Committed: https://opendev.org/openstack/neutron/commit/2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b
Submitter: "Zuul (22348)"
Branch: master

commit 2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 22 10:06:58 2024 +0100

Ensure that haproxy spawned by the metadata agents is active

    In both neutron-metadata and neutron-ovn-metadata agents we should
    ensure that haproxy service spawned for network/router is actually
    active before moving on.
    This patch adds that check and this is similar to what was already
    implemented some time ago for the dnsmasq process spawned by the dhcp
    agent.

Related-Bug: #2052787
Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-27: Related fix proposed to neutron (stable/2023.2)

Related fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/910308

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-27: Related fix proposed to neutron (stable/2023.1)

Related fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/910309

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-27: Related fix proposed to neutron (stable/zed)

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/910335

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-28: Related fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/910308
Committed: https://opendev.org/openstack/neutron/commit/32af674783c69c87d0feed622434c6839938a141
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 32af674783c69c87d0feed622434c6839938a141
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 22 10:06:58 2024 +0100

Ensure that haproxy spawned by the metadata agents is active

    Related-Bug: #2052787
    Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5
    (cherry picked from commit 2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-28: Related fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/910335
Committed: https://opendev.org/openstack/neutron/commit/aedb872e4f29b00d7658faccc0e664e00f2d2613
Submitter: "Zuul (22348)"
Branch: stable/zed

commit aedb872e4f29b00d7658faccc0e664e00f2d2613
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 22 10:06:58 2024 +0100

Ensure that haproxy spawned by the metadata agents is active

Conflicts:
neutron/tests/unit/agent/dhcp/test_agent.py

    Related-Bug: #2052787
    Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5
    (cherry picked from commit 2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b)
    (cherry picked from commit 0dfe8dedd63aba2bf5b75ad8494b0ead4ba1b79f)

tags:

added: in-stable-zed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-28: Related fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/910309
Committed: https://opendev.org/openstack/neutron/commit/0dfe8dedd63aba2bf5b75ad8494b0ead4ba1b79f
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 0dfe8dedd63aba2bf5b75ad8494b0ead4ba1b79f
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 22 10:06:58 2024 +0100

Ensure that haproxy spawned by the metadata agents is active

    Related-Bug: #2052787
    Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5
    (cherry picked from commit 2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b)

Revision history for this message

Slawek Kaplonski (slaweq) wrote on 2024-03-07:

After neutron patch was merged I investigated again similar failure (https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_703/910366/1/check/cinder-plugin-ceph-tempest-mn-aa/703b88a/testr_results.html) and it seems that haproxy is spawned without any problems but the issue is somewhere else.

From what I found so far in this job is that there is no any OF rule in br-int (checked in journal log from compute1) but for other metadata ports I see rule like:

cookie=0x9e5e61a7, duration=19.734s, table=65, n_packets=0, n_bytes=0, idle_age=19, priority=100,reg15=0x1,metadata=0x9 actions=output:288

I proposed patch https://review.opendev.org/c/openstack/tempest/+/911673 to add ovs and ovn logs to the jobs' logs for all jobs based on devstack-tempest job. Lets see if that will help to understand that issue.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-04-25: Related fix proposed to neutron (master)

#10

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/917019

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-04-26: Change abandoned on neutron (master)

#11

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/917019
Reason: not needed anymore

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #2007166 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.