SSH timeouts due to problems with metadata server in ML2/OVN backend

Bug #2052787 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
Critical
Slawek Kaplonski

Bug Description

It was visible in couple of jobs already that random tempest scenario jobs are failing due to timeout while SSHing to the guest vm.
In the VM's console log there is clearly problem with reaching metadata server:

2024-02-02 17:37:28.665832 | controller | forked to background, child pid 250
2024-02-02 17:37:28.665857 | controller | OK
2024-02-02 17:37:28.665883 | controller | checking http://169.254.169.254/2009-04-04/instance-id
2024-02-02 17:37:28.665908 | controller | failed 1/20: up 26.07. request failed
2024-02-02 17:37:28.665933 | controller | failed 2/20: up 28.37. request failed
2024-02-02 17:37:28.665958 | controller | failed 3/20: up 30.67. request failed
2024-02-02 17:37:28.665983 | controller | failed 4/20: up 32.96. request failed
2024-02-02 17:37:28.666008 | controller | failed 5/20: up 82.24. request failed
2024-02-02 17:37:28.666033 | controller | failed 6/20: up 131.56. request failed

When looking at the logs of the neutron-ovn-metadata-agent and then journal log it seems for me that those requests are never delivered to the haproxy spawned in the ovnmeta-xxx namespace as there is no any log with the log-tag configured in haproxy for that network.

Examples of failures like that:
https://3c8c3cc132d3ca41c1a0-8df332a8f6cbb54ee498032ff97f9d17.ssl.cf1.rackcdn.com/882350/2/check/cinder-plugin-ceph-tempest-mn-aa/df2995a/job-output.txt
https://ac3deee033df2f80309a-9b1010a8ed0ed23e4a7e66dfa043a295.ssl.cf5.rackcdn.com/907418/2/check/tempest-slow-py3/6dff044/job-output.txt

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/909848

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/909848
Committed: https://opendev.org/openstack/neutron/commit/2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b
Submitter: "Zuul (22348)"
Branch: master

commit 2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 22 10:06:58 2024 +0100

    Ensure that haproxy spawned by the metadata agents is active

    In both neutron-metadata and neutron-ovn-metadata agents we should
    ensure that haproxy service spawned for network/router is actually
    active before moving on.
    This patch adds that check and this is similar to what was already
    implemented some time ago for the dnsmasq process spawned by the dhcp
    agent.

    Related-Bug: #2052787
    Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/2023.2)

Related fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/910308

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/2023.1)

Related fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/910309

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/zed)

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/910335

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/910308
Committed: https://opendev.org/openstack/neutron/commit/32af674783c69c87d0feed622434c6839938a141
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 32af674783c69c87d0feed622434c6839938a141
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 22 10:06:58 2024 +0100

    Ensure that haproxy spawned by the metadata agents is active

    In both neutron-metadata and neutron-ovn-metadata agents we should
    ensure that haproxy service spawned for network/router is actually
    active before moving on.
    This patch adds that check and this is similar to what was already
    implemented some time ago for the dnsmasq process spawned by the dhcp
    agent.

    Related-Bug: #2052787
    Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5
    (cherry picked from commit 2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/910335
Committed: https://opendev.org/openstack/neutron/commit/aedb872e4f29b00d7658faccc0e664e00f2d2613
Submitter: "Zuul (22348)"
Branch: stable/zed

commit aedb872e4f29b00d7658faccc0e664e00f2d2613
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 22 10:06:58 2024 +0100

    Ensure that haproxy spawned by the metadata agents is active

    In both neutron-metadata and neutron-ovn-metadata agents we should
    ensure that haproxy service spawned for network/router is actually
    active before moving on.
    This patch adds that check and this is similar to what was already
    implemented some time ago for the dnsmasq process spawned by the dhcp
    agent.

    Conflicts:
        neutron/tests/unit/agent/dhcp/test_agent.py

    Related-Bug: #2052787
    Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5
    (cherry picked from commit 2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b)
    (cherry picked from commit 0dfe8dedd63aba2bf5b75ad8494b0ead4ba1b79f)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/910309
Committed: https://opendev.org/openstack/neutron/commit/0dfe8dedd63aba2bf5b75ad8494b0ead4ba1b79f
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 0dfe8dedd63aba2bf5b75ad8494b0ead4ba1b79f
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 22 10:06:58 2024 +0100

    Ensure that haproxy spawned by the metadata agents is active

    In both neutron-metadata and neutron-ovn-metadata agents we should
    ensure that haproxy service spawned for network/router is actually
    active before moving on.
    This patch adds that check and this is similar to what was already
    implemented some time ago for the dnsmasq process spawned by the dhcp
    agent.

    Related-Bug: #2052787
    Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5
    (cherry picked from commit 2f7f7c2fc29d0ac26b5ff9d82867952a40f0fa1b)

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

After neutron patch was merged I investigated again similar failure (https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_703/910366/1/check/cinder-plugin-ceph-tempest-mn-aa/703b88a/testr_results.html) and it seems that haproxy is spawned without any problems but the issue is somewhere else.

From what I found so far in this job is that there is no any OF rule in br-int (checked in journal log from compute1) but for other metadata ports I see rule like:

cookie=0x9e5e61a7, duration=19.734s, table=65, n_packets=0, n_bytes=0, idle_age=19, priority=100,reg15=0x1,metadata=0x9 actions=output:288

I proposed patch https://review.opendev.org/c/openstack/tempest/+/911673 to add ovs and ovn logs to the jobs' logs for all jobs based on devstack-tempest job. Lets see if that will help to understand that issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/917019

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/917019
Reason: not needed anymore

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.