"test_add_remove_fixed_ip" faling in "grenade-dvr-multinode" CI job

Bug #1920778 reported by Rodolfo Alonso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Slawek Kaplonski
Changed in neutron:
importance: Undecided → Critical
tags: added: gate-failure grenade
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi:

Checking the job executions [1], not all of them are failing but this test should be marked as unstable. Many executions are failing just because if this test.

Regards.

[1]https://zuul.opendev.org/t/openstack/builds?job_name=neutron-grenade-dvr-multinode

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Revision history for this message
Lajos Katona (lajos-katona) wrote :
Revision history for this message
Lajos Katona (lajos-katona) wrote :

The patch to make the job non-voting: https://review.opendev.org/c/openstack/neutron/+/782289

Revision history for this message
Lajos Katona (lajos-katona) wrote :
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I was checking provided logs a bit and so far I just found couple of things:
1. Test is failing on "old" release,
2. It seems for me that connectivity is fine - SSH connection is established always but problem is with authentication, that suggests me some potential problem with metadata,
3. In neutron-metadata-agent's logs there is no any request from the IP address which belongs to that failed VM - that can be some clue to check deeper.

So far I don't know anything more. I will continue this investigation in next days.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

From console log:

 Starting network: udhcpc: started, v1.29.3
 udhcpc: sending discover
 udhcpc: sending select for 10.1.0.6
 udhcpc: lease of 10.1.0.6 obtained, lease time 86400
 route: SIOCADDRT: File exists
 WARN: failed: route add -net "0.0.0.0/0" gw "10.1.0.1"
 OK
 checking http://169.254.169.254/2009-04-04/instance-id
 successful after 1/20 tries: up 9.52. iid=i-01ce8ea1
 failed to get http://169.254.169.254/2009-04-04/user-data
 warning: no ec2 metadata for user-data

So DHCP and metadata seem to work fine from neutron side. Looks like an issue with metadata itself: warning: no ec2 metadata for user-data

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Together with Rodolfo we probably found the problem. It was missing iptables nat rule which redirects packets from port 80 to 9697.
Patches proposed https://review.opendev.org/c/openstack/neutron/+/782677 and https://review.opendev.org/c/openstack/neutron/+/782690

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/782679
Committed: https://opendev.org/openstack/neutron/commit/ae07a9d9f61f9ae5539bfb074dcb24f618c495bb
Submitter: "Zuul (22348)"
Branch: master

commit ae07a9d9f61f9ae5539bfb074dcb24f618c495bb
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 24 10:53:22 2021 +0100

    [CI] Enable debugging iptables rule in the L3 agent

    It should helps us understand why in some (rare) cases rule to redirect
    packets send 169.254.169.254:80 to port 9697 isn't installed in the
    qrouter namespace.

    Change-Id: I644ea3d6767db36bfe7f4122ec2c2afe9888dd07
    Related-Bug: #1920778

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/785600
Committed: https://opendev.org/openstack/neutron/commit/c028839647d6900997cf38b5eec63b7698515dec
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit c028839647d6900997cf38b5eec63b7698515dec
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 24 12:02:14 2021 +0100

    Add locks for setting iptables rules in l3 and metadata agents

    Router_info class and metadata agent's driver are using same
    instance of the iptables manager class and it could happend that
    sometimes e.g. nat rule which packets send to 169.254.169.254:80
    redirects to the port 9697 so haproxy can process them, can be missed as
    they will be overwritten by the Router_info class manipulating other
    rules in the same 'nat' rules list.

    This patch fixed that by adding lock for methods which are changing
    rules in iptables_manager's nat table in both router_info and
    the metadata agent's driver.

    Closes-Bug: #1920778
    Change-Id: Ic3a324c0e608c7afc4b15dbc8becd33b75ee78f6
    (cherry picked from commit af3c1b84427cbe4c9d3dce8fc901ad0b099c5917)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/785649
Committed: https://opendev.org/openstack/neutron/commit/7af0b713ff21e27889f7b322b736cab4b0dadaf9
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 7af0b713ff21e27889f7b322b736cab4b0dadaf9
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 24 12:02:14 2021 +0100

    Add locks for setting iptables rules in l3 and metadata agents

    Router_info class and metadata agent's driver are using same
    instance of the iptables manager class and it could happend that
    sometimes e.g. nat rule which packets send to 169.254.169.254:80
    redirects to the port 9697 so haproxy can process them, can be missed as
    they will be overwritten by the Router_info class manipulating other
    rules in the same 'nat' rules list.

    This patch fixed that by adding lock for methods which are changing
    rules in iptables_manager's nat table in both router_info and
    the metadata agent's driver.

    Conflicts:
        neutron/agent/metadata/driver.py

    Closes-Bug: #1920778
    Change-Id: Ic3a324c0e608c7afc4b15dbc8becd33b75ee78f6
    (cherry picked from commit af3c1b84427cbe4c9d3dce8fc901ad0b099c5917)
    (cherry picked from commit c028839647d6900997cf38b5eec63b7698515dec)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/785599
Committed: https://opendev.org/openstack/neutron/commit/51163ffa46426c1b805aee9c191ac66a718abdd9
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 51163ffa46426c1b805aee9c191ac66a718abdd9
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 24 12:02:14 2021 +0100

    Add locks for setting iptables rules in l3 and metadata agents

    Router_info class and metadata agent's driver are using same
    instance of the iptables manager class and it could happend that
    sometimes e.g. nat rule which packets send to 169.254.169.254:80
    redirects to the port 9697 so haproxy can process them, can be missed as
    they will be overwritten by the Router_info class manipulating other
    rules in the same 'nat' rules list.

    This patch fixed that by adding lock for methods which are changing
    rules in iptables_manager's nat table in both router_info and
    the metadata agent's driver.

    Closes-Bug: #1920778
    Change-Id: Ic3a324c0e608c7afc4b15dbc8becd33b75ee78f6
    (cherry picked from commit af3c1b84427cbe4c9d3dce8fc901ad0b099c5917)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/785652
Committed: https://opendev.org/openstack/neutron/commit/657dccc566bc5699c2eea5b7ae6c10c24329b8b2
Submitter: "Zuul (22348)"
Branch: stable/train

commit 657dccc566bc5699c2eea5b7ae6c10c24329b8b2
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 24 12:02:14 2021 +0100

    Add locks for setting iptables rules in l3 and metadata agents

    Router_info class and metadata agent's driver are using same
    instance of the iptables manager class and it could happend that
    sometimes e.g. nat rule which packets send to 169.254.169.254:80
    redirects to the port 9697 so haproxy can process them, can be missed as
    they will be overwritten by the Router_info class manipulating other
    rules in the same 'nat' rules list.

    This patch fixed that by adding lock for methods which are changing
    rules in iptables_manager's nat table in both router_info and
    the metadata agent's driver.

    Conflicts:
        neutron/agent/metadata/driver.py

    Closes-Bug: #1920778
    Change-Id: Ic3a324c0e608c7afc4b15dbc8becd33b75ee78f6
    (cherry picked from commit af3c1b84427cbe4c9d3dce8fc901ad0b099c5917)
    (cherry picked from commit c028839647d6900997cf38b5eec63b7698515dec)
    (cherry picked from commit 7af0b713ff21e27889f7b322b736cab4b0dadaf9)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/stein)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: stable/stein
Review: https://review.opendev.org/c/openstack/neutron/+/785653

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/rocky)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: stable/rocky
Review: https://review.opendev.org/c/openstack/neutron/+/785654

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/queens)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: stable/queens
Review: https://review.opendev.org/c/openstack/neutron/+/785655

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.3.4

This issue was fixed in the openstack/neutron 15.3.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.3.2

This issue was fixed in the openstack/neutron 16.3.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.1.2

This issue was fixed in the openstack/neutron 17.1.2 release.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.1.0

This issue was fixed in the openstack/neutron 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.0.0.0rc1

This issue was fixed in the openstack/neutron 19.0.0.0rc1 release candidate.

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/827419

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/827419
Committed: https://opendev.org/openstack/neutron/commit/2d6b33445c2124a04f84d4d4e8d27675ba9d0b57
Submitter: "Zuul (22348)"
Branch: master

commit 2d6b33445c2124a04f84d4d4e8d27675ba9d0b57
Author: Slawek Kaplonski <email address hidden>
Date: Wed Feb 2 08:11:04 2022 +0100

    Revert "Switch neutron-grenade-dvr-multinode to be non voting temporary"

    This reverts commit 04e5a42f70605b0d1e5a3cc33bf4ee09a0da1052.

    Related bug is fixed, grenade dvr job is stable since long time so we
    can make it voting again.

    Related-Bug: #1920778
    Change-Id: Ica3bb4d44133e5d2d58f5bbb4a3747cf3d8811e2

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.