Fail to get default route device in CI jobs

Bug #1902002 reported by Slawek Kaplonski
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
devstack
Fix Released
Medium
Nate Johnston

Bug Description

Since few days I see error like "[ERROR] /opt/stack/devstack/functions-common:230 Failure retrieving default route device" pretty often in various CI jobs.

Example: https://e02da289fa5cd71d2848-a802bb880ba142924be00bfc16ee185a.ssl.cf5.rackcdn.com/759947/2/check/neutron-tempest-plugin-api/97e7a2f/controller/logs/devstacklog.txt

Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failure%20retrieving%20default%20route%20device%5C%22

From logstash it seems that it started happening around 23 or 24 of October 2020.
I'm not sure if that is Devstack issue, it looks for me more likely like some change in the Ubuntu OS which is used on the CI nodes but I don't know exactly where to report such "infra" issue so IMHO devstack can be good starting point hopefully.

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Usually CI nodes should always have an IPv4 address and default route, but sometimes this seems to be broken.

I proposed a workaround in devstack to only query this when needed for linuxbridge-agent https://review.opendev.org/760325, maybe one can also extend the code to check for an IPv6 default route when no v4 route is found.

Changed in devstack:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Dr. Jens Harbott (j-harbott)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to devstack (master)

Reviewed: https://review.opendev.org/760325
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=47f76acbbac350ea18df6a9463876d38c3a13539
Submitter: Zuul
Branch: master

commit 47f76acbbac350ea18df6a9463876d38c3a13539
Author: Jens Harbott <email address hidden>
Date: Thu Oct 29 10:42:38 2020 +0000

    Determine default IPv4 route device only when needed

    Sometimes instances don't have an IPv4 default route, so only check for
    it when we actually need it. In a followup patch we could extend the
    code to check for an IPv6 default route instead or in addition.

    Related-Bug: 1902002
    Change-Id: Ie6cd241721f6b1f8e030960921a696939b2dab10

Revision history for this message
melanie witt (melwitt) wrote :

I've hit another bug nearly identical to this when the tempest-ipv6-only job runs [1]:

2020-11-04 20:52:04.052336 | controller | /opt/stack/devstack/lib/neutron_plugins/services/l3:104:die_if_not_set
2020-11-04 20:52:04.052371 | controller | /opt/stack/devstack/functions-common:223:die
2020-11-04 20:52:04.055164 | controller | [ERROR] /opt/stack/devstack/functions-common:104 Failure retrieving default IPv4 route devices
2020-11-04 20:52:05.058441 | controller | Error on exit

where die_if_not_set is triggered in a different place in the code if the VM doesn't have a configured default IPv6 route:

https://opendev.org/openstack/devstack/src/commit/47f76acbbac350ea18df6a9463876d38c3a13539/lib/neutron_plugins/services/l3#L103-L104

[1] https://zuul.opendev.org/t/openstack/build/7fff839dd0bc4214ba6470ef40cf78fd/log/job-output.txt#1875

Revision history for this message
melanie witt (melwitt) wrote :
Changed in devstack:
assignee: Dr. Jens Harbott (j-harbott) → Nate Johnston (nate-johnston)
Revision history for this message
Nate Johnston (nate-johnston) wrote :

Posted an additional fix: https://review.opendev.org/#/c/761178

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to devstack (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/761739

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to devstack (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/761869

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on devstack (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/761869
Reason: Not needed as https://review.opendev.org/#/c/761178/ should fix it

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to devstack (stable/victoria)

Reviewed: https://review.opendev.org/761739
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=bbcd56fbef88a3f6c774904fa6d9b9a8acb477c8
Submitter: Zuul
Branch: stable/victoria

commit bbcd56fbef88a3f6c774904fa6d9b9a8acb477c8
Author: Jens Harbott <email address hidden>
Date: Thu Oct 29 10:42:38 2020 +0000

    Determine default IPv4 route device only when needed

    Sometimes instances don't have an IPv4 default route, so only check for
    it when we actually need it. In a followup patch we could extend the
    code to check for an IPv6 default route instead or in addition.

    Related-Bug: 1902002
    Change-Id: Ie6cd241721f6b1f8e030960921a696939b2dab10
    (cherry picked from commit 47f76acbbac350ea18df6a9463876d38c3a13539)

tags: added: in-stable-victoria
Revision history for this message
sean mooney (sean-k-mooney) wrote :

this fails on ovs jobs too by the way even with https://github.com/openstack/devstack/commit/47f76acbbac350ea18df6a9463876d38c3a13539 applied as can be seen in

https://zuul.opendev.org/t/openstack/build/bd65ccd7f7724156a53ddbab3e267b13
in that case the openstacksdk-functional-devstack job which uses ovs faild on limestone because the node did not have an ipv4 address at all.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

oh i see https://review.opendev.org/#/c/761178/ will fix that and its already approved.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.opendev.org/761178
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=efc04eec00bef94059a0e5b6f457263fc84876c1
Submitter: Zuul
Branch: master

commit efc04eec00bef94059a0e5b6f457263fc84876c1
Author: Nate Johnston <email address hidden>
Date: Tue Nov 3 10:04:26 2020 -0500

    Look for ipv6 routes so ipv6-only jobs will not fail

    For change 739139 [1] PS 12, the
    neutron-tempest-plugin-scenario-linuxbridge died in devstack with
    "/opt/stack/devstack/functions-common:237 Failure retrieving default
    route device", which comes from
    "/opt/stack/devstack/lib/neutron-legacy:237:die_if_not_set".

    Looking at the worlddump.txt for that job [2] I see that there is a
    default ipv6 route; the vm was not configured with ipv4 networking.

        ip route
        --------

        ip -6 route
        -----------

        ::1 dev lo proto kernel metric 256 pref medium
        2607:ff68:100:54::/64 dev ens3 proto kernel metric 256 expires 86380sec pref medium
        fe80::/64 dev ens3 proto kernel metric 256 pref medium
        default via fe80::f816:3eff:fe77:b05c dev ens3 proto ra metric 1024 expires 280sec hoplimit 64 pref medium

    Looking at the devstack code that throws the error [3] it looks like
    it only looks for a default route in the output of `ip route`, which
    does not include ipv6 information. This change should look in both
    the ipv4 and ipv6 route table. A similar check in the L3 setup code
    is also updated.

    [1] https://review.opendev.org/#/c/739139/
    [2] https://d4eb7e3efe98cba79a4b-f4d168cdb20f40841821e4b213645c0f.ssl.cf2.rackcdn.com/739139/12/gate/neutron-tempest-plugin-scenario-linuxbridge/9a6b4f7/controller/logs/worlddump-latest.txt
    [3] https://opendev.org/openstack/devstack/src/branch/master/lib/neutron-legacy#L236

    Closes-Bug: #1902002
    Change-Id: I839e8c222368df98fec308cf41248a9dd0a8c187

Changed in devstack:
status: In Progress → Fix Released
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

This problem is still happening in all "neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid" CI jobs. E.g.: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_104/758646/4/gate/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/104fdea/job-output.txt

We should also consider making the IPv4 default route check not mandatory.

I'll propose a patch to address this.

Regards.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Sorry for c#13: this is already addressed in https://review.opendev.org/#/c/761178/5/lib/neutron_plugins/services/l3

Thanks Nate!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.