races condition at os-net-config in overcloud deploy OVB

Bug #1814250 reported by Quique Llorente
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
High
Quique Llorente

Bug Description

Looking here [1] we get the following

[2019/02/01 09:37:15 AM] [INFO] nic1 mapped to: eth0
[2019/02/01 09:37:15 AM] [INFO] adding interface: eth0
[2019/02/01 09:37:15 AM] [INFO] adding custom route for interface: eth0
[2019/02/01 09:37:15 AM] [INFO] adding bridge: br-ex
[2019/02/01 09:37:15 AM] [WARNING] ifcfg format supports max 2 resolvers.
[2019/02/01 09:37:15 AM] [INFO] adding custom route for interface: br-ex
[2019/02/01 09:37:15 AM] [INFO] adding route rules for interface: br-ex
[2019/02/01 09:37:15 AM] [INFO] adding interface: eth1
[2019/02/01 09:37:15 AM] [INFO] adding interface: eth2
[2019/02/01 09:37:15 AM] [INFO] adding interface: eth3
[2019/02/01 09:37:15 AM] [INFO] adding interface: eth4
[2019/02/01 09:37:15 AM] [INFO] adding bridge: br-tenant
[2019/02/01 09:37:15 AM] [WARNING] ifcfg format supports max 2 resolvers.
[2019/02/01 09:37:15 AM] [INFO] adding interface: eth5
[2019/02/01 09:37:15 AM] [INFO] applying network configs...
[2019/02/01 09:37:15 AM] [INFO] Running ip route add 0.0.0.0/0 via 10.0.0.1 dev br-ex
[2019/02/01 09:37:15 AM] [WARNING] Error in 'ip route add 0.0.0.0/0 via 10.0.0.1 dev br-ex', restarting br-ex:
Unexpected error while running command.
Command: /sbin/ip route add 0.0.0.0/0 via 10.0.0.1 dev br-ex
Exit code: 1
Stdout: u''
Stderr: u'Object \"route add 0.0.0.0/0 via 10.0.0.1 dev br-ex\" is unknown, try \"ip help\".\
'

But if we check when ovs start up the bridge [2] it takes 5 seconds to start

2019-02-01T09:37:20.408Z|00034|bridge|INFO|bridge br-ex: added interface br-ex on port 65534

Also we have found a little bug related to adding rules to the bridge it was checking the wrong flag https://review.openstack.org/#/c/634399/

[1] http://logs.rdoproject.org/45/560445/236/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/b5bc7e1/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

[2] http://logs.rdoproject.org/45/560445/236/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/b5bc7e1/logs/overcloud-controller-1/var/log/openvswitch/ovs-vswitchd.log.txt.gz

Revision history for this message
Dan Sneddon (dsneddon) wrote :

This bug should be fixed by https://review.openstack.org/#/c/634399/

Revision history for this message
Gabriele Cerami (gcerami) wrote :

I don't think the error is the bridge. If the bridge is not there, the error is "Cannot find device "br-ex"'.
I could reproduce the error by passing the argument with quotes

# ip 'route 0.0.0.0/0 via 10.0.0.1 dev br-ex'
Object "route 0.0.0.0/0 via 10.0.0.1 dev br-ex" is unknown, try "ip help".

so it's like we are passing the arguments in the wrong way

Revision history for this message
Gabriele Cerami (gcerami) wrote :

This error happens in all the jobs, even the ones that succeed, so it's not fatal

Revision history for this message
Gabriele Cerami (gcerami) wrote :

I tried to investigate a bit more, I think when the command is formed on os-net-config and oslo_concurrency.processutils.execute is called with the string that form the command, it may be espaping or quoting the strings inside the parameters after "ip". So the command never runs correctly

wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.