Cloudinit takes ~6 mins to run after a reboot of a control node

Bug #1314573 reported by Cian O'Driscoll
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Jon-Paul Sullivan

Bug Description

boot.log shows the following

Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
Cloud-init v. 0.7.3 running 'init-local' at Fri, 25 Apr 2014 12:52:33 +0000. Up 2.13 seconds.
cloud-init-nonet[2.52]: waiting 10 seconds for network device
cloud-init-nonet[12.52]: waiting 120 seconds for network device
cloud-init-nonet[132.53]: gave up waiting for a network device.
Cloud-init v. 0.7.3 running 'init' at Fri, 25 Apr 2014 12:54:44 +0000. Up 132.67 seconds.
ci-info: +++++++++++++++++++++++++Net device info++++++++++++++++++++++++++
ci-info: +--------+------+------------+---------------+-------------------+
ci-info: | Device | Up | Address | Mask | Hw-Address |
ci-info: +--------+------+------------+---------------+-------------------+
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . |
ci-info: | eth0 | True | 192.0.2.32 | 255.255.255.0 | 00:1a:92:89:b8:18 |
ci-info: +--------+------+------------+---------------+-------------------+
ci-info: ++++++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++++++
ci-info: +-------+-------------+-----------+---------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-------------+-----------+---------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 192.0.2.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 192.0.2.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: +-------+-------------+-----------+---------------+-----------+-------+
2014-04-25 12:54:44,731 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: bad status code [404]
2014-04-25 12:54:45,864 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: bad status code [404]
........
...........
..........
Cloud-init v. 0.7.3 running 'modules:config' at Fri, 25 Apr 2014 12:58:55 +0000. Up 383.52 seconds.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Hmm, seeing this too. Maybe openvswitch-switch just starts too late? (or at least cloud-init-nonet starts before it) Once cloud-init-nonet times out, openvswitch-switch starts almost immediately and all network interfaces get configured correctly.

Changed in tripleo:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jon-Paul Sullivan (jonpaul-sullivan) wrote :

The System-V startup scripts are called by rc-sysinit.conf, which starts them on:

start on (filesystem and static-network-up) or failsafe-boot

And the cloud-init-nonet runs prior to static-network-up, times out before seeing it, and then continues.

The configuration being written on the vms is as follows:

# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d
auto eth0
allow-br-ex eth0
 iface eth0 inet manual
 ovs_bridge br-ex
 ovs_type OVSPort
auto br-ex
allow-ovs br-ex
 iface br-ex inet dhcp
 pre-up ip addr flush dev eth0
 ovs_type OVSBridge
 ovs_ports eth0

So we actually have a chicken/egg scenario here, where the openvswitch-switch Sys-V init script will not be run until static-network-up is there, but is needed to get it there, as the br-ex device is in the auto list.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (master)

Fix proposed to branch: master
Review: https://review.openstack.org/96221

Changed in tripleo:
assignee: nobody → Jon-Paul Sullivan (jonpaul-sullivan)
status: Triaged → In Progress
Revision history for this message
Jon-Paul Sullivan (jonpaul-sullivan) wrote :

Openvswitch-switch should not be configured ahead of networking and remote filesystems, so the change required should be to ensure that the persistent configuration is available to openvswitch-switch when it starts, but is not interpreted by networking startup as necessary, so the auto br-ex needs to go.

Revision history for this message
Jon-Paul Sullivan (jonpaul-sullivan) wrote :

I looked at setting a configuration that would bring eth0 up during boot, but allow the bridge to configure correctly later in the boot sequence, but these attempts appeared to be futile.

Given this, a simpler workaround of starting openvswitch-switch earlier than cloud-init allows the static-network-up event to trigger, and ensures that the metadata can be successfully retrieved.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/96221
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=7781738f9644b27ae0b8c4f29b9dc2bf32374946
Submitter: Jenkins
Branch: master

commit 7781738f9644b27ae0b8c4f29b9dc2bf32374946
Author: Jon-Paul Sullivan <email address hidden>
Date: Wed May 28 19:42:25 2014 +0100

    Start openvswitch-switch early in upstart

    When there is a bridge configuration in the interfaces definition the
    openvswitch-switch service needs to be started early.

    This ensures that the metadata server is available for cloud-init when
    it runs.

    Change-Id: Ieac3703dc910ccb420e1a5a1cdc40105bd4f067a
    Closes-bug: #1314573

Changed in tripleo:
status: In Progress → Fix Committed
Jay Dobies (jdob)
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.