cloud-init should give up on 404 errors

Bug #1298921 reported by Juerg Haefliger
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

On OpenStack, the CloudStack handler is retyring even if the response is 404, resulting in a 2 min timeout.

IMHO, readurl() should abort in this case.

ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-------------+----------+---------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 10.0.0.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.0.0.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: +-------+-------------+----------+---------------+-----------+-------+
2014-03-28 04:56:40,733 - url_helper.py[WARNING]: Calling 'http://10.0.0.1//latest/meta-data/instance-id' failed [0/120s]: bad status code [404]
2014-03-28 04:56:41,738 - url_helper.py[WARNING]: Calling 'http://10.0.0.1//latest/meta-data/instance-id' failed [1/120s]: bad status code [404]

<SNIP>

2014-03-28 04:58:32,946 - url_helper.py[WARNING]: Calling 'http://10.0.0.1//latest/meta-data/instance-id' failed [112/120s]: bad status code [404]
2014-03-28 04:58:39,953 - url_helper.py[WARNING]: Calling 'http://10.0.0.1//latest/meta-data/instance-id' failed [119/120s]: request error [HTTPConnectionPool(host='10.0.0.1', port=80): Max retries exceeded with url: //latest/meta-data/instance-id (Caused by <class 'socket.error'>: [Errno 115] Operation now in progress)]
2014-03-28 04:58:46,959 - DataSourceCloudStack.py[CRITICAL]: Giving up on waiting for the metadata from ['http://10.0.0.1//latest/meta-data/instance-id'] after 126 seconds

Related branches

Revision history for this message
Scott Moser (smoser) wrote :

It would seem you've mis-configured something if you're hitting this.
Default settings put OpenStack before CloudStack, so it should only do this if you've somehow not found a openstack datasource.

can you post your whole cloud-init.log ?

Revision history for this message
Juerg Haefliger (juergh) wrote :
Download full text (95.2 KiB)

Mar 31 10:00:05 hlinux [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.2 running 'single' at Mon, 31 Mar 2014 10:00:05 +0000. Up 3090.59 seconds.
Mar 31 10:00:05 hlinux [CLOUDINIT] importer.py[DEBUG]: Looking for modules ['cc_apt_pipelining', 'cloudinit.config.cc_apt_pipelining'] that have attributes ['handle']
Mar 31 10:00:05 hlinux [CLOUDINIT] importer.py[DEBUG]: Found cc_apt_pipelining with attributes ['handle'] in ['cloudinit.config.cc_apt_pipelining']
Mar 31 10:00:05 hlinux [CLOUDINIT] importer.py[DEBUG]: Looking for modules ['debian', 'cloudinit.distros.debian'] that have attributes ['Distro']
Mar 31 10:00:05 hlinux [CLOUDINIT] importer.py[DEBUG]: Found debian with attributes ['Distro'] in ['cloudinit.distros.debian']
Mar 31 10:00:05 hlinux [CLOUDINIT] stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.debian.Distro'>
Mar 31 10:00:05 hlinux [CLOUDINIT] util.py[DEBUG]: Writing to /var/lib/cloud/sem/config_apt_pipelining.once - wb: [420] 20 bytes
Mar 31 10:00:05 hlinux [CLOUDINIT] helpers.py[DEBUG]: Running config-apt-pipelining using lock (<FileLock using file '/var/lib/cloud/sem/config_apt_pipelining.once'>)
Mar 31 10:00:05 hlinux [CLOUDINIT] util.py[DEBUG]: Writing to /etc/apt/apt.conf.d/90cloud-init-pipelining - wb: [420] 80 bytes
Mar 31 10:00:05 hlinux [CLOUDINIT] cc_apt_pipelining.py[DEBUG]: Wrote /etc/apt/apt.conf.d/90cloud-init-pipelining with apt pipeline depth setting 0
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.2 running 'init' at Mon, 31 Mar 2014 10:01:10 +0000. Up 3155.70 seconds.
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Writing to /var/log/cloud-init.log - ab: [420] 0 bytes
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Changing the ownership of /var/log/cloud-init.log to 0:4
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Running command ['ifconfig', '-a'] with allowed return codes [0] (shell=False, capture=True)
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Running command ['route', '-n'] with allowed return codes [0] (shell=False, capture=True)
Mar 31 10:01:10 hlinux [CLOUDINIT] cloud-init[DEBUG]: Checking to see if files that we need already exist from a previous run that would allow us to stop early.
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Reading from /var/lib/cloud/data/no-net (quiet=False)
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
Mar 31 10:01:10 hlinux [CLOUDINIT] importer.py[DEBUG]: Looking for modules ['debian', 'cloudinit.distros.debian'] that have attributes ['Distro']
Mar 31 10:01:10 hlinux [CLOUDINIT] importer.py[DEBUG]: Found debian with attributes ['Distro'] in ['cloudinit.distros.debian']
Mar 31 10:01:10 hlinux [CLOUDINIT] stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.debian.Distro'>
Mar 31 10:01:10 hlinux [CLOUDINIT] __init__.py[DEBUG]: Looking for for data source in: ['NoCloud', 'AltCloud', 'CloudStack', 'ConfigDrive', 'Ec2', 'MAAS', 'OVF', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM', 'NETWORK']
Mar 31 ...

Revision history for this message
Scott Moser (smoser) wrote :

Juergh,
  Thanks for posting the log.
Relevant info from the log:

Mar 31 10:00:05 hlinux [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.2 running 'single' at Mon, 31 Mar 2014 10:00:05 +0000. Up 3090.59 seconds.
pt.conf.d/90cloud-init-pipelining with apt pipeline depth setting 0
...
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.2 running 'init' at Mon, 31 Mar 2014 10:01:10 +0000. Up 3155.70 seconds.
Mar 31 10:01:10 hlinux [CLOUDINIT] __init__.py[DEBUG]: Looking for for data source in: ['NoCloud', 'AltCloud', 'CloudStack', 'ConfigDrive', 'Ec2', 'MAAS', 'OVF', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM', 'NETWORK']
...
Mar 31 10:01:10 hlinux [CLOUDINIT] __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceCloudStack.DataSourceCloudStack'>
Mar 31 10:01:10 hlinux [CLOUDINIT] DataSourceCloudStack.py[DEBUG]: Using /var/lib/dhcp lease directory
Mar 31 10:01:10 hlinux [CLOUDINIT] DataSourceCloudStack.py[DEBUG]: Found DHCP identifier 192.168.122.1
Mar 31 10:01:10 hlinux [CLOUDINIT] DataSourceCloudStack.py[DEBUG]: Found DHCP identifier 192.168.122.1
Mar 31 10:01:10 hlinux [CLOUDINIT] util.py[DEBUG]: Reading from /var/lib/cloud/seed/cs/meta-data (quiet=False)
Mar 31 10:01:10 hlinux [CLOUDINIT] url_helper.py[DEBUG]: [0/1] open 'http://192.168.122.1//latest/meta-data/instance-id' with {'url': 'http://192.168.122.1//latest/meta-data/instance-id', 'headers': {'User-Agent': 'Cloud-Init/0.7.2'}, 'allow_redirects': True, 'method': 'GET', 'timeout': 50.0} configuration
Mar 31 10:01:10 hlinux [CLOUDINIT] url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='192.168.122.1', port=80): Max retries exceeded with url: //latest/meta-data/instance-id (Caused by <class 'socket.error'>: [Errno 111] Connection refused)]

So, this is 0.7.2, and I believe there must be a config file in /etc/cloud/ that enables the CloudStack datasource, and lists it before the ConfigDrive datasource, as the built in config (in cloudinit/settings.py) for 0.7.2 has:
    'datasource_list': [
        'NoCloud',
        'ConfigDrive',
        'AltCloud',
        'OVF',
        'MAAS',
        'Ec2',
        'CloudStack',
        # At the end to act as a 'catch' when none of the above work...
        'None',
    ],

In one way or another, the CloudStack datasource was explicitly enabled.
Additionally, its not a 404 that we get as the error, but a Connection Refused.

You or something explicitly configured this datasource on, and to be run before the Ec2 datasource, so generally its functioning as designed.

I don't generally disagree with the idea, and don't like the polling behavior either.
But when CloudStack was added, it was added with this annoying polling behavior (The Ec2 datasource also has this behavior, to account for the possibility that the metadata server "just isnt there yet").

Before removing the poll/timeout behavior, I'd need to get some ACK from someone involved with CloudStack that this would not break any situations.

Revision history for this message
Juerg Haefliger (juergh) wrote :

> In one way or another, the CloudStack datasource was explicitly enabled.
> You or something explicitly configured this datasource on, and to be run before the Ec2 datasource, so generally its functioning > as designed.

Well it wasn't explicitly disabled :-)

> Additionally, its not a 404 that we get as the error, but a Connection Refused.

Weird. I know I got 404 errors at one point. Need to redo my testing when I'm back in the office.

> there must be a config file in /etc/cloud/ that enables the CloudStack datasource, and lists it before the ConfigDrive datasource,

Yes there is. It seems that the list in Debian is in the wrong order. However, I think the actual ordering should be enforced by cloud-init if it is that crucial and breaks otherwise (as seen in this case).

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

CloudStack was add to default list at revno 217 during the Quantal cycle. So its been there for a while.

Scott Moser (smoser)
Changed in cloud-init (Ubuntu):
importance: Undecided → Low
Revision history for this message
Thiago Martins (martinx) wrote :

Any news on this guys?

I'm trying to create a QCoW2 image with Cloud Init, using Packer, everything seems t be okay but, I'm seeing the following problem:

---
2015-10-10 02:27:39,689 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [20/30s]: unexpected error ['NoneType' object has no attribute 'status_code']
2015-10-10 02:27:48,704 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [29/30s]: unexpected error ['NoneType' object has no attribute 'status_code']
2015-10-10 02:27:49,705 - DataSourceEc2.py[CRITICAL]: Giving up on md from ['http://169.254.169.254/2009-04-04/meta-data/instance-id'] after 30 seconds
2015-10-10 02:27:49,717 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [0/120s]: bad status code [404]
2015-10-10 02:27:50,730 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [1/120s]: bad status code [404]
2015-10-10 02:27:51,740 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [2/120s]: bad status code [404]
2015-10-10 02:27:52,750 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [3/120s]: bad status code [404]
2015-10-10 02:27:53,760 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [4/120s]: bad status code [404]
2015-10-10 02:27:54,772 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [5/120s]: bad status code [404]
2015-10-10 02:27:56,783 - url_helper.py[WARNING]: Calling 'http://192.168.122.1//latest/meta-data/instance-id' failed [7/120s]: bad status code [404]
---

I'm seeing this problem, inside by testing environment, where I do NOT have any Metadata server...

I tried to reduce the timeout, by creating the following file:

10_ec2_timeouts.cfg:

---
datasource:
  MAAS:
    timeout : 50
    max_wait : 120
  OpenStack:
    timeout : 20
    max_wait : 30
  Ec2:
    timeout : 20
    max_wait : 30
---

But, it didn't worked!

The first Metadaserver "http://169.254.169.254" was reduced but, the second Metadata server "http://192.168.122.1" still have 120s of timeout!

How to reduce the timeout for all tries that cloud-init will execute?

Thanks!
Thiago

Revision history for this message
Scott Moser (smoser) wrote :

Thiago,
  the CloudSigma datasource is the one that is doing the http requests listed above. That datasource cannot be configured with timeout. It should be a fairly easy fix to add that.

Note, though that if you're hitting the cloudsigma datasource, you are likely running on cloudsigma or there is another bug.

the cloudsigma code in trunk at the moment does a 'is_running_cloudsigma' check that should have very few false positives. See:
  http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/sources/DataSourceCloudSigma.py#L45

it should only return True if 'system-product-name' from the host-provided dmi data has 'cloudsigma' in it.

Revision history for this message
Thiago Martins (martinx) wrote :

Scott,

This is the first time I'm hearing about "cloudsigma", I'm pretty sure that I'm not running there, since this is a KVM Guest running at my own Ubuntu physical host, that I installed using Ubuntu ISO.

So, this might be another bug... Or maybe not, because it is not giving up on 404 errors...

Do you know any way to disable this second datasource that it uses automatically as a fall back ?

Thanks!

Revision history for this message
Imran Khakoo (imrankhakoo) wrote :

Any news on this? I'm also seeing 404 errors and a 2 minute timeout.

Revision history for this message
Kurt Garloff (kgarloff) wrote :
Revision history for this message
Chad Smith (chad.smith) wrote :

An upstream commit landed for this bug.

To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=097a2967

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 18.1-35-ge0f644b7-0ubuntu1

---------------
cloud-init (18.1-35-ge0f644b7-0ubuntu1) bionic; urgency=medium

  * debian/cloud-init.templates: enable IBMCloud datasource.
  * New upstream snapshot.
    - IBMCloud: Initial IBM Cloud datasource.
    - tests: remove jsonschema from xenial tox environment.
    - tests: Fix newly added schema unit tests to skip if no jsonschema.
    - ec2: Adjust ec2 datasource after exception_cb change.
    - Reduce AzurePreprovisioning HTTP timeouts.
      [Douglas Jordan] (LP: #1752977)
    - Revert the logic of exception_cb in read_url.
      [Kurt Garloff] (LP: #1702160, #1298921)
    - ubuntu-advantage: Add new config module to support
      ubuntu-advantage-tools
    - Handle global dns entries in netplan (LP: #1750884)
    - Identify OpenTelekomCloud Xen as OpenStack DS.
      [Kurt Garloff] (LP: #1756471)

 -- Chad Smith <email address hidden> Fri, 23 Mar 2018 17:20:47 -0600

Changed in cloud-init (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.