OpenStack datasource should not retry user-data on 404

Bug #1702160 reported by Chad Smith on 2017-07-03
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Medium
Unassigned

Bug Description

When receiving a 404 from user_data attempts, cloud-init should not attempt to retry the errant user_data file as it is wasted effort. If the metadata service is already up, then any user-data should already be present, so a 404 at this time means that no user data exists so retries will not result in finding user-data at a later time.

Excerpt of logs indicating metadata is up but user-data does not exist:

2017-07-03 18:26:50,408 - __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStack'>
2017-07-03 18:26:50,558 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'timeout': 10.0, 'method': 'GET', 'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack'} configuration
2017-07-03 18:26:51,434 - url_helper.py[DEBUG]: Read from http://169.254.169.254/openstack (200, 83b) after 1 attempts
...
2017-07-03 18:26:51,445 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack/latest/user_data' with {'timeout': 10.0, 'method': 'GET', 'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack/latest/user_data'} configuration
2017-07-03 18:26:52,514 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again
...2017-07-03 18:26:53,957 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again
2017-07-03 18:26:59,989 - openstack.py[DEBUG]: Failed reading optional path http://169.254.169.254/openstack/latest/user_data due to: 404 Client Error: Not Found for url: http://169.254.169.254/openstack/latest/user_data

Validated as well on cmdline minutes later:
wget http://169.254.169.254/openstack/latest/user_data
--2017-07-03 18:51:33-- http://169.254.169.254/openstack/latest/user_data
Connecting to 169.254.169.254:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2017-07-03 18:51:34 ERROR 404: Not Found.

Related branches

Chad Smith (chad.smith) on 2017-07-03
description: updated
Chad Smith (chad.smith) on 2017-07-03
summary: - OpenStack datasource should not retry user-data on 404s
+ OpenStack datasource should not retry user-data on 404
Chad Smith (chad.smith) on 2017-07-03
Changed in cloud-init:
importance: Undecided → Medium
assignee: nobody → Chad Smith (chad.smith)
status: New → In Progress
Kurt Garloff (kgarloff) wrote :

sources/helpers/openstack.py has the exception_cb logic reversed.

Attached patch fixes it.

Note: I special cased 'meta_data.json' as this is required and we might want to have retries for this file even though we have a response code >= 400. This is not because I believe that this is the best possible approach, but because there might be environments out there that relied on the unintended retrials and I want to avoid this patch possibly causing a regression anywhere.

Patch is against 0.7.9 from Xenial (where I tested this), let me know if you need this rediffed against some current git tree.

Kurt Garloff (kgarloff) wrote :

Two more notes:
1. This saves up to 18s boot time (when no user_data, network_data.json and vendor_data.json is provided)
2. The logic in url_helper.readurl() was different before, so one might argue that it should be reverted back there and the helper/openstack.py logic be left as is ... Would need to check DataSourceScaleway.py and ec2_utils.py to ensure we don't break anything. I did not go through the commit logs to understand what the author intended with this reversion and whether it was intentional.

Scott Moser (smoser) wrote :

Curt,
Thanks for the patch.
In order to accept it you will need to sign the Canonical Contributors Agreement, see http://cloudinit.readthedocs.io/en/latest/topics/hacking.html for more information.

Also, a git merge proposal is ideal rather than a patch on the bug (see also in HACKING)

thanks!

Paul Graydon (pgraydon-oracle) wrote :

Kurt,

Any updates on signing the CCA? Would love to see this patch land!

Paul

Kurt Garloff (kgarloff) wrote :

commit e9e86164198993aca13148872afdeebaae751c2c
Author: Scott Moser <email address hidden>
Date: Tue Sep 29 17:17:49 2015 -0400

reverted the logic in url_helper.py and broke the OpenStack datasource rety logic that was working just fine before.

It references LP: #1499869.
but that does not give me a clue why the logic was reverted in readurl().

Kurt Garloff (kgarloff) wrote :

When e9e8616 was committed, the only user of the exception_cb parameter in url_helper.readurl() was the OpenStack DS.

It called it's cb should_retry_cb, clearly assuming that a True ish return value would result in a retry and a False in an abort (reraising the exception).

This was true until e9e8616 broke it with
- if exception_cb and not exception_cb(req_args.copy(), excps[-1]):
+ if exception_cb and exception_cb(req_args.copy(), excps[-1]):
+ # if an exception callback was given it should return None
+ # a true-ish value means to break and re-raise the exception
                 break

The old logic aborted the retry loop on a False return value, the new one on True.

I thus conclude that the commit was doing the reversion by mistake as it broke the only user of it back then.

Kurt Garloff (kgarloff) wrote :

These days, there is one other user of exception_cb: DataSourceScaleway.
It uses
            exception_cb=lambda _, exc: exc.code == 404 or (
                isinstance(exc.cause, requests.exceptions.ConnectionError)
            )
which returns True on ConnectionErrors and on 404, causing it NOT to retry on 404 according to the current logi in readurl() which is probably the intended behavior. (A file not found error won't change by retrying ...)

Given the history, my preferred resolution of the issue would be to leave the OpenStack helper alone, reinstate the old logic in url_helper.readurl, update the commnet and revert the logic in DataSourceScaleway.

Chad Smith (chad.smith) on 2018-03-23
Changed in cloud-init:
assignee: Chad Smith (chad.smith) → nobody
Chad Smith (chad.smith) wrote :

An upstream commit landed for this bug.

To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=097a2967

Changed in cloud-init:
status: In Progress → Fix Committed

This bug is believed to be fixed in cloud-init in 18.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers