get_data in DataSourceOpenStack.py can time out if metadata service is slow

Bug #1657130 reported by Lars Kellogg-Stedman on 2017-01-17
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Medium
Unassigned
cloud-init (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned
Yakkety
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
On heavily loaded openstack metadata services, cloud-init may hit a timeout
and not properly retry when waiting longer or retring would allow it to
succeed.

cloud-init contained a setting to configure this but it was not used in all
cases. The change here enabled usage of timeout and retry for.

[Test Case]
1. Launch an instance on openstack.
2. Verify inconsistent use of 'timeout' in /var/log/cloud-init.log
  $ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
  2017-03-03 16:51:23,824 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 10.0} configuration
  2017-03-03 16:51:24,384 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 5.0} configuration

3. enable proposed, update, upgrade
4. clean
   rm -Rf /var/lib/cloud /var/log/cloud-init*
5. reboot
6. re-check step 2, expect see 'timeout' is consistent.

[Regression Potential]
low chance for regression. Slower boot times but more reliable on a non-perform
ant metadata service.

=== End SRU Template ===

cloud-init sometimes times out and fails to fetch metadata in the OpenStack environment when the Controller node is under high workload.

The default timeout value is 5 seconds and it may be too small in some cases where the Controller node is too busy to respond to the metadata request from the instance in time.

There is a 'timeout' configuration setting, as in...

  datasource:
    OpenStack:
      timeout: 30

...but this value is not used by the get_data method in cloudinit/sources/DataSourceOpenStack.py, because get_data is called from cloudinit/sources/__init__.py with no keyword arguments:

                LOG.debug("Seeing if we can get any data from %s", cls)
                s = cls(sys_cfg, distro, paths)
                if s.get_data():
                    myrep.message = "found %s data from %s" % (mode, name)
                    return (s, type_utils.obj_name(cls))

Related branches

Scott Moser (smoser) on 2017-03-03
Changed in cloud-init:
importance: Undecided → Medium
status: New → Fix Released
Changed in cloud-init (Ubuntu):
status: New → Confirmed
status: Confirmed → Fix Released
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
Changed in cloud-init (Ubuntu Xenial):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in cloud-init:
status: Fix Released → Fix Committed
Scott Moser (smoser) on 2017-03-03
description: updated

Hello Lars, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-48-g1c795b9-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → Fix Committed
tags: added: verification-needed
Chris Halse Rogers (raof) wrote :

Hello Lars, or anyone else affected,

Accepted cloud-init into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-48-g1c795b9-0ubuntu1~16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Committed
Scott Moser (smoser) wrote :

$ dpkg-query --show cloud-init
cloud-init 0.7.9-0ubuntu1~16.04.2
$ lsb_release -sc
xenial
$ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
2017-03-08 19:49:21,111 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'timeout': 10.0, 'url': 'http://169.254.169.254/openstack', 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'allow_redirects': True} configuration
2017-03-08 19:49:21,580 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'timeout': 5.0, 'url': 'http://169.254.169.254/openstack', 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'allow_redirects': True} configuration
$ rel=$(lsb_release -sc)
$ line=$(awk '$1 == "deb" && $2 ~ /ubuntu.com/ {
> printf("%s %s %s-proposed main universe\n", $1, $2, rel); exit(0) };
> ' "rel=$rel" /etc/apt/sources.list)
$ echo "$line" | sudo tee /etc/apt/sources.list.d/proposed.list
sudo: unable to resolve host xenial-20170308-194839
deb http://nova.clouds.archive.ubuntu.com/ubuntu/ xenial-proposed main universe
$ sudo apt-get update -q && sudo apt-get install cloud-init -q
...
Setting up cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.04.1) ...

$ dpkg-query --show cloud-init
cloud-init 0.7.9-48-g1c795b9-0ubuntu1~16.04.1

$ sudo rm -Rf /var/log/cloud-init* /var/lib/cloud && sudo reboot
$ sudo reboot

## go back in. Notice the difference here we have the 'timeout' of 10.0 in
## both requests.
$ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
2017-03-08 19:54:55,726 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'timeout': 10.0, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack', 'allow_redirects': True} configuration
2017-03-08 19:54:56,243 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'timeout': 10.0, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack', 'allow_redirects': True} configuration

Scott Moser (smoser) wrote :

## Show the original system, and the issue.
## See that the second request headers below shows 'timeout' of 5.0.
## it should have done both of the requests listed with 10.0

$ dpkg-query --show cloud-init
cloud-init 0.7.9-0ubuntu1~16.10.1
$ lsb_release -sc
yakkety
$ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
2017-03-08 19:49:18,879 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'allow_redirects': True, 'url': 'http://169.254.169.254/openstack', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 10.0} configuration
2017-03-08 19:49:19,350 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'allow_redirects': True, 'url': 'http://169.254.169.254/openstack', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 5.0} configuration
$ rel=$(lsb_release -sc)
$ line=$(awk '$1 == "deb" && $2 ~ /ubuntu.com/ { printf("%s %s %s-proposed main universe\n", $1, $2, rel); exit(0) };' rel=$rel /etc/apt/sources.list)
$ echo "$line" | sudo tee /etc/apt/sources.list.d/proposed.list
deb http://nova.clouds.archive.ubuntu.com/ubuntu/ yakkety-proposed main universe
$ sudo apt-get update -q && sudo apt-get install cloud-init -q
...
Setting up cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.10.1) ...

$ sudo rm -Rf /var/lib/cloud/ /var/log/cloud-init* && sudo reboot

### go back in see that both requests had timeout 10.0
$ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
2017-03-08 19:59:43,702 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack', 'timeout': 10.0, 'allow_redirects': True} configuration
2017-03-08 19:59:44,160 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack', 'timeout': 10.0, 'allow_redirects': True} configuration

tags: added: verification-done-xenial verification-done-yakkety
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-48-g1c795b9-0ubuntu1~16.04.1

---------------
cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/rules: install Z99-cloudinit-warnings.sh to /etc/profile.d
  * debian/patches/ds-identify-behavior-xenial.patch: adjust default
    behavior of ds-identify for SRU (LP: #1669675, #1660385).
  * New upstream snapshot.
    - Support warning if the used datasource is not in ds-identify's list
      (LP: #1669675).
    - DatasourceEc2: add warning message when not on AWS. (LP: #1660385)
    - Z99-cloudinit-warnings: Add profile.d script for showing warnings on
    - Z99-cloud-locale-test.sh: convert tabs to spaces, remove unneccesary
      execute bit in permissions.
    - (RedHat) net: correct errors in cloudinit/net/sysconfig.py
      [Lars Kellogg-Stedman]
    - ec2_utils: fix MetadataLeafDecoder that returned bytes on empty
    - Fix eni rendering of multiple IPs per interface [Ryan Harper]
      (LP: #1657940)
    - Add 3 ecdsa-sha2-nistp* ssh key types now that they are standardized
      [Lars Kellogg-Stedman]
    - EC2: Do not cache security credentials on disk [Andrew Jorgensen]
      (LP: #1638312)
    - OpenStack: Use timeout and retries from config in get_data.
      [Lars Kellogg-Stedman] (LP: #1657130)
    - Fixed Misc issues related to VMware customization. [Sankar Tanguturi]
    - (RedHat) Use dnf instead of yum when available [Lars Kellogg-Stedman]
    - Get early logging logged, including failures of cmdline url.
    - test / doc / build environment changes
      - Remove style checking during build and add latest style checks to
        tox [Joshua Powers]
      - code-style: make master pass pycodestyle (2.3.1) cleanly, currently
        [Joshua Powers]
      - Fix small typo and change iso-filename for consistency
      - tools/mock-meta: support python2 or python3 and ipv6 in both.
      - tests: remove executable bit on test_net, so it runs, and fix it.
      - tests: No longer monkey patch httpretty for python 3.4.2
      - reset httppretty for each test [Lars Kellogg-Stedman]
      - build: fix running Make on a branch with tags other than master
      - doc: Fix typos and clarify some aspects of the part-handler
        [Erik M. Bray]
      - doc: add some documentation on OpenStack datasource.
      - Fix minor docs typo: perserve > preserve [Jeremy Bicha]
      - validate-yaml: use python rather than explicitly python3

 -- Scott Moser <email address hidden> Mon, 06 Mar 2017 16:34:10 -0500

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-48-g1c795b9-0ubuntu1~16.10.1

---------------
cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.10.1) yakkety; urgency=medium

  * debian/rules: install Z99-cloudinit-warnings.sh to /etc/profile.d
  * debian/patches/ds-identify-behavior-yakkety.patch: adjust default
    behavior of ds-identify for SRU (LP: #1669675, #1660385).
  * New upstream snapshot.
    - Support warning if the used datasource is not in ds-identify's list
      (LP: #1669675).
    - DatasourceEc2: add warning message when not on AWS. (LP: #1660385)
    - Z99-cloudinit-warnings: Add profile.d script for showing warnings on
    - Z99-cloud-locale-test.sh: convert tabs to spaces, remove unneccesary
      execute bit in permissions.
    - (RedHat) net: correct errors in cloudinit/net/sysconfig.py
      [Lars Kellogg-Stedman]
    - ec2_utils: fix MetadataLeafDecoder that returned bytes on empty
    - Fix eni rendering of multiple IPs per interface [Ryan Harper]
      (LP: #1657940)
    - Add 3 ecdsa-sha2-nistp* ssh key types now that they are standardized
      [Lars Kellogg-Stedman]
    - EC2: Do not cache security credentials on disk [Andrew Jorgensen]
      (LP: #1638312)
    - OpenStack: Use timeout and retries from config in get_data.
      [Lars Kellogg-Stedman] (LP: #1657130)
    - Fixed Misc issues related to VMware customization. [Sankar Tanguturi]
    - (RedHat) Use dnf instead of yum when available [Lars Kellogg-Stedman]
    - Get early logging logged, including failures of cmdline url.
    - test / doc / build environment changes
      - Remove style checking during build and add latest style checks to
        tox [Joshua Powers]
      - code-style: make master pass pycodestyle (2.3.1) cleanly, currently
        [Joshua Powers]
      - Fix small typo and change iso-filename for consistency
      - tools/mock-meta: support python2 or python3 and ipv6 in both.
      - tests: remove executable bit on test_net, so it runs, and fix it.
      - tests: No longer monkey patch httpretty for python 3.4.2
      - reset httppretty for each test [Lars Kellogg-Stedman]
      - build: fix running Make on a branch with tags other than master
      - doc: Fix typos and clarify some aspects of the part-handler
        [Erik M. Bray]
      - doc: add some documentation on OpenStack datasource.
      - Fix minor docs typo: perserve > preserve [Jeremy Bicha]
      - validate-yaml: use python rather than explicitly python3

 -- Scott Moser <email address hidden> Mon, 06 Mar 2017 16:37:28 -0500

Changed in cloud-init (Ubuntu Yakkety):
status: Fix Committed → Fix Released

This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers