cloud-init metadata fallback broken

Bug #1870228 reported by James Denton
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Unassigned

Bug Description

I came across an issue today for a user that was experiencing issues connecting to metadata at 169.254.169.254. For a long time, cloud-init has had a fallback mechanism to that allowed it to contact the metadata service at http://<dhcp server ip>/latest/meta-data if http://169.254.169.254/latest/meta-data were unavailable, like so:

[ 157.574921] cloud-init[1313]: 2020-03-31 09:53:24,158 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [50/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7fa9e07c5890>, 'Connection to 169.254.169.254 timed out. (connect timeout=50.0)'))]
[ 208.629083] cloud-init[1313]: 2020-03-31 09:54:15,214 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [101/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7fa9e07c5350>, 'Connection to 169.254.169.254 timed out. (connect timeout=50.0)'))]
[ 226.639267] cloud-init[1313]: 2020-03-31 09:54:33,224 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [119/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7fa9e07c5250>, 'Connection to 169.254.169.254 timed out. (connect timeout=17.0)'))]
[ 227.640812] cloud-init[1313]: 2020-03-31 09:54:34,225 - DataSourceEc2.py[CRITICAL]: Giving up on md from ['http://169.254.169.254/2009-04-04/meta-data/instance-id'] after 120 seconds
[ 227.651134] cloud-init[1313]: 2020-03-31 09:54:34,236 - url_helper.py[WARNING]: Calling 'http://10.19.48.2/latest/meta-data/instance-id' failed [0/120s]: request error [('Connection aborted.', error(111, 'Connection refused'))]
[ 228.655226] cloud-init[1313]: 2020-03-31 09:54:35,240 - url_helper.py[WARNING]: Calling 'http://10.19.48.2/latest/meta-data/instance-id' failed [1/120s]: request error [('Connection aborted.', error(111, 'Connection refused'))]

In this Stein environment, isolated metadata is enabled, and the qdhcp namespace has a listener at 169.254.169.254:80. Previous versions of Neutron had the listener on 0.0.0.0:80, which helped facilitate the fallback mechanism described above. The bug/patch where this was changed is here:

[1] https://bugs.launchpad.net/neutron/+bug/1745618

Having this functionality back would be nice. Thoughts?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

It was changed to be like that in https://review.opendev.org/#/c/600421/ which fixed https://bugs.launchpad.net/neutron/+bug/1745618
So basically this is done like that by purpose and I don't think we would want to get back to the old solution with binding it to 0.0.0.0

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Sorry, I see You already wrote it at the bottom of bug description :)
I'm not sure exactly why we should change that back. Can't You simply "force_metadata" option (IIRC it's the name of this option) to add static route to 169.254.169.254 via Your dhcp port's IP address on vms?

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
James Denton (james-denton) wrote :

@Slawek - The route wasn't the issue (it was present) but client access to the metadata service was timing out. I noticed cloud-init fallback to the DHCP server IP, which was then refused and led me to the patch you referenced.

In the end, we found the client had aggressively modified EGRESS security group rules which inadvertently removed access to 169.254.169.254. If anything, maybe a docs update mentioning this scenario would be more appropriate. Thanks again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.