underscores should be stripped from hostnames generated for apt config

Bug #1868232 reported by Tom Haddon on 2020-03-20
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
cloud-init
High
Dan Watkins
cloud-init (Ubuntu)
Undecided
Unassigned
Focal
Undecided
Unassigned

Bug Description

In a ticket filed in the Ubuntu RT instance we were made aware of an issue where if a cloud is configured with an “_” in the region name, cloud-init will generate an apt configuration that also includes that “_” in the name.

So for example if the region name is zone_01, apt will be configured to use zone_01.clouds.archive.ubuntu.com.

On Friday March 13th we deployed some new archive servers on 18.04 using Apache 2.4.29-1ubuntu4.13. This version of apache has more strict protocol options than previous versions, per https://httpd.apache.org/docs/2.4/mod/core.html#httpprotocoloptions and the result is that a request to zone_01.clouds.archive.ubuntu.com returns a 400 Bad Request.

Could cloud-init be updated to remove non-permitted characters including “_” per https://tools.ietf.org/html/rfc3986#section-3.2.2 ?

Dan Watkins (oddbloke) wrote :

The current behaviour is that cloud-init will use the patterns defined in /etc/cloud/cloud.cfg:

 primary:
   - http://%(ec2_region)s.ec2.archive.ubuntu.com/ubuntu/
   - http://%(availability_zone)s.clouds.archive.ubuntu.com/ubuntu/
   - http://%(region)s.clouds.archive.ubuntu.com/ubuntu/

to determine the mirror to use. In this case, it will be one of the two latter patterns, depending on exactly how the data source in question presents "zone_01". Either way, the problem is the same.

Once this mirror URL is generated, cloud-init tests that it _resolves_ before using it. This is where the problem lies: *.clouds.archive.ubuntu.com will always resolve, but the newly-deployed Apache servers will no longer serve every domain that resolves. Arguably this is a misconfiguration of the archive servers (why resolve something that you can't serve?), but cloud-init should handle this case gracefully regardless.

There are (at least) a couple of ways in which we could address this issue in cloud-init:

(a) rewrite the generated URL (or the variables which we are substituting into the pattern) to only include valid URI characters
(b) modify cloud-init to check that mirrors are accessible via HTTP (rather than simply resolvable)

While both of these would address the immediate issue, only implementing (b) would mean that all instances in such zones would fallback to using archive.ubuntu.com, so I think we should do some form of (a) regardless.

One obvious downside to (b) is that it will introduce an additional HTTP request to each boot on a Debian/Ubuntu host; this could be a concern both from a client boot speed perspective, but perhaps more importantly from a server load perspective. (My gut feel is that the cost in both cases wouldn't be significantly noticeable: most Debian/Ubuntu instances that come up will perform many HTTP requests to the archive hosts, so one additional one isn't likely to be noticed. We should consider this more deeply before we implement this, however.)

(As an aside, we should do some research to confirm that the non-ASCII encoding described in the linked RFC 3986 section won't be affected by our filtering. For example, if we currently rely on the libraries we use to convert non-ASCII hostnames to the defined percent-encoding, then we would regress non-ASCII hostnames by applying a naive filter before we pass the name to those libraries.)

Changed in cloud-init:
status: New → Triaged
importance: Undecided → High
Dan Watkins (oddbloke) wrote :

I've just done some research (by stepping through with a debugger), and socket.getaddrinfo _does_ perform the encoding of non-ASCII characters:

  In [7]: socket.getaddrinfo('www.\u2603.com', None)[0][4][0]
  Out[7]: '185.53.178.7'

It does so using the 'idna' encoding:

  In [2]: "www.☃.com".encode('idna')
  Out[2]: b'www.xn--n3h.com'

which (unsurprisingly, given this bug) doesn't do anything to underscores:

  In [4]: "www_foo.☃.com".encode('idna')
  Out[4]: b'www_foo.xn--n3h.com'

So I believe the correct implementation of (a) would be to encode the URL ourselves, and then drop any invalid characters out. (We should check if there is any stdlib/requests functionality that already does this.)

Dan Watkins (oddbloke) wrote :

The cloud-init team chatted just now and we plan to implement (a), and reach out to the security and archive teams to confirm that they are comfortable with this path forward. I'll reiterate that proposal here for ease of review by those teams:

The Background:

To enable the provisioning of cloud-specific mirrors via only DNS changes, cloud-init configures apt with a <REGION>.clouds.archive.ubuntu.com archive URL by default when launching Ubuntu instances. The specific "<REGION>" name comes from the cloud (in slightly different ways depending on the cloud in question, but they all boil down to region/availability zone names). There is a *.clouds.archive.ubuntu.com DNS rule which resolves all of these archive URLs to the archive.ubuntu.com servers (in the absence of specific DNS rules for a region/cloud).

The Problem:

When instances are launched in regions/availability zones containing characters that are invalid in URIs (e.g. zone_01), cloud-init will currently generate and configure apt host names which are invalid URIs (e.g. zone_01.clouds.archive.ubuntu.com). These invalid URIs are now rejected by the archive.ubuntu.com web hosts (which were recently upgraded to bionic and its Apache), which means that new instances launched in these zones are configured with non-functional mirrors (the configured mirrors will return 400 for all requests).

The Proposed Solution:

cloud-init will remove invalid URI characters from the generated hostnames before configuring apt with them. (For example, zone_01.clouds.archive.ubuntu.com would be rewritten to zone01.clouds.archive.ubuntu.com.) Special care will be taken to ensure that IDNA names are not regressed.

Dan Watkins (oddbloke) on 2020-03-23
Changed in cloud-init:
assignee: nobody → Dan Watkins (daniel-thewatkins)
status: Triaged → In Progress
Seth Arnold (seth-arnold) wrote :

It would be nice to address the wildcard DNS entries, as those have a potential for abuse, and can be endlessly confusing if you're not prepared for them.

In the meantime though, this plan sounds good to me.

I'm worried about collisions, where multiple providers may use us_west_2, us%west2, uswest~2, etc.

Some phrases read differently if the spacing is removed. The usual examples are powergen_italia and experts_exchange, but perhaps there's more realistic phrases for region names. (This seems quite small problem compared to the overall wildcard DNS entries, though.)

Reversible transformations are usually better but since we're presumably doing this with business partners, the trouble cases may fall under "don't do that" kinds of categories.

Are these actual problems? Probably it's fine but I thought I'd mention them just in case someone else with more context or creativity can make more of them.

Thanks

Junien Fridrick (axino) wrote :

I do wonder about the actual risks of using "HttpProtocolOptions Unsafe" on the archive servers, which are only serving public, static files.

On Tue, Mar 24, 2020 at 02:58:49AM -0000, Seth Arnold wrote:
> It would be nice to address the wildcard DNS entries, as those have a
> potential for abuse, and can be endlessly confusing if you're not
> prepared for them.

The wildcard DNS entries are required for the system as-designed to
work, I believe. cloud-init will configure region-based mirror names
based _only_ on the metadata available to it in the instance (so if
RandomCloud set up a eu-random-1 region, cloud-init would configure
eu-random-1.clouds.archive.ubuntu.com as the mirror for instances in
that new region). So we need some guarantee that all possible
region-named mirrors will be listened on, hence the wildcard DNS
entries.

(We generate per-region mirrors for, I believe, a couple of reasons.
Firstly, it allows us, or the cloud, to spin up in-region mirrors and
have them used by ~all already-deployed Ubuntu instances just by adding
non-wildcard DNS entries pointing at the new mirrors. And, secondly, it
means that clouds don't have an incentive to DNS-hijack
archive.ubuntu.com if they decide they want to host in-cloud mirrors, so
cloud _users_ will have an easy way around the in-cloud mirrors if they
so desire.)

> In the meantime though, this plan sounds good to me.

OK, good!

> I'm worried about collisions, where multiple providers may use
> us_west_2, us%west2, uswest~2, etc.

To some extent, we do have this problem to solve already, as clouds
could have regions named identically. That said, this certainly does
increase the chance of collision.

> Some phrases read differently if the spacing is removed. The usual
> examples are powergen_italia and experts_exchange, but perhaps there's
> more realistic phrases for region names. (This seems quite small problem
> compared to the overall wildcard DNS entries, though.)
>
> Reversible transformations are usually better but since we're presumably
> doing this with business partners, the trouble cases may fall under
> "don't do that" kinds of categories.

It wouldn't be reversible, but we could convert invalid characters into,
say, "--" which is relatively unlikely to be used in real region
names/URIs. That would at least mean that "useast_1" and "useast1"
wouldn't collapse to the same mirror hostname (although it wouldn't do
anything about useast^1 and useast_1 colliding).

(I wonder if there are URI/hostname length boundaries that we would risk
running into if we replace single characters with anything more than a
single character, though.)

> Are these actual problems? Probably it's fine but I thought I'd mention
> them just in case someone else with more context or creativity can make
> more of them.

The input is certainly welcome!

Dimitri John Ledkov (xnox) wrote :

in punnycode -- is the encoding of -, thus i wouldn't want to use that.

In [36]: "-_☃.com".encode('idna')
Out[36]: b'xn---_-gsx.com'

I wish that idna module would do something sensible, and allow invalid chars to be encoded into xn-* string as unicode codepoints, but nothing seems to do that in python. Maybe time for an RFC update?!

Maybe it is worth talking to the cloud admins about using idna-safe zone names.

Most clouds use '-' for region name separators. We don't need round trip safety, as one can always query the zone name from the cloud-init metadata.

Thus cloud init imho, should be replacing any invalid chars with '-' (slightly better than dropping them, as the string length remains the same)

In terms of applying "HttpProtocolOptions Unsafe" on the cloud-mirror side, we should do this to support all the already launched instances, and those that are getting launched with cloud-init versions that generate unsafe urls, because otherwise those instances do not receive any updates. (I hope i understand the option right, and support strategy)

Dan Watkins (oddbloke) wrote :

> in punnycode -- is the encoding of -, thus i wouldn't want to use that.
>
> In [36]: "-_☃.com".encode('idna')
> Out[36]: b'xn---_-gsx.com'

This isn't the case; "xn--" is the prefix used to indicate a domain name label is encoded with Punycode. I don't believe Punycode modifies ASCII characters in its input (though it of course does change their position (and order relative to non-ASCII characters)):

In [17]: "-\u3061".encode('idna') == "a\u3061".encode('idna').replace(b'a', b'-')
Out[17]: True

Regardless, I think increasing the length of the string is asking for problems, so I think "--" would a mistake.

Replacing them with "-" also poses a problem: "-" isn't a valid character at the start or end of domain name labels; rewriting "foo_.example.com" to "foo-.example.com" isn't any more valid (and, in fact, as "-"s are _explicitly_ disallowed there, I bet it's actually worse in practical terms as naive implementations may reject the latter but not the former).

I think I'm going to go for "replace non-LDH characters with a single dash, then strip leading/trailing dashes". (It won't be too costly to modify, so if someone has a better idea please let me know.)

Chad Smith (chad.smith) wrote :

+1 on this case. I worry about extending hostnames too as that typically has been a problem in the past at 63 chars disappear quickly.
In reviewing the RFC on hostnames[1] it makes sense to single minus/dash replace non-conforming characters and make sure first and last hostname chars are stripped if '-'.

The only alternative I could think of was replacing a leading or trailing '-' with an arbitrary 0 to avoid collisions with other hostnames that don't start or end with '-' if we strip characters.

Paul Collins (pjdc) wrote :

We've made some changes to the archive.ubuntu.com environment, and now requests to e.g. under_score.clouds.archive.ubuntu.com are directed to a dedicated vhost that has "HTTPProtocolOptions unsafe" enabled.

Chad Smith (chad.smith) on 2020-04-15
Changed in cloud-init:
status: In Progress → Fix Committed
Changed in cloud-init (Ubuntu):
status: New → Fix Committed

This bug is believed to be fixed in cloud-init in version 20.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (8.0 KiB)

This bug was fixed in the package cloud-init - 20.2-20-gd10ce3ec-0ubuntu1

---------------
cloud-init (20.2-20-gd10ce3ec-0ubuntu1) groovy; urgency=medium

  * drop the following cherry-picks now included:
    + cpick-6600c642-ec2-render-network-on-all-NICs-and-add-secondary-IPs-as
    + cpick-986f37b0-cloudinit-move-to-pytest-for-running-tests-211
    + cpick-4fb6fd8a-net-ubuntu-focal-prioritize-netplan-over-eni-even-if
    + cpick-04771d75-cc_disk_setup-fix-RuntimeError-270
    + cpick-c5e949c0-distros-tests-test_init-add-tests-for
    + cpick-2566fdbe-net-introduce-is_ip_address-function-288
    + cpick-4f825b3e-cloudinit-refactor-util.is_ipv4-to-net.is_ipv4_address
    + cpick-c478d0bf-distros-replace-invalid-characters-in-mirror-URLs-with
    + cpick-1bbc4908-distros-drop-leading-trailing-hyphens-from-mirror-URL
    + cpick-09fea85f-net-ignore-renderer-key-in-netplan-config-306
    + fix-cpick-4fb6fd8a-net-ubuntu-focal-prioritize-netplan-over-eni.patch
    + cpick-9d7b35ce-cc_mounts-fix-incorrect-format-specifiers-316
    + cpick-0c5c7367-test_mounts-expand-happy-path-test-for-both-happy-paths
  * New upstream snapshot.
    - analyze/dump: refactor shared string into variable (#350)
    - doc: update boot.rst with correct timing of runcmd (#351)
    - HACKING.rst: change contact info to Rick Harding (#359) [lucasmoura]
    - HACKING.rst: guide people to add themselves to the CLA file (#349)
    - HACKING.rst: more unit testing documentation (#354)
    - .travis.yml: don't run lintian during integration test package builds
      (#352)
    - Add test to ensure docs examples are valid cloud-init configs (#355)
      [James Falcon] (LP: #1876414)
    - make suse and sles support 127.0.1.1 (#336) [chengcheng-chcheng]
    - Create tests to validate schema examples (#348)
      [lucasmoura] (LP: #1876412)
    - analyze/dump: add support for Amazon Linux 2 log lines (#346)
      (LP: #1876323)
    - bsd: upgrade support (#305) [Gonéri Le Bouder]
    - Add lucasmoura as contributor (#345) [lucasmoura]
    - Add "therealfalcon" as contributor (#344) [James Falcon]
    - Adapt the package building scripts to use Python 3 (#231)
      [Paride Legovini]
    - DataSourceEc2: use metadata's NIC ordering to determine route-metrics
      (#342) (LP: #1876312)
    - .travis.yml: introduce caching (#329)
    - cc_locale: introduce schema (#335)
    - doc/rtd/conf.py: bump copyright year to 2020 (#341)
    - yum_add_repo: Add Centos to the supported distro list (#340)
    - Release 20.2 (#337) (LP: #1875951)
    - doc/format: reference make-mime.py instead of an inline script (#334)
    - Add docs about creating parent folders (#330) [Adrian Wilkins]
    - DataSourceNoCloud/OVF: drop claim to support FTP (#333) (LP: #1875470)
    - schema: ignore spurious pylint error (#332)
    - schema: add json schema for write_files module (#152)
    - BSD: find_devs_with_ refactoring (#298) [Gonéri Le Bouder]
    - nocloud: drop work around for Linux 2.6 (#324) [Gonéri Le Bouder]
    - cloudinit: drop dependencies on unittest2 and contextlib2 (#322)
    - distros: handle a potential mirror filtering error case (#328)
    - log: remove unnecessary import fallback logic...

Read more...

Changed in cloud-init (Ubuntu):
status: Fix Committed → Fix Released
Seth Arnold (seth-arnold) wrote :

I believe this is fixed via https://github.com/canonical/cloud-init/commit/c478d0bff412c67280dfe8f08568de733f9425a1

I read through this patch and liked what I saw, thanks.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers