RFC1918 IPs in public address in AWS after 2.9.32 upgrade

Bug #1980731 reported by Gareth Woolridge
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Won't Fix
Undecided
Heather Lanigan

Bug Description

We recently upgraded a controller and models in AWS from 2.9.18 to 2.9.32.

Following the upgrade the "Public address" for applications deployed in the one real model for this controller switched from showing actual public cloud IPs to internal RFC1918 instance networkig IPs.

This has broken our monitoring as it caused us to register unroutable from our company network RFC1918 address into prometheus.

We have upgraded about 6 of our AWS controllers (all hosting similar applications) and this is the only model/controller (so far) showing this behaviour.

I've attached a broken and none broken juju status as examples.

Revision history for this message
Gareth Woolridge (moon127) wrote :
Tom Haddon (mthaddon)
tags: added: canonical-is-upgrades
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1980731] Re: RFC1918 IPs in public address in AWS after 2.9.32 upgrade

Can you also include `juju status --format=yaml` which should include all
addresses that we know about a machine, and not just a single 'best'
address to give.

It does look incorrect, though.

On Tue, Jul 5, 2022 at 6:30 AM Tom Haddon <email address hidden>
wrote:

> ** Tags added: canonical-is-upgrades
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1980731
>
> Title:
> RFC1918 IPs in public address in AWS after 2.9.32 upgrade
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1980731/+subscriptions
>
>

Revision history for this message
Gareth Woolridge (moon127) wrote :

juju status --format=yaml of the affected model attached

Revision history for this message
Gareth Woolridge (moon127) wrote :

Also for comparison here is another controller and application model also in AWS which doesn't show the behaviour and the public IPs are present!

Revision history for this message
Heather Lanigan (hmlanigan) wrote (last edit ):

Interesting, I'm unable to reproduce this by bootstrapping a 2.9.18 controller, deploying units and upgrading to 2.9.32.
Update: I hear that this didn't happen for approx 15 hours after the upgrade.

@Gareth, what is different between the two models in the output @ #1?

Revision history for this message
Gareth Woolridge (moon127) wrote :

The two models in #1 are on different controllers in different AWS regions but otherwise were both upgraded from 2.9.18 to 2.9.32 yesterday.

I cannot say what their initial deployed version was whether they have always been through identical version upgrades enroute to 2.9.18 and then 2.9.32 though.

I would also note our tool to register to prometheus "promreg" runs against juju status reports taken hourly, and took >15 hours to cause an alert as a result of updating these instances to RFC1918 IP, so it is also possible the issue did not manifest initially post upgrade.

I upgraded 6 or so AWS cloud mirror controllers and models yesterday from varying versions up to 2.9.32, so far this is the only one to exhibit this behaviour.

Revision history for this message
Heather Lanigan (hmlanigan) wrote (last edit ):

db snippet for machine 14

- _id: cb1ff1db-2ec1-48b1-8421-f4b9ccc6ce40:14
  addresses: [] <- should not be empty, but have 2 entries
  agent-started-at: "2022-07-04T15:11:31.957Z"
  clean: false
  containertype: ""
  hostname: ip-10-x-x-112
  jobs:
  - 1
  life: 0
  machineaddresses: <- missing the public ip entry
  - addresstype: ipv4
    networkscope: local-cloud
    origin: machine
    value: 10.x.x.112
  - addresstype: ipv4
    networkscope: local-machine
    origin: machine
    value: 127.0.0.1
  - addresstype: ipv6
    networkscope: local-machine
    origin: machine
    value: ::1
  machineid: "14"
  model-uuid: cb1ff1db-2ec1-48b1-8421-f4b9ccc6ce40
  nonce: machine-0:a37fc6a6-99bb-446f-8efa-0846daad5306
  preferredprivateaddress:
    addresstype: ipv4
    networkscope: local-cloud
    origin: provider
    spaceid: "0"
    value: 10.x.x.112
  preferredpublicaddress:
    addresstype: ipv4
    networkscope: local-cloud
    origin: machine <- expected "provider"
    spaceid: <- missing data
    value: 10.x.x.112 <- expected the public ip here
  principals:
  - content-cache/0
  series: bionic
  supportedcontainers:
  - lxd
  supportedcontainersknown: true
  tools:
    sha256: ""
    size: 0
    url: ""
    version: 2.9.32-ubuntu-amd64
  txn-queue: []
  txn-revno: 132

The above explains the juju status output, but not how we got there.

Revision history for this message
Heather Lanigan (hmlanigan) wrote (last edit ):

Looking at the controller logs, the following message occurs frequently:

The instance poller is restarting a lot:

2022-07-04 15:10:07 ERROR juju.worker.dependency engine.go:693 "instance-poller" manifold worker returned unexpected error: enumerating network interface list for instances: cannot get instance "i-094631358b23ca7e0" network interfaces: instance "i-094631358b23ca7e0" has no NIC attachment yet, retrying...

We're making an improper aws api call:

2022-07-04 15:10:09 ERROR juju.worker.dependency engine.go:693 "firewaller" manifold worker returned unexpected error: cannot respond to units changes for "machine-0": cannot open ports: operation error EC2: AuthorizeSecurityGroupIngress, https response error StatusCode: 400, RequestID: 52282227-b927-4878-b196-2140add95756, api error InvalidParameter: ipv6-ranges is not a valid parameter. IPv6 rules can only be specified for VPC security groups.

Both are bubbling up from the ec2 provider

Revision history for this message
Heather Lanigan (hmlanigan) wrote (last edit ):

Looking at the aws instances on the console - none have an interface attached.

The instance-poller errors started after the upgrade to 2.9.32.

The firewaller errors started 2021-12-16, which no obvious event to match up to.

Revision history for this message
Colin Misare (cmisare) wrote :

This appears to only be happening with EC2 instances that are running in an EC2-Classic setup (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-classic-platform.html).

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Juju broke ec2-classic instances with 2 recent changes. ec2-classic instances are EOL in august, won't fix.

tags: added: ec2-provider
Changed in juju:
status: New → Won't Fix
assignee: nobody → Heather Lanigan (hmlanigan)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.