easyrsa install hook fails on public address not found

Bug #1924780 reported by Joshua Genet on 2021-04-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EasyRSA Charm
High
Unassigned
juju
Undecided
Unassigned

Bug Description

Test run here:
https://solutions.qa.canonical.com/testruns/testRun/1116d69e-10c4-4d5c-bceb-2e0b532a0c68

Artifacts/Logs/Bundles:
https://oil-jenkins.canonical.com/artifacts/1116d69e-10c4-4d5c-bceb-2e0b532a0c68/index.html

Juju controller crashdump:
https://oil-jenkins.canonical.com/artifacts/1116d69e-10c4-4d5c-bceb-2e0b532a0c68/generated/generated/juju_aws_controller/juju-crashdump-controller-2021-04-15-17.23.46.tar.gz

Juju model crashdump:
https://oil-jenkins.canonical.com/artifacts/1116d69e-10c4-4d5c-bceb-2e0b532a0c68/generated/generated/kubernetes/juju-crashdump-kubernetes-2021-04-15-17.23.53.tar.gz

---

This occurred on 3/6 of our Juju release runs on AWS last night.
The other 3 succeeded just fine. All 3 of these failing runs were within 1hr which made us a bit suspicious something went wrong with AWS. But juju status shows that we were able to get a public IP just fine.

---

Install hook Traceback:

2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 ERROR public address not found
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 Traceback (most recent call last):
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/charm/hooks/install", line 22, in <module>
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 main()
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 bus.dispatch(restricted=restricted_mode)
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 _invoke(other_handlers)
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 handler.invoke()
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 self._action(*args)
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/charm/reactive/easyrsa.py", line 201, in create_certificate_authority
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 cn = hookenv.unit_public_ip()
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 859, in unit_public_ip
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 return unit_get('public-address')
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 93, in wrapper
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 res = func(*args, **kwargs)
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 852, in unit_get
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 return json.loads(subprocess.check_output(_args).decode('UTF-8'))
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/usr/lib/python3.8/subprocess.py", line 512, in run
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 raise CalledProcessError(retcode, process.args,
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 subprocess.CalledProcessError: Command '['unit-get', '--format=json', 'public-address']' returned non-zero exit status 1.

John A Meinel (jameinel) on 2021-04-20
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.9-rc12
Ian Booth (wallyworld) wrote :

The easyrsa unit looks to be running on machine 1.
Logs from machine 1 show the machine agent starting. One of the things that happens is that the IP addresses of the machine are recorded. This is done via a call to Go's net.InterfaceAddrs() API. The logs show this:

2021-04-15 17:22:33 DEBUG juju.worker.dependency engine.go:564 "machiner" manifold worker started at 2021-04-15 17:22:33.977186529 +0000 UTC
2021-04-15 17:22:33 DEBUG juju.network network.go:142 no lxc bridge addresses to filter for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:178 cannot get "lxdbr0" addresses: route ip+net: no such network interface (ignoring)
2021-04-15 17:22:33 DEBUG juju.network network.go:178 cannot get "virbr0" addresses: route ip+net: no such network interface (ignoring)
2021-04-15 17:22:33 DEBUG juju.network network.go:127 including address local-machine:127.0.0.1 for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:127 including address local-cloud:172.31.43.239 for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:127 including address local-fan:252.43.239.1 for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:127 including address local-machine:::1 for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:196 addresses after filtering: [local-machine:127.0.0.1 local-cloud:172.31.43.239 local-fan:252.43.239.1 local-machine:::1]
2021-04-15 17:22:33 INFO juju.worker.machiner machiner.go:162 setting addresses for "machine-1" to [local-machine:127.0.0.1 local-cloud:172.31.43.239 local-fan:252.43.239.1 local-machine:::1]
2021-04-15 17:22:34 INFO juju.worker.machiner machiner.go:112 "machine-1" started

There's no public IP address reported initially on startup. Compare this to machine 0, the controller, where these addresses are known when these logs were recorded:

2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address public:34.239.49.54 for machine
2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address local-cloud:172.31.39.150 for machine
2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address local-fan:252.39.150.1 for machine
2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address local-machine:127.0.0.1 for machine
2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address local-machine:::1 for machine

Note the 34.* public address.

Juju does periodically update the machine addresses if they change. But it appears no public IP address becomes known prior to the easyrsa unit asking for the public address, hence the install hook error reporting that such an address is not found.

The fact that status has the public address means that it became available at some short time later, after the charm had initially asked for it. Charms need to be resilient to such scenarios. Juju will fire a config-changed hook when the host machine of a charm gets updated addresses. The charm needs to wait until the public address is available. Also, the charm should be using network-get and not unit-get pubic-address which is deprecated.

Ian Booth (wallyworld) wrote :

I'd like to flag this as a charm issue rather than Juju if we're agreed on the reasoning above.

Joshua Genet (genet022) wrote :

Agreed. I moved it to Easyrsa.
Thanks for digging in to this Ian!

affects: juju → charm-easyrsa
Changed in charm-easyrsa:
milestone: 2.9-rc12 → none
George Kraft (cynerva) on 2021-04-22
Changed in charm-easyrsa:
status: Triaged → New
importance: High → Undecided

Do you know if you are running this charm inside a VPC? (either a default
VPC or because you provide vpc-id?)
At least in my testing on AWS (trying to deploy easyrsa 10 times
concurrently, and then tearing them down, and then doing it again). In my
testing, the public IP shows up at the same time as the private IP, which
makes it odd that we are seeing tests where this isn't happening.
I'm wondering if it is
a) a subnet configuration issue (AWS does let you set per-subnet whether
they default to having a Public IP or not)
b) a race condition with initialization
c) something to do with EC2 Classic vs EC2 VPC

On Thu, Apr 22, 2021 at 3:06 PM Joshua Genet <email address hidden>
wrote:

> Agreed. I moved it to Easyrsa.
> Thanks for digging in to this Ian!
>
> ** Project changed: juju => charm-easyrsa
>
> ** Changed in: charm-easyrsa
> Milestone: 2.9-rc12 => None
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1924780
>
> Title:
> easyrsa install hook fails on public address not found
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-easyrsa/+bug/1924780/+subscriptions
>

George Kraft (cynerva) wrote :

Adding Juju again. I think it's weird that Juju is starting the unit before the machine is really "ready" and I'd like to see further investigation into why that's happening.

I will leave this open against the charm because I do agree that the charm should be made more resilient to this. But:

> Also, the charm should be using network-get and not unit-get pubic-address which is deprecated.

I'm confused about this. I know that `unit-get private-address` was deprecated some time ago, but as far as I know, `unit-get public-address` is still valid. Given that network-get doesn't provide the public address (except in cross-model situations), how else are we supposed to get the public address?

Changed in charm-easyrsa:
importance: Undecided → High
status: New → Triaged
Joshua Genet (genet022) wrote :

Yes, looks like we're using the default EC2 VPC. Not EC2 Classic.
Is EC2 VPC what you suggest? Or should we be using EC2 Classic?

John A Meinel (jameinel) wrote :

VPC is what we'd recommend, as it has been the standard for a couple of
years.

On Tue, Apr 27, 2021 at 2:11 PM Joshua Genet <email address hidden>
wrote:

> Yes, looks like we're using the default EC2 VPC. Not EC2 Classic.
> Is EC2 VPC what you suggest? Or should we be using EC2 Classic?
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1924780
>
> Title:
> easyrsa install hook fails on public address not found
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-easyrsa/+bug/1924780/+subscriptions
>

John A Meinel (jameinel) wrote :

FWIW, just deploying easyrsa with my VPC I didn't see this failure in about
20 deploys.

On Tue, Apr 27, 2021 at 3:37 PM John Meinel <email address hidden> wrote:

> VPC is what we'd recommend, as it has been the standard for a couple of
> years.
>
>
> On Tue, Apr 27, 2021 at 2:11 PM Joshua Genet <email address hidden>
> wrote:
>
>> Yes, looks like we're using the default EC2 VPC. Not EC2 Classic.
>> Is EC2 VPC what you suggest? Or should we be using EC2 Classic?
>>
>> --
>> You received this bug notification because you are subscribed to juju.
>> Matching subscriptions: juju bugs
>> https://bugs.launchpad.net/bugs/1924780
>>
>> Title:
>> easyrsa install hook fails on public address not found
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/charm-easyrsa/+bug/1924780/+subscriptions
>>
>

Ian Booth (wallyworld) wrote :

network-get provides ingress addresses (akin to what public used to be) and binding addresses (akin to what was private address).

The concept of public and private addresses is insufficient to properly model what's needed. The charm needs to know "what do other services connect to me on" (typically sent across a relation) and "what do I bind to to listen to incoming connections". Juju will use the underlying network model, taking account of cross model relations, shadow addresses etc to determine what to put into network-get.

Ian Booth (wallyworld) wrote :

RE: the machine being ready.
Whether the machine even gets a "public" address or not is not something Juju can predetermine. NICS can be added/removed at any time - Juju will monitor the addresses on a machine and record them in the model. Whenever an address changes, the charm gets a config-changed hook. Juju informs the charm about the environment on which it runs; it's up to the charm to use hook commands like network-get to figure out what to do. If a charm needs to advertise an ingress address and one isn't available yet, the charm needs to signal this by setting the status to blocked or whatever until one becomes available.

Changed in juju:
status: New → Won't Fix
George Kraft (cynerva) wrote :

> The charm needs to know "what do other services connect to me on" (typically sent across a relation)

In some cases, the charm also needs to know how end-users will be connecting to them. A good example of this is that charms hosting HTTPS servers must include their public address in the SANs of their TLS certificate, otherwise end-user browsers will reject the certificate.

There is no relation modeling the network connection between the charm and the end-user. How do you propose we use network-get to obtain a good ingress address for end-users?

Ian Booth (wallyworld) wrote :

The way to do it is to bind the endpoint in question to a space, where the space has been set up to contain the relevant subnets for which you want to expose the endpoint. Juju 2.9 introduces much better capability in this area as well; see juju expose --to-spaces

https://discourse.charmhub.io/t/granular-control-of-application-expose-parameters-in-the-upcoming-2-9-juju-release/3597

George Kraft (cynerva) wrote :
Download full text (3.8 KiB)

On Juju 2.9.0:

$ juju version
2.9.0-ubuntu-amd64
$ juju show-model | grep agent-version
  agent-version: 2.9.0

If I do a simple deploy of easyrsa on AWS:

$ juju deploy cs:~containers/easyrsa

The unit comes up with a public address:

$ juju run --unit easyrsa/0 -- unit-get public-address
34.215.45.91

That address is *not* visible with network-get:

$ juju run --unit easyrsa/0 -- network-get client
bind-addresses:
- mac-address: 06:c0:ef:72:54:17
  interface-name: ens5
  addresses:
  - hostname: ""
    address: 172.31.32.190
    cidr: 172.31.32.0/20
  macaddress: 06:c0:ef:72:54:17
  interfacename: ens5
- mac-address: b2:bd:96:fb:b9:61
  interface-name: fan-252
  addresses:
  - hostname: ""
    address: 252.32.190.1
    cidr: 252.32.0.0/12
  macaddress: b2:bd:96:fb:b9:61
  interfacename: fan-252
egress-subnets:
- 172.31.32.190/32
ingress-addresses:
- 172.31.32.190
- 252.32.190.1

You're saying I need to create a space that contains the relevant subnets for which I want to expose the endpoint, right? There isn't an existing space that covers it:

$ juju spaces
Name Space ID Subnets
alpha 0 172.31.0.0/20
                 172.31.16.0/20
                 172.31.32.0/20
                 172.31.48.0/20
                 252.0.0.0/12
                 252.16.0.0/12
                 252.32.0.0/12
                 252.48.0.0/12

So, you want me to take the 279 public subnets for us-west-2 that are defined in https://ip-ranges.amazonaws.com/ip-ranges.json and create a space with them. Okay. Let me try one:

$ juju add-space public 34.208.0.0/12
ERROR cannot add space "public": subnet "34.208.0.0/12" not found

Yep, Juju isn't aware of the public subnets:

$ juju subnets
subnets:
  172.31.0.0/20:
    type: ipv4
    provider-id: subnet-931b23c8
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2c
  172.31.16.0/20:
    type: ipv4
    provider-id: subnet-a7235bc1
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2b
  172.31.32.0/20:
    type: ipv4
    provider-id: subnet-4c1b8204
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2a
  172.31.48.0/20:
    type: ipv4
    provider-id: subnet-ff8d38d4
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2d
  252.0.0.0/12:
    type: ipv4
    provider-id: subnet-931b23c8-INFAN-172-31-0-0-20
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2c
  252.16.0.0/12:
    type: ipv4
    provider-id: subnet-a7235bc1-INFAN-172-31-16-0-20
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2b
  252.32.0.0/12:
    type: ipv4
    provider-id: subnet-4c1b8204-INFAN-172-31-32-0-20
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2a
  252.48.0.0/12:
    type: ipv4
    provider-id: subnet-ff8d38d4-INFAN-172-31-48-0-20
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2d

I can't use `juju add-subnet` either:

$ juju add-subnet 34.208.0.0/...

Read more...

Ian Booth (wallyworld) wrote :

To explain a bit about what's happening...

We don't yet have a fully formed solution I don't think. Shadow address support is still in progress.

From Joe:

It's quite common if using spaces, because a provider-assigned public IP
is not usually in a known subnet (like the AWS public IP). This means
that it will never be returned for bound endpoints.

The solution I discussed a while back with John was that we might
include an address where `IsShadow` is true if it is on the same NIC as
a private address that was determined to be in the space.

The question though is which or what order to return. The local-cloud
address is usually suitable for relations in the same model, but it's
hard to know when the public address should be preferred even in CMRs.

There's also the issue on AWS where a local-cloud address might be
suitable to return for a relation with a unit in a different subnet,
when the other side is simply in a different AZ/subnet that is routable.

George Kraft (cynerva) wrote :

Ok, thanks for the explanation. It sounds like I should expect the public address to be obtainable via network-get as a shadow address in a future release of Juju.

Until then, please don't tell charm authors that `unit-get public-address` is deprecated. It's going to confuse people who need their charms to work today.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers