install hook fails on public or private address not found

Bug #1924780 reported by Joshua Genet
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Won't Fix
Undecided
Unassigned
Ceph RADOS Gateway Charm
In Progress
Undecided
Unassigned
EasyRSA Charm
Fix Released
High
Gabriel Cocenza
Elasticsearch Charm
Fix Released
High
Unassigned
Etcd Charm
Fix Released
High
Gabriel Cocenza
OpenStack Nova Cloud Controller Charm
New
Undecided
Unassigned
OpenStack RabbitMQ Server Charm
New
Undecided
Unassigned
memcached-charm
Invalid
Undecided
Unassigned

Bug Description

Test run here:
https://solutions.qa.canonical.com/testruns/testRun/1116d69e-10c4-4d5c-bceb-2e0b532a0c68

Artifacts/Logs/Bundles:
https://oil-jenkins.canonical.com/artifacts/1116d69e-10c4-4d5c-bceb-2e0b532a0c68/index.html

Juju controller crashdump:
https://oil-jenkins.canonical.com/artifacts/1116d69e-10c4-4d5c-bceb-2e0b532a0c68/generated/generated/juju_aws_controller/juju-crashdump-controller-2021-04-15-17.23.46.tar.gz

Juju model crashdump:
https://oil-jenkins.canonical.com/artifacts/1116d69e-10c4-4d5c-bceb-2e0b532a0c68/generated/generated/kubernetes/juju-crashdump-kubernetes-2021-04-15-17.23.53.tar.gz

---

This occurred on 3/6 of our Juju release runs on AWS last night.
The other 3 succeeded just fine. All 3 of these failing runs were within 1hr which made us a bit suspicious something went wrong with AWS. But juju status shows that we were able to get a public IP just fine.

---

Install hook Traceback:

2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 ERROR public address not found
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 Traceback (most recent call last):
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/charm/hooks/install", line 22, in <module>
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 main()
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 bus.dispatch(restricted=restricted_mode)
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 _invoke(other_handlers)
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 handler.invoke()
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 self._action(*args)
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/charm/reactive/easyrsa.py", line 201, in create_certificate_authority
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 cn = hookenv.unit_public_ip()
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 859, in unit_public_ip
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 return unit_get('public-address')
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 93, in wrapper
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 res = func(*args, **kwargs)
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 852, in unit_get
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 return json.loads(subprocess.check_output(_args).decode('UTF-8'))
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 File "/usr/lib/python3.8/subprocess.py", line 512, in run
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 raise CalledProcessError(retcode, process.args,
2021-04-15 16:33:58 WARNING unit.easyrsa/0.install logger.go:60 subprocess.CalledProcessError: Command '['unit-get', '--format=json', 'public-address']' returned non-zero exit status 1.

Related branches

John A Meinel (jameinel)
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.9-rc12
Revision history for this message
Ian Booth (wallyworld) wrote :

The easyrsa unit looks to be running on machine 1.
Logs from machine 1 show the machine agent starting. One of the things that happens is that the IP addresses of the machine are recorded. This is done via a call to Go's net.InterfaceAddrs() API. The logs show this:

2021-04-15 17:22:33 DEBUG juju.worker.dependency engine.go:564 "machiner" manifold worker started at 2021-04-15 17:22:33.977186529 +0000 UTC
2021-04-15 17:22:33 DEBUG juju.network network.go:142 no lxc bridge addresses to filter for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:178 cannot get "lxdbr0" addresses: route ip+net: no such network interface (ignoring)
2021-04-15 17:22:33 DEBUG juju.network network.go:178 cannot get "virbr0" addresses: route ip+net: no such network interface (ignoring)
2021-04-15 17:22:33 DEBUG juju.network network.go:127 including address local-machine:127.0.0.1 for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:127 including address local-cloud:172.31.43.239 for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:127 including address local-fan:252.43.239.1 for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:127 including address local-machine:::1 for machine
2021-04-15 17:22:33 DEBUG juju.network network.go:196 addresses after filtering: [local-machine:127.0.0.1 local-cloud:172.31.43.239 local-fan:252.43.239.1 local-machine:::1]
2021-04-15 17:22:33 INFO juju.worker.machiner machiner.go:162 setting addresses for "machine-1" to [local-machine:127.0.0.1 local-cloud:172.31.43.239 local-fan:252.43.239.1 local-machine:::1]
2021-04-15 17:22:34 INFO juju.worker.machiner machiner.go:112 "machine-1" started

There's no public IP address reported initially on startup. Compare this to machine 0, the controller, where these addresses are known when these logs were recorded:

2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address public:34.239.49.54 for machine
2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address local-cloud:172.31.39.150 for machine
2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address local-fan:252.39.150.1 for machine
2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address local-machine:127.0.0.1 for machine
2021-04-15 17:22:32 DEBUG juju.network network.go:127 including address local-machine:::1 for machine

Note the 34.* public address.

Juju does periodically update the machine addresses if they change. But it appears no public IP address becomes known prior to the easyrsa unit asking for the public address, hence the install hook error reporting that such an address is not found.

The fact that status has the public address means that it became available at some short time later, after the charm had initially asked for it. Charms need to be resilient to such scenarios. Juju will fire a config-changed hook when the host machine of a charm gets updated addresses. The charm needs to wait until the public address is available. Also, the charm should be using network-get and not unit-get pubic-address which is deprecated.

Revision history for this message
Ian Booth (wallyworld) wrote :

I'd like to flag this as a charm issue rather than Juju if we're agreed on the reasoning above.

Revision history for this message
Joshua Genet (genet022) wrote :

Agreed. I moved it to Easyrsa.
Thanks for digging in to this Ian!

affects: juju → charm-easyrsa
Changed in charm-easyrsa:
milestone: 2.9-rc12 → none
George Kraft (cynerva)
Changed in charm-easyrsa:
status: Triaged → New
importance: High → Undecided
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1924780] Re: easyrsa install hook fails on public address not found

Do you know if you are running this charm inside a VPC? (either a default
VPC or because you provide vpc-id?)
At least in my testing on AWS (trying to deploy easyrsa 10 times
concurrently, and then tearing them down, and then doing it again). In my
testing, the public IP shows up at the same time as the private IP, which
makes it odd that we are seeing tests where this isn't happening.
I'm wondering if it is
a) a subnet configuration issue (AWS does let you set per-subnet whether
they default to having a Public IP or not)
b) a race condition with initialization
c) something to do with EC2 Classic vs EC2 VPC

On Thu, Apr 22, 2021 at 3:06 PM Joshua Genet <email address hidden>
wrote:

> Agreed. I moved it to Easyrsa.
> Thanks for digging in to this Ian!
>
> ** Project changed: juju => charm-easyrsa
>
> ** Changed in: charm-easyrsa
> Milestone: 2.9-rc12 => None
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1924780
>
> Title:
> easyrsa install hook fails on public address not found
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-easyrsa/+bug/1924780/+subscriptions
>

Revision history for this message
George Kraft (cynerva) wrote : Re: easyrsa install hook fails on public address not found

Adding Juju again. I think it's weird that Juju is starting the unit before the machine is really "ready" and I'd like to see further investigation into why that's happening.

I will leave this open against the charm because I do agree that the charm should be made more resilient to this. But:

> Also, the charm should be using network-get and not unit-get pubic-address which is deprecated.

I'm confused about this. I know that `unit-get private-address` was deprecated some time ago, but as far as I know, `unit-get public-address` is still valid. Given that network-get doesn't provide the public address (except in cross-model situations), how else are we supposed to get the public address?

Changed in charm-easyrsa:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Joshua Genet (genet022) wrote :

Yes, looks like we're using the default EC2 VPC. Not EC2 Classic.
Is EC2 VPC what you suggest? Or should we be using EC2 Classic?

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1924780] Re: easyrsa install hook fails on public address not found

VPC is what we'd recommend, as it has been the standard for a couple of
years.

On Tue, Apr 27, 2021 at 2:11 PM Joshua Genet <email address hidden>
wrote:

> Yes, looks like we're using the default EC2 VPC. Not EC2 Classic.
> Is EC2 VPC what you suggest? Or should we be using EC2 Classic?
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1924780
>
> Title:
> easyrsa install hook fails on public address not found
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-easyrsa/+bug/1924780/+subscriptions
>

Revision history for this message
John A Meinel (jameinel) wrote :

FWIW, just deploying easyrsa with my VPC I didn't see this failure in about
20 deploys.

On Tue, Apr 27, 2021 at 3:37 PM John Meinel <email address hidden> wrote:

> VPC is what we'd recommend, as it has been the standard for a couple of
> years.
>
>
> On Tue, Apr 27, 2021 at 2:11 PM Joshua Genet <email address hidden>
> wrote:
>
>> Yes, looks like we're using the default EC2 VPC. Not EC2 Classic.
>> Is EC2 VPC what you suggest? Or should we be using EC2 Classic?
>>
>> --
>> You received this bug notification because you are subscribed to juju.
>> Matching subscriptions: juju bugs
>> https://bugs.launchpad.net/bugs/1924780
>>
>> Title:
>> easyrsa install hook fails on public address not found
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/charm-easyrsa/+bug/1924780/+subscriptions
>>
>

Revision history for this message
Ian Booth (wallyworld) wrote : Re: easyrsa install hook fails on public address not found

network-get provides ingress addresses (akin to what public used to be) and binding addresses (akin to what was private address).

The concept of public and private addresses is insufficient to properly model what's needed. The charm needs to know "what do other services connect to me on" (typically sent across a relation) and "what do I bind to to listen to incoming connections". Juju will use the underlying network model, taking account of cross model relations, shadow addresses etc to determine what to put into network-get.

Revision history for this message
Ian Booth (wallyworld) wrote :

RE: the machine being ready.
Whether the machine even gets a "public" address or not is not something Juju can predetermine. NICS can be added/removed at any time - Juju will monitor the addresses on a machine and record them in the model. Whenever an address changes, the charm gets a config-changed hook. Juju informs the charm about the environment on which it runs; it's up to the charm to use hook commands like network-get to figure out what to do. If a charm needs to advertise an ingress address and one isn't available yet, the charm needs to signal this by setting the status to blocked or whatever until one becomes available.

Changed in juju:
status: New → Won't Fix
Revision history for this message
George Kraft (cynerva) wrote :

> The charm needs to know "what do other services connect to me on" (typically sent across a relation)

In some cases, the charm also needs to know how end-users will be connecting to them. A good example of this is that charms hosting HTTPS servers must include their public address in the SANs of their TLS certificate, otherwise end-user browsers will reject the certificate.

There is no relation modeling the network connection between the charm and the end-user. How do you propose we use network-get to obtain a good ingress address for end-users?

Revision history for this message
Ian Booth (wallyworld) wrote :

The way to do it is to bind the endpoint in question to a space, where the space has been set up to contain the relevant subnets for which you want to expose the endpoint. Juju 2.9 introduces much better capability in this area as well; see juju expose --to-spaces

https://discourse.charmhub.io/t/granular-control-of-application-expose-parameters-in-the-upcoming-2-9-juju-release/3597

Revision history for this message
George Kraft (cynerva) wrote :
Download full text (3.8 KiB)

On Juju 2.9.0:

$ juju version
2.9.0-ubuntu-amd64
$ juju show-model | grep agent-version
  agent-version: 2.9.0

If I do a simple deploy of easyrsa on AWS:

$ juju deploy cs:~containers/easyrsa

The unit comes up with a public address:

$ juju run --unit easyrsa/0 -- unit-get public-address
34.215.45.91

That address is *not* visible with network-get:

$ juju run --unit easyrsa/0 -- network-get client
bind-addresses:
- mac-address: 06:c0:ef:72:54:17
  interface-name: ens5
  addresses:
  - hostname: ""
    address: 172.31.32.190
    cidr: 172.31.32.0/20
  macaddress: 06:c0:ef:72:54:17
  interfacename: ens5
- mac-address: b2:bd:96:fb:b9:61
  interface-name: fan-252
  addresses:
  - hostname: ""
    address: 252.32.190.1
    cidr: 252.32.0.0/12
  macaddress: b2:bd:96:fb:b9:61
  interfacename: fan-252
egress-subnets:
- 172.31.32.190/32
ingress-addresses:
- 172.31.32.190
- 252.32.190.1

You're saying I need to create a space that contains the relevant subnets for which I want to expose the endpoint, right? There isn't an existing space that covers it:

$ juju spaces
Name Space ID Subnets
alpha 0 172.31.0.0/20
                 172.31.16.0/20
                 172.31.32.0/20
                 172.31.48.0/20
                 252.0.0.0/12
                 252.16.0.0/12
                 252.32.0.0/12
                 252.48.0.0/12

So, you want me to take the 279 public subnets for us-west-2 that are defined in https://ip-ranges.amazonaws.com/ip-ranges.json and create a space with them. Okay. Let me try one:

$ juju add-space public 34.208.0.0/12
ERROR cannot add space "public": subnet "34.208.0.0/12" not found

Yep, Juju isn't aware of the public subnets:

$ juju subnets
subnets:
  172.31.0.0/20:
    type: ipv4
    provider-id: subnet-931b23c8
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2c
  172.31.16.0/20:
    type: ipv4
    provider-id: subnet-a7235bc1
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2b
  172.31.32.0/20:
    type: ipv4
    provider-id: subnet-4c1b8204
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2a
  172.31.48.0/20:
    type: ipv4
    provider-id: subnet-ff8d38d4
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2d
  252.0.0.0/12:
    type: ipv4
    provider-id: subnet-931b23c8-INFAN-172-31-0-0-20
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2c
  252.16.0.0/12:
    type: ipv4
    provider-id: subnet-a7235bc1-INFAN-172-31-16-0-20
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2b
  252.32.0.0/12:
    type: ipv4
    provider-id: subnet-4c1b8204-INFAN-172-31-32-0-20
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2a
  252.48.0.0/12:
    type: ipv4
    provider-id: subnet-ff8d38d4-INFAN-172-31-48-0-20
    provider-network-id: vpc-ea4c7a8c
    status: in-use
    space: alpha
    zones:
    - us-west-2d

I can't use `juju add-subnet` either:

$ juju add-subnet 34.208.0.0/...

Read more...

Revision history for this message
Ian Booth (wallyworld) wrote :

To explain a bit about what's happening...

We don't yet have a fully formed solution I don't think. Shadow address support is still in progress.

From Joe:

It's quite common if using spaces, because a provider-assigned public IP
is not usually in a known subnet (like the AWS public IP). This means
that it will never be returned for bound endpoints.

The solution I discussed a while back with John was that we might
include an address where `IsShadow` is true if it is on the same NIC as
a private address that was determined to be in the space.

The question though is which or what order to return. The local-cloud
address is usually suitable for relations in the same model, but it's
hard to know when the public address should be preferred even in CMRs.

There's also the issue on AWS where a local-cloud address might be
suitable to return for a relation with a unit in a different subnet,
when the other side is simply in a different AZ/subnet that is routable.

Revision history for this message
George Kraft (cynerva) wrote :

Ok, thanks for the explanation. It sounds like I should expect the public address to be obtainable via network-get as a shadow address in a future release of Juju.

Until then, please don't tell charm authors that `unit-get public-address` is deprecated. It's going to confuse people who need their charms to work today.

Revision history for this message
Alexander Balderson (asbalderson) wrote :

SQA has also had a case where this happened on the etcd charm.
2021-08-24 23:22:30 ERROR unit.etcd/2.juju-log server.go:314 Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-etcd-2/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-etcd-2/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-etcd-2/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-etcd-2/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-etcd-2/charm/reactive/etcd.py", line 239, in rerender_config
    bag = EtcdDatabag()
  File "/var/lib/juju/agents/unit-etcd-2/charm/lib/etcd_databag.py", line 44, in __init__
    self.public_address = unit_get('public-address')
  File "/var/lib/juju/agents/unit-etcd-2/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 93, in wrapper
    res = func(*args, **kwargs)
  File "/var/lib/juju/agents/unit-etcd-2/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 870, in unit_get
    return json.loads(subprocess.check_output(_args).decode('UTF-8'))
  File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['unit-get', '--format=json', 'public-address']' returned non-zero exit status 1.

that run can be found here:
https://solutions.qa.canonical.com/testruns/testRun/ab6224ac-c5d2-4a5f-b6bf-844bc09a406f
crashdump at:
https://oil-jenkins.canonical.com/artifacts/ab6224ac-c5d2-4a5f-b6bf-844bc09a406f/generated/generated/kubernetes/juju-crashdump-kubernetes-2021-08-24-23.25.23.tar.gz

Revision history for this message
Joshua Genet (genet022) wrote :

Added charm-etcd project due to Alex's comment above

Revision history for this message
Alvaro Uria (aluria) wrote :
Download full text (5.6 KiB)

Today, using cs:~containers/easyrsa (which uses cs:~containers/easyrsa-408), I hit this bug:

unit-easyrsa-0: 09:32:41 INFO unit.easyrsa/0.juju-log Reactive main running for hook install
unit-easyrsa-0: 09:32:41 INFO unit.easyrsa/0.juju-log Initializing Leadership Layer (is leader)
unit-easyrsa-0: 09:32:41 INFO unit.easyrsa/0.juju-log Invoking reactive handler: reactive/easyrsa.py:32:install
unit-easyrsa-0: 09:32:42 INFO unit.easyrsa/0.juju-log Invoking reactive handler: hooks/relations/tls-certificates/provides.py:63:broken:client
unit-easyrsa-0: 09:32:42 INFO unit.easyrsa/0.juju-log Invoking reactive handler: reactive/easyrsa.py:91:set_easyrsa_version
unit-easyrsa-0: 09:32:42 INFO unit.easyrsa/0.juju-log Invoking reactive handler: reactive/easyrsa.py:98:configure_easyrsa
unit-easyrsa-0: 09:32:42 INFO unit.easyrsa/0.juju-log Configuring OpenSSL to copy extensions.
unit-easyrsa-0: 09:32:42 INFO unit.easyrsa/0.juju-log Configuring X509 server extensions with clientAuth.
unit-easyrsa-0: 09:32:42 INFO unit.easyrsa/0.juju-log Updating /var/lib/juju/agents/unit-easyrsa-0/charm/EasyRSA/x509-types/server
unit-easyrsa-0: 09:32:42 INFO unit.easyrsa/0.juju-log Invoking reactive handler: reactive/easyrsa.py:156:create_certificate_authority
unit-easyrsa-0: 09:32:42 INFO unit.easyrsa/0.juju-log Creating new CA
unit-easyrsa-0: 09:32:42 WARNING unit.easyrsa/0.install ERROR public address not found
unit-easyrsa-0: 09:32:42 ERROR unit.easyrsa/0.juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-easyrsa-0/charm/reactive/easyrsa.py", line 201, in create_certificate_authority
    cn = hookenv.unit_public_ip()
  File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 877, in unit_public_ip
    return unit_get('public-address')
  File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 93, in wrapper
    res = func(*args, **kwargs)
  File "/var/lib/juju/agents/unit-easyrsa-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 870, in unit_get
    return json.loads(subprocess.check_output(_args).decode('UTF-8'))
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['unit-get', '--format=json', 'public-address']' returned non-zero exit status 1.

unit-easyrsa-0: 09:32:42 WARNING unit.easyrsa/0.install Traceback (most recent call l...

Read more...

Changed in charm-easyrsa:
assignee: nobody → Gabriel Angelo Sgarbi Cocenza (gabrielcocenza)
status: Triaged → In Progress
Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :

Looking at the first crashdump debug_log.txt it's quite confusing to see the IP used in machine-1, unit-easyrsa-0 and controller changes 3 times in a short period of time. The fact is that the final public address in juju-status is 34.229.60.140 for machine-1 and easyrsa.

The time line that I saw is:

Line 1396:
machine-1: 2021-04-15 17:22:32 DEBUG juju.network including address public:34.239.49.54 for machine
Note: It receives a public address in the beginning

Line 1528:
 machine-1: 2021-04-15 17:22:33 DEBUG juju.network including address local-cloud:172.31.43.239 for machine
Note: public address disappears

Line 1684:
controller-0: 2021-04-15 17:22:36 INFO juju.worker.instancepoller machine "1" (instance ID "i-0b2d8dd9ad6b30451") has new addresses: [local-cloud:172.31.43.239@alpha public:34.229.60.140@alpha]
Note: controller gives the final public address for the machine and easyrsa

Line 1691
machine-1: 2021-04-15 17:22:36 DEBUG juju.worker.machiner observed network config updated for "machine-1"...
Note: does not contain the final public address and contains the IP from line 1528

Line 3276:
machine-1: 2021-04-15 17:23:16 ERROR unit.easyrsa/0.juju-log Hook error:

My two cents is that when the hook is triggered the private address is at a transition which causes the error. I think that those transitions might give a false trigger that the machine is "ready" to start the unit when it actually it's not .

I would like some opinion on how could we make the charms more resilient to this. Currently I see two options:

1) Change charm-helpers
Changing the function unit_public_ip [1] on charm-helpers and add the decorator [2] retry_on_exception.

This might be simple and effective to solve the problem in charms that are facing this problem, but OTOH I don't know if would be a good match with Juju and operator framework wouldn't benefit from this.

2) Treat the error in each charm
Knowing that we might face a concurrency, every charm should have a logic to try and except when dealing with private address. The charms would need to log the problem and wait for the next hook to run. In this case we will need to document in some place that it's a expected behavior and the charm should be prepared if this issue happens

[1] https://github.com/juju/charm-helpers/blob/master/charmhelpers/core/hookenv.py#L874-L876
[2] https://github.com/juju/charm-helpers/blob/master/charmhelpers/core/decorators.py#L30

Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :
tags: added: review-needed
George Kraft (cynerva)
summary: - easyrsa install hook fails on public address not found
+ install hook fails on public address not found
Changed in charm-easyrsa:
status: In Progress → Fix Committed
tags: added: backport-needed
removed: review-needed
Changed in charm-easyrsa:
milestone: none → 1.22+ck3
Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote : Re: install hook fails on public address not found

Just to synthesize, using unit-get is not totally deprecated to get the public IP. Depending on the cloud, like AWS, it won't be show into ingress-addresses. In the case of the layer-easyrsa it was not necessary to use the public address as the EasyRSA root certificate's common name.

The bottom line is the juju network model is currently insufficient for all use cases and charms should be resilient in case the IP is not ready independently which method is used.

Changed in charm-easyrsa:
milestone: 1.22+ck3 → 1.23
tags: removed: backport-needed
Harry Pidcock (hpidcock)
tags: added: network
Revision history for this message
George Kraft (cynerva) wrote :

For etcd, there are three calls to unit-get public-address:

The first one[1] is in etcd_databag and does not appear to be referenced elsewhere. It can be removed.

The second one[2] is in prepare_tls_certificates, used to set the common name for the certificate. We can change that to use a network-get ingress address like we did for EasyRSA.

The third one[3] is also in prepare_tls_certificates, used to add the public IP to the certificate's Subject Alternative Names (SANs). We need to keep this one until Juju provides a suitable replacement. Adding a try/except around it would be appropriate.

[1]: https://github.com/charmed-kubernetes/layer-etcd/blob/77eef0c0a49507b74fc90cec0864fdd85555f982/lib/etcd_databag.py#L44
[2]: https://github.com/charmed-kubernetes/layer-etcd/blob/77eef0c0a49507b74fc90cec0864fdd85555f982/reactive/etcd.py#L155
[3]: https://github.com/charmed-kubernetes/layer-etcd/blob/77eef0c0a49507b74fc90cec0864fdd85555f982/reactive/etcd.py#L157

Changed in charm-etcd:
importance: Undecided → High
status: New → Triaged
George Kraft (cynerva)
Changed in charm-easyrsa:
status: Fix Committed → Fix Released
Changed in charm-etcd:
milestone: none → 1.24
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

We are running into this bug again with etcd on the release tests 2.9.23:
https://solutions.qa.canonical.com/testruns/testRun/14574d08-64af-4efa-bed4-77838bba1403

Revision history for this message
Alexander Balderson (asbalderson) wrote :
Download full text (7.9 KiB)

Bumped into this same thing, although with the private address, for unit.ceph-radosgw:

unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install ERROR private address not found
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install Traceback (most recent call last):
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/charmhelpers/core/hookenv.py", line 1377, in network_get_primary_address
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install response = subprocess.check_output(
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install File "/usr/lib/python3.8/subprocess.py", line 516, in run
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install raise CalledProcessError(retcode, process.args,
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install subprocess.CalledProcessError: Command '['network-get', '--primary-address', 'admin']' returned non-zero exit status 1.
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install During handling of the above exception, another exception occurred:
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install Traceback (most recent call last):
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 584, in get_relation_ip
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install address = network_get_primary_address(interface)
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/charmhelpers/core/hookenv.py", line 1183, in inner_translate_exc2
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install return f(*args, **kwargs)
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/charmhelpers/core/hookenv.py", line 1382, in network_get_primary_address
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install raise NoNetworkBinding("No network binding for {}"
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install charmhelpers.core.hookenv.NoNetworkBinding: No network binding for admin
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install During handling of the above exception, another exception occurred:
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install Traceback (most recent call last):
unit-ceph-radosgw-0: 00:17:20 WARNING unit.ceph-radosgw/0.install File "/var/lib/juju/agents/unit-ceph-rad...

Read more...

summary: - install hook fails on public address not found
+ install hook fails on public or private address not found
Changed in charm-etcd:
assignee: nobody → Gabriel Angelo Sgarbi Cocenza (gabrielcocenza)
Changed in charm-etcd:
status: Triaged → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

Worth a note that I made a slight change to acquire the public address as late as possible during hook execution. Not a catch-all, but if nothing else should improve performance generally:
https://github.com/juju/juju/pull/13643

Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :
tags: added: review-needed
Xav Paice (xavpaice)
Changed in charm-elasticsearch:
importance: Undecided → High
Revision history for this message
Xav Paice (xavpaice) wrote :
Download full text (4.3 KiB)

Added charm-elasticsearch. Primarily because functional tests fail due to the charmhelpers Ansible module collecting unit_public_ip as part of the context initiation for Ansible runs.

2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 Traceback (most recent call last):
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 File "/var/lib/juju/agents/unit-elasticsearch-0/charm/hooks/install", line 215, in <module>
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 hooks.execute(sys.argv)
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 File "/var/lib/juju/agents/unit-elasticsearch-0/charm/hooks/charmhelpers/contrib/ansible/__init__.py", line 282, in execute
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 charmhelpers.contrib.ansible.apply_playbook(
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 File "/var/lib/juju/agents/unit-elasticsearch-0/charm/hooks/charmhelpers/contrib/ansible/__init__.py", line 196, in apply_playbook
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 charmhelpers.contrib.templating.contexts.juju_state_to_yaml(
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 File "/var/lib/juju/agents/unit-elasticsearch-0/charm/hooks/charmhelpers/contrib/templating/contexts.py", line 105, in juju_state_to_yaml
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 config['unit_public_address'] = charmhelpers.core.hookenv.unit_get(
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 File "/var/lib/juju/agents/unit-elasticsearch-0/charm/hooks/charmhelpers/core/hookenv.py", line 85, in wrapper
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 res = func(*args, **kwargs)
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 File "/var/lib/juju/agents/unit-elasticsearch-0/charm/hooks/charmhelpers/core/hookenv.py", line 870, in unit_get
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 return json.loads(subprocess.check_output(_args).decode('UTF-8'))
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
2022-01-25 20:34:08 [ERROR] unit-elasticsearch-0.log: 2022-01-25 20:34:00 WARNING unit.elasticsearch/0.install logger.go:60 return ru...

Read more...

Changed in charm-ceph-radosgw:
assignee: nobody → Gabriel Angelo Sgarbi Cocenza (gabrielcocenza)
status: New → In Progress
Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :

Regarding the charm-ceph-radosgw it seems that the error occurs when calling assess_status(CONFIGS) after successfully running the install hook. This might be an issue for all classic charms and not from charm-ceph-radosgw isolated.

assess_status calls network_get_primary_address and Juju might not be ready to give this information which causes the CalledProcessError. I will talk with the OpenStack team about this, but I think the approach will be using try and except and block the unit if this error occurs.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-radosgw (master)
Changed in charm-etcd:
status: In Progress → Fix Committed
Changed in charm-etcd:
status: Fix Committed → Fix Released
Changed in charm-ceph-radosgw:
assignee: Gabriel Angelo Sgarbi Cocenza (gabrielcocenza) → nobody
Changed in charm-elasticsearch:
milestone: none → 22.04
status: New → Fix Released
Revision history for this message
Eric Chen (eric-chen) wrote :

It seems not related to memcache, so I mark it as invalid.

Changed in charm-memcached:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.