[2.9.37] Single unit cannot find binding for an endpoint, but endpoint has binding set in juju

Bug #1996218 reported by Alexander Balderson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

On a deployment testing juju 2.9.37 with Jammy Yoga there was a single barbican (primary) and barbican-vault (subordinate) unit that failed to get the network address for the internal space:

unit-barbican-2: 07:11:30 ERROR unit.barbican/2.juju-log shared-db:59: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-barbican-2/.venv/lib/python3.10/site-packages/charmhelpers/core/hookenv.py", line 1374, in network_get_primary_address
    response = subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 420, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['network-get', '--primary-address', 'shared-db']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-barbican-2/.venv/lib/python3.10/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-barbican-2/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-barbican-2/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-barbican-2/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-barbican-2/charm/reactive/layer_openstack_api.py", line 33, in default_setup_database
    database.configure(**db)
  File "/var/lib/juju/agents/unit-barbican-2/charm/hooks/relations/mysql-shared/requires.py", line 56, in configure
    hostname = hookenv.network_get_primary_address(
  File "/var/lib/juju/agents/unit-barbican-2/.venv/lib/python3.10/site-packages/charmhelpers/core/hookenv.py", line 1180, in inner_translate_exc2
    return f(*args, **kwargs)
  File "/var/lib/juju/agents/unit-barbican-2/.venv/lib/python3.10/site-packages/charmhelpers/core/hookenv.py", line 1379, in network_get_primary_address
    raise NoNetworkBinding("No network binding for {}"
charmhelpers.core.hookenv.NoNetworkBinding: No network binding for shared-db

but looking at the juju show-unit for that application the endpoint has a set address, and the binding is set:

  - relation-id: 59
    endpoint: shared-db
    related-endpoint: shared-db
    application-data: {}
    related-units:
      barbican-mysql-router/0:
        in-scope: true
        data:
          egress-subnets: 192.168.33.188/32
          ingress-address: 192.168.33.188
          private-address: 192.168.33.188
      barbican-mysql-router/1:
        in-scope: true
        data:
          egress-subnets: 192.168.33.171/32
          ingress-address: 192.168.33.171
          private-address: 192.168.33.171
      barbican-mysql-router/2:
        in-scope: true
        data:
          ingress-address: 10.246.65.100
          private-address: 10.246.65.100

The testrun can be found at:
https://solutions.qa.canonical.com/v2/testruns/48b95cac-f227-4474-8935-6d44d6bdc9e7/
and the crashdump can be found at:
https://oil-jenkins.canonical.com/artifacts/48b95cac-f227-4474-8935-6d44d6bdc9e7/generated/generated/openstack/juju-crashdump-openstack-2022-11-10-07.13.14.tar.gz
for each unit inside the crashdump there are logs for juju show-status-log, juju show-machine, and juju show-unit. where the bindings and address can be seen.

Revision history for this message
Ian Booth (wallyworld) wrote :

When network-get is run, it is not guaranteed that Juju has collated all the address info - link layer devices, instance ips etc - for the host machine. Therefore the api call may return 0 network info records. This results in the charmhelper "NoNetworkBinding" error.

The Juju agent on the host machine gathers the link layer device info, and the controller polls for cloud allocated host instance addresses, so the network info does eventually become available.

The charm needs to be resilient to the fact that the address info may not immediately be known. It needs to take account of that error and try again a short time later. It needs to set its status to "Waiting" with a suitable message. And if after several attempts the address info is not available, it should set its status to "Blocked".

If this error happens, can you retry the hook - it should run ok the next time, assuming the address info has been populated. This will confirm the above theory and indicate that the charm needs to be fixed.

Changed in juju:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote (last edit ):
Changed in juju:
status: Expired → New
Changed in juju:
milestone: none → 2.9-backlog
importance: Undecided → Low
status: New → Triaged
Changed in juju:
milestone: 2.9-backlog → none
Revision history for this message
Jeffrey Chang (modern911) wrote :
Download full text (4.4 KiB)

SolQA saw this on 2 juju 3.1.6 candidate runs over the weekend.

testrun 1: https://solutions.qa.canonical.com/testruns/1639761c-78bf-40fc-9941-c0d154fb0ff8
crashdump 1: https://oil-jenkins.canonical.com/artifacts/1639761c-78bf-40fc-9941-c0d154fb0ff8/generated/generated/openstack/juju-crashdump-openstack-2023-10-01-18.33.41.tar.gz
testrun 2: https://solutions.qa.canonical.com/testruns/ef1a6e6e-cf3e-40ce-a07e-46cad78300c7
crashdump 2: https://oil-jenkins.canonical.com/artifacts/ef1a6e6e-cf3e-40ce-a07e-46cad78300c7/generated/generated/openstack/juju-crashdump-openstack-2023-10-02-00.11.13.tar.gz

Error logs
2023-10-01 15:29:37 DEBUG unit.rabbitmq-server/2.juju-log server.go:325 Not updating clients: leader node is ready:False, client node is ready:False
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 Traceback (most recent call last):
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/install.real", line 1243, in <module>
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 rabbit.assess_status(rabbit.ConfigRenderer(rabbit.CONFIG_FILES()))
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 210, in __init__
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 ctxt.update(svc_context())
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbitmq_context.py", line 146, in __call__
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 ssl_mode, external_ca = ssl_utils.get_ssl_mode()
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/ssl_utils.py", line 64, in get_ssl_mode
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 relation_certs = get_relation_cert_data()
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/ssl_utils.py", line 59, in get_relation_cert_data
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 _, hostname = get_unit_amqp_endpoint_data()
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/ssl_utils.py", line 47, in get_unit_amqp_endpoint_data
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 ip = get_relation_ip(
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/contrib/network/ip.py", line 571, in get_relation_ip
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 address = network_get_primary_address(interface)
2023-10-01 15:29:37 WARNING unit.rabbitmq-server/2.install logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/core/hookenv.py", line 1180, in inner_translate_exc2
2023-10-01 15:29:37 WARN...

Read more...

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

I have another 2 occurrences on juju 3.3-beta2. The symptom is different as it did not fail with the network-get command, but I see failed due to a missing address on one of the spaces, for example:

https://solutions.qa.canonical.com/testruns/5eaab1be-81cd-41db-8927-9db0055c80cb/ does not get an address on the ceph-public-address space for machine 2/lxd/2. Crashdumps and configs are here: https://oil-jenkins.canonical.com/artifacts/5eaab1be-81cd-41db-8927-9db0055c80cb/index.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.