[2.8.2-rc1] Hook failure: "No network binding for <endpoint>" when endpoint binding configured in bundle

Bug #1891044 reported by Michael Skalka
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Joseph Phillips

Bug Description

As seen in this test run: https://solutions.qa.canonical.com/qa/testRun/2c9e93ba-9053-43f1-ac65-7fc61655f186
Bundle here: https://oil-jenkins.canonical.com/artifacts/2c9e93ba-9053-43f1-ac65-7fc61655f186/config/config/bundle.yaml
Controller crashdump: https://oil-jenkins.canonical.com/artifacts/2c9e93ba-9053-43f1-ac65-7fc61655f186/generated/generated/juju_maas_controller/juju-crashdump-controller-2020-08-10-08.54.49.tar.gz
Model crashdump: https://oil-jenkins.canonical.com/artifacts/2c9e93ba-9053-43f1-ac65-7fc61655f186/generated/generated/openstack/juju-crashdump-openstack-2020-08-10-08.55.32.tar.gz

In linked bundle the endpoint for "cluster" is clearly set, however during hook execution the unit is unable to run "network-get" to read the binding configuration, resulting in the below hook error.

Hook error:

var/log/juju/unit-mysql-0.log:
...
2020-08-10 08:52:27 DEBUG config-changed Traceback (most recent call last):
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/charmhelpers/core/hookenv.py", line 1360, in network_get_primary_address
2020-08-10 08:52:27 DEBUG config-changed stderr=subprocess.STDOUT).decode('UTF-8').strip()
2020-08-10 08:52:27 DEBUG config-changed File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
2020-08-10 08:52:27 DEBUG config-changed **kwargs).stdout
2020-08-10 08:52:27 DEBUG config-changed File "/usr/lib/python3.6/subprocess.py", line 438, in run
2020-08-10 08:52:27 DEBUG config-changed output=stdout, stderr=stderr)
2020-08-10 08:52:27 DEBUG config-changed subprocess.CalledProcessError: Command '['network-get', '--primary-address', 'cluster']' returned non-zero exit status 1.
2020-08-10 08:52:27 DEBUG config-changed
2020-08-10 08:52:27 DEBUG config-changed During handling of the above exception, another exception occurred:
2020-08-10 08:52:27 DEBUG config-changed
2020-08-10 08:52:27 DEBUG config-changed Traceback (most recent call last):
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/config-changed", line 1148, in <module>
2020-08-10 08:52:27 DEBUG config-changed main()
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/config-changed", line 1138, in main
2020-08-10 08:52:27 DEBUG config-changed hooks.execute(sys.argv)
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/charmhelpers/core/hookenv.py", line 943, in execute
2020-08-10 08:52:27 DEBUG config-changed self._hooks[hook_name]()
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/charmhelpers/contrib/hardening/harden.py", line 93, in _harden_inner2
2020-08-10 08:52:27 DEBUG config-changed return f(*args, **kwargs)
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/config-changed", line 536, in config_changed
2020-08-10 08:52:27 DEBUG config-changed hosts = get_cluster_hosts()
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/percona_utils.py", line 255, in get_cluster_hosts
2020-08-10 08:52:27 DEBUG config-changed local_cluster_address = get_cluster_host_ip()
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/percona_utils.py", line 893, in get_cluster_host_ip
2020-08-10 08:52:27 DEBUG config-changed cluster_addr = network_get_primary_address('cluster')
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/charmhelpers/core/hookenv.py", line 1164, in inner_translate_exc2
2020-08-10 08:52:27 DEBUG config-changed return f(*args, **kwargs)
2020-08-10 08:52:27 DEBUG config-changed File "/var/lib/juju/agents/unit-mysql-0/charm/charmhelpers/core/hookenv.py", line 1364, in network_get_primary_address
2020-08-10 08:52:27 DEBUG config-changed .format(binding))
2020-08-10 08:52:27 DEBUG config-changed charmhelpers.core.hookenv.NoNetworkBinding: No network binding for cluster
...

Michael Skalka (mskalka)
description: updated
Revision history for this message
Pen Gale (pengale) wrote :

This might be the issue that the OpenStack team is seeing in #1849901

Revision history for this message
Pen Gale (pengale) wrote :

This might be fixed by the work Joe did to fix aws bindings. Probably a good idea to double check that, before digging into this one.

Revision history for this message
Joseph Phillips (manadart) wrote :

I don't believe this is unrelated to the AWS bindings/constraints issue.

Changed in juju:
status: New → Triaged
assignee: nobody → Joseph Phillips (manadart)
Revision history for this message
Joseph Phillips (manadart) wrote :

This *might* be a particular manifestation of the bug fixed here:
https://github.com/juju/juju/pull/11921

It could also be a HA-induced race.

In any case, I am extending the application of polling more broadly in the network-get logic. This should increase resiliency for this scenario, which should absolutely be finding and returning an address in IAAS models.

Revision history for this message
Joseph Phillips (manadart) wrote :
Changed in juju:
milestone: none → 2.8.2
importance: Undecided → High
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

There has been a reproducer observed for this bug.

It has to do with asynchronous provisioner tasks and the set of host network changes required to land containers in spaces.

If the same device requires bridging for multiple containers *and* that device happens to be a VLAN, recreating the bridge causes it to get a new hardware address. This causes issues with correctly applying machine-sourced link-layer updates.

This does not happen when bridging a regular Ethernet device, because the bridge will share the same MAC address.

Revision history for this message
Joseph Phillips (manadart) wrote :

I believe this to be the real solution for the issue:
https://github.com/juju/juju/pull/11943

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.