Juju adds any RFC1918 address it finds on any state servers to the apiaddresses list in agent.conf

Bug #1554436 reported by Mick Gregg
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Won't Fix
Low
Unassigned
juju-core
Won't Fix
Low
Unassigned

Bug Description

When deciding which IP addresses to use as API addresses, Juju appears to assume to use all RFC1918 addresses it finds bound to an interface on a state server. The exception may be addresses bound to lxcbr0 (or 10.0.3.0/24). This can include addresses to which other Juju units have no route.

We've seen this behaviour recently with 1.24.7, at least.

This causes us two problems today in production:
1. Our OpenStack private clouds often use an RFC1918 range for their 'External' networks. With the exception of neutron-gateway units, service units deployed to LXC containers have no route to the External network
2. Juju state servers may also be hosting KVM machines. When libvirt-bin is installed, the default virsh net is created with 192.168.122.1 bound to virbr0 on the host. Some Juju agent may find this address locally if they too have the default virsh net, but it won't be a Juju API address. Others won't have a route to the address at all

We see an amount of Juju error logging for failed network connections, with units attempting to connect to rsyslogd on state servers on these bad addresses, as well as a number of these failing network connections in SYN_SENT state.

There is a work-around to remove the erroneous addresses from each agent.conf and restart the Juju agents. Of course, the erroneous addresses are re-added on agent restart and on subsequent restarts the bug is evident again.

I suggest, as with api-port, Juju could use an api-cidr option.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

It is known behavior that juju will list all IPs (except lxc bridge IPs) as possible addresses for state servers. There is work in 2.0 to improve things, but I don't believe there are any plans to address this in 1.25. Will double check.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Talked with frobware about this today, and it seems that the API host ports selection for 2.0 is still in development.

As far as 1.25 is concerned, I'm wondering if we could have a config option to specify a subnet to use (or exclude) for API host port addresses.

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
tags: added: network
Changed in juju-core:
milestone: none → 1.25.5
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Mick - could you help me get a better idea of the impact this is having? Is it just extra errors filling up the log? How often are you hitting problem #2 in the original description? Helping to understand the impact will help us prioritize this appropriately.

Revision history for this message
Mick Gregg (macgreagoir) wrote :

Cheryl,

Thanks for looking into this.

Regarding problem #2, the Canonical Cloud Reference Architecture we now follow for the BootStacks we build has nova-compute (therefore libvirt-bin et cetera) on all metal deployed by Juju. Each will have the default virt net (virbr0 with 192.168.122.1).

As you suggest, the Juju logs become very much filled with error messaging, and we see the several SYN_SENT state failed connections on all units, as noted in the bug description. From an operations view, these are both issues that can cloud troubleshooting (if you'll excuse the pun). There is really no such thing as 'harmless' error logging: if it says 'bad' it is, and it's dangerous to learn to ignore error messages and bad network connections, as I'm sure you'll appreciate.

I assume the failed connections to rsyslogd on the state servers are an issue for Juju, but you may be better placed to understand the impact than I am. Does this limit the usefulness of debug-log, for example?

Perhaps if the impact, from the Juju view, is not critical, the logging itself could be corrected too. The message says:

    ERROR juju.worker runner.go:223 exited "rsyslog": dial tcp <state server addr>:6514: connection refused

You'll note it does not tell us why this is critical, just that a connection has been refused and that this is important enough to be regarded as an error, rather than a warning.

The accuracy and usefulness of the log message is really a separate bug, though. Even if we knew more about the impact and severity from the log message, we still have erroneous attempts to connect to state servers, which need to be stopped.

Cheers,

Mick

Changed in juju-core:
milestone: 1.25.5 → 1.25.6
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25.6 → 1.25.7
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.1.0
Changed in juju-core:
status: Triaged → Won't Fix
milestone: 1.25.7 → none
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Removing 2.1 milestone as we will not be addressing this issue in 2.1.

Changed in juju:
milestone: 2.1.0 → none
Revision history for this message
Joseph Phillips (manadart) wrote :

This has since been made configurable via juju-mgmt-space.

Changed in juju:
status: Triaged → Won't Fix
importance: High → Low
Changed in juju-core:
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.