[2.2.2] private-address relation setting is not based on a default space binding

Bug #1708492 reported by Dmitrii Shcherbakov
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Witold Krecicki

Bug Description

Original title: "unable to establish a tcp connection to memcached running in a lxd container due to ufw blocking traffic"

See #5 to get to the result of this investigation.

ubuntu@maas:~$ juju controllers
Use --refresh flag with this command to see the latest information.

Controller Model User Access Cloud/Region Models Machines HA Version
samaas* default admin superuser samaas 2 1 none 2.2.2

Unable to establish a tcp connection to a container with ufw enabled. Packets get to the veth interface but do not hit memcached running in a container.

There is no 'connection refused' so this is not a problem with the memcached process not having a socket bound to a correct address:port. I also reconfigured it to explicitly bind a socket to IPv4:12111 instead of IPADDR_ANY:11211 (0.0.0.0) although it does not make any difference.

# container host

# before reconfiguring not to listen on 0.0.0.0:11211
# telnet 11211 launched inside a container results in an established connection
ubuntu@kachina:/usr/share/bcc/tools$ sudo ./tcptracer -p 171184
Tracing TCP established connections. Ctrl-C to end.
T PID COMM IP SADDR DADDR SPORT DPORT
A 171184 memcached 4 127.0.0.1 127.0.0.1 11211 42392

ubuntu@kachina:~$ uname -r
4.10.0-28-generic

ubuntu@kachina:~$ apt policy lxd
lxd:
  Installed: 2.0.10-0ubuntu1~16.04.1
  Candidate: 2.0.10-0ubuntu1~16.04.1
  Version table:
     2.15-0ubuntu6~ubuntu16.04.1 100
        100 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages
 *** 2.0.10-0ubuntu1~16.04.1 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.2-0ubuntu1~16.04.1 500
        500 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages
     2.0.0-0ubuntu4 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages

capabilities:

ubuntu@juju-51cde3-6-lxd-4:~$ sudo capsh --print
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,37+ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,37
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=0(root)
gid=0(root)
groups=0(root)

ubuntu@juju-51cde3-6-lxd-4:~$ ip -4 -o a s
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
46: eth0 inet 10.232.4.118/21 brd 10.232.7.255 scope global eth0\ valid_lft forever preferred_lft forever

ubuntu@juju-51cde3-6-lxd-4:~$ ss -tulpn 'sport = 11211'
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
udp UNCONN 0 0 10.232.4.118:11211 *:*
tcp LISTEN 0 128 10.232.4.118:11211 *:*

ubuntu@juju-51cde3-6-lxd-4:~$ pgrep -af memcached
4821 bash /var/lib/juju/init/jujud-unit-memcached-2/exec-start.sh
4825 /var/lib/juju/tools/unit-memcached-2/jujud unit --data-dir /var/lib/juju --unit-name memcached/2 --debug
27146 /usr/bin/memcached -m 768 -p 11211 -u memcache -l 10.232.4.118 -c 1024 -f 1.25

# the correct rules are present
ubuntu@juju-51cde3-6-lxd-4:~$ sudo ufw status
Status: active

To Action From
-- ------ ----
11211/tcp ALLOW 10.232.24.21
11211/tcp ALLOW 10.232.24.19
11211/tcp ALLOW 10.232.24.13
11211/tcp ALLOW 10.232.4.94
11211/tcp ALLOW 10.232.4.127
22 ALLOW Anywhere
11211/tcp DENY Anywhere
22 (v6) ALLOW Anywhere (v6)
11211/tcp (v6) DENY Anywhere (v6)

ubuntu@juju-51cde3-6-lxd-4:~$ sudo ufw disable
Firewall stopped and disabled on system startup

# from a different host and container although it is reproducible from the same host 'outside' of the container

# able to connect. ERROR is fine after sending newlines since memcached expects something meaningful instead
ubuntu@juju-51cde3-5-lxd-3:~$ telnet 10.232.4.118 11211
Trying 10.232.4.118...
Connected to 10.232.4.118.
Escape character is '^]'.

ERROR

ERROR
^]
telnet> ^CConnection closed.

ubuntu@juju-51cde3-6-lxd-4:~$ sudo ufw enable
Command may disrupt existing ssh connections. Proceed with operation (y|n)? y
Firewall is active and enabled on system startup

# cannot connect anymore
ubuntu@juju-51cde3-5-lxd-3:~$ telnet 10.232.4.118 11211
Trying 10.232.4.118...

ubuntu@kachina:~$ sudo lxc exec juju-51cde3-6-lxd-4 -- tcpdump -n -i eth0 src port 11211 or dst port 11211
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C17:23:07.218430 IP 10.232.4.91.50930 > 10.232.4.118.11211: Flags [S], seq 1066636968, win 29200, options [mss 1460,sackOK,TS val 1965982543 ecr 0,nop,wscale 7], length 0
17:23:08.224381 IP 10.232.4.91.50930 > 10.232.4.118.11211: Flags [S], seq 1066636968, win 29200, options [mss 1460,sackOK,TS val 1965982795 ecr 0,nop,wscale 7], length 0
17:23:10.240390 IP 10.232.4.91.50930 > 10.232.4.118.11211: Flags [S], seq 1066636968, win 29200, options [mss 1460,sackOK,TS val 1965983299 ecr 0,nop,wscale 7], length 0

ubuntu@juju-51cde3-6-lxd-4:~$ sudo iptables-save
http://paste.ubuntu.com/25234384/

# container config:

sudo lxc config show juju-51cde3-6-lxd-4
http://paste.ubuntu.com/25234378/

ubuntu@kachina:~$ sudo lxc profile show default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
name: default

---

ubuntu@kachina:~$ sudo lxc info juju-51cde3-6-lxd-4

http://paste.ubuntu.com/25234380/

Tags: cpec
description: updated
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Looks like ALLOW rules somehow get lower priority or are not applied while DENY rules do:

To Action From
-- ------ ----
11211/tcp ALLOW 10.232.24.21
11211/tcp ALLOW 10.232.24.19
11211/tcp ALLOW 10.232.24.13
11211/tcp ALLOW 10.232.4.94
11211/tcp ALLOW 10.232.4.127
11211/tcp DENY Anywhere # <---------------- that
11211/tcp (v6) DENY Anywhere (v6)

ubuntu@juju-51cde3-6-lxd-4:~$ sudo ufw allow 11211/tcp
Rule updated
Rule updated (v6)

# Able to connect now

ubuntu@juju-51cde3-5-lxd-3:~$ telnet 10.232.4.118 11211
Trying 10.232.4.118...
Connected to 10.232.4.118.
Escape character is '^]'.
^]

telnet> Connection closed.

# enable DENY again

ubuntu@juju-51cde3-6-lxd-4:~$ sudo ufw deny 11211/tcp
Rule updated
Rule updated (v6)
ubuntu@juju-51cde3-6-lxd-4:~$ sudo ufw status
Status: active

To Action From
-- ------ ----
11211/tcp ALLOW 10.232.24.21
11211/tcp ALLOW 10.232.24.19
11211/tcp ALLOW 10.232.24.13
11211/tcp ALLOW 10.232.4.94
11211/tcp ALLOW 10.232.4.127
22 ALLOW Anywhere
11211/tcp DENY Anywhere
22 (v6) ALLOW Anywhere (v6)
11211/tcp (v6) DENY Anywhere (v6)

# no luck again
ubuntu@juju-51cde3-5-lxd-3:~$ telnet 10.232.4.118 11211
Trying 10.232.4.118...
^C

description: updated
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

The required rules are present and are in the right order:

ubuntu@juju-51cde3-6-lxd-4:~$ sudo iptables-save | grep 11211
-A ufw-user-input -s 10.232.24.21/32 -p tcp -m tcp --dport 11211 -j ACCEPT
-A ufw-user-input -s 10.232.24.19/32 -p tcp -m tcp --dport 11211 -j ACCEPT
-A ufw-user-input -s 10.232.24.13/32 -p tcp -m tcp --dport 11211 -j ACCEPT
-A ufw-user-input -s 10.232.4.94/32 -p tcp -m tcp --dport 11211 -j ACCEPT
-A ufw-user-input -s 10.232.4.127/32 -p tcp -m tcp --dport 11211 -j ACCEPT
-A ufw-user-input -p tcp -m tcp --dport 11211 -j DROP

sudo iptables -L -v

# the DROP rule is applied to all the packets coming to that port
# the needed rules have 0/0 counters
# pkts bytes
# 1008 60480
ubuntu@juju-51cde3-6-lxd-4:~$ sudo iptables -L -v

...

Chain ufw-user-input (1 references)
 pkts bytes target prot opt in out source destination
    0 0 ACCEPT tcp -- any any eth1.juju-51cde3-6-lxd-3.maas anywhere tcp dpt:11211
    0 0 ACCEPT tcp -- any any eth1.juju-51cde3-5-lxd-3.maas anywhere tcp dpt:11211
    0 0 ACCEPT tcp -- any any eth1.juju-51cde3-3-lxd-3.maas anywhere tcp dpt:11211
    0 0 ACCEPT tcp -- any any juju-51cde3-5-lxd-4.maas anywhere tcp dpt:11211
    0 0 ACCEPT tcp -- any any juju-51cde3-3-lxd-4.maas anywhere tcp dpt:11211
    0 0 ACCEPT tcp -- any any anywhere anywhere tcp dpt:ssh
    0 0 ACCEPT udp -- any any anywhere anywhere udp dpt:ssh
 1008 60480 DROP tcp -- any any anywhere anywhere tcp dpt:11211
...

description: updated
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

It appears to be that telnet was using an outgoing address (.91) for which there is no ufw rule.

ubuntu@juju-51cde3-5-lxd-3:~$ ip -4 -o a s
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
26: eth0 inet 10.232.42.7/21 brd 10.232.47.255 scope global eth0\ valid_lft forever preferred_lft forever
26: eth0 inet 10.232.40.212/21 brd 10.232.47.255 scope global secondary eth0\ valid_lft forever preferred_lft forever
28: eth1 inet 10.232.24.19/21 brd 10.232.31.255 scope global eth1\ valid_lft forever preferred_lft forever
30: eth2 inet 10.232.4.91/21 brd 10.232.7.255 scope global eth2\ valid_lft forever preferred_lft forever

Binding to 10.232.24.19 on the client side is rather pointless - the connection times out as there is no routing set up to get to the destination host (not that it is intended):

ubuntu@juju-51cde3-5-lxd-3:~$ ipython3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
Type "copyright", "credits" or "license" for more information.

IPython 2.4.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]: import socket

In [2]: s = socket.socket()

In [3]: s.bind(('10.232.24.19', 10000))

In [4]: s.connect(('10.232.4.118', 11211))
ubuntu@juju-51cde3-5-lxd-3:~$ bg
[1]+ ipython3 &

ubuntu@juju-51cde3-5-lxd-3:~$ lsof -n -i -a -p `pgrep -f ipython3`
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python3 1626602 ubuntu 6u IPv4 37861471 0t0 TCP 10.232.24.19:webmin->10.232.4.118:11211 (SYN_SENT)

ubuntu@juju-51cde3-5-lxd-3:~$ ---------------------------------------------------------------------------
TimeoutError Traceback (most recent call last)
<ipython-input-4-e15bfdfcd7ba> in <module>()
----> 1 s.connect(('10.232.4.118', 11211))

TimeoutError: [Errno 110] Connection timed out

[1]+ Stopped ipython3

----

So, the ufw rules are simply incorrect: addresses in 10.232.4.0/21 should be used when setting up filtering rules.

10.232.4.91/21 is the correct address to use for initiating a connection.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Spaces seem to be correct in my bundle:

ubuntu@maas:~/bundles⟫ grep -P '( gnocchi)|( memcached)': -A11 foundation-converged.yaml
  gnocchi-vip: &gnocchi-vip 10.232.1.212 10.232.40.212
  panko-vip: &panko-vip 10.232.1.213 10.232.40.213
  aodh-vip: &aodh-vip 10.232.1.214 10.232.40.214

  # NTP configuration
  ntp-source: &ntp-source "ntp.ubuntu.com"

  # After bundle has been deployed, log in to Landscape server and create
  # an account. In the account settings, set the Registration key and then
  # configure landscape-client to use that registration-key:
  # juju config landscape-client registration-key=$your_registration_key

--
  gnocchi:
    charm: cs:~james-page/gnocchi
    num_units: 3
    bindings:
      "": *oam-space
      public: *public-space
      admin: *admin-space
      internal: *internal-space
      shared-db: *internal-space
      storage-ceph: *ceph-public-space
      coordinator: *oam-space
    options:
--
  memcached:
    charm: cs:xenial/memcached-16
    bindings:
      "": *oam-space
    num_units: 3
    options:
       allow-ufw-ip6-softfail: true
    to:
    - lxd:3
    - lxd:5
    - lxd:6
relations:

primary-address values are correct for bindings:
ubuntu@maas:~/bundles⟫ juju run --unit memcached/2 'network-get --primary-address cache'
10.232.4.118

ubuntu@maas:~/bundles⟫ juju run --unit gnocchi/2 'network-get --primary-address coordinator'
10.232.4.95

The private-address propagated to the memcached unit over a 'cache' relation is **not** correct:

ubuntu@maas:~/bundles⟫ juju spaces
Space Subnets
admin-space 10.232.16.0/21
ceph-access-space 10.232.24.0/21
ceph-replica-space 10.232.48.0/21
internal-space 172.16.10.0/24
oam-space 10.232.0.0/21
                    10.232.36.0/24
                    192.168.122.0/24
public-space 10.232.40.0/21
undefined 172.17.0.0/16

ubuntu@maas:~/bundles⟫ python3 -c "import ipaddress ; print(ipaddress.ip_address('10.232.24.13') in ipaddress.ip_network('10.232.0.0/21'))"
False

python3 -c "import ipaddress ; print(ipaddress.ip_address('10.232.24.13') in ipaddress.ip_network('10.232.24.0/21'))"
True

An address in ceph-access-space (10.232.24.13 specifically) is considered to be a 'public-address' for gnocchi/2 unit.

ubuntu@maas:~/bundles⟫ juju run --unit gnocchi/2 'unit-get private-address ; unit-get public-address'
10.232.4.95
10.232.24.21

And is propagated as a private-address to memcached.

ubuntu@maas:~/bundles⟫ juju run --unit memcached/2 'relation-get -r `relation-ids cache` - gnocchi/2'
private-address: 10.232.24.21

I do not see overrides in any of the charm libs. Especially not for this relation as it is not that common.

https://github.com/openstack/charms.openstack/search?utf8=%E2%9C%93&q=private-address&type=
https://github.com/openstack-charmers/charm-gnocchi/search?utf8=%E2%9C%93&q=private-address&type=
https://github.com/juju-solutions/charms.reactive/search?utf8=%E2%9C%93&q=private-address&type=
https://github.com/juju/charm-helpers/search?p=3&q=private-address&type=&utf8=%E2%9C%93

description: updated
summary: - unable to establish a tcp connection to memcached running in a lxd
- container due to ufw blocking traffic
+ [2.2.2] public-address is propagated over a relation instead of private-
+ address but is presented as private-address to the other side
description: updated
affects: memcached (Juju Charms Collection) → juju
tags: added: cpec
description: updated
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote : Re: [2.2.2] public-address is propagated over a relation instead of private-address but is presented as private-address to the other side

In essence, something got fixed in 2.2.2:

https://github.com/juju/docs/blame/master/src/en/reference-release-notes.md#L144

"‘unit-get private-address’ now uses the default binding for an application."

But not to the extent we would like.

private-address retrieval in cache-relation-joined:
https://bazaar.launchpad.net/~memcached-team/charms/trusty/memcached/trunk/view/head:/hooks/memcached_hooks.py#L154

Provides side of the 'memcache' interface:

https://bazaar.launchpad.net/~memcached-team/charms/trusty/memcached/trunk/view/head:/metadata.yaml#L17

provides:
  cache:
    interface: memcache

Requires side:
https://github.com/openstack-charmers/charm-gnocchi/blob/8372a3ed526b0065d8be42f1230eef454c913c97/src/metadata.yaml#L23-L24

  coordinator-memcached:
    interface: memcache

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Private address is written to the relation settings here at the controller (apiserver):
https://github.com/juju/juju/blob/2.2/apiserver/uniter/uniter.go#L1078

SettingsAddress which supplies the private-address:

https://github.com/juju/juju/blob/2.2/state/relationunit.go#L471
...
 if crossmodel, err := ru.relation.IsCrossModel(); err != nil {
  return network.Address{}, errors.Trace(err)
 } else if !crossmodel {
  return unit.PrivateAddress()
 }
...

As this is not a cross-model relation we should hit PrivateAddress().

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Removed and re-added a relation between gnocchi and memcached (had to nuke gnocchi/0 as it prevented that from happening due to how reactive charms work - was blocked on connecting to memcached). Same result after that.

Attached a mongo dump after a new relation was added.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

More debugging:

1.

unit.PrivateAddress() is used in a non-CMR 'if' branch:
https://github.com/juju/juju/blob/develop/state/relationunit.go#L486

which is not space-aware:
https://github.com/juju/juju/blob/develop/state/unit.go#L778-L786

2.

On a simple model with 2 ubuntu (https://jujucharms.com/ubuntu/10) charms with modified metadata.yaml files to match the original setup and modifications to the code to use GetNetworkInfoForSpaces and more logging:
http://paste.ubuntu.com/25257187/

So the problem still persists with http://paste.ubuntu.com/25258545/ (https://github.com/juju/juju/blob/develop/state/machine_linklayerdevices.go#L1162)

3. With more logging in GetNetworkInfoForSpaces the problem is more apparent:

http://paste.ubuntu.com/25258545/

At some point, this code gets triggered an address in ceph-access-space:

if spaces.Contains("") && privateAddress.Value == addr.Value() {

And, therefore, results in:

results[""] = {[{00:16:3e:02:4f:31 eth1 [{10.232.24.3 10.232.24.0/21}]}] <nil>}

Which is incorrect - this code should not depend on private address at all - only on what we have chosen as a default space.

--

So 2 things:

1. change unit.PrivateAddress() usage to GetNetworkInfoForSpaces in relationunit.go:SettingsAddress

2. fix GetNetworkInfoForSpaces to return a proper value for a default space not based on PrivateAddress.

summary: - [2.2.2] public-address is propagated over a relation instead of private-
- address but is presented as private-address to the other side
+ [2.2.2] private-address relation setting is not based on a default space
+ binding
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

~wpk suggested

http://paste.ubuntu.com/25261862/

This resulted in a proper output.

ubuntu@maas:~/juju⟫ juju run --unit memcached/0 'relation-get -r `relation-ids cache` - gnocchi/0'
private-address: 10.232.4.230

So, there are two cases:

1) no default space binding == 'empty space' => private-address is returned
2) default space is bound to some space => a primary address is returned from that space

Revision history for this message
Witold Krecicki (wpk) wrote :
Changed in juju:
assignee: nobody → Witold Krecicki (wpk)
status: New → In Progress
importance: Undecided → High
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Fix-released in 2.2.3

Changed in juju:
status: In Progress → Fix Released
Changed in juju:
milestone: none → 2.2.3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.