access-network is ignored

Bug #1497527 reported by Ante Karamatić
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
percona-cluster (Juju Charms Collection)
Fix Released
High
James Page

Bug Description

When adding relation between glance (also seen with nova-cloud-controller and probably others) and with access-network set to specific network (in this case 10.0.4.0/24), proper grants are not set. In attached log one can see that grants are created for both 10.0.1.0/24 and 10.0.4.0/24 IPs. 10.0.1.0/24 is network used for juju deployment (private-addresses are from this network).

However, IPs that are granted are random. Percona grants access to these IPs without specific order and usually misses the right IP. For example, glance/0 unit had IPs: 10.0.1.129 and 10.0.4.135. But if you look at the log you can see that access for ip 10.0.4.135 was given only at 09:02:24 (after multiple remove/add relation exercises). Access for 10.0.1.129 was provided at 08:37:58 (on first attempt at creating a relation).

Therefore, access-network is not working properly. Not only that IPs from that network are not granted access, but also IPs that should not have access are given access. In this case percona should have granted access only to IPs from 10.0.4.0/24 network, and not from 10.0.1.0/24 network. If you look further in the log, you'll see the same problem with nova-cloud-controller (10.0.4.139 was never given grant access, but grant was given to 10.0.1.133; both IPs belong to nova-cloud-controller/0).

This is obvious only because in this case in both cases glance/0 and nova-cloud-controller/0 were leaders for those services. I can only assume that same problem exists for non-leaders, but since they don't connect to DB during deployment problem is not visible. It will (if it exist) become very visible once service fails over to different node.

I'm marking this High bug because of the impact scale of this bug.

Revision history for this message
Ante Karamatić (ivoks) wrote :
Revision history for this message
James Page (james-page) wrote :

This issue is specific to:

    percona-cluster configured using access-network
    related service with multiple remote units

its caused by the fact that currently, the shared-db relation uses a multi-hook execution conversation to negotiate access over the correct 'access-network' configuration; the problem occurs when multiple units are in the service related, as it looks like the leader unit does not complete negotiation until after initial access (over 'private-address') has been granted, the follower units have negotiated correct access and the pxc charm has switched its db_host presented value over to a IP on the access-network, resulting in the leader trying to complete operations without the appropriate grants in place.

My proposed fix is to not present data until a valid ip is presented by each remote unit for access-network configurations; remote operations will continue to be gated by presence of the unit name in allowed_hosts, but this won't be present until the initial negotiation has completed.

Changed in percona-cluster (Juju Charms Collection):
status: New → Triaged
status: Triaged → In Progress
milestone: none → 15.10
assignee: nobody → James Page (james-page)
James Page (james-page)
tags: added: stable
tags: added: openstack
tags: added: backport-potential
tags: added: sts
James Page (james-page)
Changed in percona-cluster (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in percona-cluster (Juju Charms Collection):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.