b-s-b charm easily confused by multiple volumes partially matching its expected volume name

Bug #1487636 reported by Daniel Manrique
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
block-storage-broker (Juju Charms Collection)
Confirmed
Undecided
Unassigned

Bug Description

When deploying a postgresql service using block-storage-broker, I got an error about "mutiple volumes" and the stack failed to deploy correctly.

Looking into the code, the get_volume_id method n hooks/utils.py is vulnerable to picking more than one unit when it only expects one (and maybe even getting the wrong one).

How to reproduce:

0- Configure with your appropriate nova values and save the example .cfg file from block-storage-broker's README.md (this simply deploys a b-s-b, storage and postgresql stack and adds the needed relations, creating or reusing a nova volume in the process).
1- nova create-volume --display-name="bogus-postgresql/0 prodstack-zone-1" 11
2- nova create-volume --display-name="another-bogus-postgresql/0 prodstack-zone-1" 11
3- juju-deployer -c blah.cfg doit-openstack

Expected result:
- Nice live stack with no errors

Actual result:

2015-08-21 22:06:39 Starting deployment of doit-openstack
2015-08-21 22:06:39 Deploying services...
2015-08-21 22:06:40 Deploying service block-storage-broker using local:precise/block-storage-broker
2015-08-21 22:06:44 Deploying service postgresql using local:precise/postgresql
2015-08-21 22:06:49 Deploying service storage using local:precise/storage
2015-08-21 22:06:58 Config specifies num units for subordinate: storage
2015-08-21 22:08:27 Adding relations...
2015-08-21 22:08:27 Adding relation postgresql <-> storage
2015-08-21 22:08:28 Adding relation storage <-> block-storage-broker
2015-08-21 22:09:28 The following units had errors:
   unit: block-storage-broker/0: machine: 7 agent-state: error details: hook failed: "block-storage-relation-changed"
2015-08-21 22:09:28 Deployment stopped. run time: 169.96

And looking at the logs in the b-s-b unit:

2015-08-21 22:08:33 INFO juju.worker.uniter relations.go:327 joined relation "storage:block-storage block-storage-broker:block-storage"
2015-08-21 22:08:42 INFO juju.worker.uniter.operation executor.go:66 running operation run relation-joined (13; storage/0) hook
2015-08-21 22:08:42 INFO juju.worker.uniter.operation executor.go:87 preparing operation "run relation-joined (13; storage/0) hook"
2015-08-21 22:08:42 INFO juju.worker.uniter.operation executor.go:87 executing operation "run relation-joined (13; storage/0) hook"
2015-08-21 22:08:42 INFO juju.worker.uniter.context runner.go:149 skipped "block-storage-relation-joined" hook (not implemented)
2015-08-21 22:08:42 INFO juju.worker.uniter.context context.go:359 handling reboot
2015-08-21 22:08:42 INFO juju.worker.uniter.operation runhook.go:95 skipped "block-storage-relation-joined" hook (missing)
2015-08-21 22:08:42 INFO juju.worker.uniter.operation executor.go:87 committing operation "run relation-joined (13; storage/0) hook"
2015-08-21 22:08:42 INFO juju.worker.uniter.operation executor.go:66 running operation run relation-changed (13; storage/0) hook
2015-08-21 22:08:42 INFO juju.worker.uniter.operation executor.go:87 preparing operation "run relation-changed (13; storage/0) hook"
2015-08-21 22:08:42 INFO juju.worker.uniter.operation executor.go:87 executing operation "run relation-changed (13; storage/0) hook"
2015-08-21 22:08:43 INFO juju-log block-storage:13: Running block-storage-relation-changed hook
2015-08-21 22:08:43 INFO juju-log block-storage:13: Relation block-storage:13 with storage/0
2015-08-21 22:08:43 INFO block-storage-relation-changed
<SNIP of a table with potentially sensitive information>
2015-08-21 22:08:43 INFO juju-log block-storage:13: Validated charm configuration credentials have access to block storage service
2015-08-21 22:08:43 ERROR juju-log block-storage:13: Error: Multiple volumes are associated with postgresql/0 prodstack-zone-1 volume. Cannot get_volume_id.
2015-08-21 22:08:43 INFO juju.worker.uniter.context context.go:359 handling reboot
2015-08-21 22:08:43 ERROR juju.worker.uniter.operation runhook.go:86 hook "block-storage-relation-changed" failed: exit status 1
2015-08-21 22:08:43 DEBUG juju.worker.uniter modes.go:417 ModeAbide exiting
2015-08-21 22:08:43 INFO juju.worker.uniter modes.go:415 ModeHookError starting
2015-08-21 22:08:44 DEBUG juju.worker.uniter.filter filter.go:504 want resolved event
2015-08-21 22:08:44 DEBUG juju.worker.uniter.filter filter.go:498 want forced upgrade true
2015-08-21 22:08:44 DEBUG juju.worker.uniter.filter filter.go:620 no new charm event

OK, so what happens is that the get_volume_id method does this:

            elif token in volume_name:
                matches.append(volume_id)

(where token is that "postgresql/0 prodstack-zone-1" name).

From this we can clearly see that it doesn't do a full-name match, but a partial one. So in my example, both the bogus volumes match prompting the "Multiple volumes" error.

Also, the charm would expect to create (and reuse) a "postgresql/0 prodstack-zone-1" volume, but if a single volume matches the expression (i.e. try deleting one of the bogus volumes, the other one will match), then the charm will happily use that one, even though it probably is not the correct one.

Changing that search so it does an exact match would avoid multiple matches.

Maybe a situation where the same tenant has more than one postgresql service (say he has staging-postgresql/0 and production-postgresql/0) is farfetched, but this report is prompted by a real-life situation so it's not theoretical. This is just a minimal reproduction case.

Thanks!

Tags: canonical-is
Revision history for this message
Daniel Manrique (roadmr) wrote :

(1)
After looking into this a bit more I think a good solution would be to make b-s-b search for exact matches. This helps in scenarios where e.g.

staging-postgresql:
     charm: postgresql
production-postgresql:
     charm: postgresql

I have two services using the same charm, the existing code will get messed up once the second service is launched and I have two volumes matching "postgresql/0 my-zone" in service-name.

This still presents problems in instances where e.g. a postgresql service was deployed, then destroyed (but the volume stays) and I want to deploy a new service: if I didn't remember to blast the old volume, I'll inherit it and it may have old data which will cause trouble down the line.

(2)
A possible addition would be making the "storage" charm configurable as far as the volume name it requests from b-s-b. That way I can be explicit and say stuff like

volume_label = "database for project blah"

though a single environment can't have two postgresql services (the service has to be named differently) and thus the first fix should help avoid problems with two identically (or overlappingly) named services, the configurable volume_label could help in some instances. I have a harder time imagining a real use-case for that so I'd go for the first option instead.

Revision history for this message
Daniel Manrique (roadmr) wrote :

Hm, a test called test_get_volume_id_without_volume_name_multiple_matching_volumes suggests that the partial matching behavior is by design.

This leaves two choices:

1- Rename the postgresql service so the units are guaranteed to have non-clashing names. It feels forced to have to do this when my expectation is that my tenant will NOT have conflicting service (and hence, volume) names.

2- Update the "storage" charm so an arbitrary volume label can be passed, and add that as an option to the deployment, this shifts the burden of deduplication to the human doing the deployment.

And of course:

3- Change the storage-broker charm's behavior so partial matching is not done. This would require some discussion with the charm authors to see if we're not breaking an assumption/rationale that makes sense.

Changed in block-storage-broker (Juju Charms Collection):
status: New → Fix Released
Changed in block-storage-broker (Juju Charms Collection):
status: Fix Released → Confirmed
Paul Gear (paulgear)
tags: added: canonical-is
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.