cannot bootstrap on openstack provider: Multiple security_group matches found for name 'XYZ', use an ID to be more specific.

Bug #1333162 reported by Vincent Ladeuil
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Unassigned
OpenStack Charm Test Infra
Invalid
High
Unassigned
juju-core
Fix Released
Undecided
Unassigned
1.25
Fix Released
Undecided
Unassigned

Bug Description

Now observed with Ubuntu OpenStack Kilo as the undercloud, with juju 1.25.0 and juju 1.25.1(proposed) and the juju openstack provider.

This blocks deploy/test automation where iterative bootstrap/deploy/destroy loops occur, such as bundletester, amulet, juju test, and perhaps other CI.

2015-11-24 14:26:47 ERROR juju.cmd supercommand.go:429 failed to bootstrap environment: cannot start bootstrap instance: cannot run instance: failed to run a server with nova.RunServerOpts{ ...

caused by: request (http://10.245.161.158:8774/v2/d2be765ac7a7490a899995bdad501cc6/servers) returned unexpected status: 409; error info: {"conflictingRequest": {"message": "Multiple security_group matches found for name 'juju-osci-sv07-0', use an ID to be more specific.", "code": 409}}

# juju bootstrap --debug output:
http://paste.ubuntu.com/13492802/

--- original description ---

On HP cloud, I encounter a case where I end up with the juju-hp-global security group defined twice.

I use:
    firewall-mode: global
    use-floating-ip: true

in my environments.yaml file.

This is not reproducible at will but more likely to happen when a deployment is already present and I'm running more juju deployer commands to complete a deployment.

juju status also gives some hard to parse hints:

  "12":
    agent-state-info: '(error: cannot run instance: failed to run a server with nova.RunServerOpts{Name:"juju-hp-machine-12",
      FlavorId:"100", ImageId:"b35fe7b8-78bb-44a8-a6bd-7bc5ebdfc633", UserData:[]uint8{0x1f,

< ~500 lines of: 0x99, 0xb6, 0x7f, 0x14, 0x4d, 0x1b, 0xc7, 0x72, 0xa9, 0x84, 0x2e, 0x98, 0x72, stuff>

      SecurityGroupNames:[]nova.SecurityGroupName{nova.SecurityGroupName{Name:"juju-hp"},
      nova.SecurityGroupName{Name:"juju-hp-global"}}, Networks:[]nova.ServerNetworks{}}

      caused by: request (https://region-a.geo-1.compute.hpcloudsvc.com/v2/11206487910601/servers)
      returned unexpected status: 409; error info: {"conflictingRequest": {"message":
      "Multiple security_group matches found for name ''juju-hp-global'', use an ID
      to be more specific.", "code": 409}})'
    instance-id: pending
    series: precise

I would guess that checking the existence of juju-hp-global fails for some unexpected cause (rate limitation, nova transient error) and juju believes the group doesn't exist and create a *new* one.

At that point there are two existing juju-hp-global secgroups and juju can't recover:

 $ nova secgroup-list
+--------------------------------------+----------------+-------------+
| Id | Name | Description |
+--------------------------------------+----------------+-------------+
| f9b0e939-1a37-4f61-92b4-28f88b125d74 | default | default |
| c37f45d9-0787-4272-8558-dc4db9310802 | juju-hp | juju group |
| 32fb3fef-e1c1-41df-9227-5eda15e1cfba | juju-hp-global | juju group |
| d9cf6d0a-747a-474a-920d-90e7381eb1f5 | juju-hp-global | juju group |
+--------------------------------------+----------------+-------------+

Revision history for this message
Curtis Hovey (sinzui) wrote :

Juju CI sees this too. We added a script to purge the env's secgroup after tests to ensure the next use the the env doesn't see the phantom secgroup.

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → next-stable
tags: added: hp-cloud openstack-provider security
Revision history for this message
Vincent Ladeuil (vila) wrote :

$ nova secgroup-delete 32fb3fef-e1c1-41df-9227-5eda15e1cfba
ERROR: 409-{u'QuantumError': u'Security Group 32fb3fef-e1c1-41df-9227-5eda15e1cfba in use.'} (HTTP 400) (Request-ID: req-07a2a2ca-646d-4192-9ff7-248cfd7ba174)

:-/

$ nova secgroup-delete d9cf6d0a-747a-474a-920d-90e7381eb1f5
+--------------------------------------+----------------+-------------+
| Id | Name | Description |
+--------------------------------------+----------------+-------------+
| d9cf6d0a-747a-474a-920d-90e7381eb1f5 | juju-hp-global | juju group |
+--------------------------------------+----------------+-------------+

At least.

$ juju destroy-environment hp --force
$ nova secgroup-delete 32fb3fef-e1c1-41df-9227-5eda15e1cfba
+--------------------------------------+----------------+-------------+
| Id | Name | Description |
+--------------------------------------+----------------+-------------+
| 32fb3fef-e1c1-41df-9227-5eda15e1cfba | juju-hp-global | juju group |
+--------------------------------------+----------------+-------------+

Not ideal for us, the node 0 needs to be tear down.

Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Medium
milestone: next-stable → none
Revision history for this message
Ryan Beisner (1chb1n) wrote :

I am observing this with juju 1.25.0 and 1.25.1(proposed) on the openstack provider (Kilo).

The work-around for secgroup cleanup isn't viable for deployment automation where the operator does not control the moments between destroy and deploy.

When the situation arises, it causes all subsequent deployments to fail due to bootstrap errors. This requires human interaction to resolve, short of programmatically doing secgroup housekeeping, and patching that housekeeping into juju test, amulet and/or bundletester.

In OpenStack charm testing, we have 20+ jenkins slaves, each with unique juju environments, all operating against a private cloud via the juju openstack provider. Last night, all slave enviros entered this "Multiple security_group matches" state, and are unusable. I plan to inspect and remove duplicate secgroups to dig more, but I seek a more clever provider solution.

summary: - juju-hp-global secgroup defined twice
+ cannot bootstrap on openstack provider: Multiple security_group matches
+ found for name 'XYZ', use an ID to be more specific.
Ryan Beisner (1chb1n)
description: updated
tags: added: uosci
Changed in juju-core:
importance: Medium → High
milestone: none → 1.26-beta1
Curtis Hovey (sinzui)
tags: added: repeatability
tags: added: charmers
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Perhaps the best way to address this would be by addressing how secgroups are identified and referenced by juju. ie. Use the guaranteed-unique object ID, instead of the human-readable name string.

See https://bugs.launchpad.net/juju-core/+bug/1461957

tags: added: bug-squad
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This is one of the "leaks" discussed during the cdo-qa sprint. It should be brought up to Mark Ramm's attention, per his request.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Here is another example of this impacting test automation:
http://paste.ubuntu.com/13580777/

Revision history for this message
Antonio Rosales (arosales) wrote :

I also saw this in HP Cloud and had to request to have my security group limit increased. This may have some impacts on the GOOSE provider implementation.

-thanks,
Antonio

Changed in juju-core:
milestone: 1.26-beta1 → 2.0-alpha2
Changed in juju-core:
milestone: 2.0-alpha2 → 2.0-alpha3
Changed in juju-core:
milestone: 2.0-alpha3 → 2.0-beta4
Changed in juju-core:
milestone: 2.0-beta4 → 2.0.0
Curtis Hovey (sinzui)
tags: removed: hp-cloud
Changed in juju-core:
status: Triaged → Incomplete
Revision history for this message
Anastasia (anastasia-macmood) wrote :

We have done a lot of work in the area. Could we please confirm if this is still failing.

Changed in juju-core:
milestone: 2.0.0 → none
Revision history for this message
Aaron Bentley (abentley) wrote :

CI hasn't shown this since we last tested HP.

Changed in juju-core:
status: Incomplete → Invalid
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :

The message is real. Juju should not use secgroup names to track secgroups on OpenStack. It should use unique secgroup IDs. Even with vanilla nova client, this can be expected to fail like so:

ubuntu@osci-bastion:⟫ for i in $(nova secgroup-list | awk '{ print $4 }'); do nova secgroup-delete $i; done
ERROR (NoUniqueMatch): Multiple security group matches found for name 'juju-osci-sv06', use an ID to be more specific.
ERROR (NoUniqueMatch): Multiple security group matches found for name 'juju-osci-sv06', use an ID to be more specific.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Ryan

When you get a chance, please let us know if you are seeing the same behavior with Juju 2. Thank you very much for the update!

Changed in juju-core:
status: Invalid → Triaged
importance: High → Critical
Changed in juju:
status: New → Incomplete
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Just to reiterate: there is no guarantee from openstack nova that secgroup names will be unique. Juju appears to assume they will be unique. If juju were to use the unique ID of the secgroup object instead of the name, problem solved.

tags: added: sts
Changed in juju:
status: Incomplete → Triaged
importance: Undecided → Critical
milestone: none → 2.2.0
assignee: nobody → Alexis Bruemmer (alexis-bruemmer)
Ryan Beisner (1chb1n)
Changed in charm-test-infra:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Ryan Beisner (1chb1n) wrote :

We've now upgraded to 1.25.9 in all of the OSCI Juju 1.x slaves. But before we'll know how well those fixes work toward this sec group cleanup issue, we'll have to turn off our own post-destroy vacuum cleaners. I'll plan to do that some day soon and provide feedback either way. Thanks, all!

Changed in juju-core:
milestone: none → 1.25.11
Changed in juju:
assignee: Alexis Bruemmer (alexis-bruemmer) → nobody
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Marking as Incomplete for 1.25 while awaiting Ryan's feedback.

Changed in juju-core:
status: Triaged → Incomplete
importance: Critical → Undecided
milestone: 1.25.11 → none
Revision history for this message
Anastasia (anastasia-macmood) wrote :

We have addressed issue of deleting security groups as part of bug # 1625624.

On the openstack deployment with Juju 2.1 bootstrapped, I can see security groups identified uniquely with appropriate juju- tag as Name: http://pastebin.ubuntu.com/24031915/

I am marking this as Fix Committed.

Changed in juju:
status: Triaged → Fix Committed
milestone: 2.2.0 → 2.1.0
Revision history for this message
Anastasia (anastasia-macmood) wrote :

As per above, the fix went in into earlier releases of Juju. However, I can only assign currently active milestones. I have put this against 2.1.0.

Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

Hi.

From what I can see, this is also fixed in 1.25.10.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Mario Splivalo (mariosplivalo),

Thank you for confirmation! I'll mark as 'Fix Committed' for 1.25 as well :D

Changed in juju-core:
status: Incomplete → Fix Committed
status: Fix Committed → Fix Released
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Well, 'Fix Released' as I cannot target 1.25.10 milestone since it's already out.

Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.1.0 → 2.1-rc1
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Ryan, can we close this out now for OpenStack Charm Test Infra?

Ryan Beisner (1chb1n)
Changed in charm-test-infra:
status: Confirmed → Invalid
Revision history for this message
Tom Haddon (mthaddon) wrote :

We're seeing this again on 1.25.12 I believe.

We have a Jenkins instance that runs about 80 jobs each weekend, a mix of Juju 1.25.12 and 2.2.2. This is running against the OpenStack provider, and each job spins up a Juju environment and then runs tests. For the last few weekends we've had a large number of failures in jobs, all with the same error as per the original description.

Looking at one example this morning I saw the following message:

"Multiple security_group matches found for name 'juju-mojo-ols-logging-trusty', use an ID to be more specific."

I confirmed manually that there were no security groups matching that name, and reran the job. It was able to bootstrap without problems. The first part of the job does:

juju destroy-environment --force -y $JUJU_ENV

It then runs juju status in a loop until the output matches:

'Please check your credentials or use .* to create a new environment.'

It then runs juju bootstrap.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.