destroy-environment reports WARNING cannot delete security group

Bug #1335885 reported by Matt Bruzek
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Tim Penhey
1.24
Fix Released
Critical
Tim Penhey
1.25
Fix Released
Critical
Tim Penhey

Bug Description

I found a WARNING about not being able to delete a security group in HP-cloud.

$ juju destroy-environment -y hp-mbruzek
WARNING cannot delete security group "juju-hp-mbruzek-0". Used by another environment?

When I check the HP console, the security group is deleted, so it appears Juju cleared it out after all. Is there something that can be done about the WARNING message?

Here are the steps I took to get this error:

$ juju bootstrap -v -e hp-mbruzek
$ juju deploy ubuntu
Added charm "cs:precise/ubuntu-4" to the environment.
$ juju destroy-environment -y hp-mbruzek
WARNING cannot delete security group "juju-hp-mbruzek-0". Used by another environment?

I had a similar problem with security groups last week. This problem was different because the groups were NOT deleted, even after using the --force flag. I opened a ticket on the HP website and the response from Chris Shin was:

"The reason why we can not delete the security groups is because they are in use by a particular port. We must delete the port to delete the security group. If you would like we will need your permission to delete the port so we can delete the security groups."

Does the Juju need to delete the ports before deleting security groups?

Please let me know if you need more information.

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → next-stable
Curtis Hovey (sinzui)
tags: added: cloud-installer destroy-environment landscape openstack-provider security
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.21 → none
importance: High → Medium
Revision history for this message
Abel Deuring (adeuring) wrote :

I see this warning in nearly all failing tests in this Jenkins job (checked failing builds in the range 7208..7469):

http://juju-ci.vapour.ws:8080/view/Cloud%20Health/job/test-cloud-hp/

The only failing builds _without_ the warning "cannot delete security group" are 7208 and 7305.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Abel, you wrote the rules to delete open stack security groups! You explained that since the env is still up, it is not possible to delete the state-server's security groups. The command is deferred though. So the groups will be deleted when the env is finally removed.

I don't believe destroy-environment --force deletes security groups though. The option exits early to cleanup the local disk, the command wont wait to cleanup. Juju CI doesn't cleanup security groups in HP anymore. The only ones we find are those left from test failures where juju didn't destroy-environment, or users using 1.18.x

Also note that the rules to delete security groups appear to be versioned with openstack. Juju doesn't cleanup canonistack lcy02.

Revision history for this message
Matt Rae (mattrae) wrote :

I'm running into this with juju 1.24.5 using the openstack provider. juju destroy-environment reports that it can't delete security groups, but I can delete them just fine through horizon. There are no other juju environments or security groups created by a previous environment.

ubuntu@test2:~$ juju destroy-environment openstack
WARNING! this command will destroy the "openstack" environment (type: openstack)
This includes all machines, services, data and other resources.

Continue [y/N]? y
WARNING cannot delete security group "juju-openstack-0". Used by another environment?
WARNING cannot delete security group "juju-openstack". Used by another environment?
WARNING cannot delete security group "juju-openstack-0". Used by another environment?

Same happens with using --force

ubuntu@test2:~$ juju destroy-environment openstack --force
WARNING cannot delete security group "juju-openstack-0". Used by another environment?
WARNING cannot delete security group "juju-openstack". Used by another environment?
WARNING cannot delete security group "juju-openstack-0". Used by another environment?

Revision history for this message
Matt Rae (mattrae) wrote :

Noting that I've seen both cases where I get the warning about not being able to delete the security groups, but when I check horizon they are gone.. but also times where when I check horizon, the security groups are still there and I am able to delete them.

Revision history for this message
Matt Rae (mattrae) wrote :

still seeing this with 1.25-alpha1

Revision history for this message
Ryan Beisner (1chb1n) wrote :

This has become more prevalent in charm test automation, with the openstack provider. During amulet tests, multiple separate Juju environments are bootstrapped, deployed, and destroyed in series.

00:36:56.142 WARNING cannot delete security group "juju-osci-sv07". Used by another environment?
00:36:56.143 ERROR failed to bootstrap environment: cannot start bootstrap instance: cannot set up groups: failed to create a security group with name: juju-osci-sv07

Increasingly, Juju fails to bootstrap because it fails to create a security group with the same name as the security group it had just failed to delete.

The impact is that we have false failures in public CI results, which causes us to have to re-run jobs, in hopes that this condition will not be met on the re-run.

00:35:11.507 juju-test.conductor.016-basic-trusty-juno RESULT : PASS
00:35:11.508 juju-test.conductor DEBUG : Tearing down osci-sv07 juju environment
00:35:11.508 juju-test.conductor DEBUG : Calling "juju destroy-environment -y osci-sv07"
00:35:46.388 WARNING cannot delete security group "juju-osci-sv07-0". Used by another environment?
00:35:48.432 juju-test.conductor DEBUG : Starting a bootstrap for osci-sv07, kill after 300
00:35:48.433 juju-test.conductor DEBUG : Running the following: juju bootstrap -e osci-sv07
00:35:49.290 Bootstrapping environment "osci-sv07"
00:35:51.591 Starting new instance for initial state server
00:35:51.690 Launching instance
00:36:23.485 Bootstrap failed, destroying environment
00:36:56.142 WARNING cannot delete security group "juju-osci-sv07". Used by another environment?
00:36:56.143 ERROR failed to bootstrap environment: cannot start bootstrap instance: cannot set up groups: failed to create a security group with name: juju-osci-sv07
00:36:56.143 caused by: failed executing the request http://10.245.161.158:8774/v2/d2be765ac7a7490a899995bdad501cc6/os-security-groups
00:36:56.143 caused by: Post http://10.245.161.158:8774/v2/d2be765ac7a7490a899995bdad501cc6/os-security-groups: EOF
00:36:56.151 juju-test.conductor WARNING : Could not bootstrap osci-sv07, got Bootstrap returned with an exit > 0. Skipping
00:36:56.151 juju-test.conductor DEBUG : Starting a bootstrap for osci-sv07, kill after 300
00:36:56.151 juju-test.conductor DEBUG : Running the following: juju bootstrap -e osci-sv07

Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Also, version info:

jenkins@juju-osci-machine-11:~$ apt-cache policy juju
juju:
  Installed: 1.24.6-0ubuntu1~14.04.1~juju1
  Candidate: 1.24.6-0ubuntu1~14.04.1~juju1
  Version table:
 *** 1.24.6-0ubuntu1~14.04.1~juju1 0
        500 http://ppa.launchpad.net/juju/stable/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     1.22.6-0ubuntu1~14.04.1 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty-updates/universe amd64 Packages
     1.18.1-0ubuntu1 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
jenkins@juju-osci-machine-11:~$

Changed in juju-core:
importance: Medium → Critical
milestone: none → 1.24.7
Ryan Beisner (1chb1n)
tags: added: amulet uosci
Revision history for this message
Curtis Hovey (sinzui) wrote :

Since you cannot delete a sec group while the env is up, but the sec group is deleted when env is torn down, the easiest solution is to just stop emitting the warning.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

This used to be just an annoying warning, but we are now seeing bootstrap failures over it.

To boil it down:

1. Amulet (or user) bootstraps, deploys, confirms ok, destroys.

2. 00:36:56.142 WARNING cannot delete security group "juju-osci-sv07". Used by another environment?
("I couldn't delete this named thing, probably because it is in use.")

3. Amulet (or user) issues bootstrap command, plans to deploy something different and confirm it.

4. 00:36:56.143 ERROR failed to bootstrap environment: cannot start bootstrap instance: cannot set up groups: failed to create a security group with name: juju-osci-sv07
("I couldn't create this named thing, probably because it already exists.")

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Here is a reproducer with --debug output. I hit this race in the 19th iteration of a simple bootstrap/deploy loop:

http://paste.ubuntu.com/12621979/

(Note, the lock removal is to work around bug 1500613, which will otherwise also be reproduced by the same loop.)

Revision history for this message
Ryan Beisner (1chb1n) wrote :

^ bootstrap/destroy loop, that is.

Tim Penhey (thumper)
Changed in juju-core:
milestone: 1.24.7 → 1.26-alpha1
Revision history for this message
Tim Penhey (thumper) wrote :

I have a branch that does some retries around the deletion of the security groups, with short delays. I'd like to see if we can test it with the bootstrap/destroy loop?

I've uploaded the juju and jujud binary to:
  chinstrap.canonical.com:~tim/openstack-secgroup-delete-retry

This should work with a bootstrap with --upload-tools to test.

The warnings are still emitted if the security group was not able to be deleted within two seconds. I'd be very curious to see if this fix works.

http://reviews.vapour.ws/r/2789/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

@thumper
I'll drop those binaries in and re-run the loop. Thanks!

Revision history for this message
Ryan Beisner (1chb1n) wrote :

@thumper

I do still hit the bootstrap fail (cannot set up groups error) with the provided binaries.

http://paste.ubuntu.com/12625371/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Just want to re-iterate that this is randomly blocking dev & testing, while we are in the final run-up to Liberty and 15.10 OpenStack Charm releases. We can't reliably gate on charm test failures, as many of the failures are this bug.

ex:
L185, L609 @ http://paste.ubuntu.com/12625976/

Many thanks for your work on this. Also open to suggestions for work-around.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

To give an idea of timing in our particular cloud, I queried nova secgroups and nova instances, in a loop, as fast as the apis would allow -- while bootstrapping and destroying.

Timeline and data from that:

bootstrap: http://paste.ubuntu.com/12626772/
destroy: http://paste.ubuntu.com/12626773/
nova instance: http://paste.ubuntu.com/12626774/
nova secgroup: http://paste.ubuntu.com/12626775/

20:15:41 bootstrap started by user
20:15:44 secgroup created (#1)
20:15:47 secgroup created (#2)
20:15:48 nova instance build starts
20:15:56 nova instance running & wired
20:17:14 bootstrap complete
20:19:03 destroy started by user
20:19:04 destroy issues terminate instances
20:19:04 nova instance starts to delete
20:19:11 nova instance is gone
20:19:13 secgroups gone
20:19:13 destroy exits cleanly

Tim Penhey (thumper)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Ryan Beisner (1chb1n) wrote :

With the 2nd test binary (30s @ 1s interval), the results were much-improved. The first 44 iterations were successful (vs 19 on the 1st custom bin, 9 on 1.24.6 proper). However, on the 45th iteration, a new error was hit.

After successfully destroying, it issued a 2nd 'terminate instance,' resulting in a nil error.

2015-09-30 22:33:00 DEBUG juju.provider.openstack provider.go:1141 terminating instances []
2015-09-30 22:33:31 WARNING juju.provider.openstack provider.go:1628 cannot delete security group "juju-beis1". Used by another environment?
2015-09-30 22:33:31 ERROR juju.cmd supercommand.go:430 failed to bootstrap environment: cannot start bootstrap instance: cannot set up groups: failed to create a rule for the security group with id: <nil>
caused by: failed executing the request http://10.245.161.158:8774/v2/e5965159218d4836950b2e5f27d1c9b2/os-security-group-rules
caused by: Post http://10.245.161.158:8774/v2/e5965159218d4836950b2e5f27d1c9b2/os-security-group-rules: EOF

...

Full loop output:
http://paste.ubuntu.com/12629219/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Stats from that last set of iterations (time from 'terminating instances' to 'command finished'):

# Excluding the failed final loop:
0:00:09 avg
0:00:06 median
0:00:03 min
0:00:34 max

# Including the failed final loop:
0:00:11 avg
0:00:07 median
0:00:03 min
0:01:06 max

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Curious of ETA for release / backport?

We are seriously affected in not-a-good way in our CI, running 1.24.6.

Tim Penhey (thumper)
Changed in juju-core:
status: In Progress → Fix Committed
Tim Penhey (thumper)
Changed in juju-core:
assignee: nobody → Tim Penhey (thumper)
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.