[2.9-RC12] Bootstrap fails, security group 'juju-XXXXX' does not exist in default VPC

Bug #1926169 reported by Michael Skalka
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
Low
Unassigned

Bug Description

Seen during the juju 2.9-rc12 release test, we fail to bootstrap due to a juju-created security group not being found:

2021-04-23 18:17:12.253857 Launching controller instance(s) on aws/us-east-1...
2021-04-23 18:17:18.654122 - Verifying availability zone
 - Making user data
 - Setting up groups
 - Trying to start instance in availability zone "us-east-1a"
 - Start instance attempt 1
 - Verifying availability zone
 - Making user data
 - Setting up groups
ERROR failed to bootstrap model: cannot start bootstrap instance: cannot set up groups: fetching security group "juju-8dd56ca7-27b1-44b9-8c86-f3bbfa79d21a-0": The security group 'juju-8dd56ca7-27b1-44b9-8c86-f3bbfa79d21a-0' does not exist in default VPC 'vpc-c0451dba' (InvalidGroup.NotFound)

Full details of the run can be found here: https://solutions.qa.canonical.com/testruns/testRun/24ac9264-2b83-4c77-926f-e8f97d157895

Unfortunately we did not capture any logs from the local juju client snap.

Revision history for this message
Ian Booth (wallyworld) wrote :

The code basically does

name = "<controllerUUID>-0" #0 is machine 0, the bootstrap machine

createGroup(name)
if group already exists {
  readGroup(name) <--- failing here
  setGroupPermissions(name)
} else {
  setGroupTags(tags)
}

At bootstrap, the group should not previously exist so it should just set the tags.
However, it's failing because the create API call appears to get back a "InvalidGroup.Duplicate" error, so Juju tries to read the group. It's understandable that there could be an eventual consistency error but not in this circumstance as the group named after the new controller UUID should not already exist so the code path in question should never get executed.

Does this issue occur frequently?

Revision history for this message
Alexander Balderson (asbalderson) wrote :

Ian,

it doenst look like it happens too frequently; but we're marking all the occurrences with this bug; you can view the occurrences here as they happen.

https://solutions.qa.canonical.com/bugs/bugs/bug/1926169

Revision history for this message
Joshua Genet (genet022) wrote :
Revision history for this message
Alexander Balderson (asbalderson) wrote :

we also seen this 2.9.5

Revision history for this message
John A Meinel (jameinel) wrote :

Following the Solutions QA bug link: https://solutions.qa.canonical.com/bugs/bugs/bug/1926169

This hasn't been seen since October (and then only a single time). There may be a timing related issue w/ AWS (we create the security group, but because of eventual consistency, when we query for it later it doesn't exist yet).
However, since it happens very infrequently, I don't think we'll prioritize this until we have an easy way to reproduce, so that we know that we have fixed it.

Changed in juju:
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.