openstack core count may be inaccurate
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Auto Package Testing |
In Progress
|
High
|
Skia |
Bug Description
Its has been reported (thanks ginggs) that failures can happen in autopkgtest-cloud with error:
Quota exceeded for cores: Requested 2, but already used 514 of 515 cores (HTTP 403)
Full log at [1]. I checked number of running VMs shortly after, and I counted less than 400 cores in use (taking into account that autopkgtest-big instances have 4 cores).
This may be a side effect of dropping the flock [2]. It may be that the instance deletion is asynchronous, and cores are freed only after the delete operation is complete.
We should do something like:
1. Figure out a way to query openstack for the current quota usage, and check how it matches the number of running VMs.
2. Check if in the worker we can do something like instance.
3. Check whether this improves the comparison of point (1.)
[1] https:/
[2] https:/
Related branches
- Paride Legovini: Approve
-
Diff: 25 lines (+7/-0)1 file modifiedcharms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker (+7/-0)
Changed in auto-package-testing: | |
importance: | Undecided → High |
tags: | added: adt-564 |
Changed in auto-package-testing: | |
assignee: | nobody → Skia (hyask) |
I've built a tiny script to compare what is reported by the quota and a manual count.
It prints the quota, then the count, then the quota again, because since it can take a bit of time to compute the manual count, this shows quickly if there are inconsistencies between the first and second quota displayed.
During the manual count, it will also print any instance that is not either `ACTIVE` or `ERROR`, because those are two very common status, and it has been verified that they are correctly counted in the quota. `SHUTOFF` has also been verified as counting in the quota, but is sufficiently rare as to not make too much noise, and that also helps in cleaning up those usually old VMs.
So far, all the printed VMs are in the `BUILD` state, and since I've always observed inconsistencies between the reported quota and the manual count, I guess they got added to the quota during this state.
Here are example outputs from the script, with some comments:
# lcy02:
quota: {'core': 92, 'instance': 43} amd64-liblocale -us-perl- 20231129- 134245- juju-7f2275- prod-proposed- migration- environment- 3 - BUILD - autopkgtest - 2 amd64-nvidia- graphics- drivers- 525-20231129- 140832- juju-7f2275- prod-proposed- migration- environment- 2 - BUILD - autopkgtest - 2 amd64-systemd- upstream- 20240625- 110038- juju-7f2275- prod-proposed- migration- environment- 3-2792f97d- 63b5-4c80- a259-8ef43a4d06 2e - BUILD - autopkgtest - 2 amd64-dgit- 20240625- 082704- juju-7f2275- prod-proposed- migration- environment- 3-1cb642ce- cf37-4f8e- a233-f561d63f55 57 - BUILD - autopkgtest - 2 amd64-systemd- upstream- 20240625- 101614- juju-7f2275- prod-proposed- migration- environment- 3-93c90fad- c9eb-4c82- 9f69-b8d4a267b8 e2 - BUILD - autopkgtest - 2
adt-noble-
adt-focal-
adt-noble-
adt-oracular-
adt-noble-
count: {'core': 96, 'instance': 45}
quota: {'core': 92, 'instance': 43}
This is a very common example: a few VMs in `BUILD`, and the reported quota is a bit below the manual count.
# bos03-arm64:
quota: {'core': 186, 'instance': 42} arm64-r- cran-ps- 20240617- 073206- juju-7f2275- prod-proposed- migration- environment- 3-82b1810f- 7af3-447f- b772-c474b3675c 87 - BUILD - autopkgtest - 2
adt-oracular-
count: {'core': 188, 'instance': 43}
quota: {'core': 186, 'instance': 42}
This one is interesting, because the delta between the counted and reported values is exactly the only VM that is displayed. In addition, trying to `openstack server show` this VM reports `No server with a name or ID of '[...]' exists.`, confirming that OpenStack is clearly inconsistent with this one.
# bos02-arm64:
quota: {'core': 145, 'instance': 84}
count: {'core': 55, 'instance': 25}
quota: {'core': 145, 'instance': 84}
This one is really weird: lots of instances counting in the quota, but only a third displayed in `openstack server list`. This is probably a case where we should ask IS to run some magic.
# bos03-s390x and bos03-ppc64el:
quota: {'core': 0, 'instance': 0}
count: {'core': 0, 'instance': 0}
quota: {'core': 0, 'instance': 0}
Not very interesting, but at least it's consistent: if we don't use those OpenStack, the zeros are everywhere.