MAAS has orphan ip addresses and dns records that are slowing down the entire service
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Jacopo Rota | ||
3.4 |
Fix Released
|
High
|
Jacopo Rota | ||
3.5 |
Fix Released
|
High
|
Jacopo Rota |
Bug Description
On SOLQA env at X.X.164.2:5240 running 3.4.0 snap we see a lot of postgres activities
```
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
654234 root 20 0 1187948 908072 26908 S 84.1 1.4 20:51.34 python3
654236 root 20 0 1312776 952840 27144 S 55.5 1.4 22:23.74 python3
654232 root 20 0 1311844 964052 26996 S 43.2 1.5 25:05.65 python3
671936 postgres 20 0 239800 103008 90004 S 31.9 0.2 0:38.30 postgres
672556 postgres 20 0 250784 118092 98500 R 31.6 0.2 0:34.93 postgres
672900 postgres 20 0 235176 101228 92344 R 29.6 0.2 0:07.53 postgres
672552 postgres 20 0 236608 79128 69008 S 28.9 0.1 0:21.86 postgres
673032 postgres 20 0 239484 100504 87736 R 26.6 0.2 0:04.76 postgres
672220 postgres 20 0 258404 103504 75952 R 15.6 0.2 0:04.42 postgres
673275 postgres 20 0 235116 89260 81424 S 15.6 0.1 0:05.57 postgres
672577 postgres 20 0 235636 82832 73604 S 14.3 0.1 0:45.81 postgres
```
and after looking at the db most of the queries are dns/static ip related, for example:
```
SELECT "maasserver_
```
and looking at the record 2068
```
2068 | 2023-10-23 19:21:19.843186+00 | 2023-10-23 19:21:19.843186+00 | network-
```
and the related ip address
```
id | created | updated | ip | alloc_type | subnet_id | user_id | lease_time | temp_expires_on
-------
50843 | 2023-10-23 19:21:19+00 | 2023-10-23 19:31:19.913985+00 | | 6 | 2 | | 600 |
```
we spotted that this is coming from one of the MANY lxd containers that are spawned and deleted after few minutes.
The amount of such records is huge
```
maasdb=# select subnet_id, count(*) from maasserver_
subnet_id | count
-----------+-------
| 121
3 | 303
5 | 246
4 | 65
2 | 13686
```
as well as the ip addresses
```
maasdb=# select domain_id, count(*) from maasserver_
domain_id | count
-----------+-------
8 | 43
7 | 43
1 | 11862
5 | 43
4 | 43
6 | 43
3 | 43
```
most of these IP addresses are of `alloc_type=DHCP` and are really old
```
id | created | updated | ip | alloc_type | subnet_id | user_id | lease_time | temp_expires_on
-------
169688 | 2023-12-01 20:57:44+00 | 2023-12-01 21:07:44.427741+00 | | 6 | 2 | | 600 |
37082 | 2023-10-18 20:12:25+00 | 2023-10-18 20:22:25.756612+00 | | 6 | 2 | | 600 |
30036 | 2023-10-17 17:14:27+00 | 2023-10-17 17:24:27.870446+00 | | 6 | 2 | | 600 |
24933 | 2023-10-16 17:14:03+00 | 2023-10-16 17:24:03.284484+00 | | 6 | 2 | | 600 |
27055 | 2023-10-17 03:28:39+00 | 2023-10-17 03:38:39.891357+00 | | 6 | 2 | | 600 |
31218 | 2023-10-17 21:11:55+00 | 2023-10-17 21:21:55.570965+00 | | 6 | 2 | | 600 |
22972 | 2023-10-16 11:08:21+00 | 2023-10-16 11:18:21.326206+00 | | 6 | 2 | | 600 |
```
meaning that we are missing to clean them up (these are probably created when a new host gets an IP from the MAAS DHCP server).
Related branches
- MAAS Lander: Approve
- Jacopo Rota: Approve
-
Diff: 109 lines (+55/-1)2 files modifiedsrc/maasserver/start_up.py (+21/-1)
src/maasserver/tests/test_start_up.py (+34/-0)
- MAAS Lander: Approve
- Alexsander de Souza: Approve
-
Diff: 109 lines (+55/-1)2 files modifiedsrc/maasserver/start_up.py (+21/-1)
src/maasserver/tests/test_start_up.py (+34/-0)
description: | updated |
Changed in maas: | |
status: | Triaged → Fix Committed |
Changed in maas: | |
assignee: | nobody → Jacopo Rota (r00ta) |
Changed in maas: | |
milestone: | 3.5.0 → 3.5.0-beta1 |
status: | Fix Committed → Fix Released |
As discussed with Jacopo, I did a test on freshly deployed MAAS 3.4. I had a loop which created lxc container getting dhcp address from MAAS and immediately deleted the container.
After doing it around 100 times I could see numbers in abovce tables grow and they never go down even after leases expire.
I started another round with more containers.
maasdb=# select domain_id, count(*) from maasserver_ dnsresource group by domain_id;
domain_id | count
-----------+-------
1 | 2067
(1 row)
maasdb=# select subnet_id, count(*) from maasserver_ staticipaddress group by subnet_id;
subnet_id | count
-----------+-------
2 | 17
6 | 6
| 6
1 | 2233
3 | 8
5 | 6
(6 rows)
This is after all containers are gone and leases expired
MAAS UI says "5 hosts; 32 resource records" which is right