Bug #2028284 “LXD vm compose fails with - This “instances” entry...” : Bugs : MAAS

Revision history for this message

Adam Collard (adam-collard) wrote on 2023-07-20:

#1

That error comes from LXD, when you try to create an instance with a name that already exists.

for i in 1 2; do lxc launch --empty foo; done

Revision history for this message

Bill Wear (billwear) wrote on 2023-07-25:

#2

so how did these duplicate instance names come about -- is it something in the MAAS code that causes this, or is it something related to the test rigging?

Bill Wear (billwear) on 2023-07-26

Changed in maas:
status:	New → Incomplete

Revision history for this message

Marian Gasparovic (marosg) wrote on 2023-08-04:

#3

There is no test rigging, all comes directly from deployment

Changed in maas:
status:	Incomplete → New

Revision history for this message

Marian Gasparovic (marosg) wrote on 2023-08-09:

#4

Download full text (10.6 KiB)

I did several experiments, I made sure there were no machines defined

$ for i in 30 31 32; do ssh 10.244.40.$i lxc list ; done
Warning: Permanently added '10.244.40.30' (ED25519) to the list of known hosts.
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
Warning: Permanently added '10.244.40.31' (ED25519) to the list of known hosts.
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm

+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+

Then ran the deployment

$ fce_wrap build --layer maas --steps compose_vms
2023-08-09-13:10:11 root DEBUG fce --debug build --layer maas --steps compose_vms
2023-08-09-13:10:11 root DEBUG FCE version: 2.21+git.10.g0acb2a1d
2023-08-09-13:10:11 root DEBUG Running 'zone' project check
2023-08-09-13:10:11 fce.build INFO Started building layer: maas
Warning: Permanently added '10.244.40.30' (ED25519) to the list of known hosts.
2023-08-09-13:10:12 fce.maas INFO Starting step: maas:compose_vms
2023-08-09-13:10:12 root DEBUG [localhost]: maas root vm-hosts read
2023-08-09-13:10:16 root DEBUG [localhost]: maas root version read
2023-08-09-13:10:19 root DEBUG [localhost]: maas root rack-controllers read hostname=leafeon
2023-08-09-13:10:24 root DEBUG [localhost]: maas root vm-host update 1 memory_over_commit_ratio=10 cpu_over_commit_ratio=10
2023-08-09-13:10:29 root DEBUG [localhost]: maas root machines read
2023-08-09-13:10:34 foundationcloudengine.layers.configuremaas INFO Creating elastic-1 in leafeon
2023-08-09-13:10:34 root DEBUG [localhost]: maas root vm-host compose 1 hostname=elastic-1 cores=2 memory=24576 storage=500.0 zone=2
2023-08-09-13:10:41 root DEBUG [localhost]: maas root tags create name=elastic
2023-08-09-13:10:44 root ERROR [localhost] Command failed: maas root tags create name=elastic
2023-08-09-13:10:44 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:10:44 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:10:44 root DEBUG [localhost]: maas root tag update-nodes elastic add=np6tyw
2023-08-09-13:10:48 foundationcloudengine.layers.configuremaas INFO Creating grafana-1 in leafeon
2023-08-09-13:10:48 root DEBUG [localhost]: maas root vm-host compose 1 hostname=grafana-1 cores=2 memory=3072 storage=40.0 zone=2
2023-08-09-13:10:54 root DEBUG [localhost]: maas root tags create name=grafana
2023-08-09-13:10:58 root ERROR [localhost] Command failed: maas root tags create name=grafana
2023-08-09-13:10:58 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:10:58 root ERROR 2[localhost] STDERR follows...

I did several experiments, I made sure there were no machines defined

$ for i in 30 31 32; do ssh 10.244.40.$i lxc list ; done
Warning: Permanently added '10.244.40.30' (ED25519) to the list of known hosts.
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
Warning: Permanently added '10.244.40.31' (ED25519) to the list of known hosts.
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm

+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+

Then ran the deployment

$ fce_wrap build --layer maas --steps compose_vms
2023-08-09-13:10:11 root DEBUG fce --debug build --layer maas --steps compose_vms
2023-08-09-13:10:11 root DEBUG FCE version: 2.21+git.10.g0acb2a1d
2023-08-09-13:10:11 root DEBUG Running 'zone' project check
2023-08-09-13:10:11 fce.build INFO Started building layer: maas
Warning: Permanently added '10.244.40.30' (ED25519) to the list of known hosts.
2023-08-09-13:10:12 fce.maas INFO Starting step: maas:compose_vms
2023-08-09-13:10:12 root DEBUG [localhost]: maas root vm-hosts read
2023-08-09-13:10:16 root DEBUG [localhost]: maas root version read
2023-08-09-13:10:19 root DEBUG [localhost]: maas root rack-controllers read hostname=leafeon
2023-08-09-13:10:24 root DEBUG [localhost]: maas root vm-host update 1 memory_over_commit_ratio=10 cpu_over_commit_ratio=10
2023-08-09-13:10:29 root DEBUG [localhost]: maas root machines read
2023-08-09-13:10:34 foundationcloudengine.layers.configuremaas INFO Creating elastic-1 in leafeon
2023-08-09-13:10:34 root DEBUG [localhost]: maas root vm-host compose 1 hostname=elastic-1 cores=2 memory=24576 storage=500.0 zone=2
2023-08-09-13:10:41 root DEBUG [localhost]: maas root tags create name=elastic
2023-08-09-13:10:44 root ERROR [localhost] Command failed: maas root tags create name=elastic
2023-08-09-13:10:44 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:10:44 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:10:44 root DEBUG [localhost]: maas root tag update-nodes elastic add=np6tyw
2023-08-09-13:10:48 foundationcloudengine.layers.configuremaas INFO Creating grafana-1 in leafeon
2023-08-09-13:10:48 root DEBUG [localhost]: maas root vm-host compose 1 hostname=grafana-1 cores=2 memory=3072 storage=40.0 zone=2
2023-08-09-13:10:54 root DEBUG [localhost]: maas root tags create name=grafana
2023-08-09-13:10:58 root ERROR [localhost] Command failed: maas root tags create name=grafana
2023-08-09-13:10:58 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:10:58 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:10:58 root DEBUG [localhost]: maas root tag update-nodes grafana add=gnxg47
2023-08-09-13:11:02 foundationcloudengine.layers.configuremaas INFO Creating juju-1 in leafeon
2023-08-09-13:11:02 root DEBUG [localhost]: maas root vm-host compose 1 hostname=juju-1 cores=8 memory=24576 storage=100.0 zone=2
2023-08-09-13:11:08 root DEBUG [localhost]: maas root tags create name=juju
2023-08-09-13:11:12 root ERROR [localhost] Command failed: maas root tags create name=juju
2023-08-09-13:11:12 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:11:12 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:11:12 root DEBUG [localhost]: maas root tag update-nodes juju add=nesx37
2023-08-09-13:11:16 foundationcloudengine.layers.configuremaas INFO Creating landscapeha-1 in leafeon
2023-08-09-13:11:16 root DEBUG [localhost]: maas root vm-host compose 1 hostname=landscapeha-1 cores=2 memory=3072 storage=40.0 zone=2
2023-08-09-13:11:21 root DEBUG [localhost]: maas root tags create name=landscapeha
2023-08-09-13:11:25 root ERROR [localhost] Command failed: maas root tags create name=landscapeha
2023-08-09-13:11:25 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:11:25 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:11:25 root DEBUG [localhost]: maas root tag update-nodes landscapeha add=6hsfn4
2023-08-09-13:11:29 foundationcloudengine.layers.configuremaas INFO Creating landscape-1 in leafeon
2023-08-09-13:11:29 root DEBUG [localhost]: maas root vm-host compose 1 hostname=landscape-1 cores=2 memory=4096 storage=40.0 zone=2
2023-08-09-13:11:34 root DEBUG [localhost]: maas root tags create name=landscape
2023-08-09-13:11:38 root ERROR [localhost] Command failed: maas root tags create name=landscape
2023-08-09-13:11:38 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:11:38 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:11:38 root DEBUG [localhost]: maas root tag update-nodes landscape add=xcrnhr
2023-08-09-13:11:42 foundationcloudengine.layers.configuremaas INFO Creating landscapesql-1 in leafeon
2023-08-09-13:11:42 root DEBUG [localhost]: maas root vm-host compose 1 hostname=landscapesql-1 cores=2 memory=16384 storage=100.0 zone=2
2023-08-09-13:11:47 root DEBUG [localhost]: maas root tags create name=landscapesql
2023-08-09-13:11:51 root ERROR [localhost] Command failed: maas root tags create name=landscapesql
2023-08-09-13:11:51 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:11:51 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:11:51 root DEBUG [localhost]: maas root tag update-nodes landscapesql add=fddbnk
2023-08-09-13:11:55 foundationcloudengine.layers.configuremaas INFO Creating landscapeamqp-1 in leafeon
2023-08-09-13:11:55 root DEBUG [localhost]: maas root vm-host compose 1 hostname=landscapeamqp-1 cores=2 memory=4096 storage=40.0 zone=2
2023-08-09-13:12:01 root DEBUG [localhost]: maas root tags create name=landscapeamqp
2023-08-09-13:12:05 root ERROR [localhost] Command failed: maas root tags create name=landscapeamqp
2023-08-09-13:12:05 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:12:05 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:12:05 root DEBUG [localhost]: maas root tag update-nodes landscapeamqp add=q77qfs
2023-08-09-13:12:08 foundationcloudengine.layers.configuremaas INFO Creating nagios-1 in leafeon
2023-08-09-13:12:08 root DEBUG [localhost]: maas root vm-host compose 1 hostname=nagios-1 cores=4 memory=3072 storage=40.0 zone=2
2023-08-09-13:12:14 root DEBUG [localhost]: maas root tags create name=nagios
2023-08-09-13:12:18 root ERROR [localhost] Command failed: maas root tags create name=nagios
2023-08-09-13:12:18 root ERROR 1[localhost] STDOUT follows:
{"name": ["Tag with this Name already exists."]}
2023-08-09-13:12:18 root ERROR 2[localhost] STDERR follows:
b''
2023-08-09-13:12:18 root DEBUG [localhost]: maas root tag update-nodes nagios add=faxnwc
2023-08-09-13:12:22 foundationcloudengine.layers.configuremaas INFO Creating graylog-1 in leafeon
2023-08-09-13:12:22 root DEBUG [localhost]: maas root vm-host compose 1 hostname=graylog-1 cores=4 memory=16384 storage=40.0 zone=2
2023-08-09-13:12:27 root ERROR [localhost] Command failed: maas root vm-host compose 1 hostname=graylog-1 cores=4 memory=16384 storage=40.0 zone=2
2023-08-09-13:12:27 root ERROR 1[localhost] STDOUT follows:
Unable to compose machine because: Failed talking to pod: Failed creating instance record: Add instance info to the database: This "instances" entry already exists
2023-08-09-13:12:27 root ERROR 2[localhost] STDERR follows:
b''

And here is a new status

$ for i in 30 31 32; do ssh 10.244.40.$i lxc list ; done
Warning: Permanently added '10.244.40.30' (ED25519) to the list of known hosts.
+-----------------+---------+----------------------+------+-----------------+-----------+
|      NAME       |  STATE  |         IPV4         | IPV6 |      TYPE       | SNAPSHOTS |
+-----------------+---------+----------------------+------+-----------------+-----------+
| elastic-1       | RUNNING | 10.244.40.119 (eth0) |      | VIRTUAL-MACHINE | 0         |
+-----------------+---------+----------------------+------+-----------------+-----------+
| grafana-1       | RUNNING | 10.244.40.158 (eth0) |      | VIRTUAL-MACHINE | 0         |
+-----------------+---------+----------------------+------+-----------------+-----------+
| graylog-1       | STOPPED |                      |      | VIRTUAL-MACHINE | 0         |
+-----------------+---------+----------------------+------+-----------------+-----------+
| juju-1          | RUNNING | 10.244.40.159 (eth0) |      | VIRTUAL-MACHINE | 0         |
+-----------------+---------+----------------------+------+-----------------+-----------+
| landscape-1     | RUNNING | 10.244.40.161 (eth0) |      | VIRTUAL-MACHINE | 0         |
+-----------------+---------+----------------------+------+-----------------+-----------+
| landscapeamqp-1 | RUNNING | 10.244.40.117 (eth0) |      | VIRTUAL-MACHINE | 0         |
+-----------------+---------+----------------------+------+-----------------+-----------+
| landscapeha-1   | RUNNING | 10.244.40.195 (eth0) |      | VIRTUAL-MACHINE | 0         |
|                 |         | 10.244.40.160 (eth0) |      |                 |           |
+-----------------+---------+----------------------+------+-----------------+-----------+
| landscapesql-1  | RUNNING | 10.244.40.116 (eth0) |      | VIRTUAL-MACHINE | 0         |
+-----------------+---------+----------------------+------+-----------------+-----------+
| nagios-1        | RUNNING | 10.244.40.193 (eth0) |      | VIRTUAL-MACHINE | 0         |
+-----------------+---------+----------------------+------+-----------------+-----------+
Warning: Permanently added '10.244.40.31' (ED25519) to the list of known hosts.
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
Warning: Permanently added '10.244.40.32' (ED25519) to the list of known hosts.
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+

I repeated it five times, I hit this issue three times with various VMs in any of those three servers.

Then I modified the deployment to put 10 seconds sleep between each `vm-host compose` and with five tests I did not encounter the issue.

Revision history for this message

Björn Tillenius (bjornt) wrote on 2023-08-10:

#5

I suspect what happens is that in the api handler, we first create the instance in lxd and then we do db work. If the db work fails due to a serialization error, which is expected now and then, the whole request is retried and it will try to create a new instance with the same name in lxd.

It's quite hard to handle this properly without significant changes, but I'll take a look to see what we can do. I'm also going to see whether we changed something recently so that you see more serialization errors than before.

Revision history for this message

Björn Tillenius (bjornt) wrote on 2023-08-10:

#6

BTW, I did reproduce this by raising a serialization error at the end of BMC.create_machine().

Revision history for this message

Björn Tillenius (bjornt) wrote on 2023-08-11:

#7

There are three serialization errors happening between the time the compose command was being executed and the 'instance already exist' error happened. The most likely culprit is this:

2023-07-19 22:17:41.506 UTC [2508416] maas@maasdb ERROR: could not serialize ac
cess due to concurrent update
2023-07-19 22:17:41.506 UTC [2508416] maas@maasdb STATEMENT: UPDATE "maasserver_podhints" SET "pod_id" = 1, "cores" = 20, "memory" = 131072, "cpu_speed" = 2600, "local_storage" = 0, "cluster_id" = NULL WHERE "maasserver_podhints"."id" = 1

They way the compose form works is that it creates an instance in lxd and then it updates the pod hints in the database. After that it adds a post commit hook to refresh commissioning data for the pod, which also syncs the hints to the database.

So what can happen when composing multiple vms in a row is that the pod hint sync in the form and the one in the post commit from the previous compose command will conflict.

Still trying to figure out how to solve this.

Revision history for this message

Björn Tillenius (bjornt) wrote on 2023-08-11:

#8

In the long term, we should make the compose form handle conflict errors. One idea is that add a user.* isntance property containing the maas system id, so that we know that MAAS created the instance and can handle deletion of it if necessary.

In the short term, we can make BMC.sync_hints() a bit smarter and not save anything if nothing changed. Currently only virsh is actually returning any hints when composing a machine. And only LXD is sending commissioning results in the post commit hook.

Changed in maas:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 3.5.0

Adam Collard (adam-collard) on 2023-08-21

summary:	- vm compose fails with - This "instances" entry already exists + LXD vm compose fails with - This "instances" entry already exists
Changed in maas:
assignee:	nobody → Alexsander de Souza (alexsander-souza)

Alexsander de Souza (alexsander-souza) on 2023-08-25

Changed in maas:
status:	Triaged → In Progress

MAAS Lander (maas-lander) on 2023-08-29

Changed in maas:
status:	In Progress → Fix Committed

Revision history for this message

Trent Lloyd (lathiat) wrote on 2024-02-28:

#9

Wanted to note that I've just filed a Bug #2055252 that may be sortof related but I don't see a serialisation error in my case (and it's not obvious exactly why the VM creation fails and gets rolled back).

The LXD VM creation fails specifically with a storage constraint of ('storage', ['root:0,0:8']) which juju is generating when using a second disk from juju storage but you don't specify a root-disk size with a root-disk constraint - so it passes in 0. (This Juju Bug #1983084)

It my bug it seems the DB entries are rolled back but the LXD VM is left behind and requires manual cleanup. Since this also comes from juju, it tries 10 times so we get 10 VMs left behind when this happens.

Anton Troyanov (troyanov) on 2024-03-05

Changed in maas:
milestone:	3.5.0 → 3.5.0-beta1

Anton Troyanov (troyanov) on 2024-03-05

Changed in maas:
status:	Fix Committed → Fix Released

MAAS

LXD vm compose fails with - This "instances" entry already exists

Bug Description

Related branches

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to	Milestone
MAAS	Fix Released	High	Alexsander de Souza	MAAS 3.5.0-beta1
3.3	Fix Released	High	Alexsander de Souza	MAAS 3.3.5
3.4	Fix Released	High	Alexsander de Souza	MAAS 3.4.0-rc2