MAAS fails to create node because a node with the hostname already exists

Bug #2059715 reported by Amjad Chami
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Undecided
Unassigned

Bug Description

MAAS seems to restart during a command to compose a vm-host which causes it to fail because the command is issued twice

Console log:
2024-03-28-11:16:55 foundationcloudengine.layers.configuremaas INFO Creating graylog-3 in sunset
2024-03-28-11:16:55 root DEBUG [localhost]: maas root vm-host compose 2 hostname=graylog-3 cores=2 memory=4096 storage=40.0 zone=3
2024-03-28-11:17:06 root ERROR [localhost] Command failed: maas root vm-host compose 2 hostname=graylog-3 cores=2 memory=4096 storage=40.0 zone=3
2024-03-28-11:17:06 root ERROR 1[localhost] STDOUT follows:
{"hostname": ["Node with hostname \"graylog-3\" already exists"]}

In the logs:
2024-03-28T11:16:55+00:00 sunset maas.node: [info] juju-upgrade-2: Status transition from TESTING to READY
2024-03-28T11:16:55+00:00 sunset maas.service_monitor: [info] Service 'maas-http' has been restarted. Its current state is 'on' and 'running'.

testrun: https://solutions.qa.canonical.com/testruns/459947f6-df64-4af7-97ca-0337e42455fe
artificats: https://oil-jenkins.canonical.com/artifacts/459947f6-df64-4af7-97ca-0337e42455fe/index.html

Revision history for this message
Jacopo Rota (r00ta) wrote :

Hi @Amjad, was actually your script retrying the request because the first one was apparently failing? Or did you get
```
2024-03-28-11:17:06 root ERROR [localhost] Command failed: maas root vm-host compose 2 hostname=graylog-3 cores=2 memory=4096 storage=40.0 zone=3
2024-03-28-11:17:06 root ERROR 1[localhost] STDOUT follows:
{"hostname": ["Node with hostname \"graylog-3\" already exists"]}
```
after a single call to `maas root vm-host compose 2`?

Revision history for this message
Amjad Chami (amjad-chami) wrote :

I think it was a single call, if it was multiple it would have shown in the fce debug output

Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

Hi Amjad,I can see inside foundation engine a logic that is retrying a command 10 times and, by default, the output_mode is set to quiet. Meaning that there is high chance we lost something from the output. (search for `remotehelpers/run_cmd`). Actually you can start from add_vm_host_vm function and follow the chain of execution till run_cmd. May I ask you to update the script to something like output_mode="live" and retry?

Changed in maas:
status: New → Incomplete
summary: - MAAS fails to create node because a node with the hostname exists
+ MAAS fails to create node because a node with the hostname already
+ exists
Revision history for this message
Amjad Chami (amjad-chami) wrote :

I managed to recreate this bug while trying to recreate LP#1965554. It's kinda hard to hit but it depends on when the command is issued during a maas restart. Depending on when the vm-compose is ran and when the the maas restart one of three things can happen, either the vm is created, or it's not, or this bug is hit.

Revision history for this message
Anton Troyanov (troyanov) wrote :

Hi Amjad,

So far I didn't have any luck to reproduce the issue. If you have an environment where it can be reproduced, may I ask you to collect some LXD logs? (I guess you are using LXD as KVM?) `lxc monitor --pretty`

It will help us to understand where the problem happens and also would be great to get `lxc ls --all-projects`

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Amjad Chami (amjad-chami) wrote :

Hi Anton, I'll try to replicate it. For context, I ran into it once after creating and deleting vms at least a hundred times, it seems it needs to be in a specific state during a restart. We've seen 4 instances of it in the past month https://solutions.qa.canonical.com/bugs/lp:maas:2059715

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.