MAAS LXD VM creation issue (Ensure this value is less than or equal to 0)

Bug #2048399 reported by Dan Martin
54
This bug affects 9 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Stamatis Katsaounis
3.3
Fix Released
High
Stamatis Katsaounis
3.4
Fix Released
High
Stamatis Katsaounis
lxd
Fix Released
Unknown

Bug Description

Hi All,

In summary - I have an issue creating VM’s through MAAS and LXD.
The error:
Ensure this value is less than or equal to 0 for CPU and RAM.

I have the latest version of MAAS running: 3.4.0-14319-g.3ab76533f
I decided to setup an LXD host to see the integration between the two…

I installed LXD (5.19) with no issues.

Within LXD I enabled the UI and added a vm. again with no issues.
To help, I deleted this VM before the next steps.

I then went into MAAS and added an LXD host.
No issues.

With MAAs I then try to add a vm but get errors for both CPU and RAM:
Error: Ensure this value is less than or equal to 0.

It doesn’t matter what I put in here, it won’t create.

regiond.log

2024-01-05 13:58:19 maasserver.websockets.protocol: [critical] Error on request (39) pod.compose: {“cores”: [“Ensure this value is less than or equal to 0.”], “memory”: [“Ensure this value is less than or equal to 0.”]}
Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/twisted/internet/defer.py”, line 700, in errback
self._startRunCallbacks(fail)
File “/usr/lib/python3/dist-packages/twisted/internet/defer.py”, line 763, in _startRunCallbacks
self._runCallbacks()
File “/usr/lib/python3/dist-packages/twisted/internet/defer.py”, line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File “/usr/lib/python3/dist-packages/twisted/internet/defer.py”, line 1750, in gotResult
current_context.run(_inlineCallbacks, r, gen, status)
— —
File “/usr/lib/python3/dist-packages/twisted/internet/defer.py”, line 1656, in _inlineCallbacks
result = current_context.run(
File “/usr/lib/python3/dist-packages/twisted/python/failure.py”, line 489, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File “/usr/lib/python3/dist-packages/maasserver/websockets/handlers/pod.py”, line 317, in compose
form = await deferToDatabase(get_form, pod, params)
File “/usr/lib/python3/dist-packages/twisted/python/threadpool.py”, line 244, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
File “/usr/lib/python3/dist-packages/twisted/python/threadpool.py”, line 260, in
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
File “/usr/lib/python3/dist-packages/twisted/python/context.py”, line 117, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File “/usr/lib/python3/dist-packages/twisted/python/context.py”, line 82, in callWithContext
return func(*args, **kw)
File “/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py”, line 856, in callInContext
return func(*args, **kwargs)
File “/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py”, line 203, in wrapper
result = func(*args, **kwargs)
File “/usr/lib/python3/dist-packages/maasserver/utils/orm.py”, line 771, in call_within_transaction
return func_outside_txn(*args, **kwargs)
File “/usr/lib/python3/dist-packages/maasserver/utils/orm.py”, line 574, in retrier
return func(*args, **kwargs)
File “/usr/lib/python3.10/contextlib.py”, line 79, in inner
return func(*args, **kwds)
File “/usr/lib/python3/dist-packages/maasserver/websockets/handlers/pod.py”, line 307, in get_form
raise HandlerValidationError(form.errors)
maasserver.websockets.base.HandlerValidationError: {‘cores’: [‘Ensure this value is less than or equal to 0.’], ‘memory’: [‘Ensure this value is less than or equal to 0.’]}

I then added a new VM via the LXD ui and it worked fine (still).
After refreshing host, the vm I created shows as commissioning, but this never occurs.
I can however delete this VM from MAAS, it can also see all the info correctly…

Any help would be great.

Thanks,

Related branches

Revision history for this message
Anton Troyanov (troyanov) wrote :

Hi Dan,

Regarding your first issue (validation error): can you please check if MAAS UI displays any values for your LXD host and try clicking "Refresh host" button?

Regarding issue with a stuck commissioning: is there anything interesting in the VM console? `lxc console {your-vm-name} --project {maas-kvm-project}`

Revision history for this message
Anton Troyanov (troyanov) wrote :

Another guess regarding "stuck commissioning": it is possible that you have Docker installed on the same machine that is an LXD host?

https://documentation.ubuntu.com/lxd/en/latest/howto/network_bridge_firewalld/#network-lxd-docker

Changed in maas:
status: New → Incomplete
Revision history for this message
Dan Martin (ec-danmartin) wrote :

Hi Anton,

Thanks for looking over this.
Refreshing the host does not fix the issue. The available resources are displayed and are correct.

The VM console doesn't have anything of note.
MAAS eventually detaches the commission has failed.

As for the host, it has nothing but LXD on it. ie fresh install.

Revision history for this message
Anton Troyanov (troyanov) wrote :

Can you please share the output of `maas <profile> vm-hosts read`

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Dan Martin (ec-danmartin) wrote :

Hi Anton:

Success.
Machine-readable output follows:
[
    {
        "type": "lxd",
        "available": {
            "cores": 0,
            "memory": 0,
            "local_storage": 3862249340928
        },
        "cpu_over_commit_ratio": 1.0,
        "architectures": [
            "amd64/generic"
        ],
        "host": {
            "system_id": "kksmtf",
            "__incomplete__": true
        },
        "id": 42,
        "name": "XXXXXX",
        "pool": {
            "name": "ec",
            "description": "ec pool",
            "id": 0,
            "resource_uri": "/MAAS/api/2.0/resourcepool/0/"
        },
        "storage_pools": [
            {
                "id": "ecsp",
                "name": "ecsp",
                "type": "zfs",
                "path": "ecsp",
                "total": 3862249340928,
                "used": 0,
                "available": 3862249340928,
                "default": true
            }
        ],
        "default_macvlan_mode": null,
        "used": {
            "cores": 0,
            "memory": 0,
            "local_storage": 0
        },
        "memory_over_commit_ratio": 1.0,
        "zone": {
            "name": "kh",
            "description": "",
            "id": 1,
            "resource_uri": "/MAAS/api/2.0/zones/kh/"
        },
        "capabilities": [
            "composable",
            "dynamic_local_storage",
            "over_commit",
            "storage_pools"
        ],
        "version": "5.19",
        "total": {
            "cores": 0,
            "memory": 0,
            "local_storage": 3862249340928
        },
        "tags": [
            "pod-console-logging"
        ],
        "resource_uri": "/MAAS/api/2.0/vm-hosts/42/"
    }
]

Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

Hi Dan,

I would like to ask you if you can provide us with the commissioning script results of machine with system_id kksmtf.

$ maas <profile> node-results read system_id=kksmtf name=50-maas-01-commissioning | jq -r '.[0].data' | base64 -d
$ maas <profile> node-results read system_id=kksmtf name=20-maas-03-machine-resources | jq -r '.[0].data' | base64 -d

I am more interested in 20-maas-03-machine-resources but let's ask both for completeness. This script is running `lxc info --resources` among others, and captures the output. I want to check from the content if memory and CPU are properly discovered.

Thank you :)

Revision history for this message
Dan Martin (ec-danmartin) wrote :

Hi Stamatis,

Attached is an zip with he 2 outputs in.

Thanks,

Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

Hi Dan,

Thank you for providing the commissioning script output. I used it to trigger the internal parsing function and it parsed it successfully. That means that something is not updated correctly afterwards. Some extra questions for clarification:

1) What does MAAS report for CPU/Memory of machine `kksmtf` (in accordance with VM host info before refresh)?

    `maas <profile> vm-host read <vm_host_id> | jq -r .total`
    `maas <profile> machine read kksmtf | jq -r .cpu_count`
    `maas <profile> machine read kksmtf | jq -r .memory`

2) After running a new refresh command can you repeat the same queries and let me know what you are getting?

   `maas admin vm-host refresh <vm_host_id> | jq -r .total`
   `maas <profile> machine read kksmtf | jq -r .cpu_count`
   `maas <profile> machine read kksmtf | jq -r .memory`

3) Is it an option for you to commission the machine with MAAS and then deploy it as a VM host? This will let MAAS install LXD on the machine and do the VM host registration for you.

   `maas admin machine deploy <machine_id_of_commissioned_machine> register_vmhost=true distro_series=<ubuntu/jammy_or_what_you_desire>`

It will not help troubleshooting your issue but maybe it can unblock you if we hit a wall during investigation.

Revision history for this message
Dan Martin (ec-danmartin) wrote :

Hi Stamatis,

I've tried to get Maas to commission the machine directly as a host, but this simply doesn't work.
it just ends up as a failed deployment. I've tried multiple times, extended the timeout etc etc...

as requested:
    `maas <profile> vm-host read <vm_host_id> | jq -r .total`
{
  "cores": 0,
  "memory": 0,
  "local_storage": 3862249340928
}
    `maas <profile> machine read kksmtf | jq -r .cpu_count`
144
    `maas <profile> machine read kksmtf | jq -r .memory`
393216

   `maas admin vm-host refresh <vm_host_id> | jq -r .total`
{
  "cores": 0,
  "memory": 0,
  "local_storage": 3862249340928
}
   `maas <profile> machine read kksmtf | jq -r .cpu_count`
144
   `maas <profile> machine read kksmtf | jq -r .memory`
393216

I hope this helps, I just don't see why it can see the machine info but not as a vm-host...

Revision history for this message
Björn Tillenius (bjornt) wrote :

Could you please attach the logs for the region and rack? Especially including the time when you refresh the VM host, so we can see if that produces any errors.

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Dan Martin (ec-danmartin) wrote :
Download full text (14.5 KiB)

Hi,

there is nothing in rackd.log (at this time)

regiond.log:

2024-02-06 10:23:53 regiond: [info] 127.0.0.1 POST /MAAS/api/2.0/vm-hosts/42/?op=refresh HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0.20.2 (gzip))
2024-02-06 10:24:03 metadataserver.builtin_scripts.hooks: [warn] Machine configuration extra for `kksmtf` is None
2024-02-06 10:24:03 maasserver: [error] ################################ Exception: PhysicalBlockDevice matching query does not exist. ################################
2024-02-06 10:24:04 maasserver: [error] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/django/db/models/fields/related_descriptors.py", line 173, in __get__
    rel_obj = self.field.get_cached_value(instance)
  File "/usr/lib/python3/dist-packages/django/db/models/fields/mixins.py", line 15, in get_cached_value
    return instance._state.fields_cache[cache_name]
KeyError: 'boot_disk'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/utils/views.py", line 298, in view_atomic_with_post_commit_savepoint
    return view_atomic(*args, **kwargs)
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/lib/python3/dist-packages/maasserver/api/support.py", line 62, in __call__
    response = super().__call__(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/django/views/decorators/vary.py", line 20, in inner_func
    response = func(*args, **kwargs)
  File "/usr/lib/python3.10/dist-packages/piston3/resource.py", line 197, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/usr/lib/python3.10/dist-packages/piston3/resource.py", line 195, in __call__
    result = meth(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/api/support.py", line 371, in dispatch
    return function(self, request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/metadataserver/api.py", line 858, in signal
    target_status = process(node, request, status)
  File "/usr/lib/python3/dist-packages/metadataserver/api.py", line 680, in _process_commissioning
    self._store_results(
  File "/usr/lib/python3/dist-packages/metadataserver/api.py", line 620, in _store_results
    node.save()
  File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 2048, in save
    super().save(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/maasserver/models/cleansave.py", line 46, in save
    self.full_clean(exclude=exclude_clean_fields, validate_unique=False)
  File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 1236, in full_clean
    self.clean()
  File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 2025, in clean
    self.clean_boot_disk()
  File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 1911, in clean_boot_disk
    if self.boot_disk is not None:
  File "/usr/lib/python3/dist-packages/django/db/m...

Revision history for this message
Björn Tillenius (bjornt) wrote :

That's odd. Could you please run the following queries in you database?

SELECT maasserver_node.boot_disk_id,
       maasserver_physicalblockdevice.blockdevice_ptr_id
FROM maasserver_node
FULL OUTER JOIN maasserver_physicalblockdevice
  ON maasserver_physicalblockdevice.blockdevice_ptr_id = maasserver_node.boot_disk_id
WHERE maasserver_node.system_id = 'kksmtf';

SELECT maasserver_node.boot_disk_id,
       maasserver_blockdevice.id,
       maasserver_blockdevice.name
FROM maasserver_node
FULL OUTER JOIN maasserver_blockdevice
  ON maasserver_blockdevice.id = maasserver_node.boot_disk_id
WHERE maasserver_node.system_id = 'kksmtf';

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Dan Martin (ec-danmartin) wrote :

Hi Bjorn,

SELECT maasserver_node.boot_disk_id,
       maasserver_physicalblockdevice.blockdevice_ptr_id
FROM maasserver_node
FULL OUTER JOIN maasserver_physicalblockdevice
  ON maasserver_physicalblockdevice.blockdevice_ptr_id = maasserver_node.boot_disk_id
WHERE maasserver_node.system_id = 'kksmtf';

 boot_disk_id | blockdevice_ptr_id
--------------+--------------------
           72 | 72

SELECT maasserver_node.boot_disk_id,
       maasserver_blockdevice.id,
       maasserver_blockdevice.name
FROM maasserver_node
FULL OUTER JOIN maasserver_blockdevice
  ON maasserver_blockdevice.id = maasserver_node.boot_disk_id
WHERE maasserver_node.system_id = 'kksmtf';
 boot_disk_id | id | name
--------------+----+------
           72 | 72 | sda

Changed in maas:
status: Incomplete → Triaged
Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

I hit the same issue after deploying a machine and registering it as a VM host. The exception of block device does not appear to be related since I didn't receive it.

Revision history for this message
Anton Troyanov (troyanov) wrote :

A friend of mine said that once he switched back to 5.18 everything worked well. Might be worth trying...

Revision history for this message
Alan Baghumian (alanbach) wrote :

I encountered this exact same issue with MAAS 3.3.5:

$ maas ${PROFILE} vm-host read 175
Success.
Machine-readable output follows:
{
    "version": "5.20",
    "cpu_over_commit_ratio": 1.0,
    "default_macvlan_mode": null,
    "type": "lxd",
    "architectures": [
        "amd64/generic"
    ],
    "capabilities": [
        "composable",
        "dynamic_local_storage",
        "over_commit",
        "storage_pools"
    ],
    "pool": {
        "name": "default",
        "description": "Default pool",
        "id": 0,
        "resource_uri": "/MAAS/api/2.0/resourcepool/0/"
    },
    "host": {
        "system_id": "mht63q",
        "__incomplete__": true
    },
    "id": 175,
    "total": {
        "cores": 0,
        "memory": 0,
        "local_storage": 234622992384
    },
    "name": "os-node-1",
    "available": {
        "cores": 0,
        "memory": 0,
        "local_storage": 234622992384
    },
    "zone": {
        "name": "default",
        "description": "",
        "id": 1,
        "resource_uri": "/MAAS/api/2.0/zones/default/"
    },
    "memory_over_commit_ratio": 1.0,
    "used": {
        "cores": 0,
        "memory": 0,
        "local_storage": 0
    },
    "tags": [
        "pod-console-logging"
    ],
    "storage_pools": [
        {
            "id": "default",
            "name": "default",
            "type": "dir",
            "path": "Unknown",
            "total": 234622992384,
            "used": 0,
            "available": 234622992384,
            "default": true
        }
    ],
    "resource_uri": "/MAAS/api/2.0/vm-hosts/175/"
}

Here is a quick workaround until this is fixed:

1. Find the affected LXD host ID either from the MAAS UI URL or CLI

$ maas ${PROFILE} vm-hosts read

2. SSH to the controller housing the PostgresSQL and login to DB (For reference only - your DB name is different):

$ sudo -u postgres psql maasdb

3. Update the cores and memory data for the LXD host based on what your hardware has (Running the CLI command after this will show the updated resources):

maasdb=# select * from maasserver_bmc where id=<LXD-host-ID>;

maasdb=# update maasserver_bmc set cores=64,memory=51200 where id=<LXD-host-ID>;
UPDATE 1

4. Compose a VM using MAAS CLI, only specify the VM name:

$ maas ${PROFILE} vm-hosts compose <LXD-host-ID> vm-name

5. Check the MAAS UI and wait for the initial commissioning to finish.

6. SSH to the LXD host to modify the resources (limits.cpu, limits.memory, disk etc):

$ sudo lxc config edit vm-name

7. Recommission the machine to reflect the hardware changes.

Changed in maas:
importance: Undecided → High
assignee: nobody → Stamatis Katsaounis (skatsaounis)
status: Triaged → In Progress
milestone: none → 3.5.0
Changed in lxd:
status: Unknown → New
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Dan Martin (ec-danmartin) wrote :

What is the expected release date for 3.5.0?
Is it possible to apply said fix to my setup, so I can use the LXD from within MAAS?
Or should I follow Anton's steps?

Revision history for this message
Alan Baghumian (alanbach) wrote :

Hi @Dan! The way MAAS patching works, it applies to the latest branch (3.5 in this case) first then back-ported to the older stable branches where applicable (usually high importance bugs similar to this one).

If in a hurry you can follow Anton's or my workarounds while the fix is being worked on and released.

Best,
Alan

Revision history for this message
Dan Martin (ec-danmartin) wrote :

Hi Alan, thanks for explaining.
I will follow your steps and keep an eye out for an update :)

Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

Hi Dan and everyone affected by this LXD issue. I would like to share some updates.

First of all, from the new MAAS releases and on, MAAS will use LXD LTS channel for provisioning VM hosts. This is 5.21/stable and with the last series of merge proposals, this is applied to master, 3.4, 3.3 branches.

While the above decision is not fixing immediately the LXD VM host use case, it is a better choice from the previously selected latest/stable channel, which contains monthly feature releases. In addition, we are planning to test proactively with latest changes of LXD so as to catch issues before those are released.

Regarding the LXD bug which is linked to this LP bug: It is scheduled to be released to 5.21 soon. When this is done and when new MAAS releases are out we expect the issue on MAAS side to be gone. In the meantime, we suggest users to use latest LXD channel that does not contain this bug. That channel is 5.19/stable.

As far as machines provisioned by MAAS are concerned, this channel selection has to be made to src/metadataserver/vendor_data.py. As far as non-MAAS managed LXD hosts, it is up to the user to make the channel selection with commands similar to what MAAS is using inside the vendor_data file.

Revision history for this message
Dan Martin (ec-danmartin) wrote :

Hi Stamatis,

Thanks for getting back to me - sorry a few questions.

Having read your post, I'm not 100% sure of what actions I should be taking and when?

Am I able to update now to get the MAAS fix?
Are you also saying I need to now run 5.19/stable? instead of what I am running now:

lxd 5.20-f3dd836 27049 latest/stable canonical✓ -

Thanks,

Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

If you need to make it work now, while the issue is still present:

- In case of deploying the LXD host through MAAS (Deploy machine and register as KVM host), you need to edit manually the vendor data file and use 5.19/stable channel on the two relevant lines.
- In case you install LXD on your own to the machine and then manually adding it as a VM host, make sure to install from channel 5.19/stable.

Otherwise, you should wait for the LXD bug to be fixed, being added to 5.21/stable and MAAS issue new release that is using 5.21/stable channel. This will happen with MAAS 3.4.1. At that time, you will have to upgrade your MAAS and as long as the LXD bug is inside 5.21/stable you will face no problem.

Revision history for this message
Dan Martin (ec-danmartin) wrote :

Hi,

That makes sense.
For me, the LXD host deploy never worked - it always just failed deployment. so I have gone with the manual install.

I will need to downgrade then.

Do you know if there is an advised downgrade process?
I'm keen to not break LXD as I have VM's running at present.

Hopefully 5.21 is out soon also - so other don't get this issue.

Thanks,

Changed in lxd:
status: New → Fix Released
Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.