root disk flavor constraints not applied to nova-lxd instances when config-drive is used

Bug #1659604 reported by Damian Wojsław
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
nova-lxd
Triaged
Low
Unassigned
lxd (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

During the deployment of openstack with lxd as a hypervisor, using nova-lxd, I came to the issue below.

After the container is created, I tried to set the size of the root filesystem and got error in response:

$ lxc config device set instance-00000004 root size 6GB
error: The device doesn't exist

I have confirmed that it works the same if I create a standalone container with a profile:

trochej@ubuntu-lxd:~$ lxc profile show zfs-lxd
name: zfs-lxd
config:
  security.nesting: "True"
description: ""
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    size: 6GB
    type: disk

I've had a short discussion with Stephane Graber about it and it seems that after creating the profile and applying it to container being created, the root device is owned by profile, not by container. This scenario needs second step, adding root device to the container itself:

lxc config device add instance-00000004 root disk path=/ size=6GB

nova-lxd in version deployed by me seems to set this parameter in this line:
https://github.com/openstack/nova-lxd/blob/13.2.0/nova_lxd/nova/virt/lxd/config.py#L189

It is, in turn, called in create_profile:

https://github.com/openstack/nova-lxd/blob/13.2.0/nova_lxd/nova/virt/lxd/config.py#L107

I have repro available if you need any more data.

# Steps to reproduce:

1. Deploy openstack with nova-lxd backend. I only tried it with ZFS, so no idea if btrfs or directories will have this issue, too.

2. Create a container.

3. Query container's root device information or try to set size:
lxc config device set instance-00000004 root size 6GB

Expected outcome:
command above returns nothing, when checking zfs quota for the lxd container, it should be set to 6GB, when running df -h / from within lxd container, root filesystem should be 6 GB.

Actual outcome:
lxc command returns error:
error: The device doesn't exist

ZFS quota is not set, df -h / within lxd container reports whole underlying pool size.

Profile applied to the instance. Notice the "root:" section.

$ lxc profile show instance-00000004
name: instance-00000004
config:
  boot.autostart: "True"
  limits.cpu: "1"
  limits.memory: 512MB
  raw.lxc: |
    lxc.console.logfile=/var/log/lxd/instance-00000004/console.log
  security.nesting: "True"
description: ""
devices:
  qbrc272e9e2-e3:
    hwaddr: fa:16:3e:71:b5:72
    nictype: bridged
    parent: qbrc272e9e2-e3
    type: nic
  root:
    path: /
    size: 20GB
    type: disk

Tags: sts
tags: added: sts
Revision history for this message
Paul Hummer (rockstar) wrote :

Looking closer at this, I'm not sure this is a bug. By design, if you provision a machine with nova-lxd, you should *never* manipulate the container using the lxc command. There are *horrible* issues that can result from doing this.

Revision history for this message
Damian Wojsław (damian-wojslaw) wrote :

Hello Paul

I don't want to manipulate container manually. The only reason I did that was to:

1. Diagnose issue
2. Run the manual fix

What I would however would like to happen is nova-lxd to:
1. Set the drive moment it creates container
2. Expose an API to set those constrains.

Regards

Revision history for this message
James Page (james-page) wrote :

Did a quick test - container profile (specific to each container) has

  root:
    path: /
    size: 1GB
    type: disk

however

  lxd/containers/instance-00000001 39G 671M 38G 2% /

(which is the entire device)

so it appears that size constraints applied in a profile don't actually get applied into the container (which I think is odd).

Changed in nova-lxd:
status: New → Confirmed
importance: Undecided → High
summary: - nova-lxd seems to forget one step when configuring root device
+ root disk flavor constraints not applied to nova-lxd instances
Revision history for this message
James Page (james-page) wrote : Re: root disk flavor constraints not applied to nova-lxd instances

I also tried by adding a separate disk using lxc config device add, and then restarting the container but no quota was applied to the block device.

Revision history for this message
James Page (james-page) wrote :

We've always managed a profile per container so that everything is one place (and I swear I remember validating this work myself when it was implemented); I also think a size in a profile should be honoured by LXD as well as when adding a specific device.

Revision history for this message
James Page (james-page) wrote :

Might be mis-reading the LXD codebase but it looks like the quota should be applied:

func containerConfigureInternal(c container) error {
    // Find the root device
    for _, m := range c.ExpandedDevices() {
        if m["type"] != "disk" || m["path"] != "/" || m["size"] == "" {
            continue
        }

        size, err := shared.ParseByteSizeString(m["size"])
        if err != nil {
            return err
        }

        err = c.Storage().ContainerSetQuota(c, size)
        if err != nil {
            return err
        }

        break
    }

    return nil
}

Revision history for this message
James Page (james-page) wrote :

OK so this is odd - I happened to be testing a xenial-mitaka deployment and for this one I see quota being applied:

$ df -h
Filesystem Size Used Avail Use% Mounted on
lxd/containers/instance-00000001 21G 658M 20G 4% /

that's 20G root for a m1.small.

Revision history for this message
James Page (james-page) wrote :

(using 13.3.0-0ubuntu1 from xenial-proposed but I can't see any related fixes for this).

Revision history for this message
James Page (james-page) wrote :

Tried re-producing on a xenial-newton proposed deployment - however I still see quota being correctly applied to the root disk volume:

ssh ubuntu@10.5.150.8 -i ~/testkey.pem df -h
Warning: Permanently added '10.5.150.8' (ECDSA) to the list of known hosts.
Filesystem Size Used Avail Use% Mounted on
lxd/containers/instance-00000003 21G 658M 20G 4% /

James Page (james-page)
Changed in nova-lxd:
status: Confirmed → Incomplete
Revision history for this message
Darren Wardlow (gdwardlow) wrote :

I can reproduce this when using "force_config_drive = True" in nova.conf.
It appears to only be an issue when the configdrive portion is added to the profile.

Revision history for this message
Bob Taylor (rltaylo) wrote :
Download full text (4.5 KiB)

To elaborate on Darren's comment - I enabled debugging on both nova-compute and lxd on a host that I'm seeing this issue on. It appears that any activity that adds a drive after the container is created is removing the zfs quota for the root volume. The lxd debug log has the best information around this..

With "force_config_drive = True" in nova, the rootfs restriction is being set upon container creation (in this case a 10GB disk):

ephemeral=false lvl=info msg="Created container" name=instance-00000851 t=2017-03-16T19:57:00+0000
driver=storage/zfs imageFingerprint=51c25c338eba3eef145209aa69b6ea09cbfd823700a3469ace4bb9e1c0f00e4d isPrivileged=false lvl=dbug msg=ContainerCreateFromImage name=instance-00000851 t=2017-03-16T19:57:00+0000
container=instance-00000851 lvl=dbug msg="Shifting root filesystem" rootfs=/var/lib/lxd/containers/instance-00000851/rootfs t=2017-03-16T19:57:00+0000
container=instance-00000851 driver=storage/zfs lvl=dbug msg=ContainerSetQuota size=10737418240 t=2017-03-16T19:57:05+0000
lvl=dbug msg="\n\t{\n\t\t\"type\": \"sync\",\n\t\t\"status\": \"Success\",\n\t\t\"status_code\": 200,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": {\n\t\t\t\"id\": \"682c8dee-98f5-4595-9632-6391c5df0444\",\n\t\t\t\"class\": \"task\",\n\t\t\t\"created_at\": \"2017-03-16T19:57:00.576190686Z\",\n\t\t\t\"updated_at\": \"2017-03-16T19:57:00.576190686Z\",\n\t\t\t\"status\": \"Success\",\n\t\t\t\"status_code\": 200,\n\t\t\t\"resources\": {\n\t\t\t\t\"containers\": [\n\t\t\t\t\t\"/1.0/containers/instance-00000851\"\n\t\t\t\t]\n\t\t\t},\n\t\t\t\"metadata\": null,\n\t\t\t\"may_cancel\": false,\n\t\t\t\"err\": \"\"\n\t\t}\n\t}" t=2017-03-16T19:57:06+0000

However, right after the config drive is added, the quota is removed:

ip=@ lvl=dbug method=PUT msg=handling t=2017-03-16T19:57:08+0000 url=/1.0/profiles/instance-00000851
lvl=dbug msg="\n\t{\n\t\t\"config\": {\n\t\t\t\"raw.lxc\": \"lxc.console.logfile=/var/log/lxd/instance-00000851/console.log\\n\",\n\t\t\t\"limits.memory\": \"4096MB\",\n\t\t\t\"boot.autostart\": \"True\",\n\t\t\t\"limits.cpu\": \"2\"\n\t\t},\n\t\t\"description\": \"\",\n\t\t\"devices\": {\n\t\t\t\"brq608bb3c9-8a\": {\n\t\t\t\t\"hwaddr\": \"fa:16:3e:15:ab:da\",\n\t\t\t\t\"type\": \"nic\",\n\t\t\t\t\"host_name\": \"tapa17775ae-c5\",\n\t\t\t\t\"parent\": \"brq608bb3c9-8a\",\n\t\t\t\t\"nictype\": \"bridged\"\n\t\t\t},\n\t\t\t\"root\": {\n\t\t\t\t\"path\": \"/\",\n\t\t\t\t\"type\": \"disk\",\n\t\t\t\t\"size\": \"10GB\"\n\t\t\t},\n\t\t\t\"ephemeral0\": {\n\t\t\t\t\"path\": \"/mnt\",\n\t\t\t\t\"type\": \"disk\",\n\t\t\t\t\"source\": \"/var/lib/nova/instances/instance-00000851/storage/ephemeral0\"\n\t\t\t},\n\t\t\t\"configdrive\": {\n\t\t\t\t\"path\": \"/var/lib/cloud/data\",\n\t\t\t\t\"type\": \"disk\",\n\t\t\t\t\"source\": \"/var/lib/nova/instances/instance-00000851/configdrive\"\n\t\t\t}\n\t\t}\n\t" t=2017-03-16T19:57:08+0000
container=instance-00000851 driver=storage/zfs lvl=dbug msg=ContainerSetQuota size=0 t=2017-03-16T19:57:08+0000
ip=@ lvl=dbug method=PUT msg=handling t=2017-03-16T19:57:08+0000 url=/1.0/containers/instance-00000851/state
lvl=dbug msg="\n\t{\n\t\t\"type\": \"async\...

Read more...

Revision history for this message
James Page (james-page) wrote :

OK so the mix of using config-drive might be related (some of my earlier test results where with config-drive enabled for the instances, so that might explain it).

config-drive has some other issues (see bug 1673411).

Changed in nova-lxd:
status: Incomplete → New
James Page (james-page)
summary: - root disk flavor constraints not applied to nova-lxd instances
+ root disk flavor constraints not applied to nova-lxd instances when
+ config-drive is used
Revision history for this message
Stéphane Graber (stgraber) wrote :

Closing the LXD side of this since AFAICT there's no problem with LXD enforcing the limit if it's configured properly.

Changed in lxd (Ubuntu):
status: New → Invalid
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I think the LXD part of the problem is that is removes constraints when another device is added

Revision history for this message
Bob Taylor (rltaylo) wrote :

Agreed - I can reproduce the issue with the constraint being removed on LXD without having nova-lxd in the loop at all.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Ah indeed, this is a bug we fixed a little while back now, marking Fix released.

It will be included in the next round of SRUs too.

Changed in lxd (Ubuntu):
status: Invalid → Fix Released
James Page (james-page)
Changed in nova-lxd:
status: New → Confirmed
status: Confirmed → Triaged
Revision history for this message
Ivy Alexander (ivyalexander) wrote :

Stéphane, is this still planned for 2.0.10 and is there a good way to check up on the progress for the Xenial backport of the fix?

Revision history for this message
Leonardo Borda (lborda) wrote :

Looks like it will make for the next xenial-proposed release soon.

https://github.com/lxc/lxd/issues/3043
commit: a23e28f11f8f9625cb5352faec9046833a4ad320

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

It sounds like this is probably fixed. Just need to schedule a test to check if it has been.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Moving to low as it's most likely been fixed; still need to test it.

Changed in nova-lxd:
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.