check_can_live_migrate_source fails with mkfs.vfat error: Label can be no longer than 11 characters

Bug #2061701 reported by Oliver Walsh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Low
Unassigned

Bug Description

When testing NBD TLS live-migration the initial attempt failed. Looks like a generated filesystem label is too long.

Subsequent attempts succeeded

$ nova-compute --version
Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code.
27.2.1

oslo_messaging.rpc.client.RemoteError: Remote error: ProcessExecutionError Unexpected error while running command.
Command: mkfs -t vfat -n ephemeral_1_0706d66 /var/lib/nova/instances/_base/ephemeral_1_0706d66
Exit code: 1
Stdout: 'mkfs.fat 4.2 (2021-01-31)\n'
Stderr: 'mkfs.vfat: Label can be no longer than 11 characters\n'

Revision history for this message
Oliver Walsh (owalsh) wrote :
summary: check_can_live_migrate_source fails with mkfs.vfat error: Label can be
- no longer thatn 11 characters
+ no longer than 11 characters
Revision history for this message
sean mooney (sean-k-mooney) wrote :

looking at the log and the behavior you describe it sound like the mkfs perhaps works but returns an error an the second migration uses the previously created disk?

i do not be live that anything has regress or changed in nova in this regard so this feels like either a latent bug or a regression outside of nova.

a possible work around is to change the default format of ephemeral disk.

the fact this is using vfat implies that the glance image does not have os_type set to windows or linux
https://github.com/openstack/nova/blob/96268d4e7a0dd1872c77641a634a94f599d59fe0/nova/privsep/fs.py#L257-L260

assuming default_ephemeral_format is not set which it wont be by default

https://github.com/openstack/nova/blob/96268d4e7a0dd1872c77641a634a94f599d59fe0/nova/privsep/fs.py#L348-L375

is going to take this code path
https://github.com/openstack/nova/blob/96268d4e7a0dd1872c77641a634a94f599d59fe0/nova/privsep/fs.py#L368C11-L370C79

so its failing back to vfat as the only type that works for both.

the that code is called form https://github.com/openstack/nova/blob/96268d4e7a0dd1872c77641a634a94f599d59fe0/nova/virt/disk/api.py#L65-L68

which is called here

https://github.com/openstack/nova/blob/96268d4e7a0dd1872c77641a634a94f599d59fe0/nova/virt/libvirt/driver.py#L4872-L4888

based on the error this is likely being invoked here

https://github.com/openstack/nova/blob/96268d4e7a0dd1872c77641a634a94f599d59fe0/nova/virt/libvirt/driver.py#L5146-L5158

the label should be fs_label='ephemeral%d' % idx,

i.e. ephemeral0 or 1

which is 10 charters long
and it would be 11 for the first 100 epmeral disks.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

looking at https://man7.org/linux/man-pages/man8/mkfs.vfat.8.html

and

mkfs -t vfat -n ephemeral_1_0706d66 /var/lib/nova/instances/_base/ephemeral_1_0706d66

 -n VOLUME-NAME
           Sets the volume name (label) of the filesystem. The volume
           name can be up to 11 characters long. Supplying an empty
           string, a string consisting only of white space or the string
           "NO NAME" as VOLUME-NAME has the same effect as not giving
           the -n option. The default is no label.

so it look like we are not passing fs_lable to -n were are instead passing the name of the file.

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Revision history for this message
sean mooney (sean-k-mooney) wrote :

this is likely related to this change https://github.com/openstack/nova/commit/4289b645970
which is very old

Revision history for this message
sean mooney (sean-k-mooney) wrote :

given this code is 12+ years old without changes my gues is mkfs started threating this as a error recently.

if you have a log form the destination node that would also be helpful
the attach log is form the source so i have to guess at exactly what codepath in nova we are taking since its not included in the traceback.

there should be an error in on edpm-compute-1 in pre live migrate on the destination

if you still have that log can you attach or link to it.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

if this is what i belive it is
the nova bug has been there for 12 years and its
either only a problem due to a new mkfs.vfat release or no one noticed and reported this in that time.

in either case i think the workaround is simple.
set the os_type in the glance image or set a default format in the nova config.
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_ephemeral_format

setting this to low as it does not appear to be a regression introduced recently.

Changed in nova:
importance: Undecided → Low
status: New → Triaged
tags: added: libvirt lo
tags: added: low-hanging-fruit
removed: lo
Revision history for this message
Michel Jouvin (mijouvin) wrote :

Hi,

I just experienced the same problem when trying to migrate a VM with a 300G ephemeral disk to an Antelope HV. I applied the suggested workaround (defining `default_ephemeral_format` to `xfs`. But unfortunately I experienced the same problem because, as said by `mkfs.xfs` man page, XFS as as hard limit of 12 characters and returns an error if the label is longer. I managed to work around the problem using `ext4` whose limit is 16 but seems to tolerate (probably truncate) a longer label.

In the case of my VM, the label is 22 characters long (ephemeral_300_0706d66). It would be good it OpenStack could generate labels matching the usual limits (I'd say 12 so taht xfs can be used) but at least <= 16 which seems the largest limit supported by any supported filesystem.

Michel

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.