uid mismatch prevents live migration and nfs share

Bug #1080680 reported by Ilkka Tengvall
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-manuals
Fix Released
Medium
Tom Fifield

Bug Description

Ubuntu documentation for OpenStack install should include uid check between the hosts. It seems ubuntu doesn't use fixed uid numbers for services like nova. This causes nova-compute to fail while using shared nfs storage. That's because nova users have random uid on different hosts, and thus prevent each others to share files under nfs.

NFS4 makes this even bigger problem, since it knows enough to show file owners being nova in all the hosts, even though the underlying uid number is different. But writing of files fails, since the numeric uid won't match. This took a while to notice.

Also libvirtd should have a common uid in order to make nfs storage share (and thus live-migration) to work.

What I had to do was to fix this:

1. set the nova uid in /etc/passwd to the same number in all hosts (e.g. 112)
2. set the libvirt-qemu uid in /etc/passwd to the same number in all hosts (e.g. 119)
3. set the nova group in /etc/group file to the same number in all hosts (e.g. 120)
4. set the libvirtd group in /etc/group file to the same number in all hosts (e.g. 119)
5. stop the services
6. change all the files owned by nova or group by nova, e.g.
6.1 find / -uid 108 -exec chown nova {} \; # note the 108 here is the old nova uid before the change
6.2 find / -gid 120 -exec chgrp nova {} \;
7. possibly the same for the libvirt-qemu owned files if those were needed to change
8. restart the services

If you had old uid and gid e.g. 108 and 120, do check that no such files exist anymore with old credentials:

find / -name proc -prune -o \( -uid 108 -o -gid 120 \) -exec ls -la {} \; | less

And while changing the uid and gid, make sure you don't use numbers that are already used for some other user/group. Make sure you come up with unique number.

This should be either instructed in prerequisites for nodes list, or added to the part where live-migration is discussed.

BR,

 Ilkka Tengvall

Revision history for this message
Ilkka Tengvall (ilkka-tengvall) wrote :

Forgot to mention the crucial storage path:

by default it's /var/lib/nova/instances, that's what hosts look by default for instance image files, and that's what we shared between hosts using NFS.

Revision history for this message
Tom Fifield (fifieldt) wrote :

Thanks for the report Ilkka!

tags: added: nova
Changed in openstack-manuals:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Tom Fifield (fifieldt) wrote :
Changed in openstack-manuals:
status: Triaged → In Progress
assignee: nobody → Tom Fifield (fifieldt)
Revision history for this message
Tom Fifield (fifieldt) wrote :

Ilkka, in addition to update the live migration documentation, I have added a new section to the System Administration chapter titled " Recovering from a UID/GID mismatch" with your proceedure, since it can be useful for other circumstances too. Thanks for writing it up!

Revision history for this message
Ilkka Tengvall (ilkka-tengvall) wrote :

I'm glad I was able to provide my 2 cents, thanks for doing the patch+review works!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-manuals (master)

Reviewed: https://review.openstack.org/17281
Committed: http://github.com/openstack/openstack-manuals/commit/ae71be8ff108a730a17c6347ab83215299fc9b8a
Submitter: Jenkins
Branch: master

commit ae71be8ff108a730a17c6347ab83215299fc9b8a
Author: Tom Fifield <email address hidden>
Date: Sat Dec 1 14:30:07 2012 +1100

    Add a note about identical GID/UID for live migrat

    fixes bug 1080680

    As noted in the bug report, if the GID and UID are not identical
    between the servers, live migration fails badly and requires
    some cleanup. This change adds a step to ensure users are aware
    this needs to be done before creating the mount.

    patch 2: adds a new section to the System Administration chapter
    for the proceedure to recover from a UID/GID mismatch

    patch 3: clarifies that the proceedure from patch 2 is run on
    nova-compute hosts

    Change-Id: Ie91072554ffa1ed9c2954fd58eea49303edecef5

Changed in openstack-manuals:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-manuals (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/17707

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-manuals (stable/folsom)

Reviewed: https://review.openstack.org/17707
Committed: http://github.com/openstack/openstack-manuals/commit/9807fdf15cb1805472cb752cc280cf497ddf4489
Submitter: Jenkins
Branch: stable/folsom

commit 9807fdf15cb1805472cb752cc280cf497ddf4489
Author: Tom Fifield <email address hidden>
Date: Sat Dec 1 14:30:07 2012 +1100

    Add a note about identical GID/UID for live migrat

    fixes bug 1080680

    As noted in the bug report, if the GID and UID are not identical
    between the servers, live migration fails badly and requires
    some cleanup. This change adds a step to ensure users are aware
    this needs to be done before creating the mount.

    patch 2: adds a new section to the System Administration chapter
    for the proceedure to recover from a UID/GID mismatch

    patch 3: clarifies that the proceedure from patch 2 is run on
    nova-compute hosts

    Cherry picked from https://review.openstack.org/#/c/17281/

    Change-Id: Id656b2bb62a97430999fc2c7ffd93e51a804a464

tags: added: in-stable-folsom
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.