libvirt-bin unaware of uids on LDAP

Bug #1382046 reported by Harald Hannelius
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

I was unable to connect to libvirtd from my regular user account after upgrading to Utopic Unicorn.

The error from 'virsh list' was something about that libvirtd couldn't resolve my uid. "failed to find user record for uid" I think.

Since I have all non-system users on LDAP remotely, I suspect that systemd and/or libvirtd was unable to a resolve my uid upon startup and didn't bother to check again.

It might be a bug in libvirt-bin as well. I don't know.

Tempfix: 'service libvirt-bin restart' and one should be able to connect again.

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: systemd 208-8ubuntu8
ProcVersionSignature: Ubuntu 3.16.0-22.29-generic 3.16.4
Uname: Linux 3.16.0-22-generic x86_64
ApportVersion: 2.14.7-0ubuntu6
Architecture: amd64
CurrentDesktop: XFCE
Date: Thu Oct 16 15:52:30 2014
InstallationDate: Installed on 2010-11-04 (1442 days ago)
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
SourcePackage: systemd
UpgradeStatus: Upgraded to utopic on 2014-10-14 (2 days ago)

Revision history for this message
Harald Hannelius (harald-arcada) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

That description is a bit vague. Maybe for someone familiar with libvirt it's enough, but can you please precisely describe the steps that you did and the output? Thanks!

Confirming, that is with init=/bin/systemd or systemd-sysv, not with booting with upstart (the only supported init system in Ubuntu right now)?

tags: added: systemd-boot
affects: systemd (Ubuntu) → libvirt (Ubuntu)
Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin Pitt (pitti) wrote :

Right, by the first look it sounds like libvirt's unit starts up before openldap, so there's a missing ordering dependency/wants somewhere?

Revision history for this message
Harald Hannelius (harald-arcada) wrote :

Oops, I seem to be running /sbin/init as PID #1, this is a symlink to upstart . I thought I was running systemd, since my /home on NFS didn't get mounted on boot anymore, and this is a known issue with systemd.

After the upgrade from 14.4 LTS to 14.10 utopic I was about to start my virtual machines. As myself I was unable to do so;

$ virsh list
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to find user record for uid 'xxx': No such file or directory

I then restarted libvirt-bin and 'virsh list' works again. The command worked for root all the time.

Revision history for this message
Martin Pitt (pitti) wrote :

So that might just be the same dependency/ordering problem with the upstart jobs/init.d scripts.

summary: - libvirt-bin unaware of uids on LDAP when started via systemd
+ libvirt-bin unaware of uids on LDAP
tags: removed: systemd-boot
Changed in libvirt (Ubuntu):
status: Incomplete → New
Revision history for this message
Robie Basak (racb) wrote :

What needs to have started before libvirtd in this case? LDAP with NSS shouldn't need a daemon running, right?

Importance -> Medium since LDAP NSS is a non-default use case.

Changed in libvirt (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Harald Hannelius (harald-arcada) wrote :

nscd probably, but this is not a hard requirement depending on how stuff is configured.

slapd might have to be started, if the host is hosting it's own directory

networking has to be up if, it not hosting local LDAP directory.

Revision history for this message
Robie Basak (racb) wrote :

Upstart can support all of this. The difficulty is in shipping packages that understand and implement the dependencies correctly by default, when that depends on how the system is configured locally.

Set it up one way round and it'll break some other user who is doing it backwards (I don't know - maybe serving LDAP from within a VM or something). It's difficult to consider all use cases when focusing on just one like this.

So the workaround is: configure services in /etc/init/ to start the right way round to match your local configuration.

If there's an obvious startup ordering that we can implement, that will work for users in a default configuration, and the most common non-default configurations, then we can do that. But we need to be careful to not break any of these people.

Otherwise, it'll continue to be necessary for users to configure their own service startup dependencies as they make changes to their local configurations. I understand that this is undesirable for something like LDAP though (since configuring an LDAP client would ideally be a single thing without concern for every other service that might need it).

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Can you confirm that starting nscd and slapd before libvirtd always fixes the issue?

(you can ensure that by adding "or starting libvirt-bin" to their start on statements, assuming they have upstart jobs)

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu):
status: New → Confirmed
Revision history for this message
John Affleck (jraffleck) wrote :

I find that I need to restart libvirt-bin multiple times in order to be able to reliably use virsh. Once will usually get virt-manager up and running, but attempting send-key from the command line will start to fail again:
% virsh send-key Windows7 29 56 111
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to find group record for gid '1001': No such file or directory

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1382046] Re: libvirt-bin unaware of uids on LDAP

That seems odd, and suggests that timing of libvirtd startup may
not be related.

Revision history for this message
John Affleck (jraffleck) wrote :

Hmm. I take #11 back - I could have sworn I saw it, but I can't reproduce it.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for the info - sadly unable to reproduce it now doesn't necessarily mean you didn't see it before.

I'm really hoping to get some definitive responses to comment #9.

Revision history for this message
John Affleck (jraffleck) wrote :

I am a novice at upstart, but I don't see upstart jobs for slapd or, in my case, ncsd. So I went ahead and did horrible things to the existing libvirt-bin upstart job thing:

Revision history for this message
John Affleck (jraffleck) wrote :

..err to continue #15. I added:
        log_msg "libvirt-bin: starting nscd at $(date)"
        /etc/init.d/nscd start
        /etc/init.d/nslcd start
        sleep 10
..and:
       log_msg "libvirt-bin: starting libvirtd at $(date)"
        exec /usr/sbin/libvirtd $libvirtd_opts

I see, in the new '/var/log/libvirt/startuplog.log':
libvirt: libvirt-bin: starting nscd at Wed Oct 29 20:36:19 EDT 2014
libvirt: libvirt-bin: starting libvirtd at Wed Oct 29 20:36:28 EDT 2014
.which is.. encouraging ?

But I still get:
virsh list
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to find user record for uid '1001': No such file or directory

Of course, now I've wedged things so badly I can't actually restart it at all.

Revision history for this message
John Affleck (jraffleck) wrote :

Ok. I give up. I do not understand upstart.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@John,

I think the better solution would be to create upstart jobs for nscd/slapd, and make sure that they 'start on starting slapd'.

But before spending time on that, I would really like a definitive answer to the question in comment #9:

When this problem occurs, does "sudo stop libvirt-bin; sudo start libvirt-bin" always completely solve the issue (until next reboot)?

If not, then the ordering is not the problem.

Revision history for this message
Harald Hannelius (harald-arcada) wrote :

Sorry, I am unable to confirm the scenario in post #9 right now.

Revision history for this message
John Affleck (jraffleck) wrote :

@serge,

In my case, no: "sudo stop libvirt-bin; sudo start libvirt-bin" does not always completely solve the problem.

'virsh list' worked find after reboot and I was able to bring up virt-manager as well.
% virsh list
 Id Name State
----------------------------------------------------

# Just to make sure it's not ns*cd crashing:
% ps aux | grep 'cd\>'
root 3572 0.0 0.0 771720 2632 ? Ssl 07:53 0:00 /usr/sbin/nscd
nslcd 3611 0.0 0.0 458764 11640 ? Sl 07:53 0:00 /usr/sbin/nslcd
jaffleck 6978 0.0 0.0 6588 708 pts/16 S+ 08:10 0:00 grep --color=auto cd\>

% virt-manager
# (success)
% virsh list
 Id Name State
----------------------------------------------------

% virsh list
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to find group record for gid '1001': No such file or directory

% ps aux | grep 'cd\>'
root 3572 0.0 0.0 771720 2632 ? Ssl 07:53 0:00 /usr/sbin/nscd
nslcd 3611 0.0 0.0 458764 11640 ? Sl 07:53 0:00 /usr/sbin/nslcd
libvirt+ 7088 22.3 25.2 4732808 4144772 ? Sl 08:11 0:54 qemu-system-x86_64 -....
jaffleck 7211 0.0 0.0 6588 724 pts/16 S+ 08:15 0:00 grep --color=auto cd\>

# Exit virt-manager
% sudo service libvirt-bin restart
libvirt-bin stop/waiting
libvirt-bin start/running, process 8002
% virsh list
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to find group record for gid '1001': No such file or directory
% sudo service libvirt-bin restart
libvirt-bin stop/waiting
libvirt-bin start/running, process 8061
% virsh list
 Id Name State
----------------------------------------------------
 2 Windows7 running

% virt-manager
# (success)

..this replicated the behavior I described in #11. So that's one documented instance. I'll see if it happens again.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for the information, John.

Could you provide instructions for the very simplest setup with which I could
reproduce this? I think that'll be simpler than trying to ask you for logfiles
bit by bit.

Revision history for this message
John Affleck (jraffleck) wrote :

Ok. I've managed to re-create this via a bootable live usb image, so I'm at least confident it's not some kruft in my system.

To reproduce from a clean install:
sudo apt-get install libpam-ldap qemu-kvm libvirt-bin

(point ldap at ldap server,....)

sudo auth-client-config -a -p ldap_example

add an ldap user to the libvirtd group

sudo /etc/init.d/qemu-kvm start

login as ldap user, creating home directory and stuff

ubuntu@ubuntu:~$ sudo su - <ldap user>
ldap@ubuntu:~$ virsh list
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to find user record for uid '1001'

Restart libvirtd-bin:
sudo /etc/init.d/libvirt-bin restart

ldap@ubuntu:~$ virsh list
 Id Name State
----------------------------------------------------

I think that's it. I'd be interested if someone else could confirm.

Revision history for this message
Jeronimo (jeronimon) wrote :

I've detected the same situation. In a environment that gets the users and groups from a LDAP server I get the same error about the username and group.

It seems that it's only away about local users and groups. I work around it creating dupes of the username name and group both in /etc/passwd and /etc/group but I don't feel confortable doing this.

Any news about this issue?

Thanks.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Since comment #11 says that libvirtd sometimes needs to be restarted several times before this works, it seems slapd takes some time to start. So the question is how can /etc/init/libvirt-bin.conf's pre-start tell when ldap is ready?

Revision history for this message
Robie Basak (racb) wrote :

"getent" might be useful here.

But if I understand the problem correctly, I'd say that it's a hack and
that init should know when ldap is ready and not start dependents until
then.

Or, hold dependents until it is ready. I believe the current best answer
is "socket activation" but I don't know whether that would need changes
within slapd.

If started via a Sys V init script, does the script exit only when slapd
is ready to answer queries, or does it exit before this time?

Revision history for this message
Mikael Frykholm (mikael) wrote :

I have the same problem. I think the upstart scripts is a red herring. My guess is that there is some race condition in the connection to nss.
I have all my users in ldap and sometimes libvirtd hangs and a couple of restarts of it are needed to be able to connect to it.

Stuff I have tried:
upgrade libvirtd to 1.2.12-1
running with and without nscd

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.