juju bootstrap hangs for local environment under precise on vmware

Bug #920454 reported by Brad Crittenden on 2012-01-23
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
libvirt
Fix Released
High
pyjuju
High
Unassigned
libvirt (Ubuntu)
Undecided
Unassigned
Precise
Medium
Unassigned

Bug Description

With precise, 'juju bootstrap -e local' fails to start networking. The call to 'net.start()' hangs.

This problem only occurs when using precise on vmware. Precise on metal works and Oneiric with vmware works.

tags: added: local

Created attachment 565119
gdb backtrace of virsh

Description of problem:
Using FC16 as VMware Workstation 8 guest with Intel VT-x virtualisation so that I can test KVM. When installing libvirt & qemu-kvm I am unable to connect to the local hypervisor with virsh (or virt-manager for that matter).

Running fallback Gnome desktop environment and latest updates

Have tried disabling auth (set to none) in the libvirtd.conf and disabling selinux (setenforce 0). Also tried with std user & root user.

Version-Release number of selected component (if applicable):
* FC16 stock with all updates (also tested with testing updates)
* Kernel 3.2.6-3.fc16.x86_64
* libvirt 0.9.6-4.fc16

How reproducible:
Have reproduced on another system, using fresh FC16 install as VMware Workstation 8 guest. Same results.

Steps to Reproduce:
1. Install FC16 as VMware guest with Intel VT-x virtualisation
2. Install qemu-kvm & libvirt
3. Type qemu --connect qemu:///system

Actual results:
Process hangs until ^C

Expected results:
Virsh prompt connected to local hypervisor

Additional info:
In the hope that it is useful, I have attached a gdb backtrace while it is hanging. I ran debuginfo-install libvirt then:

virsh --connect qemu:///system &
gdb
attach [processid]
backtrace
See attachment for backtrace

Can you provide a backtrace of all the libvirtd threads with bt -a when this problem is occurring?

And when you reproduce the hang, is dmidecode running?

ps axwww | grep dmide

Hi,

1. Attachment created: backtrace of libvirtd attached
I did not fully understand your instructions, I hope this is the information that you require, let me know if there's anything more that you want - the gdb commands that I used are in the attachment,

2. Results of ps axwww | grep dmide:

1484 ? S 0:00 /usr/sbin/dmidecode -q -t 0,1,4,17

Matt

Created attachment 565125
libvirtd backtrace (all threads)

Yeah I've heard of this issue before, the dmidecode hang in vmware guests. I think there's a patch upstream for it

Eric, do you know more about this?

Matt, that's what I was looking for. I have the same thought Cole did which is that this is dmidecode related.

Are you willing to try building upstream libvirt to see if it makes the problem go away? I'm not convinced it's fixed upstream yet, but if you can repro this at will and test builds I'm sure we can figure it out.

Sure Dave. Can you provide me some high-level instructions, or point me to a site that might have something similar?

Thanks,

Matt

bug 783453 is another example of a dmidecode hang; F16 does not (yet) have the two patches mentioned in that bug:

commit 06b9c5b9231ef4dbd4b5ff69564305cd4f814879
Author: Michal Privoznik <email address hidden>
Date: Tue Jan 3 18:40:55 2012 +0100

    virCommand: Properly handle POLLHUP

    It is a good practise to set revents to zero before doing any poll().
    Moreover, we should check if event we waited for really occurred or
    if any of fds we were polling on didn't encountered hangup.

commit d19149dda888d36cea58b6cdf7446f98bd1bf734
Author: Laszlo Ersek <email address hidden>
Date: Tue Jan 24 15:55:19 2012 +0100

    virCommandProcessIO(): make poll() usage more robust

    POLLIN and POLLHUP are not mutually exclusive. Currently the following
    seems possible: the child writes 3K to its stdout or stderr pipe, and
    immediately closes it. We get POLLIN|POLLHUP (I'm not sure that's possible
    on Linux, but SUSv4 seems to allow it). We read 1K and throw away the
    rest.

But it is not certain whether those two patches are all that's needed, or whether we need yet a third patch backported to the F16 build.

After a bit of investigation - I am currently building the fc17 version of libvirt from src RPM.

During the build of libvirt-0.9.10-1 from the fc17 source repo, the test for virsh-all hung. It seems that dmidecode was the issue again - the build continued once I have terminated the dmidecode process.

Once the new RPM was installed - and once I had disabled TLS auth :) - the problem is solved. Both virsh and virt-manager connect without issue.

P.S.
There was a sanlock=>0.8 dependency that I ignored for now as I don't have shared storage.

So now the question is, are the two patches Eric mentioned sufficient, or is there some other required commit? Osier, I'm about to go offline for the day, would you mind spinning an F16 test build with just the two patches and see if it still fixes the problem?

(In reply to comment #13)
> So now the question is, are the two patches Eric mentioned sufficient, or is
> there some other required commit? Osier, I'm about to go offline for the day,
> would you mind spinning an F16 test build with just the two patches and see if
> it still fixes the problem?

Let me do it.

(In reply to comment #14)
> (In reply to comment #13)
> > So now the question is, are the two patches Eric mentioned sufficient, or is
> > there some other required commit? Osier, I'm about to go offline for the day,
> > would you mind spinning an F16 test build with just the two patches and see if
> > it still fixes the problem?
>
> Let me do it.

Tested with installing VMware Workstation 8, and fc16 guest, the problem was resolved exactly with those two patches applied in the testing build.

libvirt-0.9.6-5.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/libvirt-0.9.6-5.fc16

Package libvirt-0.9.6-5.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-0.9.6-5.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-3067/libvirt-0.9.6-5.fc16
then log in and leave karma (feedback).

libvirt-0.9.6-5.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.

I apologize for the noise, devs. I'm posting this to benefit those searching for RHEL solutions to this very problem. :)

This problem with libvirt exists in RHEL 6.2, and I stumbled upon it while preparing for RHCSA/RHCE recertification. My study environment consists of VMWare Workstation 8.0.2-591240 and RHEL 6.2.

This is fixed in RHEL 6.3 beta as of 2012/04/25.

Clint Byrum (clint-fewbar) wrote :

3 people affected, this one deserves some investigation. I suspect that the libvirt networking is fighting with vmware's guest additions.

Changed in juju:
status: New → Confirmed
importance: Undecided → High
Sidnei da Silva (sidnei) wrote :

See bug upstream, specially this comment:

  https://bugzilla.redhat.com/show_bug.cgi?id=796451#c10

Clint Byrum (clint-fewbar) wrote :

Looks like we can cherry pick those two patches into precise and solve this bug for vmware users.

Changed in libvirt (Ubuntu):
status: New → Fix Released
Changed in libvirt (Ubuntu Precise):
status: New → Triaged
importance: Undecided → Medium
Changed in juju:
status: Confirmed → Triaged
Brad Crittenden (bac) wrote :

This bug still exists in precise but is fixed in quantal.

Kapil Thangavelu (hazmat) wrote :

in addition to being fixed in quantal we're also no longer using libvirt networking.. marking resolved.

Changed in juju:
status: Triaged → Fix Released
milestone: none → 0.6.1
Changed in libvirt:
importance: Unknown → High
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.