Nova bm operations fail when LIBVIRT_DEFAULT_URI not set

Bug #1226310 reported by Steve Baker
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Giulio Fidente

Bug Description

In the following scenario:
- Fedora host
- Ubuntu seed-vm

nova baremetal calls from seed-vm -> host fails as fedora does not set uri_default = "qemu:///system" by default at the system level.

A workaround is to do the following:
- on seed-vm, edit /etc/init/nova-compute.conf, add on line 3:
    env LIBVIRT_DEFAULT_URI=qemu:///system
- run:
  service nova-compute restart

Proper fix is to modify os-svc-install to take an argument which allows env to be set in upstart files, and specify LIBVIRT_DEFAULT_URI=qemu:///system in nova/install.d/74-nova

A similar fix may be required for systemd scripts.

Revision history for this message
Mark McLoughlin (markmc) wrote :

This feels like it could be a config setting for the virtual power driver e.g.

  [baremetal]
  virtual_power_host_user=markmc
  virtual_power_host_key=/opt/stack/boot-stack/virtual-power-key
  virtual_power_ssh_host=192.168.122.1
  virtual_power_type=virsh
  virtual_power_libvirt_uri=qemu:///system

I'm also going to try working around by adding this:

  uri_default=qemu:///system

to ~/.config/libvirt.conf in the $vp_host_user account on $vp_ssh_host

Revision history for this message
Mark McLoughlin (markmc) wrote :

It's actually:

  uri_default = qemu:///system

in ~/.config/libvirt/libvirt.conf

(i.e. I got the path wrong)

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Workarounds #1 and #2 error on Fedora 19. The correct contents of ~/.config/libvirt/libvirt.conf should be:

uri_default = "qemu:///system"

Revision history for this message
Robert Collins (lifeless) wrote :

I'm fairly sure we fixed this durin the sprint.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
status: Triaged → Fix Released
Revision history for this message
Tim Serong (tserong) wrote :

According to the git logs, commit 96f3f7d (Change-Id: I1279bb1b303c81a7ae627ee0b579302e5163d398) is meant to fix this, but AFAICT it doesn't fix it on my openSUSE host. I'm not sure why yet, I haven't had a chance to dig any deeper, just mentioning this for the record. I can confirm that the workaround in comment #3 works fine for me though.

Revision history for this message
Jan Provaznik (jan-provaznik) wrote :

I don't think the released patch fixes this issue (Change-Id: I1279bb1b303c81a7ae627ee0b579302e5163d398) - it just sets environment variable for nova-compute service in VM. But this variable is needed on virtual host where nova's virtual driver is ssh-ing to. I don't see anywhere in nova code that this variable would be taken into account when doing ssh connection (and local env variables are not automatically propagated through ssh). I like Mark's solution - make it config setting.

Setting back to Triaged.

Changed in tripleo:
status: Fix Released → Triaged
Revision history for this message
Imre Farkas (ifarkas) wrote :

I can confirm it's not fixed. I am still seeing this issue on Fedora unless I set LIBVIRT_DEFAULT_URI in /etc/profile.d/virsh.sh as a workaround.

Revision history for this message
Giulio Fidente (gfidente) wrote :

I'm attempting to fix this in nova-bm introducing support for generic options to be passed to the virtual_power_command , see https://review.openstack.org/#/c/96184/

The reason for not adding a virsh specific option is that the virtual_power driver class is command agnostic and at the moment has already transparent support for vbox and virsh.

Revision history for this message
aeva black (tenbrae) wrote :

Thinking of how we'd solve this in Ironic, I don't believe that fixing this with a CONF option is appropriate, for a few reasons.

1) this is really a property of the host on which the (virtual) node being managed exists, not a global setting for all nodes. Imagine having the seed vm on an Ubuntu host, some nodes on a Fedora host and some nodes on an Ubuntu host. With a CONF setting, you couldn't use VMs on both the Fedora and Ubuntu hosts.

2) because this is really a property of the host on the other end of the SSH connection, it should be configured properly there.

3) this solution only applies to virsh/libvirt, and would need separate CONF options for any additional virt types that the SSH driver supported (eg, vbox and vmware), whereas, again, a per-node setting would be a general solution.

Revision history for this message
Giulio Fidente (gfidente) wrote :

I did some further verification and this affects Debian Wheezy as well. As per libvirt defaults, non-root users are connected to qemu:///session not qemu:///system

Here are the results from a fresh installation:

root@jeosd7:~# cat /etc/debian_version
7.5

root@jeosd7:~# virsh uri
qemu:///system

root@jeosd7:~# su -l gfidente
gfidente@jeosd7:~$ virsh uri
qemu:///session

Revision history for this message
Giulio Fidente (gfidente) wrote :

Yet I can see why devananda suggests we can fix it with a config setting on the host: the virtual power driver is not connecting to libvirtd directly, it is actually ssh-ing on the host and using virsh from there.

I'm really not sure what would be the best way to proceed for a fix but given this is a _real_ issue, I'd like to receive some feedback/proposals here!

Thanks!

Changed in tripleo:
assignee: nobody → Giulio Fidente (gfidente)
Revision history for this message
James Slagle (james-slagle) wrote :

IMO, i am not opposed to adding LIBVIRT_DEFAULT_URI=qemu:///system to the user's .bashrc file or the per user libvirt configuration file (i think you can set this under ~/.config).

i just don't buy the argument that this would be surprising or unexpected to someone running devtest. Given all the other things we do (including requiring --trash-my-machine, adding interfaces, routes, etc), this is certainly no more surprising than any of that.

And, this trips up almost every new user on Fedora/Red Hat distros, and will for Debian users as well.

The other option is to update all tripleo scripts where virsh is called to never use sudo, and never assume any path under /var/lib/libvirt, and instead use virsh to query for all such paths. That may be more technically correct, but the level of effort is much more than the former option. And given that we are TRASHing your machine, i just don't understand the opposition to adding it to ~/.bashrc automatically so that this doesn't trip people up.

Revision history for this message
Giulio Fidente (gfidente) wrote :

I'm seeing this same setup on a freshly installed ubuntu 12.04; if it works in CI we are either setting the URI in the user environment OR maybe it is set during the image setup

tripleo@jeosu1204:/var/log/libvirt$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.4 LTS"

tripleo@jeosu1204:/var/log/libvirt$ virsh uri
qemu:///session

tripleo@jeosu1204:/var/log/libvirt$ sudo virsh uri
qemu:///system

tripleo@jeosu1204:/var/log/libvirt$

Revision history for this message
Dan Prince (dan-prince) wrote :

Couple more data points here:

The Ironic ssh power driver automatically sets --connect qemu:///system.

Furthermore we specifically allow this to be done as part of the allowed commands in the ci_commands script:

http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/elements/testenv-worker/bin/ci_commands#n64

So if you are using Ironic this won't happen now right?

---

We don't seem to set this in the old Nova driver at this point however...

So if this is still a problem why not set it there too?

Revision history for this message
Dan Prince (dan-prince) wrote :

Giulio: With regards to the CI setup... I think the reason it works there is because we are ssh'ing into the root account. The Jenkins slaves ssh into the root account on the testenv server and execute this command via the ci_command wrapper script. Because the ssh command is running as root there is no need to specify the URI.

Normal devtest users are likely using a normal user account which defaults to qemu:///session. So they would need to set this manually...

Revision history for this message
Giulio Fidente (gfidente) wrote :

Thanks Dan! I submitted a small change which could fix this for the nova baremetal driver too: https://review.openstack.org/#/c/96184/

Revision history for this message
Giulio Fidente (gfidente) wrote :

https://review.openstack.org/#/c/96184/ merged, nova-bm will always use qemu:///system as per ironic default config setting [1] so I'm closing this bug now.

1. https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/ssh.py#L41

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.