Unable to start VM controller

Bug #1728535 reported by Dakshina Ilangovan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-training-labs
Invalid
Undecided
Roger Luethi

Bug Description

There is an indefinite wait on running VM, getting IP, MAC etc. Any suggestions to debug will be helpful. Snapshot and logs attached.

Revision history for this message
Dakshina Ilangovan (dakshinai) wrote :
Revision history for this message
Dakshina Ilangovan (dakshinai) wrote :
Revision history for this message
Roger Luethi (rl-o) wrote :

Looks like the controller VM does not boot (properly). You would see this behavior if the basedisk was broken. If, for instance, the basedisk creation fails during the distro install and you try to build a cluster anyway, the first thing you'll notice is that the disk image won't boot. The basedisk installation should end with something like this on the console:

INFO Base disk created.
INFO stacktrain base disk build ends.
INFO Basedisk build took 539 seconds

To be on the safe side, you can rebuild the basedisk with "./st.py -b basedisk".

Changed in labs:
assignee: nobody → Roger Luethi (rl-o)
Revision history for this message
Dakshina Ilangovan (dakshinai) wrote :

Hi Roger,

I had to perform the following steps to get past error
error: failed to get pool 'default'
error: Storage pool not found: no storage pool with matching name 'default'

sudo virsh pool-define-as default dir --target $TRAININGLABS/labs/img
sudo virsh pool-autostart default
sudo virsh pool-start default

I tried setting up the base disk first, however the script loops in the following state as described? Could you suggest why? PFA logs.
.INFO Waiting for ssh server in VM base to respond at 192.168.122.11:22.
WARNING Adjusting permissions for key file (0400):
        /home/daks/git/training-labs/labs/lib/osbash-ssh-keys/osbash_key
.WARNING Adjusting permissions for key file (0400):
        /home/daks/git/training-labs/labs/lib/osbash-ssh-keys/osbash_key
.WARNING Adjusting permissions for key file (0400):
        /home/daks/git/training-labs/labs/lib/osbash-ssh-keys/osbash_key
.WARNING Adjusting permissions for key file (0400):
        /home/daks/git/training-labs/labs/lib/osbash-ssh-keys/osbash_key

Revision history for this message
Roger Luethi (rl-o) wrote :

Regarding the default pool and network: on some distros, you have to start virt-manager once to create them. We could try to do it in our scripts, but making it work reliably on all distros would take extensive testing. I guess most users will have done that anyway (just when using libvirt/KVM for something else), but at least we could document that better.

What Linux distribution are you using?

You are getting all these warnings because the chmod call fails to change the permissions on the file and (re)discovers the need to adjust the permissions every time it checks whether the VM is already responding.

So the next question is: what filesystem is /home/daks/git/training-labs/labs/lib/osbash-ssh-keys/osbash_key on? Is your home on an SMB mount or something like that? Can you change the permissions to 400 manually, and will they stick?

It is possible that your basedisk was still building in the background, despite the warnings. If the chmod call fails, I would expect you to get a thousand or more of these warnings until the basedisk is built (depending on how based the VM and its Internet connection are). Have you looked at the base VM console (using virt-manager) while you were getting the warnings?

Revision history for this message
Roger Luethi (rl-o) wrote :

I am genuinely interested in the answers to my questions. Any updates?

Revision history for this message
Dakshina Ilangovan (dakshinai) wrote :
Download full text (4.3 KiB)

Sorry for the delay Roger.

I'm using a Fedora 25 server machine with kernel version 4.13.5-200.fc26.x86_64. I'm running the script as 'daks' user with sudo permissions. However it failed with error

./st.py -b basedisk
INFO Using provider kvm.
[sudo] password for daks:
INFO stacktrain start at Thu Nov 2 16:23:44 2017
Basedisk exists: base-ssh-pike-ubuntu-16.04-amd64
        Destroy and recreate? [y/N] n
Nothing to do.
Done, returning now.
[daks@aj09-26-wcp labs]$ ./st.py -b basedisk
INFO Using provider kvm.
INFO stacktrain start at Thu Nov 2 16:23:52 2017
Basedisk exists: base-ssh-pike-ubuntu-16.04-amd64
        Destroy and recreate? [y/N] y
INFO Deleting existing basedisk.
INFO Asked to delete VM base.
INFO not found
INFO Deleting existing basedisk.
INFO ISO image okay.
INFO Install ISO:
        /home/daks/git/training-labs/labs/img/ubuntu-16.04.3-server-amd64.iso
INFO base_fixups.sh -> 00_base_fixups.sh
INFO apt_init.sh -> 01_apt_init.sh
INFO apt_upgrade.sh -> 02_apt_upgrade.sh
INFO pre-download.sh -> 03_pre-download.sh
INFO apt_pre-download.sh -> 04_apt_pre-download.sh
INFO enable_osbash_ssh_keys.sh -> 05_enable_osbash_ssh_keys.sh
INFO zero_empty.sh -> 06_zero_empty.sh
INFO shutdown.sh -> 07_shutdown.sh
WARNING /home/daks/git/training-labs/labs/img/base-ssh-pike-ubuntu-16.04-amd64 may not be accessible by the hypervisor. You will need to grant the 'qemu' user search permissions for the following directories: ['/home/daks']
WARNING No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results.
WARNING Graphics requested but DISPLAY is not set. Not running virt-viewer.

Starting install...
ERROR Cannot access storage file '/home/daks/git/training-labs/labs/img/base-ssh-pike-ubuntu-16.04-amd64' (as uid:107, gid:107): Permission denied
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
  virsh --connect qemu:///system start base
otherwise, please restart your installation.

Previously I added qemu and daks to a usergroup that can access training labs directory but it may not be the right solution. I also did not set it correctly since user 'daks' had access issues.

I un-commented user/group fields for qemu to work as root in /etc/libvirt/qemu.conf and restarted libvirtd.

Now I see a new error.

 ./st.py -b basedisk
INFO Using provider kvm.
[sudo] password for daks:
INFO stacktrain start at Thu Nov 2 17:11:04 2017
Basedisk exists: base-ssh-pike-ubuntu-16.04-amd64
        Destroy and recreate? [y/N] y
INFO Deleting existing basedisk.
INFO Asked to delete VM base.
INFO not found
INFO Deleting existing basedisk.
INFO ISO image okay.
INFO Install ISO:
        /home/daks/git/training-labs/labs/img/ubuntu-16.04.3-server-amd64.iso
INFO base_fixups.sh -> 00_base_fixups.sh
INFO apt_init.sh -> 01_apt_init.sh
INFO apt_upgrade.sh -> 02_apt_upgrade.sh
INFO pre-download.sh -> 03_pre-download.sh
INFO apt_pre-download.sh -> 04_apt_p...

Read more...

Revision history for this message
Roger Luethi (rl-o) wrote :

KVM has always been harder than VirtualBox (less integrated solution, differences in setup between distros), but it should not be as difficult as it appears to be in your case.

We try to require as few changes as possible, which is one reason why we use the default storage pool and network. You should not have to touch the libvirt configution (/etc/libvirt/qemu.conf).

The other reason for having virt-manager create the default pool is that subtle differences between Linux distributions may require different configurations that we don't know about. The default pool location that you set differs from the one set by virt-manager (where, at least for Fedora, it is "/var/lib/libvirt/images"). This may be of additional importance on Fedora if SELinux is active.

You may want to remove the default pool and have it created by virt-manager. On a Fedora system, the resulting file (/etc/libvirt/storage/default.xml) should look something like this:

<pool type='dir'>
  <name>default</name>
  <uuid>e92b97a7-d775-44f2-9528-4e48d26c0c40</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
  </source>
  <target>
    <path>/var/lib/libvirt/images</path>
  </target>
</pool>

It might be a good idea to try creating a VM manually (using virt-manager or virsh), just to verify that your KVM installation is working.

I am still puzzled by the chmod failures you had earlier. I can't see a reason why that would happen. But I guess we will get back to that once the basedisk installs.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to training-labs (master)

Fix proposed to branch: master
Review: https://review.openstack.org/518500

Changed in labs:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to training-labs (master)

Reviewed: https://review.openstack.org/518500
Committed: https://git.openstack.org/cgit/openstack/training-labs/commit/?id=d1c4dd1c090fe1663e23f8e8501c0dfb49144f5b
Submitter: Zuul
Branch: master

commit d1c4dd1c090fe1663e23f8e8501c0dfb49144f5b
Author: Roger Luethi <email address hidden>
Date: Wed Nov 8 10:16:11 2017 +0100

    Warn if KVM pool 'default' does not exist

    For KVM, the training-labs scripts use the storage pool, 'default'. Some
    users run the scripts on systems without default pool and see a
    confusing error leading to failure.

    The pool is created automatically when the virt-manager GUI is first
    started.

    Until we know how to do what virt-manager does correctly on all Linux
    distributions, this patch warns users and tells them to run
    virt-manager.

    Partial-Bug: #1728535
    Change-Id: Id4abc28d24790c93d02e4fe3044b9034fd76a56c

Revision history for this message
Dakshina Ilangovan (dakshinai) wrote :

Hi,

I was able to create a basedisk after adding user 'daks' (running training labs) to 'kvm' usergroup. During installation of cluster, I see error at step, Start autostart/04_apt_install_mysql.sh

"Failed to create new system journal No space left on device"

I do see enough space. Any quick suggestions Also could suggest a way for quick cleanup and system requirements?

Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 11M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/fedora_aj09--26--wcp-root 15G 15G 52K 100% /
tmpfs 32G 4.0K 32G 1% /tmp
/dev/sda2 976M 165M 745M 19% /boot
tmpfs 6.3G 0 6.3G 0% /run/user/0
tmpfs 6.3G 0 6.3G 0% /run/user/1000

Revision history for this message
Roger Luethi (rl-o) wrote :

libvirt (the library around KVM) stores the training-labs images on your root partition (you will probably find the images in /var/lib/libvirt/images/).

Your root partition is 100% full (only 52 KB remain). Your system should be adequate for building training-labs, but your root disk is too small. You need at least 15 GB free disk space in /var/lib/libvirt/images when you start building (20 GB are recommended).

Revision history for this message
Roger Luethi (rl-o) wrote :

I am closing this bug. The last bug was not having enough disk space.

Changed in labs:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.