undercloud vm fails to start properly

Bug #1763089 reported by Adam Huffman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Low
Unassigned

Bug Description

Description
===========

Several times today I have tried to use quickstart.sh and each time it has failed because the script is unable to pick up the IP from the undercloud VM. It looks as though the VM has crashed, because the KVM process is stuck at 100% and the virsh console is not responsive - it connects and shows nothing when I press enter.

Steps to reproduce
==================

bash quickstart.sh --install-deps
export VIRTHOST=<virthost IP>
bash quickstart.sh -R queens $VIRTHOST

Expected result
===============

undercloud VM starts normally and the rest of the script proceeds.

Actual result
=============

The script cannot capture the undercloud VM IP, so it stops:

TASK [setup/undercloud : Start undercloud vm] ********************************************************************************************************************************************************
task path: /home/stack/.quickstart/tripleo-quickstart/roles/libvirt/setup/undercloud/tasks/main.yml:328
Wednesday 11 April 2018 17:38:31 +0100 (0:00:00.042) 0:12:13.584 *******
changed: [10.188.2.120] => {"changed": true, "failed": false, "msg": 0}

TASK [setup/undercloud : Get undercloud vm ip address] ***********************************************************************************************************************************************
task path: /home/stack/.quickstart/tripleo-quickstart/roles/libvirt/setup/undercloud/tasks/main.yml:341
Wednesday 11 April 2018 17:38:33 +0100 (0:00:01.901) 0:12:15.485 *******
FAILED - RETRYING: Get undercloud vm ip address (20 retries left).
FAILED - RETRYING: Get undercloud vm ip address (19 retries left).
FAILED - RETRYING: Get undercloud vm ip address (18 retries left).
FAILED - RETRYING: Get undercloud vm ip address (17 retries left).
FAILED - RETRYING: Get undercloud vm ip address (16 retries left).
FAILED - RETRYING: Get undercloud vm ip address (15 retries left).
FAILED - RETRYING: Get undercloud vm ip address (14 retries left).
FAILED - RETRYING: Get undercloud vm ip address (13 retries left).
FAILED - RETRYING: Get undercloud vm ip address (12 retries left).
FAILED - RETRYING: Get undercloud vm ip address (11 retries left).
FAILED - RETRYING: Get undercloud vm ip address (10 retries left).
FAILED - RETRYING: Get undercloud vm ip address (9 retries left).
FAILED - RETRYING: Get undercloud vm ip address (8 retries left).
FAILED - RETRYING: Get undercloud vm ip address (7 retries left).
FAILED - RETRYING: Get undercloud vm ip address (6 retries left).
FAILED - RETRYING: Get undercloud vm ip address (5 retries left).
FAILED - RETRYING: Get undercloud vm ip address (4 retries left).
FAILED - RETRYING: Get undercloud vm ip address (3 retries left).
FAILED - RETRYING: Get undercloud vm ip address (2 retries left).
FAILED - RETRYING: Get undercloud vm ip address (1 retries left).
fatal: [10.188.2.120]: FAILED! => {"attempts": 20, "changed": true, "failed": true, "msg": "non-zero return code", "rc": 1, "stderr": "Connection to 10.188.2.120 closed.\r\n", "stdout": "undercloud ip is not available\r\n", "stdout_lines": ["undercloud ip is not available"]}

PLAY RECAP *******************************************************************************************************************************************************************************************
10.188.2.120 : ok=140 changed=58 unreachable=0 failed=1
localhost : ok=11 changed=4 unreachable=0 failed=0

I saw the same result with Pike.

Environment
===========

CentOS 7.4

Revision history for this message
Emilien Macchi (emilienm) wrote :

you're probably having issues with RDO cloud, don't you?

Changed in tripleo:
status: New → Triaged
importance: Undecided → Low
milestone: none → rocky-1
Revision history for this message
Adam Huffman (adam-huffman) wrote :

Not sure what you mean?

Are you referring to the images generated by RDO?

Revision history for this message
Adam Huffman (adam-huffman) wrote :

Just tried again with the image that was built yesterday (from /queens/rdo_trunk/b18490299814d62f21b6594a3e95da68465b2e0d_85b157a9) with the same result.

Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Revision history for this message
Adam Huffman (adam-huffman) wrote :

Still happening with today's checkout. Would be grateful for any troubleshooting suggestions...

Revision history for this message
Adam Huffman (adam-huffman) wrote :

Finally managed to grab the state of the console via virt-manager:

SeaBIOS (version 1.11.0-2.el7)
Machine UUID <snip>

iPXE (http://ipxe.org) 00:03.0 C980 PCI2.10 PnP PMM+BFF94540+BFEF4540 C980

iPXE (http://ipxe.org) 00:04.0 C980 PCI2.10 PnP PMM+BFF94540+BFEF4540 C980

Booting from ROM…
Probing EDD (edd=off to disable)… ok

Revision history for this message
Adam Huffman (adam-huffman) wrote :

Still seeing this with latest master - undercloud VM is stuck after
"
Probing EDD (edd=off to disable)... ok
"

Seen this on several completely separate nodes now.

Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Revision history for this message
Adam Huffman (adam-huffman) wrote :

I believe this is caused by problems with nested virtualization when running certain kernels on Skylake. When I manually changed the CPU model in libvirt, the VM does boot.

Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
Marios Andreou (marios-b) wrote :

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: wallaby-3 → none
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.