since ubuntu 18 bionic release and latest, the ubuntu18 cloud image is unable to boot up on openstack instance

Bug #1851552 reported by Vasili
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Community Project
New
Undecided
Unassigned
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
cloud-init
Invalid
Undecided
Unassigned
networking-calico
Invalid
Undecided
Unassigned
qemu-kvm
New
Undecided
Unassigned
cloud-init (Ubuntu)
Invalid
Undecided
Unassigned
qemu (Ubuntu)
New
Undecided
Unassigned

Bug Description

Openstack Queens release which is running on ubuntu 18 LTS Controller and Compute.
Tried to boot up the instance via horizon dashboard without success.
Nova flow works perfect.
When access to console I discovered that the boot process stuck in the middle.
[[0;1;31m TIME [0m] Timed out waiting for device dev-vdb.device.
[[0;1;33mDEPEND[0m] Dependency failed for /mnt.
[[0;1;33mDEPEND[0m] Dependency failed for File System Check on /dev/vdb.
It receives IP but looks like not get configured at time.
since ubuntu 18 there is netplan feature managing the network interfaces
please advise.

more details as follow:
https://bugs.launchpad.net/networking-calico/+bug/1851548

Tags: metadata
Revision history for this message
Paride Legovini (paride) wrote :

Hello,

Could you please elaborate a bit more on how you came to the conclusion that the problem is caused specifically by cloud-init? Without some more context information it's difficult for us to tell if this is actually a bug and to begin working on it.

If you think this is actually a problem with cloud-init, could you please run `cloud-init collect-logs` and attach the generated tarball to this bug report? The collected logs will help us understand what's going on.

I'm marking this report as Incomplete for the moment, please change its status back to New after providing additional information. Thanks!

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
Vasili (vasili.namatov) wrote :

The instance creation was working until calico configured on controller and compute. Ubuntu 16 and Centos releases are booting up successfully. As known ubuntu began to work with netplan since 18 and all latest releases. Not sure the issue with cloud init or the order or timing of booting process which relates to getting IP in time and properly.

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Vasili (vasili.namatov) wrote :

the problem is in:

[[0;32m OK [0m] Started Wait for Network to be Configured.

Revision history for this message
Vasili (vasili.namatov) wrote :

I used the following official latest ubuntu bionic image:
http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img

And the regular openstack command:
https://docs.openstack.org/mitaka/install-guide-ubuntu/launch-instance-provider.html

openstack server create --flavor ubuntu-flavor --image ubuntu-bionic-latest \
  --nic net-id=c9d82a5d-e075-4d66-8ecd-1092fa218ad7 --security-group allow_all \
  --key-name cloud-keypair.private ubuntu-bionic-instance

more details as follow:
https://bugs.launchpad.net/networking-calico/+bug/1851548

Revision history for this message
Vasili (vasili.namatov) wrote :

see full log https://etherpad.openstack.org/p/ubuntu-xenial-log for successful creation of instance with latest ubuntu-xenial cloud image http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img

see full log https://etherpad.openstack.org/p/ubuntu-bionic-log for successful creation of instance with latest ubuntu-bionic cloud image http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img

Revision history for this message
Vasili (vasili.namatov) wrote :

correction:

see full log https://etherpad.openstack.org/p/ubuntu-xenial-log for successful creation of instance with latest ubuntu-xenial cloud image http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img

see full log https://etherpad.openstack.org/p/ubuntu-bionic-log for NOT SUCCESSFUL creation of instance with latest ubuntu-bionic cloud image http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img

Revision history for this message
Scott Moser (smoser) wrote :

Hi.
Please attach the output of 'cloud-init collect-logs'. Ideally from the 18.04 instance, but the 16.04 instance would be fine if you're not able to get it from 18.04.

Then, set the status of this bug back to New.

thanks.

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
Vasili (vasili.namatov) wrote :

see attached collected cloud init logs from ubuntu xenial..

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Vasili, unfortunately there isn't enough info in the 16.04 logs to help us work out what's going on with 18.04. Do you have any way of accessing an 18.04 instance (serial console, perhaps?) that would allow you to gather more data?

Moving this back to Incomplete for now, apologies for the round trips!

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
Ryan Harper (raharper) wrote :

> [[0;1;33mDEPEND[0m] Dependency failed for File System Check on /dev/vdb.

Looking at the bionic log you posted, it never gets a /dev/vdb device. Can you confirm that the VM configuration on the compute node correctly was configured with an ephemeral block device?

Here we can see not all of the block devices expected are present...

[[0;32m OK [0m] Started udev Coldplug all Devices.
[[0m[0;31m* [0m] (1 of 3) A start job is running for���label-UEFI.device (19s / 1min 30s)[K[[0;1;31m*[0m[0;31m*

Also, looking at the 16.04 boot, it looks like this is nested virtualization, I can see in the journal that the xenial kernel is hitting this one:

Nov 24 16:18:43.138019 ubuntu kernel: ------------[ cut here ]------------
Nov 24 16:18:43.138390 ubuntu kernel: WARNING: CPU: 0 PID: 0 at /build/linux-mU1Buo/linux-4.4.0/arch/x86/kernel/fpu/xstate.c:517 fpu__init_system_xstate+0x37e/0x764()
Nov 24 16:18:43.138624 ubuntu kernel: XSAVE consistency problem, dumping leaves
Nov 24 16:18:43.147521 ubuntu kernel: Modules linked in:
Nov 24 16:18:43.147832 ubuntu kernel:
Nov 24 16:18:43.148048 ubuntu kernel: CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.0-169-generic #198-Ubuntu
Nov 24 16:18:43.148268 ubuntu kernel: 0000000000000086 a2c4204db3cb6ecb ffffffff81e03d80 ffffffff8140c8e1
Nov 24 16:18:43.148582 ubuntu kernel: ffffffff81e03dc8 ffffffff81cb3c68 ffffffff81e03db8 ffffffff81086492
Nov 24 16:18:43.148788 ubuntu kernel: 0000000000000008 0000000000000440 0000000000000040 ffffffff81e03e4c
Nov 24 16:18:43.149274 ubuntu kernel: Call Trace:
Nov 24 16:18:43.149480 ubuntu kernel: [<ffffffff8140c8e1>] dump_stack+0x63/0x82
Nov 24 16:18:43.149687 ubuntu kernel: [<ffffffff81086492>] warn_slowpath_common+0x82/0xc0
Nov 24 16:18:43.149892 ubuntu kernel: [<ffffffff8108652c>] warn_slowpath_fmt+0x5c/0x80
Nov 24 16:18:43.150100 ubuntu kernel: [<ffffffff81081f86>] ? xfeature_size+0x59/0x77
Nov 24 16:18:43.150343 ubuntu kernel: [<ffffffff81f783f1>] fpu__init_system_xstate+0x37e/0x764
Nov 24 16:18:43.150549 ubuntu kernel: [<ffffffff81f68120>] ? early_idt_handler_array+0x120/0x120
Nov 24 16:18:43.150756 ubuntu kernel: [<ffffffff81f77dfa>] fpu__init_system+0x1e7/0x28e
Nov 24 16:18:43.159459 ubuntu kernel: [<ffffffff81f790a6>] early_cpu_init+0x2b6/0x2bb
Nov 24 16:18:43.159715 ubuntu kernel: [<ffffffff81f790a6>] ? early_cpu_init+0x2b6/0x2bb
Nov 24 16:18:43.160030 ubuntu kernel: [<ffffffff81f7439d>] setup_arch+0xc0/0xd1f
Nov 24 16:18:43.160240 ubuntu kernel: [<ffffffff81f68120>] ? early_idt_handler_array+0x120/0x120
Nov 24 16:18:43.160528 ubuntu kernel: [<ffffffff81f68c16>] start_kernel+0xe2/0x4a4
Nov 24 16:18:43.160737 ubuntu kernel: [<ffffffff81f68120>] ? early_idt_handler_array+0x120/0x120
Nov 24 16:18:43.160926 ubuntu kernel: [<ffffffff81f682da>] x86_64_start_reservations+0x2a/0x2c
Nov 24 16:18:43.161133 ubuntu kernel: [<ffffffff81f68426>] x86_64_start_kernel+0x14a/0x16d
Nov 24 16:18:43.161624 ubuntu kernel: ---[ end trace 4d5ff9f2f68c4233 ]---

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829555

Revision history for this message
Vasili (vasili.namatov) wrote :

Hi,
Thanks for shared investigation details.
According to the setup I use, could you assist to understand where and what I'm missing here that can lead to the issues you mentioned?

### Working setup details ###

[1] The Openstack service installed on Controller and Compute with Ubuntu 18.04.2
With Minimal deployment for Queens:

Keystone
Glance
Nova
Neutron
Horizon

Reference:
https://docs.openstack.org/install-guide/openstack-services.html#minimal-deployment-for-queens

All the configurations done according to the guide (reference I mentioned).

[2] The add-ons features included to Openstack are:

Calico driver plugin for Neutron
Calico Bird for BGP Peering
Calico Felix for Security
Calico DHCP Agent instead Neutron DHCP

Reference:
https://docs.projectcalico.org/v3.10/getting-started/openstack/

Thanks,
Vasili

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Changed in qemu (Ubuntu):
status: New → Confirmed
Thomas Huth (th-huth)
affects: qemu → qemu (Ubuntu)
Revision history for this message
Ryan Harper (raharper) wrote :

For your Openstack deployment, are you running on baremetal?
Are you deploying something like devstack or triple-o which enable nested virtualization?

https://docs.openstack.org/devstack/latest/guides/devstack-with-nested-kvm.html
https://docs.openstack.org/tripleo-quickstart/latest/unprivileged.html
https://tripleo-docs.readthedocs.io/en/latest/environments/virtual.html

Changed in cloud-init (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Vasili (vasili.namatov) wrote :

nope, no devstack nor tripleo.. everything straight forward as I mentioned previously.. installed all those services manually on controller and compute and connected to l3 switch with bgp of the bird..

Revision history for this message
Vasili (vasili.namatov) wrote :

mariadb, rabbitmq, memcached, etcd, keystone, glance, neutron, nova, calico-driver, calico-felix, calico-dhcp, nova-api-metadata, bird

any clue on the issue?

Thanks,
Vasili

Revision history for this message
Ryan Harper (raharper) wrote :

Unfortunately no; the kernel messages are very much related to nested virtualization, but I don't know where in your software stack it gets configured/enabled.

Revision history for this message
Ryan Harper (raharper) wrote :

I'm marking the cloud-init task invalid as at this time the logs point to a nested virtualization/openstack issue with devices not being present; not related to cloud-init. If further investigation points to an issue with cloud-init you can move the cloud-init task back to New.

Changed in cloud-init (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Vasili (vasili.namatov) wrote : Re: [Bug 1851552] Re: since ubuntu 18 bionic release and latest, the ubuntu18 cloud image is unable to boot up on openstack instance

There is no nested virtualization, all the openstack on bare metal with regular installation with regular services, the only thing is running is calico which is eliminate neutron ml2, metadata and dhcp and its running with calico plugin, calico-dhcp and calico felix. As well as on each compute nova-api-metadata is available.

How the devices can be presented, could you advise with further steps of investigation?

Best,
Vasili

Sent from iPhone

> On 9 Jan 2020, at 2:31, Ryan Harper <email address hidden> wrote:
>
> I'm marking the cloud-init task invalid as at this time the logs point
> to a nested virtualization/openstack issue with devices not being
> present; not related to cloud-init. If further investigation points to
> an issue with cloud-init you can move the cloud-init task back to New.
>
> ** Changed in: cloud-init (Ubuntu)
> Status: Incomplete => Invalid
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1851552
>
> Title:
> since ubuntu 18 bionic release and latest, the ubuntu18 cloud image is
> unable to boot up on openstack instance
>
> Status in cloud-init:
> New
> Status in networking-calico:
> New
> Status in OpenStack Compute (nova):
> New
> Status in OpenStack Community Project:
> New
> Status in qemu-kvm:
> New
> Status in cloud-init package in Ubuntu:
> Invalid
> Status in qemu package in Ubuntu:
> New
>
> Bug description:
> Openstack Queens release which is running on ubuntu 18 LTS Controller and Compute.
> Tried to boot up the instance via horizon dashboard without success.
> Nova flow works perfect.
> When access to console I discovered that the boot process stuck in the middle.
> [[0;1;31m TIME [0m] Timed out waiting for device dev-vdb.device.
> [[0;1;33mDEPEND[0m] Dependency failed for /mnt.
> [[0;1;33mDEPEND[0m] Dependency failed for File System Check on /dev/vdb.
> It receives IP but looks like not get configured at time.
> since ubuntu 18 there is netplan feature managing the network interfaces
> please advise.
>
> more details as follow:
> https://bugs.launchpad.net/networking-calico/+bug/1851548
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1851552/+subscriptions

Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Vasili,

From a cloud-init perspective, there isn't anything we can do so I'm going to move the upstream task to Invalid too. I'm afraid I don't really have any advice on how to proceed, as this appears to be a hypervisor or cloud issue.

Dan

Changed in cloud-init:
status: New → Invalid
Revision history for this message
Vasili (vasili.namatov) wrote :

In Rocky release I’m not experiencing kind of issues. And make sure you use kvm and not qemu, cause qemu is limited on its performance and kvm just born to work with latest hardware :)

Best,
Vasili

Sent from iPhone

> On 14 Jan 2020, at 20:35, Dan Watkins <email address hidden> wrote:
>
> Hi Vasili,
>
>> From a cloud-init perspective, there isn't anything we can do so I'm
> going to move the upstream task to Invalid too. I'm afraid I don't
> really have any advice on how to proceed, as this appears to be a
> hypervisor or cloud issue.
>
>
> Dan
>
> ** Changed in: cloud-init
> Status: New => Invalid
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1851552
>
> Title:
> since ubuntu 18 bionic release and latest, the ubuntu18 cloud image is
> unable to boot up on openstack instance
>
> Status in cloud-init:
> Invalid
> Status in networking-calico:
> New
> Status in OpenStack Compute (nova):
> New
> Status in OpenStack Community Project:
> New
> Status in qemu-kvm:
> New
> Status in cloud-init package in Ubuntu:
> Invalid
> Status in qemu package in Ubuntu:
> New
>
> Bug description:
> Openstack Queens release which is running on ubuntu 18 LTS Controller and Compute.
> Tried to boot up the instance via horizon dashboard without success.
> Nova flow works perfect.
> When access to console I discovered that the boot process stuck in the middle.
> [[0;1;31m TIME [0m] Timed out waiting for device dev-vdb.device.
> [[0;1;33mDEPEND[0m] Dependency failed for /mnt.
> [[0;1;33mDEPEND[0m] Dependency failed for File System Check on /dev/vdb.
> It receives IP but looks like not get configured at time.
> since ubuntu 18 there is netplan feature managing the network interfaces
> please advise.
>
> more details as follow:
> https://bugs.launchpad.net/networking-calico/+bug/1851548
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1851552/+subscriptions

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

I honestly don't see any evidence of some broken behaviour in Nova if, particularly, other instances with other guest image using cloud-init can boot correctly.

Please provide us some logs or better trace of a potential Nova problem in order for us to classify the potential root cause and a possible solution, but in the meantime I'll have to close this bug from the Nova point of view. You can reopen this bug by changing its status to New.

Changed in nova:
status: New → Invalid
tags: added: metadata
Revision history for this message
Nell Jerram (neil-jerram) wrote :

I don't believe this is to do with networking-calico, so will mark as Invalid for networking-calico.

Changed in networking-calico:
status: New → Invalid
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.