failed to run vm on Ocata version after deployed successfully

Bug #1670055 reported by MarginHu
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla
New
Undecided
Unassigned

Bug Description

Hi Guys:
I deploy openstack ocata successfully by kolla, run init-runonce, failed to boot a vm .

[root@kode0 nova]# nova show demo1
/usr/lib/python2.7/site-packages/novaclient/client.py:278: UserWarning: The 'tenant_id' argument is deprecated in Ocata and its use may result in errors in future releases. As 'project_id' is provided, the 'tenant_id' argument will be ignored.
  warnings.warn(msg)
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | demo1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-00000004 |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-h5i0ox4f |
| OS-EXT-SRV-ATTR:root_device_name | /dev/vda |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2017-03-03T16:19:23Z |
| description | demo1 |
| fault | {"message": "'NoneType' object is not iterable", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 1780, in _do_build_and_run_instance |
| | filter_properties) |
| | File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 2016, in _build_and_run_instance |
| | instance_uuid=instance.uuid, reason=six.text_type(e)) |
| | ", "created": "2017-03-03T16:19:33Z"} |
| flavor | m1.tiny (1) |
| hostId | |
| host_status | |
| id | 50837923-6cc8-4aa7-918e-6ce0c121a390 |
| image | cirros (c0945eb1-e7ea-46ba-95f0-01c2e4c30652) |
| key_name | mykey |
| locked | False |
| metadata | {} |
| name | demo1 |
| os-extended-volumes:volumes_attached | [] |
| status | ERROR |
| tags | [] |
| tenant_id | aa0bc3b5640a4c83bbb6d98526969834 |
| updated | 2017-03-03T16:19:33Z |
| user_id | 2b88490754924b8b9805866768fcc689 |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[root@kode0 nova]# vim /usr/lib/python2.7/site-packages/nova/compute/manager.py +1780
[root@kode0 nova]#
[root@kode0 nova]#
[root@kode0 nova]# nova hypervisor-list
/usr/lib/python2.7/site-packages/novaclient/client.py:278: UserWarning: The 'tenant_id' argument is deprecated in Ocata and its use may result in errors in future releases. As 'project_id' is provided, the 'tenant_id' argument will be ignored.
  warnings.warn(msg)
+----+---------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+----+---------------------+-------+---------+
| 4 | kode5.genomics.cn | up | enabled |
| 7 | kode4.genomics.cn | up | enabled |
+----+---------------------+-------+---------+
[root@kode0 nova]#
[root@kode0 nova]# nova service-list
/usr/lib/python2.7/site-packages/novaclient/client.py:278: UserWarning: The 'tenant_id' argument is deprecated in Ocata and its use may result in errors in future releases. As 'project_id' is provided, the 'tenant_id' argument will be ignored.
  warnings.warn(msg)
+----+------------------+-------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+-------------------+----------+---------+-------+----------------------------+-----------------+
| 22 | nova-consoleauth | kode0.genomics.cn | internal | enabled | up | 2017-03-04T22:00:41.000000 | - |
| 37 | nova-scheduler | kode0.genomics.cn | internal | enabled | up | 2017-03-04T22:00:43.000000 | - |
| 40 | nova-conductor | kode0.genomics.cn | internal | enabled | up | 2017-03-04T22:00:40.000000 | - |
| 46 | nova-consoleauth | kode2.genomics.cn | internal | enabled | up | 2017-03-04T22:00:41.000000 | - |
| 49 | nova-conductor | kode2.genomics.cn | internal | enabled | up | 2017-03-04T22:00:40.000000 | - |
| 52 | nova-consoleauth | kode1.genomics.cn | internal | enabled | up | 2017-03-04T22:00:42.000000 | - |
| 55 | nova-conductor | kode1.genomics.cn | internal | enabled | up | 2017-03-04T22:00:44.000000 | - |
| 58 | nova-scheduler | kode1.genomics.cn | internal | enabled | up | 2017-03-04T22:00:44.000000 | - |
| 61 | nova-scheduler | kode2.genomics.cn | internal | enabled | up | 2017-03-04T22:00:42.000000 | - |
| 64 | nova-compute | kode5.genomics.cn | nova | enabled | up | 2017-03-04T22:00:40.000000 | - |
| 67 | nova-compute | kode4.genomics.cn | nova | enabled | up | 2017-03-04T22:00:44.000000 | - |
+----+------------------+-------------------+----------+---------+-------+----------------------------+-----------------+
[root@kode0 n

how to trouble-shooting this issue?
I "docker logs" nova-compute, nova-scheduler,nova-conductor,nova-api and found no error.

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

Hi, logs can be found at /var/lib/docker/volumes/kolla_logs/_data/nova/

Check there for the error, probably will be at scheduler, compute or conductor logs.

Revision history for this message
MarginHu (margin2017) wrote :
Download full text (7.3 KiB)

I find a clue from compute node "kode5".

Line 1786: 2017-03-04 00:19:07.075 6 DEBUG oslo_concurrency.lockutils [req-cc95fa42-5db9-43d8-856f-6854ac45c9fd - - - - -] Lock "/var/lib/nova/instances/50837923-6cc8-4aa7-918e-6ce0c121a390/disk.info" released by "nova.virt.libvirt.imagebackend.write_to_disk_info_file" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282
 Line 1790: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [req-cc95fa42-5db9-43d8-856f-6854ac45c9fd - - - - -] [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] Instance failed to spawn
 Line 1791: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] Traceback (most recent call last):
 Line 1792: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2125, in _build_resources
 Line 1793: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] yield resources
 Line 1794: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1930, in _build_and_run_instance
 Line 1795: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] block_device_info=block_device_info)
 Line 1796: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2676, in spawn
 Line 1797: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] block_device_info=block_device_info)
 Line 1798: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3081, in _create_image
 Line 1799: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] fallback_from_host)
 Line 1800: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3197, in _create_and_inject_local_root
 Line 1801: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] instance, size, fallback_from_host)
 Line 1802: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6758, in _try_fetch_image_cache
 Line 1803: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] size=size)
 Line 1804: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 50837923-6cc8-4aa7-918e-6ce0c121a390] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 227, in cache
 Line 1805: 2017-03-04 00:19:07.321 6 ERROR nova.compute.manager [instance: 5083792...

Read more...

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

Looks like an issue with glance or nova not able to download the image from glance, is the image active? What kind of storage is being used? Are you able to manually download the image? If using ceph check if all osd are UP, pgs active-clean (docker exec ceph_mon ceph -s), the image properly stored in the backend used.

Revision history for this message
MarginHu (margin2017) wrote : Re: [Bug 1670055] Re: failed to run vm on Ocata version after deployed successfully

yes, It is like a glance issue.

I use lvm as glance image backend when deploy openstack .

[root@kode0 opt]# glance image-list
+--------------------------------------+--------+
| ID | Name |
+--------------------------------------+--------+
| c0945eb1-e7ea-46ba-95f0-01c2e4c30652 | cirros |
+--------------------------------------+--------+
[root@kode0 opt]# glance image-download c0945eb1-e7ea-46ba-95f0-01c2e4c30652
No redirection or local file specified for downloaded image data. Please
specify a local file with --file to save downloaded image or redirect
output to another source.
[root@kode0 opt]# glance image-download
c0945eb1-e7ea-46ba-95f0-01c2e4c30652 --file cirros.image
'NoneType' object is not iterable

On 2017/3/5 8:19, Eduardo Gonzalez wrote:
> Looks like an issue with glance or nova not able to download the image
> from glance, is the image active? What kind of storage is being used?
> Are you able to manually download the image? If using ceph check if all
> osd are UP, pgs active-clean (docker exec ceph_mon ceph -s), the image
> properly stored in the backend used.
>

--
---------------------------------------------
Margin Hu
Love open source software
mail: jhu_com@163.com
github: http://www.github.com/todaygood
---------------------------------------------

Revision history for this message
MarginHu (margin2017) wrote :

[root@kode0 tmp]# glance image-list
+--------------------------------------+--------+
| ID | Name |
+--------------------------------------+--------+
| 2c02d662-0717-4c9c-bf56-00a447040f0a | cirros |
+--------------------------------------+--------+

[root@kode0 tmp]# glance image-show 2c02d662-0717-4c9c-bf56-00a447040f0a
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| checksum | ee1eca47dc88f4879d8a229cc70a07c6 |
| container_format | bare |
| created_at | 2017-03-05T00:41:50Z |
| disk_format | qcow2 |
| id | 2c02d662-0717-4c9c-bf56-00a447040f0a |
| min_disk | 0 |
| min_ram | 0 |
| name | cirros |
| owner | aa0bc3b5640a4c83bbb6d98526969834 |
| protected | False |
| size | 13287936 |
| status | active |
| tags | [] |
| updated_at | 2017-03-05T00:41:50Z |
| virtual_size | None |
| visibility | shared |
+------------------+--------------------------------------+
[root@kode0 tmp]#

Revision history for this message
Steven Dake (sdake) wrote :

Are you running multiple controllers by chance? We have a solution for nova image pull using multiple controllers, however, the glance team had mentioned they may deprecate this functionality in the future.

https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova/templates/nova.conf.j2#L128

Revision history for this message
MarginHu (margin2017) wrote :

Yes, I'm running 3 controllers.

I can find the glance config in nova.conf:

[glance]
api_servers = http://192.168.102.20:9292,http://192.168.102.21:9292,http://192.168.102.22:9292
num_retries = 3
debug = True

Revision history for this message
MarginHu (margin2017) wrote :

I find the following from glance api.log.

2017-03-05 08:42:16.060 21 WARNING glance.location [req-aaabec67-bf99-48b0-964d-a13f243d9976 - - - - -] Get image 2c02d662-0717-4c9c-bf56-00a447040f0a data failed: Image /var/lib/glance/images/2c02d662-0717-4c9c-bf56-00a447040f0a not found.
2017-03-05 08:42:16.061 21 ERROR glance.location [req-aaabec67-bf99-48b0-964d-a13f243d9976 - - - - -] Glance tried all active locations to get data for image 2c02d662-0717-4c9c-bf56-00a447040f0a but all have failed.

Revision history for this message
MarginHu (margin2017) wrote :

I found the root cause is that my glance storage backend is file and glance services are running in 3 controllers.

[glance_store]
default_store = file
filesystem_store_datadir = /var/lib/glance/images/

when I upload the image to one of 3 controllers successfully, it met failure when retrieving the image next time from another one controller, so It's necessary to use shared storage such as ceph when running glance service on multi-nodes.

workaround:
1.copy the image to anthother 2 controllers

[root@kode2 ~]# ls /var/lib/docker/volumes/glance/_data/images
2c02d662-0717-4c9c-bf56-00a447040f0a
[root@kode2 ~]# cd /var/lib/docker/volumes/glance/_data/images;ls -lah
total 13M
drwxr-x---. 1 42415 42415 72 Mar 5 08:41 .
drwxr-xr-x. 1 42415 42415 12 Mar 4 00:00 ..
-rw-r-----. 1 42415 42415 13M Mar 5 08:41 2c02d662-0717-4c9c-bf56-00a447040f0a

[root@kode1 ~]# ls /var/lib/docker/volumes/glance/_data/images
[root@kode1 ~]#

[root@kode0 ~]# ls /var/lib/docker/volumes/glance/_data/images
[root@kode0 ~]#

[root@kode2 images]# scp 2c02d662-0717-4c9c-bf56-00a447040f0a kode0:/var/lib/docker/volumes/glance/_data/images/

[root@kode2 images]# scp 2c02d662-0717-4c9c-bf56-00a447040f0a kode1:/var/lib/docker/volumes/glance/_data/images/

2.modify the ownership of image.

[root@kode1 images]# chown 42415:42415 2c02d662-0717-4c9c-bf56-00a447040f0a
[root@kode0 images]# chown 42415:42415 2c02d662-0717-4c9c-bf56-00a447040f0a

3. boot a vm
[root@kode0 ~]# openstack server create --image cirros --flavor m1.tiny --key-name mykey --nic net-id=c53cbfe2-f210-4f0e-b62c-cdab995a3f63 demo1

[root@kode0 ~]# openstack server list
+--------------------------------------+-------+--------+-------------------+------------+
| ID | Name | Status | Networks | Image Name |
+--------------------------------------+-------+--------+-------------------+------------+
| d1544927-291c-4a34-9634-dcc23880d1b7 | demo1 | ACTIVE | demo-net=10.0.0.8 | cirros |
+--------------------------------------+-------+--------+-------------------+------------+

Revision history for this message
Ryan Wallner (wallnerryan) wrote :

I hit this exact thing, thank you for your post on how to fix it, worked fine in our testing after copying the image backed by file.

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

Related fix and bug:
https://review.openstack.org/#/c/448654/
https://bugs.launchpad.net/kolla-ansible/+bug/1681301

I'm going to mark this bug as duplicated of the other since it already have a fix in progress.
Regards

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.