Bug #1571211 “In multinode setup VM fails to launch due to cinde...” : Bugs : Cinder

Steven Dake (sdake) on 2016-04-16

affects:	kolla → cinder
Changed in cinder:
status:	New → Confirmed
assignee:	nobody → Steven Dake (sdake)

OpenStack Infra (hudson-openstack) on 2016-04-16

Changed in cinder:
status:	Confirmed → In Progress

Revision history for this message

Steven Dake (sdake) wrote on 2016-04-16:

#1

kolla uses glance_api_servers as a list of all glance API services and glance_num_retries. These values are not honored if glance returns a None type which happens if the call() operation results in a missing image. Sergeuei to add logs.

Revision history for this message

Steven Dake (sdake) wrote on 2016-04-16:

#2

I have confirmed this bug on multinode setup using LVM SCSI. I have also confirmed it via inspection of the code base.

Revision history for this message

Steven Dake (sdake) wrote on 2016-04-16:

#3

The review tracking this bug is:
https://review.openstack.org/306756

Sheel Rana (ranasheel2000) on 2016-04-16

Changed in cinder:
importance:	Undecided → Medium

Revision history for this message

Steven Dake (sdake) wrote on 2016-04-16:

#4

Sheel suggested using this exception:
[10:08:22] <sheel> sdake: nopes, exception.ImageNotFound :)

Revision history for this message

Serguei Bezverkhi (sbezverk) wrote on 2016-04-16:

#5

Here are the steps to reproduce it:

Due to the nature of this bug, at least 2 better 3 VM needs to be launched.

1. Add image to glance, could be ubuntu or centos (these two were tested)
2. Create 3 volumes and make them bootable, these volumes will be used as VMs hard disks.

cinder create --name centos7-1-disk 10
cinder create --name ubuntu-1-disk 10
cinder create --name ubuntu-2-disk 10
cinder set-bootable $(cinder list | grep centos7-1-disk | awk {'print $2'}) true
cinder set-bootable $(cinder list | grep ubuntu-1-disk | awk {'print $2'}) true
cinder set-bootable $(cinder list | grep ubuntu-2-disk | awk {'print $2'}) true

3. Launched 3 instances one right after another. Since the issue is between cinder and glance it is very important to use exact command structure, 1st disk is cinder volume which will be created automatically based on glance image, 2nd disk, the disk prepared during step 2.

nova boot --flavor m1.small-10g \
--nic net-id=$(neutron net-list | grep net-1710 | awk '{print $2}') \
--block-device id=$(glance image-list | grep CentOS-7-x86_64 | awk '{print $2}'),source=image,dest=volume,bus=ide,device=/dev/hdc,size=5,type=cdrom,bootindex=1 \
--block-device source=volume,id=$(cinder list | grep centos7-1-disk | awk '{print $2}'),dest=volume,size=10,bootindex=0 centos-1

nova boot --flavor m1.small-10g \
--nic net-id=$(neutron net-list | grep net-1710 | awk '{print $2}') \
--block-device id=$(glance image-list | grep ubuntu | awk '{print $2}'),source=image,dest=volume,bus=ide,device=/dev/hdc,size=1,type=cdrom,bootindex=1 \
--block-device source=volume,id=$(cinder list | grep ubuntu-1-disk | awk '{print $2}'),dest=volume,size=10,bootindex=0 ubuntu-1

nova boot --flavor m1.small-10g \
--nic net-id=$(neutron net-list | grep net-1710 | awk '{print $2}') \
--block-device id=$(glance image-list | grep ubuntu | awk '{print $2}'),source=image,dest=volume,bus=ide,device=/dev/hdc,size=1,type=cdrom,bootindex=1 \
--block-device source=volume,id=$(cinder list | grep ubuntu-2-disk | awk '{print $2}'),dest=volume,size=10,bootindex=0 ubuntu-2

When the bug is triggered, 1st and 3rd instance will be in active state, but 2nd instance will be in error state.

Here are the steps to reproduce it:

Due to the nature of this bug, at least 2 better 3 VM needs to be launched.

1. Add image to glance, could be ubuntu or centos (these two were tested)
2. Create 3 volumes and make them bootable, these volumes will be used as VMs hard disks.

cinder create --name centos7-1-disk 10
cinder create --name ubuntu-1-disk 10
cinder create --name ubuntu-2-disk 10
cinder set-bootable $(cinder list | grep centos7-1-disk | awk {'print $2'}) true
cinder set-bootable $(cinder list | grep ubuntu-1-disk | awk {'print $2'}) true
cinder set-bootable $(cinder list | grep ubuntu-2-disk | awk {'print $2'}) true

3. Launched 3 instances one right after another. Since the issue is between cinder and glance it is very important to use exact command structure, 1st disk is cinder volume which will be created automatically based on glance image, 2nd disk, the disk prepared during step 2.

nova boot --flavor m1.small-10g  \
--nic net-id=$(neutron net-list | grep net-1710 | awk '{print $2}') \
--block-device id=$(glance image-list | grep CentOS-7-x86_64 | awk '{print $2}'),source=image,dest=volume,bus=ide,device=/dev/hdc,size=5,type=cdrom,bootindex=1 \
--block-device source=volume,id=$(cinder list | grep centos7-1-disk | awk '{print $2}'),dest=volume,size=10,bootindex=0 centos-1

nova boot --flavor m1.small-10g  \
--nic net-id=$(neutron net-list | grep net-1710 | awk '{print $2}') \
--block-device id=$(glance image-list | grep ubuntu | awk '{print $2}'),source=image,dest=volume,bus=ide,device=/dev/hdc,size=1,type=cdrom,bootindex=1 \
--block-device source=volume,id=$(cinder list | grep ubuntu-1-disk | awk '{print $2}'),dest=volume,size=10,bootindex=0 ubuntu-1

nova boot --flavor m1.small-10g  \
--nic net-id=$(neutron net-list | grep net-1710 | awk '{print $2}') \
--block-device id=$(glance image-list | grep ubuntu | awk '{print $2}'),source=image,dest=volume,bus=ide,device=/dev/hdc,size=1,type=cdrom,bootindex=1 \
--block-device source=volume,id=$(cinder list | grep ubuntu-2-disk | awk '{print $2}'),dest=volume,size=10,bootindex=0 ubuntu-2

When the bug is triggered, 1st and 3rd instance will be in active state, but 2nd instance will be in error state.

Revision history for this message

Serguei Bezverkhi (sbezverk) wrote on 2016-04-16:

#6

cinder volume logs before the fix.. Edit (367.3 KiB, text/html)

cinder-volume.log collected on the server before applying the proposed fix.

OpenStack Infra (hudson-openstack) on 2016-04-17

Changed in cinder:
assignee:	Steven Dake (sdake) → John Griffith (john-griffith)

Revision history for this message

Sean McGinnis (sean-mcginnis) wrote on 2016-04-18:

#7

The actual issue is the glance backends are not synced, correct? So this isn't so much a bug in Cinder as a proposal to work around configuration issues with retries on the Cinder side.

Revision history for this message

Serguei Bezverkhi (sbezverk) wrote on 2016-04-18:

#8

Hi Sean,

Glance allows this configuration when an image exist on one api server and does not on another. The client's logic (nova, cinder, etc) then should be to try to get the image from each of configured api servers and fails only after all api servers were contacted.

Serguei

Revision history for this message

Michal Dulko (michal-dulko-f) wrote on 2016-04-19:

#9

I disagree with Serguei here. Having non-replicated Glance instances in a single OpenStack deployment seems very non-HA and I believe is an abuse of retries functionality. The correct way of having Glance configured for HA is to fully replicate the stores of each of them.

Revision history for this message

Michal Dulko (michal-dulko-f) wrote on 2016-04-19:

#10

And why exactly aren't you using a single backend for all Glance instances in Kolla? This seems *very* inefficient to always query all the g-api's to find a certain image. Consistent hash ring and its implementations (it's Swift here I guess?) were invented exactly to avoid that.

Revision history for this message

Serguei Bezverkhi (sbezverk) wrote on 2016-04-19:

#11

Michal, do not get me wrong, I agree with what you saying. But the fact that using file backend with api server list is still VALID and documented configuration despite all its inefficiencies it should work. If something does not work as described by the doc, it is a bug and must be fixed.

Revision history for this message

Michal Dulko (michal-dulko-f) wrote on 2016-04-19:

#12

Serguei, can you point to Glance docs stating that this is a supported configuration? Using glance_api_servers for that purpose renders Keystone's service catalog useless to me, so I'm surprised to hear this is in the official docs.

Revision history for this message

Serguei Bezverkhi (sbezverk) wrote on 2016-04-19:

#13

Here is configuration lines from cinder.conf(liberty), I do not think if it were not official, it would be here, right? and plural of "servers" indicates that there might be more than just one.

# A list of the glance API servers available to cinder
# ([hostname|ip]:port) (list value)
#glance_api_servers=$glance_host:$glance_port

I posted a question on glance IRC asking them to confirm, will let you know as soon as I hear back.

Revision history for this message

Serguei Bezverkhi (sbezverk) wrote on 2016-04-19:

#14

I got confirmation from glance, in this agreed weird scenario, that each controller used with file as glance backend, has no knowledge about other controllers and which images they have. It is responsibility of a client process to walk provided api server list and check with each for that image presence. Basically if this fix is not done, we will have to publish it somewhere as a caveat that the glance file backend is not supported in multinode setup.

Revision history for this message

Serguei Bezverkhi (sbezverk) wrote on 2016-04-19:

#15

All, after some additional discussions with glance folks the agreement was to publish a doc patch stating that multi-node scenario does not support file as a back-end for glance.

Revision history for this message

Michal Dulko (michal-dulko-f) wrote on 2016-04-19:

#16

Serguei: Does this render this bug invalid?

Michal Dulko (michal-dulko-f) on 2016-06-08

Changed in cinder:
status:	In Progress → Won't Fix

Revision history for this message

Steven Dake (sdake) wrote on 2016-06-21:

#17

Michal,

The bug is still valid. The code as is is defective. If an image is not found for whatever reason, only the first api server is examined. Look at review:
https://review.openstack.org/306756

OpenStack Infra (hudson-openstack) on 2016-06-21

Changed in cinder:
status:	Won't Fix → In Progress

Revision history for this message

GrzegorzKoper (grzegorz-koper) wrote on 2016-06-22:

#18

Hello,
This is exactly the situation i faced when I deployed multinode topology. Whenever you try to boot the instance from Image , it will use round robin to connect glance API , and fails to deploy if image is not found on local storage.

Revision history for this message

GrzegorzKoper (grzegorz-koper) wrote on 2016-06-22:

#19

Sry for double post, but is there a related bug opened in glance ?
Even if we move glance backend to external swift ( to eliminate this issue ), situation forces us to use glance-cache - and then the same problems with the round-robin mechanism, and glance-cache-manage , glance-cache-prefetcher not finding the images :/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-29: Change abandoned on cinder (master)

#20

Change abandoned by John Griffith (<email address hidden>) on branch: master
Review: https://review.openstack.org/306756

Revision history for this message

Vladislav Belogrudov (vlad-belogrudov) wrote on 2016-11-10:

#21

i doubt it is a problem of Cinder. Glance will fail on its own in multinode setup if you don't specify ha configuration with proper shared storage backend like ceph or nfs. Just try to store / fetch images to get the same error.

Revision history for this message

Sean McGinnis (sean-mcginnis) wrote on 2017-05-28: Bug Assignee Expired

#22

Unassigning due to no activity for > 6 months.

Changed in cinder:
assignee:	John Griffith (john-griffith) → nobody

Sean McGinnis (sean-mcginnis) on 2017-09-26

Changed in cinder:
status:	In Progress → New

OpenStack Infra (hudson-openstack) on 2018-02-18

Changed in cinder:
assignee:	nobody → Sean McGinnis (sean-mcginnis)
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-16: Change abandoned on cinder (master)

#23

Change abandoned by Sean McGinnis (<email address hidden>) on branch: master
Review: https://review.openstack.org/306756

Revision history for this message

Sean McGinnis (sean-mcginnis) wrote on 2019-04-16:

#24

Consensus appears to be that this is something Glance, or more accurately python-glanceclient, should handle, not every consuming project.

Changed in cinder:
status:	In Progress → Invalid

Revision history for this message

Viorel-Cosmin Miron (uhl-hosting) wrote on 2020-02-19:

#25

What happened to this? Is this solved, still affecting users?

Cinder

In multinode setup VM fails to launch due to cinder not checking with all glance api servers

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Invalid	Medium	Sean McGinnis
	Glance	New	Undecided	Unassigned