Mirantis OpenStack

"Invalid block device mapping" error on a creating >200 instances

Bug #1533197 reported by Sergey Arkhipov on 2016-01-12

This bug report is a duplicate of: Bug #1550192: When deleting a lot of volumes at once, some of the volumes stay in status “Deleting”. Edit Remove

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Mirantis OpenStack	Status tracked in 10.0.x
10.0.x	Confirmed	High	Ivan Kolodyazhny	Mirantis OpenStack 10.0
8.0.x	Won't Fix	High	Ivan Kolodyazhny	Mirantis OpenStack 8.0-updates
9.x	Won't Fix	High	Ivan Kolodyazhny	Mirantis OpenStack 9.0

Bug Description

Problem discovered on MOS 8.0 ISO #361

When I start a large amount of instances (in my case, 250), a big part of them failed with error like "Build of instance 070995e4-66f4-4acd-87c7-92815a489edd aborted: Block Device Mapping is Invalid."

Message: Build of instance 070995e4-66f4-4acd-87c7-92815a489edd aborted: Block Device Mapping is Invalid.
Code: 500
Details: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1905, in _do_build_and_run_instance filter_properties) File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2025, in _build_and_run_instance 'create.error', fault=e) File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__ six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1996, in _build_and_run_instance block_device_mapping) as resources: File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2143, in _build_resources reason=e.format_message())

Environment:
    * 50 baremetal nodes
    * Ceph *is not* used
        - Cinder LVM over iSCSI for volumes is enabled, rest of checkboxes are disabled
    * Ceilometer, Murano and Sahara *are* installed
    * Each node has 32142MB RAM, 12 CPU

Steps to reproduce:
    * Upload an image with Ubuntu Trusty
    * Proceed to Horizon
    * Proceed to 'Instances', click on 'Launch instance'
    * Select m1.small flavour (1 vCPU, root disk - 20GB, RAM - 2048 MB)
    * Set 'Boot from image (creates a new volume)'
    * Choose uploaded image
    * Set device size to 20 GB
    * Ensure that 'Delete on Terminate' checkbox is active
    * Set 250 instances
    * Choose network
    * Click on 'Launch'

Expected results:
* All instances are up and running

Real result:
* Only 81 instance started. 169 failed with error above
* Cinder has 235 created volumes available or in-use

Due to bug https://bugs.launchpad.net/bugs/1529182 I cannot create diagnostic snapshot, so I've gathered all logs after error happened:
https://drive.google.com/a/mirantis.com/file/d/0B9tzODpFABxkWk05Ykl2V1NZWE0/view?usp=sharing

Please find cinder and nova output attached.

Related upstream bug: https://bugs.launchpad.net/cinder/+bug/1556902

See original description

Tags:

Revision history for this message

Sergey Arkhipov (sarkhipov) wrote on 2016-01-12:

output.txt Edit (69.4 KiB, text/plain)

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-13:

We don't have such large environment, but we are going to give it a try on smaller scale, keeping the number of VMs and volumes used

tags:	added: area-nova
tags:	removed: long-haul-testing

Revision history for this message

Alexander Gubanov (ogubanov) wrote on 2016-01-13:

In attached logs I found that nova had got 504 from haproxy.
haproxy logs for cinder have a lot of 504 answers

ogubanov@ogubanov-pc:~/Downloads/logs$ find ./docker-logs/ -name haproxy.log | xargs grep 'cinder' | grep -c ' 504 '
161

and I found that haproxy (seems) reset POST requests if it takes >= 60 second

ogubanov@ogubanov-pc:~/Downloads/logs$ find ./docker-logs/ -name haproxy.log | xargs grep 'cinder' | grep ' 504 ' | tail -n 3
./docker-logs/node-20.domain.tld/haproxy.log:2016-01-12T09:50:37.451451+00:00 info: 192.168.0.35:49739 [12/Jan/2016:09:49:37.453] cinder-api cinder-api/node-20 0/0/0/-1/60001 504 194 - - sH-- 1848/24/24/24/0 0/0 "POST /v1/fcf23cb570d640ffb4c93d4dcca9c138/volumes/20d52a22-efdd-4848-b82e-c9a347510adb/action HTTP/1.1"
./docker-logs/node-20.domain.tld/haproxy.log:2016-01-12T09:51:06.319330+00:00 info: 192.168.0.23:47543 [12/Jan/2016:09:50:06.300] cinder-api cinder-api/node-20 0/0/0/-1/60011 504 194 - - sH-- 1862/23/23/23/0 0/0 "POST /v1/fcf23cb570d640ffb4c93d4dcca9c138/volumes/4f453d0f-6602-476c-8330-1ef1caa91aa1/action HTTP/1.1"
./docker-logs/node-20.domain.tld/haproxy.log:2016-01-12T09:51:06.330370+00:00 info: 192.168.0.36:41157 [12/Jan/2016:09:50:06.302] cinder-api cinder-api/node-20 0/0/0/-1/60021 504 194 - - sH-- 1861/22/22/22/0 0/0 "POST /v1/fcf23cb570d640ffb4c93d4dcca9c138/volumes/f19c61b7-ebc8-40cc-af57-1840a59ae428/action HTTP/1.1"

Details here http://pastebin.com/MjWu9sMr

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-13:

We deploy cinder-api in active/backup mode, which means only one cinder-api instance processes API requests at a moment of time.

According to what Alexander found in logs it's either:

1) there was a problem with node-20 (e.g. we ran out of memory and used swap instead, etc)m which made things really slow and increased cinder-api response times to more than 60s

2) cinder-api simply can't process so many requests in time (== we reached the limit of 1 cinder-api)

Unfortunately, atop reports are missing, so we can't know for sure which one was that.

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-13:

Cinder team, could you please take a look at logs and provide your feedback on my previous comment?

tags:

added: area-cinder
removed: area-nova

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2016-01-18:

@Sergey, please, provide atop reports and access to the env. We can't reproduce it and find the root cause

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-28:

Re-assigning the bug to Sergey, as the ball is on his side now

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-29:

Evgeny, please do not re-assign bugs without leaving a comment first.

Ivan made a valid request.

Revision history for this message

Evgeny Ivanov (eivanov) wrote on 2016-01-29:

*The problem has been reproduced, the env was provided for further analysis. @rpodolyaka created a diagnostic snapshoot.*

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-29:

#10

Diagnostic snapshot with a repro:

https://drive.google.com/file/d/0B2db-pBC_yblekxyNFBpMVBNWVU/view?usp=sharing

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-29:

#11

So I took a look at the environment and uploaded a diagnostic snapshot.

Errors in nova-compute logs are caused by the fact that haproxy terminates the HTTP requests to Cinder (initializing of volume connection) after 60s timeout:

http://paste.openstack.org/show/485411/
http://paste.openstack.org/show/485417/

I checked atop logs to figure out if we are CPU bound or not (cinder-api runs in active-backup mode):

http://paste.openstack.org/show/485413/

The CPU usage is rather high, still it must be ok for a 12-core server.

cinder-volume / cinder-api interaction is interesting:

http://paste.openstack.org/show/485428/

HTTP POST to cinder-api is processed *synchronously* as API waits for return of an RPC call to cinder-volume. The latter is for some reason slow on processing of the RPC call - looks like it gets stuck somehere in Ceph calls - by the time it's ready to send an RPC reply rabbitmq closes the connection (heartbeats missed?).

Eventually, cinder-volume retries and delivers a message back to cinder-api, but it's too late and haproxy already closed the connection on its side.

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-01-29:

#12

MOS Cinder, please take a look at my comment and the diagnostic snapshot and tell, what you think.

Roman Podoliaka (rpodolyaka) on 2016-02-02

tags:

added: move-to-mu

Roman Podoliaka (rpodolyaka) on 2016-02-02

tags:

added: release-notes

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-02-03:

#13

We'll need more time to find the root cause of this. The user impact is limited. We believe it's safe to move this to a MU and document this as a current performance/scale limitation in the release notes.

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2016-02-04:

#14

The issue is reproducible both with Ceph and LVM backends

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2016-02-04:

#15

The reason of failures is:
We've got a lot of concurrent requests to cinder-api <==> cinder-volume. Some of them executed more than 1 minute and HAProxy terminates connection. That's why Nova raises "Block Device Mapping is Invalid" exception. I'm still investigating why cinder-volume works slow in such cases

Revision history for this message

Sergey Arkhipov (sarkhipov) wrote on 2016-02-17:

#16

The problem was reproduced on build #562 on scale env 14 at even smaller scale (on order of magnitude!).

Environment:
    * 3 controllers
        - Sahara, Murano activated
        - Ceph is used for everything
    * 2 LMA nodes
    * 5 computes

I've reproduced the steps above on different amount of booting VMs:
  * During boot of 15 instances (3 per compute) I've got following instances failed:
    - e960617e-041f-4348-8d3c-1a38b39cec11
  * During boot of 20 instances (4 per compute) I've got following instances failed:
    - 04e0746b-5ba6-4622-bd8a-acf68f6c77e1
    - 528a9bc1-5374-494b-b405-3ac35d33e3a4

Please find diagnostic snapshot here:
https://drive.google.com/a/mirantis.com/file/d/0B9tzODpFABxkdk9JeDNEZ0l1Y0k/view?usp=sharing

Olga Gusarenko (ogusarenko) on 2016-02-25

tags:

added: 8.0 release-notes-done
removed: release-notes

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2016-04-11:

#17

Related upstream bug: https://bugs.launchpad.net/cinder/+bug/1556902

Ivan Kolodyazhny (e0ne) on 2016-04-19

description:

updated

Revision history for this message

Ivan Kolodyazhny (e0ne) wrote on 2016-06-02:

#18

Actually, there are two different issues with Ceph and LVM backends:

1. Ceph:
Using 'raw' format for images increased number of concurrent builds of volumes. Now, it's in our guidelines so, I hope, it should not be an issue any more.

2. LVM:
We can't fix it in 9.0 and earlier. Fix requires a lot of coder changed cinder and should be done not in scope of bug. I'll propose a spec for it to Newton. There is some feedback of my proposal both in openstack-dev and operators mailing lists [1], [2]
These solution also could be helpful for Ceph backend

Unfortunately, we can't provide only workaround as a recommendation to use 'Generic image cache' feature which was introduces in Liberty/MOS8.0

[1] http://lists.openstack.org/pipermail/openstack-dev/2016-May/095668.html
[2] http://lists.openstack.org/pipermail/openstack-operators/2016-May/010527.html