Rapid attach/detach of consecutive volumes eats up device names

Bug #1376942 reported by Sean Severson
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Boden R

Bug Description

When running long tests involving the continuous attach and detach of hundreds of consecutive volumes, Nova will blindly assign them incrementing device paths and never reuse paths that were freed up. This eventually leads to strings of errors in n-api such as the following:

2014-10-02 14:04:12.298 DEBUG nova.api.openstack.wsgi [req-16e36730-3f4c-4f75-8107-10d4ecfba293 admin admin] Action: 'create', calling method: <bound method VolumeAttachmentController.create of <nova.api.openstack.compute.contrib.volumes.VolumeAttachmentController object at 0x7fe62b01a150>>, body: {"volumeAttachment": {"device": "/dev/vd~w", "volumeId": "ccbc09e9-50a4-46b9-a413-66db51659abe"}} from (pid=48499) _process_stack /opt/stack/nova/nova/api/openstack/wsgi.py:908
2014-10-02 14:04:12.299 AUDIT nova.api.openstack.compute.contrib.volumes [req-16e36730-3f4c-4f75-8107-10d4ecfba293 admin admin] Attach volume ccbc09e9-50a4-46b9-a413-66db51659abe to instance dfbfe4a2-19fb-4f01-be1a-9e437aec67df at /dev/vd~w
2014-10-02 14:04:12.358 DEBUG nova.api.openstack.wsgi [req-16e36730-3f4c-4f75-8107-10d4ecfba293 admin admin] Returning 400 to user: The supplied device path (/dev/vd~w) is invalid. from (pid=48499) __call__ /opt/stack/nova/nova/api/openstack/wsgi.py:1175
2014-10-02 14:04:12.359 INFO nova.osapi_compute.wsgi.server [req-16e36730-3f4c-4f75-8107-10d4ecfba293 admin admin] 10.50.135.12 "POST /v2/4cb643dec0bd40c89b984dacfd288448/servers/dfbfe4a2-19fb-4f01-be1a-9e437aec67df/os-volume_attachments HTTP/1.1" status: 400 len: 277 time: 0.0690210

On the instance, the device paths mentioned in n-api aren't even being used. Typically the instance will reuse /dev/vdb, assuming only one volume at a time is being attached. Nova should be reporting the device path that's actually being used, and should definitely not be using special characters in the path.

Note that this is not a case of rapidly reattaching the same volume. To properly reproduce this situation, it is necessary to attach and detach new volumes.

Revision history for this message
Sean Severson (sseverson) wrote :

Additional note: The attaches work fine from /dev/vda to /dev/vdzz, but after that point extended characters are used and that's when the errors begin.

Sean Dague (sdague)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

@sseverson,
Is the validator wrong? https://github.com/openstack/nova/blob/master/nova/block_device.py#L534

or the code that is generating the device names? (can't yet figure out where that is happening, any pointers are welcome)

Revision history for this message
Boden R (boden) wrote :

@dims-v -- I'll take a look at this one... If you're already neck deep in it please let me know so we don't duplicate efforts. Thanks

Changed in nova:
assignee: nobody → Boden R (boden)
status: Confirmed → In Progress
Revision history for this message
Boden R (boden) wrote :

@sseverson,
I think we need a little more information on this issue in order to make progress..

Based on what you've provided in the description:

(a) As per your log snip, we can see your client (e.g. python novaclient or similar) is passing in the following request body for the attach call:

{"volumeAttachment": {"device": "/dev/vd~w", "volumeId": "ccbc09e9-50a4-46b9-a413-66db51659abe"}}

(b) This is confirmed in the AUDIT message:

2014-10-02 14:04:12.299 AUDIT nova.api.openstack.compute.contrib.volumes [req-16e36730-3f4c-4f75-8107-10d4ecfba293 admin admin] Attach volume ccbc09e9-50a4-46b9-a413-66db51659abe to instance dfbfe4a2-19fb-4f01-be1a-9e437aec67df at /dev/vd~w

The AUDIT message is logged here: https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/volumes.py#L407

(c) Once the volume attach call gets down to nova.api.API.attach_volume(), the 1st thing done is to validate the device path against a regex of valid device path names. If the device path is invalid the attach_volume() method will raise InvalidDevicePath immediately (fail fast). See here: https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2920

(d) Based on your error message in the log snip:

2014-10-02 14:04:12.358 DEBUG nova.api.openstack.wsgi [req-16e36730-3f4c-4f75-8107-10d4ecfba293 admin admin] Returning 400 to user: The supplied device path (/dev/vd~w) is invalid. from (pid=48499) __call__ /opt/stack/nova/nova/api/openstack/wsgi.py:1175

We are in fact hitting this device path validation error from #c above.

That said, what I can deduce from the info thus far is that your client is passing in the /dev/vd~w path on the nova API attach call. The fact that nova api fails with this path is in fact proper behavior -- its rejecting the invalid segment '~w'.

If you could:
(a) Please provide additional log snips which indicate invalid behavior.
(d) Verify your client is not sending invalid device paths
(c) Further describe the problem here if not evident by the additional log snips you provide.

Moreover I'm reducing the importance of this one given it's fairly unlikely that many consumers will be attaching and detaching 100s of different different volumes rapidly.

Thank you

Changed in nova:
importance: High → Medium
status: In Progress → Incomplete
Revision history for this message
Sean Severson (sseverson) wrote :

Upon further review, it appears than the automated test that was doing the attaches was sending the invalid device names. I am already in the process of testing changes to the test in question. I have marked this bug invalid.

Changed in nova:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.