Comment 3 for bug 1763712

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking at a recent failure:

http://logs.openstack.org/81/624181/1/check/nova-multiattach/c1589ae/logs/screen-c-api.txt#_Dec_10_22_03_25_888783

The cinder-api request starts here:

Dec 10 22:02:25.787298 ubuntu-xenial-inap-mtl01-0001107482 <email address hidden>[30020]: INFO cinder.api.openstack.wsgi [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 tempest-ServerActionsTestJSON-1162184781] PUT http://198.72.124.211/volume/v3/54189e280b844b928184fb49836be237/attachments/e8c408fd-969b-4dac-93b6-8031f6634fc8

and times out here:

Dec 10 22:03:25.888783 ubuntu-xenial-inap-mtl01-0001107482 <email address hidden>[30020]: ERROR cinder.api.v3.attachments [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 tempest-ServerActionsTestJSON-1162184781] Unable to update the attachment.: MessagingTimeout: Timed out waiting for a reply to message ID 60d2d0479c7749adb7f116a8d6de570c

The request ID is req-fec531fa-7200-4e15-a820-18ea7e5e6214. Tracing that down into cinder-volume logs I see this as the first thing logged:

Dec 10 22:02:25.906808 ubuntu-xenial-inap-mtl01-0001107482 cinder-volume[31110]: DEBUG oslo_concurrency.processutils [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 None] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-b62c0c8d-b3cd-46d0-b009-cabc13115f07 {{(pid=31207) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:372}}

The next entry is:

Dec 10 22:03:24.340740 ubuntu-xenial-inap-mtl01-0001107482 cinder-volume[31110]: DEBUG oslo_concurrency.processutils [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-b62c0c8d-b3cd-46d0-b009-cabc13115f07" returned: 0 in 58.435s {{(pid=31207) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}

This is 1 second before the timeout. So "lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-b62c0c8d-b3cd-46d0-b009-cabc13115f07" is taking 58.435s and the rpc_response_timeout is 60 seconds by default.

Looking at syslog, I noticed that a few seconds before that lvchange call, another is running on another volume which takes 61+ seconds:

Dec 10 22:03:24.317836 ubuntu-xenial-inap-mtl01-0001107482 cinder-volume[31110]: DEBUG oslo_concurrency.processutils [req-9e16b85d-5662-4c70-aa13-88cd9c6a03fc req-ba539360-fea6-4538-88dd-afa948120e1c tempest-ServerRescueNegativeTestJSON-1204517625 None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-c48c90f2-bf42-473b-b9a4-48f7e6f97f24" returned: 0 in 61.526s {{(pid=31207) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}

Looking at the activate_lv code in cinder it doesn't appear to be doing any locking or anything from the cinder side:

http://git.openstack.org/cgit/openstack/cinder/tree/cinder/brick/local_dev/lvm.py#n657

So I'm not sure why that lvchange activate call would be taking nearly a minute.