Huawei OceanStor 5800 V3: 'volume_admin_metadata' lazy load error

Bug #1626944 reported by Oleksandr Liemieshko
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Confirmed
Medium
Ivan Kolodyazhny
9.x
Confirmed
Medium
Ivan Kolodyazhny

Bug Description

Looks like Huawei driver tries to access admin_metadata but context is not admin so it fails if session is busy or expired because this has not been pre-loaded by Cinder.

2016-09-07 14:18:59.668 22218 ERROR cinder.volume.manager [req-14d00815-2c96-48a9-a119-d5e2da8646b2 1f0a08659d2a4ce1a3523c2637f572be 3b93a86a24e84cd4afae7d336d42fc31 - - -] Terminate volume connection failed: Parent instance <Volume at 0x7fad3b59ec10> is not bound to a Session; lazy load operation of attribute 'volume_admin_metadata' cannot proceed
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher [req-14d00815-2c96-48a9-a119-d5e2da8646b2 1f0a08659d2a4ce1a3523c2637f572be 3b93a86a24e84cd4afae7d336d42fc31 - - -] Exception during message handling: Bad or unexpected response from the storage volume backend API: Terminate volume connection failed: Parent instance <Volume at 0x7fad3b59ec10> is not bound to a Session; lazy load operation of attribute 'volume_admin_metadata' cannot proceed

tags: added: support
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Incomplete - please provide MOS version and steps to reproduce

tags: added: area-cinder
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Alexander, please add full traceback and steps to reproduce. It don't have an env but I could try to verify this bug with upstream fixes

Revision history for this message
Oleksandr Liemieshko (oliemieshko) wrote :
Download full text (4.5 KiB)

Unfortunately we couldn't reproduce the issue in our lab. We saw this on customer's env (Fuel 9.0).

Happens randomly when the array is handling multiple requests to create and attach volumes at the same time.

2016-09-07 14:18:59.661 22217 INFO cinder.volume.manager [req-5016e25a-4c33-4670-8844-65fc1a6f19dc 1f0a08659d2a4ce1a3523c2637f572be 3b93a86a24e84cd4afae7d336d42fc31 - - -] Terminate volume connection completed successfully.
2016-09-07 14:18:59.668 22218 ERROR cinder.volume.manager [req-14d00815-2c96-48a9-a119-d5e2da8646b2 1f0a08659d2a4ce1a3523c2637f572be 3b93a86a24e84cd4afae7d336d42fc31 - - -] Terminate volume connection failed: Parent instance <Volume at 0x7fad3b59ec10> is not bound to a Session; lazy load operation of attribute 'volume_admin_metadata' cannot proceed
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher [req-14d00815-2c96-48a9-a119-d5e2da8646b2 1f0a08659d2a4ce1a3523c2637f572be 3b93a86a24e84cd4afae7d336d42fc31 - - -] Exception during message handling: Bad or unexpected response from the storage volume backend API: Terminate volume connection failed: Parent instance <Volume at 0x7fad3b59ec10> is not bound to a Session; lazy load operation of attribute 'volume_admin_metadata' cannot proceed
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher incoming.message))
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 1541, in terminate_connection
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher raise exception.VolumeBackendAPIException(data=err_msg)
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Terminate volume connection failed: Parent instance <Volume at 0x7fad3b59ec10> is not bound to a Session; lazy load operation of attribute 'volume_admin_metadata' cannot proceed
2016-09-07 14:18:59.669 22218 ERROR oslo_messaging.rpc.dispatcher
2016-09-07 14:18:59.673 22218 ERROR oslo_messaging._drivers.common [req-14d00815-2c96-48a9-a119-d5e2da8646b2 1f0a08659d2a4ce1a3523c2637f572be 3b93a86a24e84cd4afae7d336d42fc31 - - -] Returning exception Bad or unexpected response from the storage volume backend API: Terminate volume connection failed: Parent instance <Volume at 0x7fad3b59ec1...

Read more...

Revision history for this message
Justinas Balciunas (justinas-balciunas) wrote :

I would add that this issue is probably specific to Huawei Cinder driver, namely with the way the drive interacts with SQLAlchemy.

The Huawei driver uses the _volume_get_query method in cinder/db/sqlalchemy/api.py.

As the most use cases are for the non-admin users, hence the problem appears immediately. It would not, however, if only the admin user managed (create/attach/detach/delete) the volumes. Also, concurrency is key to the issue. If the operations are atomic - no issues typically happen but with concurrency - always. This type of workload can be simulated with Rally.

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Mark as confirmed. It't faced by customer

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Alexander, is this issue still valid? If yes, please, provide an env for troubleshooting.

Revision history for this message
Oleksandr Liemieshko (oliemieshko) wrote :

Yes, it's still valid. But unfortunately we don't have lab for troubleshooting the issue. I asked to customer about a WebEx session to perform live issue analysis

Revision history for this message
Sam Stoelinga (sammiestoel) wrote :

I am decreasing importance to Medium as this is only applicable to customers with Huawei Cinder driver.

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Alexander, will we have webex session? We can't resolve an issue without it

Revision history for this message
Oleksandr Liemieshko (oliemieshko) wrote :

Ticket was closed by customer, so we won't be able create Webex session for this particular issue. I think we can close the bug

Changed in mos:
status: Confirmed → Won't Fix
Revision history for this message
Oleksandr Liemieshko (oliemieshko) wrote :

We have another ticket from customer related to the issue. We need to fix it

Changed in mos:
assignee: Alexander Lemeshko (oliemieshko) → MOS Cinder (mos-cinder)
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Alexander, please schedule a webex with customer

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Short update after the webex:
- issue is reproduced sometimes using Rally
- we don't have steps to reproduce it manually to debug it
- I'll provide a patch to customer with detailed logging
- customer will apply a patch and give us logs after the failure to investigate the issue

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

- the same issue initialize_connection calls too

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Alexander, please, ask customer to apply attached patches, re-run rally until issue will be reproduced and send us logs.

Patches should be applied to all cinder-volume nodes and cinder-volume services should be restarted

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.