Creating backup for a volume with snapshot id parameter fail

Bug #1593538 reported by huanan
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Tushar Trambak Gite

Bug Description

What i do:
    1. Creat a 1G size volume and then create a snaspshotfor this volume, backup volume to swift with snapshot id parameter(not config rpc_response_timeout in cinder.conf), backend will create volume from snapshot, The c-vol log shows error as below:
    2. When i config rpc_response_timeout=120 in cinder.conf, the same operation as step 1 is successful.
    3. When i create a bigger volume (10G)(rpc_response_timeout=120 in cinder.conf), the same operation as step 1 is failed.

c-vol log:

2016-06-17 18:58:53.206 DEBUG oslo_messaging._drivers.amqpdriver [req-512f6b36-f123-49f1-a34c-d22276d07ab6 515b0e063767468f89ec2cbe94d3291e c0b0b1dc91814333aa17f2e4e0cc9d34] CALL msg_id: 11b0c387ee85469ea5dc89c42c1139de exchange 'openstack' topic 'cinder-volume' from (pid=11723) _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
2016-06-17 18:58:59.101 DEBUG oslo_db.sqlalchemy.engines [req-512f6b36-f123-49f1-a34c-d22276d07ab6 515b0e063767468f89ec2cbe94d3291e c0b0b1dc91814333aa17f2e4e0cc9d34] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION from (pid=11723) _check_effective_sql_mode /usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256
2016-06-17 18:59:26.917 DEBUG oslo_service.periodic_task [req-512f6b36-f123-49f1-a34c-d22276d07ab6 515b0e063767468f89ec2cbe94d3291e c0b0b1dc91814333aa17f2e4e0cc9d34] Running periodic task BackupManager._publish_service_capabilities from (pid=11723) run_periodic_tasks /usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py:215
2016-06-17 19:00:26.918 DEBUG oslo_service.periodic_task [req-512f6b36-f123-49f1-a34c-d22276d07ab6 515b0e063767468f89ec2cbe94d3291e c0b0b1dc91814333aa17f2e4e0cc9d34] Running periodic task BackupManager._publish_service_capabilities from (pid=11723) run_periodic_tasks /usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py:215
2016-06-17 19:00:53.328 ERROR oslo_messaging.rpc.server [req-512f6b36-f123-49f1-a34c-d22276d07ab6 515b0e063767468f89ec2cbe94d3291e c0b0b1dc91814333aa17f2e4e0cc9d34] Exception during message handling
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server Traceback (most recent call last):
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server result = func(ctxt, **new_args)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/backup/manager.py", line 386, in create_backup
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server self._update_backup_error(backup, context, six.text_type(err))
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 221, in __exit__
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server self.force_reraise()
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 197, in force_reraise
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/backup/manager.py", line 380, in create_backup
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server self._run_backup(context, backup, volume)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/backup/manager.py", line 411, in _run_backup
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server backup, volume)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/rpcapi.py", line 294, in get_backup_device
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server return cctxt.call(ctxt, 'get_backup_device', backup=backup)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server retry=self.retry)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 96, in _send
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server timeout=timeout, retry=retry)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 464, in send
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server retry=retry)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 453, in _send
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server result = self._waiter.wait(msg_id, timeout)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in wait
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server message = self.waiters.get(msg_id, timeout=timeout)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 238, in get
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server 'to message ID %s' % msg_id)
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server MessagingTimeout: Timed out waiting for a reply to message ID 11b0c387ee85469ea5dc89c42c1139de
2016-06-17 19:00:53.328 TRACE oslo_messaging.rpc.server

Revision history for this message
huanan (huanan) wrote :

When backup failed , the temp volume still here, I think when exception raise we should delete temp volume.

Revision history for this message
huanan (huanan) wrote :

RPC Get backup device is a synchronous operation, it will take a long time to create a volume from snapshot when volume is to large.

Revision history for this message
Xing Yang (xing-yang) wrote :

What OpenStack release are you testing with? Which driver are you using? I wonder why creating a 1 GB volume will time out.

Changed in cinder:
assignee: nobody → Xing Yang (xing-yang)
Revision history for this message
huanan (huanan) wrote :

 The environment loading yesterday(use devstack), openstack version 2.6.0, use Huawei driver. In huawei driver, we create volume from snapshot by luncopy , creating a 1 GB volume could be more than a minuteut, if i config rpc_response_timeout=120 in cinder.conf, it will backup volume successful with snapshot id.

Revision history for this message
wangxiyuan (wangxiyuan) wrote :

@Xing Yang, Hi, how is this bug going?
The error raised when ehe RPC call between cinder-backup and cinder-volume is timeout. So the behavior here in Cinder is OK, but as huanan said before, when backup failed , the temp volume still here. We should clean up the temp volume or snapshot as well in this situation.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote : Bug Assignee Expired

Unassigning due to no activity for > 6 months.

Changed in cinder:
assignee: Xing Yang (xing-yang) → nobody
Revision history for this message
Damian Cikowski (dcdamien) wrote :

I'm experiencing the same behavior as @huanan but with NetApp driver (E-Series). When backup is requested NetApp driver creates snapshot and then copies it to the new volume. It can take a long time if volume is big what causes RPC timeout.

Revision history for this message
Damian Cikowski (dcdamien) wrote :

Further tests confirms that backup always fails with RPC timeout if rpc_response_timeout is shorter than time needed for creating new volume from snapshot on the storage array.

Changed in cinder:
assignee: nobody → Tushar Trambak Gite (tushargite96)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.