OpenStack Compute (nova)

Ephemeral storage removal fails with message rbd remove failed

Bug #1856845 reported by Sasha Andonov on 2019-12-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Sasha Andonov

Bug Description

Description
===========
After destroying instances, ephemeral storage removal intermittently fails with message:

2019-10-17 11:21:08.122 398018 INFO nova.virt.libvirt.driver [-] [instance: 87096add-348e-4c94-8f31-066346e32eef] Instance destroyed successfully.
2019-10-17 11:21:14.619 398018 WARNING nova.virt.libvirt.storage.rbd_utils [-] rbd remove 87096add-348e-4c94-8f31-066346e32eef_disk in pool rbd_pool failed

Ceph logs report lossy connection error:
2019-10-17 11:21:06.181233 7fbbdf2f4700 0 -- 10.248.83.92:6808/20526 submit_message osd_op_reply(192922 rbd_data.77c63845d27cdd.0000000000004728 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1273856~262144] v1504399'62984460 uv62984460 ack = 0) v7 remote, 10.248.54.216:0/2391175308, failed lossy con, dropping message 0x56545f021e40

Steps to reproduce
==================
- Deploy Nova with Ceph ephemeral storage RBD
- Create an instance
- Destroy an instance

Expected result
===============
Nova instance destroyed, ceph ephemeral storage always removed from pool

Actual result
=============
Nova instance destroyed, ceph ephemeral storage sometimes remains in pool

Tags:

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-12-18:

Which release of nova and ceph are being used?

Revision history for this message

Sasha Andonov (sandonov) wrote on 2019-12-18:

Reproduced on openstack-nova Newton and ceph 10.2.11 Jewel

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-04: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/705764

Changed in nova:
assignee:	nobody → Sasha Andonov (sandonov)
status:	New → In Progress

melanie witt (melwitt) on 2020-03-12

tags:

added: ceph

melanie witt (melwitt) on 2020-05-19

Changed in nova:
importance:	Undecided → Medium

OpenStack Infra (hudson-openstack) on 2020-05-20

Changed in nova:
assignee:	Sasha Andonov (sandonov) → melanie witt (melwitt)

melanie witt (melwitt) on 2020-05-20

Changed in nova:
assignee:	melanie witt (melwitt) → Sasha Andonov (sandonov)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-22: Fix merged to nova (master)

Reviewed: https://review.opendev.org/705764
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6458c3dba53b9a9fb903bdb6e5e08af14ad015d6
Submitter: Zuul
Branch: master

commit 6458c3dba53b9a9fb903bdb6e5e08af14ad015d6
Author: Sasha Andonov <email address hidden>
Date: Tue Feb 4 16:59:14 2020 +0100

rbd_utils: increase _destroy_volume timeout

    If RBD backend is used for Nova ephemeral storage, Nova tries to remove
    ephemeral storage volume from Ceph in a retry loop: 10 attempts at 1
    second intervals, totaling 10 seconds overall - which, due to a thirty
    second ceph watcher timeout, might result in intermittent volume
    removal failures on Ceph side.
    This patch adds params rbd_destroy_volume_retries, defaulting to 12, and
    rbd_destroy_volume_retry_interval, defaulting to 5, which multiplied, give
    Ceph reasonable amount of time to complete the operation successfully.

Closes-Bug: #1856845
Change-Id: Icfd55617f0126f79d9610f8a2fc6b4c817d1a2bd

Changed in nova:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.