When deleting a lot of volumes at once, some of the volumes stay in status “Deleting”

Bug #1550192 reported by Olga Klochkova on 2016-02-26
68
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Oleksiy Molchanov
8.0.x
High
Unassigned
Mitaka
High
Oleksiy Molchanov
Newton
High
Oleksiy Molchanov
Ocata
High
Oleksiy Molchanov

Bug Description

Precondition
Env has 1000 volumes, Items Per Page 500
steps:
1. Check all volumes on page
2. Click button Delete.

Actual result:
After some time an error appears: “Gateway Timeout: The gateway did not receive a timely response from the upstream server or application. “

After some times some of the volumes stay in status “Deleting”.
After updating volume status on “Available” on page: Admin-System-Volumes deleting is possible.

relevant cinder-volume logs:
2016-02-25 15:04:34.338 4925 ERROR cinder.service [req-506085d9-f233-46a9-854a-af25ef2f8407 - - - - -] Exception encountered:
2016-02-25 15:04:34.338 4925 ERROR cinder.service Traceback (most recent call last):
2016-02-25 15:04:34.338 4925 ERROR cinder.service File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 310, in report_state
2016-02-25 15:04:34.338 4925 ERROR cinder.service service_ref = objects.Service.get_by_id(ctxt, self.service_id)
2016-02-25 15:04:34.338 4925 ERROR cinder.service File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 172, in wrapper
2016-02-25 15:04:34.338 4925 ERROR cinder.service result = fn(cls, context, *args, **kwargs)
2016-02-25 15:04:34.338 4925 ERROR cinder.service File "/usr/lib/python2.7/dist-packages/cinder/objects/service.py", line 71, in get_by_id
2016-02-25 15:04:34.338 4925 ERROR cinder.service db_service = db.service_get(context, id)
2016-02-25 15:04:34.338 4925 ERROR cinder.service File "/usr/lib/python2.7/dist-packages/cinder/db/api.py", line 100, in service_get
2016-02-25 15:04:34.338 4925 ERROR cinder.service return IMPL.service_get(context, service_id)


2016-02-25 15:04:34.338 4925 ERROR cinder.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1053, in _do_get
2016-02-25 15:04:34.338 4925 ERROR cinder.service (self.size(), self.overflow(), self._timeout))
2016-02-25 15:04:34.338 4925 ERROR cinder.service TimeoutError: QueuePool limit of size 5 overflow 5 reached, connection timed out, timeout 30
2016-02-25 15:04:34.338 4925 ERROR cinder.service

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "569"
  build_id: "569"
  fuel-nailgun_sha: "558ca91a854cf29e395940c232911ffb851899c1"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "33634ec27be77ecfb0b56b7e07497ad86d1fdcd3"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "d605bcbabf315382d56d0ce8143458be67c53434"

Easy steps to reproduce:
1. Deploy env with Ceph and 1 controller
2. Create 10 volumes
3. Delete 10 volumes with one command: `cinder delete <vol1_id> <vol2_id>... <vol0_id>`

Eugene Nikanorov (enikanorov) wrote :

Here we actually see two problems:
Horizon gets 504 from haproxy on long-running cinder operation.
User impact of that particular issue is minimal, however:
Logs from cinder-volume indicate that it is not properly configured to process such requests, setting importance to High because of that.

Changed in mos:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → MOS Cinder (mos-cinder)
importance: Medium → High
Yuriy Nesenenko (ynesenenko) wrote :

This bug looks like duplicate bug https://bugs.launchpad.net/mos/+bug/1533197

Eugene Nikanorov (enikanorov) wrote :

Reading through 1533197 I'm not convinced that exact same problem was hit there.
While slowness indeed lead to 504 in both cases, here we have another kind of issue with mysql connection pool exhaustion.
This needs to be addressed separately.

Ivan Kolodyazhny (e0ne) wrote :

Eugene, MySQL pool size is configurable too

Eugene Nikanorov (enikanorov) wrote :

Sure it is configurable. It's the matter of our defaults (how cinder is deployed) - they should allow cinder to execute any kinds of requests.

It should either process request correctly, or return meaningfull error to the user.

tags: added: area-cinder
removed: cinder
Dina Belova (dbelova) on 2016-05-25
tags: removed: horizon
Ivan Kolodyazhny (e0ne) wrote :

More logs could be found here: https://bugs.launchpad.net/mos/+bug/1576573

Ivan Kolodyazhny (e0ne) wrote :

The root cause is blocking operations with DB. I've changed DB driver to PyMySQL in my env and it resolved an issue. After changes were applied, I don't see any errors and long-running tasks in logs.

Ivan Kolodyazhny (e0ne) wrote :

I recommend to change DB driver for Cinder to PyMySQL for MOS 9.0 & 10.0 releases and add release note for 8.0 in case if is will be found by customers

description: updated
Ivan Berezovskiy (iberezovskiy) wrote :

Mos-packaging team, please confirm that python-pymysql package is included in ISO, and it is dependency for openstack components (cinder in particular). That will allow us to switch DB backend driver

Thanks Igor!

Ivan Kolodyazhny (e0ne) wrote :

We decided to not switch to new DB driver before MOS10 release. Fix should be documented as workaround in the release notes

tags: added: release-notes
Ivan Kolodyazhny (e0ne) on 2016-06-01
tags: added: area-build
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 8.0 per comment #8. Targeted to 8.0-mu-2 to document workaround in the release notes.

Ivan Kolodyazhny (e0ne) wrote :

To apply workaround you need to change database connection string in /etc/cinder.conf from

connection = mysql://user:password@db_host/cinder?charset=utf8&read_timeout=60

to

connection = mysql+pymysql://user:passwod@db_host/cinder?charset=utf8

Fix proposed to branch: master
Change author: Alexander Adamov <email address hidden>
Review: https://review.fuel-infra.org/21638

Reviewed: https://review.fuel-infra.org/21638
Submitter: Olga Gusarenko <email address hidden>
Branch: master

Commit: 3bd273257264b0dd2df5787e6abecde4b8431c29
Author: Alexander Adamov <email address hidden>
Date: Mon Jun 6 07:45:17 2016

[RN MOS9.0] Volumes stay in status 'Deleting'

Adds a workaround for the cinder known issue:
"When deleting a lot of volumes at once, some of
 the volumes stay in status `Deleting`"

Change-Id: Ibd9996b0a0c14bbebafa88f6d63a9b2c7276786b
Partial-Bug: #1550192

Alexey, please switch all components to PyMySQL driver usage

tags: added: release-notes-done
removed: release-notes
Roman Vyalov (r0mikiam) on 2016-06-14
tags: removed: area-build
tags: added: 10.0-reviewed

It was decided to switch all components to pymysql driver usage in next Fuel version. So moving bug to 9.2

Ivan Kolodyazhny (e0ne) wrote :

Ivan, who is responsible for this task?

Dmitry Pyzhov (dpyzhov) on 2016-11-24
Changed in mos:
status: Confirmed → Won't Fix
tags: added: release-notes
Dmitry Pyzhov (dpyzhov) on 2016-12-22
Changed in mos:
status: Won't Fix → Confirmed
milestone: 9.2 → 10.0
no longer affects: mos/10.0.x
no longer affects: mos/10.0.x
no longer affects: mos
no longer affects: mos/8.0.x
no longer affects: mos/9.x
Changed in fuel:
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Oleksiy Molchanov (omolchanov)
Changed in fuel:
milestone: none → 11.0

Related fix proposed to branch: master
Change author: Olena Logvinova <email address hidden>
Review: https://review.fuel-infra.org/29777

Reviewed: https://review.fuel-infra.org/29777
Submitter: Mariia Zlatkova <email address hidden>
Branch: master

Commit: 47644ef56b51a1e030363ac9cf0f01c6e757b78c
Author: Olena Logvinova <email address hidden>
Date: Wed Jan 11 15:34:58 2017

[RN9.2] Known issues - Cinder volumes in Deleting status

Change-Id: I25bcc44242306c67d639392256d7804d646bcf3f
Related-Bug: #1550192

Reviewed: https://review.openstack.org/393301
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=b691c1144ff78cc5f6d7674a42b0a8ac27077a19
Submitter: Jenkins
Branch: stable/mitaka

commit b691c1144ff78cc5f6d7674a42b0a8ac27077a19
Author: Oleksiy Molchanov <email address hidden>
Date: Mon Jan 16 20:57:41 2017 +0200

    Make Cinder use PyMySQL as default DB driver

    Change-Id: Id012cb764de3b109d95e7c4a29d1d9c94e337117
    Closes-Bug: #1550192

tags: added: on-verification
Ivan Kolodyazhny (e0ne) on 2017-01-20
tags: added: customer-found
Ekaterina Shutova (eshutova) wrote :

Verified on 9.2 snapshot #798.

tags: removed: on-verification
Maria Zlatkova (mzlatkova) wrote :

A release note has been added to 9.2 resolved issues: https://review.fuel-infra.org/#/c/30162.

tags: removed: release-notes

This issue was fixed in the openstack/fuel-library 11.0.0.0rc1 release candidate.

tags: added: on-verification
Ekaterina Shutova (eshutova) wrote :

Verified on 10.1 #1578.
Volumes were deleted successfully.

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers