backup service crashes in ceph job with "pure virtual method called"

Bug #1551305 reported by Matt Riedemann on 2016-02-29
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Medium
Unassigned
ceph (Ubuntu)
Undecided
Unassigned
Trusty
Critical
Chris Holcombe

Bug Description

The service crashes here:

http://logs.openstack.org/49/281149/5/gate/gate-tempest-dsvm-full-ceph/82e9e00/logs/screen-c-bak.txt.gz#_2016-02-29_12_13_20_501

2016-02-29 12:13:20.501 DEBUG cinder.backup.drivers.ceph [req-01f7866f-34a2-4c5b-8b23-3357236e0013 tempest-VolumesBackupsV1Test-2076678965] Image 'volume-c2490215-a828-4af5-b510-681c0a52c76b.backup.e5db41b1-35a3-4557-bff9-7292054cd1ad' not found - trying diff format name _rbd_image_exists /opt/stack/new/cinder/cinder/backup/drivers/ceph.py:561
pure virtual method called
terminate called without an active exception

On the API side, anything requesting backups fails:

http://logs.openstack.org/49/281149/5/gate/gate-tempest-dsvm-full-ceph/82e9e00/logs/screen-c-api.txt.gz#_2016-02-29_12_16_36_116

2016-02-29 12:16:36.116 INFO cinder.api.openstack.wsgi [req-f817ad3e-6100-4eec-80d1-685edb1d1542 None] HTTP exception thrown: Service cinder-backup could not be found.

So far there is only one hit in the gate for this:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22HTTP%20exception%20thrown%3A%20Service%20cinder-backup%20could%20not%20be%20found.%5C%22%20AND%20tags%3A%5C%22screen-c-api.txt%5C%22&from=7d

Matt Riedemann (mriedem) wrote :

This is the version of ceph used in the failed test:

ii librados2 0.80.11-0ubuntu1.14.04.1 amd64 RADOS distributed object store client library

Here is the info on that release:

https://launchpad.net/ubuntu/+source/ceph/0.80.11-0ubuntu1.14.04.1

Released on 1/18/2016.

Duncan Thomas (duncan-thomas) wrote :

"pure virtual method called" at the end of the backup log seems ot come from C++, not from python, which suggests the issue is a change in the ceph libraries somewhere rather than cinder.

Matt Riedemann (mriedem) wrote :

Looks like this was the fix in ceph:

https://github.com/ceph/ceph/pull/6525

But that fix wasn't included in the ubuntu release:

https://bugs.launchpad.net/ubuntu/trusty/+source/ceph/+bug/1535278

summary: - backup service crashes in ceph job
+ backup service crashes in ceph job with "pure virtual method called"
James Page (james-page) wrote :

Upstream original bug reference:

http://tracker.ceph.com/issues/13636

firefly backport was rejected:

http://tracker.ceph.com/issues/13757

Michal Dulko (michal-dulko-f) wrote :

I was afraid that whole scalable backups stuff we've merged recently may be causing this, but it seems very unlikely.

Matt Riedemann (mriedem) wrote :

@james-page, I see in http://tracker.ceph.com/issues/13757 that ceph won't backport the fix to firefly since that's EOL. So will Ubuntu be patching the package in the distro?

Matt Riedemann (mriedem) wrote :

For now we're going to make the ceph jobs non-voting so we stop the gate resets during feature freeze week:

https://review.openstack.org/#/c/286642/

Josh Durgin (jdurgin) wrote :

@james-page I'd recommend backporting that patch for the precise packages

James Page (james-page) on 2016-03-02
Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → James Page (james-page)
status: Triaged → Invalid
Changed in ceph (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → James Page (james-page)
Changed in ceph (Ubuntu):
importance: Critical → Undecided
assignee: James Page (james-page) → nobody
James Page (james-page) on 2016-03-02
Changed in ceph (Ubuntu Trusty):
assignee: James Page (james-page) → Chris Holcombe (xfactor973)
Matt Riedemann (mriedem) wrote :

@Josh, per comment 9, I think you mean Trusty 14.04 LTS packages, since that's what the OpenStack CI system is using.

Josh Durgin (jdurgin) wrote :

@Matt yes, I meant trusty rather than precise

James Page (james-page) wrote :

Uploading proposed packages with cherry picked fixes c/o Chris to:

https://launchpad.net/~openstack-ubuntu-testing/+archive/ubuntu/ceph-sru

Changed in ceph (Ubuntu Trusty):
status: Triaged → In Progress
Matt Riedemann (mriedem) wrote :

I have a devstack change up to test the ppa:

https://review.openstack.org/#/c/288145/

That depends on a project-config change to add the ceph job to the experimental queue for devstack so we can actually test the ppa.

James Page (james-page) wrote :

I've done some initial regression testing of the proposed update - no regressions that I can see.

Is there a nice way to repro this so we can have a specific test case for the SRU? something synthetic would be great - my regression testing does not include cinder-backup which I think is the key to hitting this bug...

Matt Riedemann (mriedem) wrote :

I don't have a simple reproduce on this, and it's not just the cinder backup service that was hitting failures according to the logstash query above, it also randomly hits in other services.

Matt Riedemann (mriedem) wrote :

What's the status on the trusty package part of this? Is there an ETA on when that will be released?

Matt Riedemann (mriedem) wrote :

We've dropped the ceph job that uses the packages from the distro, we get them via the ceph mirrors using the devstack ceph plugin now.

Changed in cinder:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers