backup service crashes in ceph job with "pure virtual method called"

Bug #1551305 reported by Matt Riedemann
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Medium
Unassigned
ceph (Ubuntu)
Invalid
Undecided
Unassigned
Trusty
In Progress
Critical
Chris Holcombe

Bug Description

The service crashes here:

http://logs.openstack.org/49/281149/5/gate/gate-tempest-dsvm-full-ceph/82e9e00/logs/screen-c-bak.txt.gz#_2016-02-29_12_13_20_501

2016-02-29 12:13:20.501 DEBUG cinder.backup.drivers.ceph [req-01f7866f-34a2-4c5b-8b23-3357236e0013 tempest-VolumesBackupsV1Test-2076678965] Image 'volume-c2490215-a828-4af5-b510-681c0a52c76b.backup.e5db41b1-35a3-4557-bff9-7292054cd1ad' not found - trying diff format name _rbd_image_exists /opt/stack/new/cinder/cinder/backup/drivers/ceph.py:561
pure virtual method called
terminate called without an active exception

On the API side, anything requesting backups fails:

http://logs.openstack.org/49/281149/5/gate/gate-tempest-dsvm-full-ceph/82e9e00/logs/screen-c-api.txt.gz#_2016-02-29_12_16_36_116

2016-02-29 12:16:36.116 INFO cinder.api.openstack.wsgi [req-f817ad3e-6100-4eec-80d1-685edb1d1542 None] HTTP exception thrown: Service cinder-backup could not be found.

So far there is only one hit in the gate for this:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22HTTP%20exception%20thrown%3A%20Service%20cinder-backup%20could%20not%20be%20found.%5C%22%20AND%20tags%3A%5C%22screen-c-api.txt%5C%22&from=7d

Tags: ceph
Revision history for this message
Matt Riedemann (mriedem) wrote :

This is the version of ceph used in the failed test:

ii librados2 0.80.11-0ubuntu1.14.04.1 amd64 RADOS distributed object store client library

Here is the info on that release:

https://launchpad.net/ubuntu/+source/ceph/0.80.11-0ubuntu1.14.04.1

Released on 1/18/2016.

Revision history for this message
Duncan Thomas (duncan-thomas) wrote :

"pure virtual method called" at the end of the backup log seems ot come from C++, not from python, which suggests the issue is a change in the ceph libraries somewhere rather than cinder.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like this was the fix in ceph:

https://github.com/ceph/ceph/pull/6525

But that fix wasn't included in the ubuntu release:

https://bugs.launchpad.net/ubuntu/trusty/+source/ceph/+bug/1535278

summary: - backup service crashes in ceph job
+ backup service crashes in ceph job with "pure virtual method called"
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
James Page (james-page) wrote :

Upstream original bug reference:

http://tracker.ceph.com/issues/13636

firefly backport was rejected:

http://tracker.ceph.com/issues/13757

Revision history for this message
Michal Dulko (michal-dulko-f) wrote :

I was afraid that whole scalable backups stuff we've merged recently may be causing this, but it seems very unlikely.

Revision history for this message
Matt Riedemann (mriedem) wrote :

@james-page, I see in http://tracker.ceph.com/issues/13757 that ceph won't backport the fix to firefly since that's EOL. So will Ubuntu be patching the package in the distro?

Revision history for this message
Matt Riedemann (mriedem) wrote :

For now we're going to make the ceph jobs non-voting so we stop the gate resets during feature freeze week:

https://review.openstack.org/#/c/286642/

Revision history for this message
Josh Durgin (jdurgin) wrote :

@james-page I'd recommend backporting that patch for the precise packages

James Page (james-page)
Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → James Page (james-page)
status: Triaged → Invalid
Changed in ceph (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → James Page (james-page)
Changed in ceph (Ubuntu):
importance: Critical → Undecided
assignee: James Page (james-page) → nobody
James Page (james-page)
Changed in ceph (Ubuntu Trusty):
assignee: James Page (james-page) → Chris Holcombe (xfactor973)
Revision history for this message
Matt Riedemann (mriedem) wrote :

@Josh, per comment 9, I think you mean Trusty 14.04 LTS packages, since that's what the OpenStack CI system is using.

Revision history for this message
Josh Durgin (jdurgin) wrote :

@Matt yes, I meant trusty rather than precise

Revision history for this message
James Page (james-page) wrote :

Uploading proposed packages with cherry picked fixes c/o Chris to:

https://launchpad.net/~openstack-ubuntu-testing/+archive/ubuntu/ceph-sru

Changed in ceph (Ubuntu Trusty):
status: Triaged → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

I have a devstack change up to test the ppa:

https://review.openstack.org/#/c/288145/

That depends on a project-config change to add the ceph job to the experimental queue for devstack so we can actually test the ppa.

Revision history for this message
James Page (james-page) wrote :

I've done some initial regression testing of the proposed update - no regressions that I can see.

Is there a nice way to repro this so we can have a specific test case for the SRU? something synthetic would be great - my regression testing does not include cinder-backup which I think is the key to hitting this bug...

Revision history for this message
Matt Riedemann (mriedem) wrote :

I don't have a simple reproduce on this, and it's not just the cinder backup service that was hitting failures according to the logstash query above, it also randomly hits in other services.

Revision history for this message
Matt Riedemann (mriedem) wrote :

What's the status on the trusty package part of this? Is there an ETA on when that will be released?

Revision history for this message
Matt Riedemann (mriedem) wrote :

We've dropped the ceph job that uses the packages from the distro, we get them via the ceph mirrors using the devstack ceph plugin now.

Changed in cinder:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.