glance-simplestreams-sync starts failing to sync images with failed conections to ceph-radosgw

Bug #1872548 reported by John George
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Glance-Simplestreams-Sync Charm
Incomplete
Undecided
Unassigned

Bug Description

Initial deployment of the attached bundle sync'd images without issues.
After about a week, the glance-simplestreams-sync/0 unit went into blocked state, while failing to sync images:
glance-simplestreams-sync/0* blocked idle 6/lxd/2 10.0.2.98 Image sync failed, retrying soon.

The glance-simplestreams-sync log shows connection failures to the ceph-radosgw/0 unit.
The ceph-radosgw logs have ERROR: keystone revocation processing returned error r=-13

Please see the juju crashdump available at:
https://people.canonical.com/~jog/bugs/juju-crashdump-openstack-GSSS-ceph-radosgw-keystone-revocation-error.tar.gz

Tags: cdo-qa
Revision history for this message
John George (jog) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :

The glance-simplestreams-sync charm is configured with "use_swift: true", but swift is not in this deployment. I think this needs to be corrected and if the issue still exists, re-raise the bug status.

Other observations:

This deployment uses OpenStack Stein and Ceph Mimic on Ubuntu Bionic, which should be a supported combination.

Ceph had some known issues with https://github.com/openstack-attic/identity-api/blob/master/v3/src/markdown/identity-api-v3-os-revoke-ext.md at https://tracker.ceph.com/issues/9493, but that should be fix-released as of Luminous. Still, the ksv3 / fernet surface areas are potentially related.

Keystone fernet token expiry is raised to 1 day in this deployment, referencing https://bugs.launchpad.net/charm-keystone/+bug/1856876.

The g-s-s-s charm has some late/in-flight py3 follow-up work underway @ https://bugs.launchpad.net/charm-glance-simplestreams-sync/+bug/1853456, which may be partially related.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Additional observations:

keystone-ldap is in play here. That needs to be factored into a reproducer scenario if the g-s-s-s use_swift config is fixed and the bug is re-raised.

There is a lot of good context surrounding this, which is associated with the ceph bug tracker referenced above:
https://bugs.launchpad.net/kolla/+bug/1683294
https://github.com/ceph/ceph/pull/16164/commits/e773b304eefa3d2ca7c1fe0817c89082bf574a38
https://tracker.ceph.com/issues/22312

Basically: I would expect this to never occur in a healthy ksv3/fernet environment, hence the request to fix the config and reproduce.

Changed in charm-glance-simplestreams-sync:
status: New → Incomplete
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

The use_swift key in the g-s-s-s charm config is a red herring. That jey just enables the use of the swift endpoints in the cloud, rather than self-hosting the images on the g-s-s-s unit. I have confirmed that g-s-s-s + radosgw works just fine with use_ssift enabled.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.