Filter results when querying a container

Bug #1319097 reported by Robin Winslow
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Wishlist
Christian Hugo

Bug Description

Would it be possible to support filtering of the results by substring returned when querying a container - so that instead of listing all results, you list only results whose names contain the substring?

https://ask.openstack.org/en/question/29566/swift-search-objects-in-container-by-name/

e.g.: https://swift.example.com/v1/AUTH_ID/my_collection?format=json&q=ubuntu

For me it would also be fantastic if it was possible to filter based on meta data as well - which is connected to https://bugs.launchpad.net/swift/+bug/1319096

tags: added: container
removed: containers
Changed in swift:
assignee: nobody → Christian Hugo (christianhugo)
Revision history for this message
Christian Hugo (christianhugo) wrote :

Hi, i want to add a paramater to filter the result with a regular expression.

It should cover this use case and many others.

Do you guys think this could be reasonable change?

Changed in swift:
importance: Undecided → Wishlist
Revision history for this message
John Dickinson (notmyname) wrote :

It's definitely a change that a lot of people would like. Unfortunately, to get reasonable filtering of listing results, we'd need to store indexes for the given column in the DB. The indexes will expand storage requirements for the DBs, increase latency for write requests (due to increased index insertions), and then there's the big problem of what do to with existing data.

If you're talking about applying a regex to every row in a container DB, that sounds like a significant impact to a cluster (from the operator perspective) and likely a pretty substantial impact to the API user.

I think the better long-term way of solving the end-user problem of filtering listing results is the current in-progress work of indexing object metadata data in an external system optimized for that (eg ELK). This includes integration with the OpenStack Searchlight project.

Revision history for this message
Janie Richling (jrichli) wrote :

I totally agree, and there are ongoing efforts centered around metadata search.

Not thinking about filtering based on metadata, and instead just focusing on the first request you mentioned - modify the listing only to contain results whose names contain a given substring:

At first, I was thinking you could write a middleware to filter the listing response. The decrypter middleware came to mind because it has to parse and process each item in the container listing currently. But then I asked Tim about this, and he reminded me that there is a limit set against the results. Furthermore, there is support to specify a subset of objects (common use for paging the results). If you filter in middleware instead of in the original listing from containers, then the listing would not have the expected item count in the results.

Revision history for this message
Matthew Oliver (matt-0) wrote :

The first part of this (substring matching) has come up before, a few times really. This is something that the Horizon people have been asking for to. Which made me go write: https://review.openstack.org/#/c/287592/

However as John mentions (and Sam in the review) it does introduce a bit performance hit when using 'contains' even though it's hitting the a the name index.

Revision history for this message
Matthew Oliver (matt-0) wrote :

Like Janie said, notifications should help with the Horizon use case too cause they can take advantage of searchlight, so once we have notification support and integration with search light, boom, easy searching.

Revision history for this message
Timur Alperovich (timur-alperovich) wrote :

In particular for the metadata search efforts, here is a wiki page tracking some of the thoughts on it: https://wiki.openstack.org/wiki/Swift/ideas/metadata-sync. SwiftStack also made the code we use for Elasticsearch integration available:
https://github.com/swiftstack/container-crawler and https://github.com/swiftstack/swift-metadata-sync

The first repository is the code required to look at container databases for any new objects. The second repository uses the ContainerCrawler to index object metadata in Elasticsearch. This subsequently allows for quick searches by substrings, metadata values, or a number of other ways.

Revision history for this message
Christian Hugo (christianhugo) wrote :

Thanks for the answers. I didn't know about the Elsaticsearch integration an i totally agree that this is the way to go. I did not think about the db index as well.
I won't try to implement this. Thanks for all the feedback.

Tim Burke (1-tim-z)
tags: added: container-listings
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.