Improvements to the snap search API functionality

Bug #1734122 reported by Robin Winslow
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Snap Store Server
Expired
Undecided
Unassigned

Bug Description

Matthew made a document (Canonical private) about how queries and results should be handled:

https://docs.google.com/document/d/191BWQmF58lUssyo_x7azYm1vLuv91Z451wWXceXjvnM/edit?ts=59d77f22

We discussed this on the sprint and at the time highlighted a few differences between his vision and the existing behaviour.

We just had a meeting to compare the specifics in this document with the existing API, and came up with a list of improvements. Some are higher priority than others, so I'll put them in priority order and try to comment on priority:

(I'm happy to split this out into separate bugs, but right now it was quicker to write them down all here)

# Suggested improvements

## Reliable page sizes

There's already an issue for this - https://bugs.launchpad.net/snapstore/+bug/1731925
  Pages are often a different size than what was asked for. This is pretty serious as it's both both confusing (amateur-looking) for the user, and limits our ability to infer things from the API.

## Return total search results

We almost certainly want to expose total number of search results to the user. If the above bug was fixed then we could figure out the total number by making multiple calls to the API - this would be okay in the short term.

In the long term however it would be preferably to see the total number of results for that query returned with every API call.

## Basic query should query publishers and sections

The basic query parameter should also search publishers and sections - e.g. searching for "canonical" should find all snaps by Canonical, and "games" should return all snaps in the "Games" section.

This is less high priority than getting reliable page sizes, but it's still fairly important for a good search experience.

## Search terms should be treated with "or" not "and"

"documentation builder" should return both "documentation-builder" and other "builder" and "documentation" results, but rank "documentation-builder" at the top. At the moment it only returns "documentation-builder".

## International versions of common characters

Currently, searching for "découvertes" finds the Bayam snap, but "decouvertes" gives you nothing.

Could we please make it so that "decouvertes" would find the Bayam snap. This shouldn't work the other way round however - "é" should only be treated as is ("découvertes" should not find "decouvertes")

## Only search on complete words (what is substring search doing? It's weird.)

I can't work out quite how the existing search works. In some cases it seems to do substring search and in some cases it doesn't. "Can" and "onic" both return results for "Canonical". They should not, they should only return results for those actual words.

However, this is not all that high priority.

## Common words should be ignored if any other words exist - e.g. "a", "an", "the"

 But results containing these words should be ranked higher than those not containing those words. I can't quite tell if this is happening already, because of the current "and" functionality.

Again, this is not as high priority as the other points.

## Adjustable ranking

It would be nice to consider explicitly which matches cause higher ranking in the search results. E.g. if someone searches for "games", do snaps from a "publisher" called "awesome games" rank higher than snaps in the section "games"?

Can we currently specify this?

Tags: search
description: updated
description: updated
description: updated
Revision history for this message
Matthew Paul Thomas (mpt) wrote :

“"games" should return all snaps in the "Games" section” may be a misunderstanding. What I had in mind instead was that the “Games” category would be returned as *a single result* in the set of results.

If it did return every snap in that category, the flood of results could be confusing if you didn’t know that the category existed.

Celso Providelo (cprov)
Changed in snapstore:
status: New → Confirmed
tags: added: search
tags: added: snapfind
removed: search
Celso Providelo (cprov)
tags: added: search
Revision history for this message
William Grant (wgrant) wrote :
Download full text (3.5 KiB)

> ## Return total search results
>
> We almost certainly want to expose total number of search results to
> the user. If the above bug was fixed then we could figure out the
> total number by making multiple calls to the API - this would be okay
> in the short term.

Is the exact number actually interesting to present? It's expensive to
calculate and can never be totally accurate -- snaps can drop in or out
of search results. We could provide a rough estimate if it's really
worth showing to the user.

> ## Basic query should query publishers and sections
>
> The basic query parameter should also search publishers and sections
> e.g. searching for "canonical" should find all snaps by Canonical,
> and "games" should return all snaps in the "Games" section.

There seems to be some disagreement as what the desired behaviour here
is. It also doesn't sound important until we have views for publishers
and sections, neither of which sounds terribly useful today given the
current number of snaps, and neither of which is scheduled for the first
round of UI AFAIK.

> ## Search terms should be treated with "or" not "and"
>
> "documentation builder" should return both "documentation-builder"
> and other "builder" and "documentation" results, but rank
> "documentation-builder" at the top. At the moment it only returns
> "documentation-builder".

It can't be quite that simple; if we miss even a single stopword, heaps
of searches will erroneously return hundreds of snaps. We could allow
some subset of terms to be missing, but a straight OR is going to give
unacceptable and surprising results.

> ## International versions of common characters
>
> Currently, searching for "découvertes" finds the Bayam snap, but
> "decouvertes" gives you nothing.
>
> Could we please make it so that "decouvertes" would find the Bayam
> snap. This shouldn't work the other way round however - "é" should
> only be treated as is ("découvertes" should not find "decouvertes")

Non-English metadata is not presently supported by the snappy ecosystem.
Once snap metadata is translatable, we'll enhance the store search APIs
to take an explicit language and perform language-appropriate
normalisation and stemming. But even once that's done, the behaviour
requested here doesn't seem at all intuitive. This needs more discussion.

> ## Only search on complete words (what is substring search doing?>
> It's weird.)
>
> I can't work out quite how the existing search works. In some cases
> it seems to do substring search and in some cases it doesn't. "Can"
> and "onic" both return results for "Canonical". They should not, they
> should only return results for those actual words.

The final word in a search is treated as a prefix, to handle incremental
search in applications like gnome-software. Some clients may not want
that behaviour, so we might want to make that optional.

Do you have an example of a request that has "onic" matching
"Canonical"? "Can" as the final term in a query should, but "onic"
should not and does not AFAICT.

> ## Common words should be ignored if any other words exist - e.g.
> "a", "an", "the"
>
> But results containing these words should be ranked high...

Read more...

Revision history for this message
John Lenton (chipaca) wrote :

The document says

> A snap should be returned if all the search words are in any applicable fields, even in different fields. For example, if a publisher named Yoyodyne has one snap titled JazzWriter and one titled MegaChess, searching for “yoyodyne jazzwriter” should return JazzWriter but not MegaChess.

which is exactly the opposite of

> Search terms should be treated with "or" not "and"

so, which one is it?

(I really hope you mean the "and", because the other one doesn't make too much sense to me...)

John Lenton (chipaca)
Changed in snapstore:
status: Confirmed → Incomplete
Celso Providelo (cprov)
tags: removed: snapfind
Revision history for this message
Matthew Paul Thomas (mpt) wrote :

> We could provide a rough estimate if it's really worth showing to the user.

The use case for knowing the number of results is getting to the bottom of the first SERP, and deciding whether it’s a better use of your time to page through the rest of the results or to refine your search. That doesn’t require the number to be exact: “1-10 of about 50” would be fine.

> Once snap metadata is translatable, we'll enhance the store search APIs
to take an explicit language and perform language-appropriate
normalisation and stemming.

That might help the Bayam example, but it seems like a needless dependency for solving this problem, and it wouldn’t help with snap titles. If someone titles their snap “Discothèque”, they won’t change that across languages, and I should be able to find it by searching for “discotheque” regardless.

> We don't currently consider stopwords in ranking, and it is very unusual for that to be done.

It’s lower priority than the other items, but not unusual at all. For example, Google, Bing, DuckDuckGo, and Wikipedia all correctly give differently-ranked results for “the national” vs. “national” (and all do the right thing when searching for “the the”).

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Snap Store because there has been no activity for 60 days.]

Changed in snapstore:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.