snapstore and review-tools use the wrong regexp for snap names

Bug #1751447 reported by John Lenton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Click Reviewers tools (obsolete)
Fix Released
Undecided
John Lenton
Snap Store Server
Fix Released
Undecided
Unassigned
Snapcraft
Fix Released
Undecided
John Lenton
review-tools
Fix Released
Undecided
Jamie Strandboge
snapd
Fix Released
Undecided
John Lenton

Bug Description

The agreed-upon regexp for snap names is

^(?:[a-z0-9]+-?)*[a-z](?:-?[a-z0-9])*$

with a maximum length of 40.

(which translates to: snap names must be made of ascii letters and numbers, and the hyphen, but hyphens must be surrounded by letters or numbers).

Snapcraft and snapd both use this regexp for validating snap names; a snap with a name that doesn't match this will fail to install.

However, today, you can happily register a snap called "1024", or "10-10". As far as I can tell neither review-tools nor the store will stop you from publishing this.

Related branches

Revision history for this message
John Lenton (chipaca) wrote :

I understand that the regexp we use in Go has nested quantifiers which
you should never use in Python (I'm adding a snapcraft task because of
this -- a snap name of "u-94903713687486543234157734673284536758" is
used in review-tools' testsuite to showcase the issue).

I see three alternatives:

1. what review-tools is doing, tweaked: check it matches
   r"^[a-z0-9](?:-?[a-z0-9])*$" and then any('a' <= c <= 'z' for c in n)
   (currently it's checking 'not n.isnumeric()' which doesn't catch
   a name with only numbers and dashes).

2. a slightly more efficient variation on that: check it matches
   r'^[a-z0-9-]*[a-z][a-z0-9-]*$' and then that it doesn't start with -, end
   with -, or have two -s in a row.

3. rewrite the regexp to use look-aheads and look-behinds instead of nested
   quantifiers: r'^(?:[a-z0-9]|(?<!-)-)*[a-z](?:[a-z0-9]|-(?!-))*$' this won't
   work in Go (no support for any of the lookarounds), but it works in Python.

4. do it by walking the string by hand.

I implemented these in https://pastebin.ubuntu.com/p/K7wnV2QDhv/ and made sure
they all passed the review-tools tests (plus a couple I've added), and timed
them: with that code as q.py, I did

$ for i in $( grep -o -P '(?<=def ).*(?=\(n\):)' q.py ); do echo "$i"; python -m timeit -s "import q" "q.$i('u-9490371368748654323415773467')"; done
hand_walk
1000000 loops, best of 3: 0.285 usec per loop
simpler_regex_and_check_a
1000000 loops, best of 3: 1.66 usec per loop
simpler_regex_and_check_b
1000000 loops, best of 3: 0.891 usec per loop
lookaround_regexp
100000 loops, best of 3: 3.06 usec per loop
nested_quantifiers
10 loops, best of 3: 10.9 sec per loop

and, in a less pathological case,

$ for i in $( grep -o -P '(?<=def ).*(?=\(n\):)' q.py ); do echo "$i"; python -m timeit -s "import q" "q.$i('meh')"; donehand_walk
1000000 loops, best of 3: 0.291 usec per loop
simpler_regex_and_check_a
1000000 loops, best of 3: 0.765 usec per loop
simpler_regex_and_check_b
1000000 loops, best of 3: 0.435 usec per loop
lookaround_regexp
1000000 loops, best of 3: 0.457 usec per loop
nested_quantifiers
1000000 loops, best of 3: 0.457 usec per loop

Surprisingly, walking it by hand seems to be the fastest (I expected the
re-based ones to win due to a lot of the processing being in C).

I'd be happiest if we could sync all the checks for this, in other words, that
if we found a bug in one of them we could trivially re-sync the checks. The
lookaround regexp fails this test, but so does the current one we're using in
snapd. I _suspect_, but am not certain, that the manual walk won't work with
snapcraft's schema validation.

HTH.

John Lenton (chipaca)
description: updated
Revision history for this message
John Lenton (chipaca) wrote :

Gah. Beyond launchpad eating my formatting, I also messed up a tab when I copied the manual walk into q.py. Without that, the re-based ones win:

. hand_walk: 100000 loops, best of 3: 2.62 usec per loop
.simpler_regex_and_check_a: 1000000 loops, best of 3: 1.66 usec per loop
.simpler_regex_and_check_b: 1000000 loops, best of 3: 0.889 usec per loop
. lookaround_regexp: 100000 loops, best of 3: 2.87 usec per loop
. nested_quantifiers: ^C

Revision history for this message
John Lenton (chipaca) wrote :

one last try to get launchpad to play nice with indentation

                hand_walk: 100000 loops, best of 3: 2.62 usec per loop
simpler_regex_and_check_a: 1000000 loops, best of 3: 1.66 usec per loop
simpler_regex_and_check_b: 1000000 loops, best of 3: 0.889 usec per loop
        lookaround_regexp: 100000 loops, best of 3: 2.87 usec per loop
       nested_quantifiers: ^C

Revision history for this message
John Lenton (chipaca) wrote :

OK, this is embarrasing, I had another copy error (I'd fixed the lookaround regexp, and forgotten about the fix). The right one is

r'^(?:[a-z0-9]|(?<=[a-z0-9])-)*[a-z](?:[a-z0-9]|-(?=[a-z0-9]))*$'

(it's not enough to check - is not preceded/followed by -; you need to check it is preceded/followed by [a-z0-9] otherwise ^ or $ will match)

and, i wasn't doing the right check in one of them (wouldn't affect timings).

Anyway, I updated q.py to actually run the tests, to avoid more paper bags. Updated at
https://pastebin.ubuntu.com/p/nTgX78fhCy/

                hand_walk:  100000 loops, best of 3: 2.61 usec per loop
simpler_regex_and_check_a: 1000000 loops, best of 3: 1.68 usec per loop
simpler_regex_and_check_b: 1000000 loops, best of 3: 0.908 usec per loop
        lookaround_regexp:  100000 loops, best of 3: 3.09 usec per loop
       nested_quantifiers: ^C

Revision history for this message
Sergio Schvezov (sergiusens) wrote :
Changed in snapcraft:
assignee: nobody → John Lenton (chipaca)
status: New → Fix Committed
Changed in click-reviewers-tools:
assignee: nobody → John Lenton (chipaca)
status: New → Fix Committed
Changed in review-tools:
assignee: nobody → Jamie Strandboge (jdstrand)
status: New → Triaged
Changed in review-tools:
status: Triaged → Fix Released
John Lenton (chipaca)
Changed in snapd:
status: New → In Progress
assignee: nobody → John Lenton (chipaca)
Changed in snapstore:
status: New → Fix Committed
Changed in snapstore:
status: Fix Committed → Fix Released
Changed in click-reviewers-tools:
status: Fix Committed → Fix Released
Changed in snapcraft:
status: Fix Committed → Fix Released
Changed in snapd:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.