This came up in another context again, so I am going to try to leave some comments which hopefully make the situation more clear (because it is confusing and there are too many things which are named the same):

* snapctl comes from the snapd project
* snapd the project gets shipped onto a user's system through multiple possible places/packages
* snapd can come from the `core` snap, the `snapd` snap or from traditional linux packages like the debian package for Debian 9
* for most utilities, we have implemented re-exec logic so that when you run /usr/bin/snap for example, it will re-exec through the most recent version of code from wherever that is, i.e. if the deb is 2.21 and the snapd snap is 2.51, then /usr/bin/snap will exec (with the same args) /snap/snapd/current/usr/bin/snap
* We did not have re-exec logic implemented for snapctl as of 2.21
* Rather than implement re-exec logic for snapctl, we instead opted to have snapd when it sets up the mount namespace pick the most recent/correct version of snapctl from wherever that might be, and mount it in the mount namespace for the snap at /usr/bin/snapctl, so that strict snaps do not need to care about where snapctl is coming from they can always rely on /usr/bin/snapctl being available and being the right version.
* None of that description for snapctl mounting applies for classic snaps, which we did not consider at the time and in retrospect probably opting for re-exec logic directly inside snapctl would have been a better idea because it would have solved the problem for classic snaps as well
* This means that old snapctl's do not know any better to try and re-exec themselves into a newer version, and also that we do not get a chance to "fix" the version of snapctl for classic snaps since classic snaps do not enter into a mount namespace for us to mount[1]


This problem with snapctl may appear on the surface to be a "new" regression in snapd, but that is only because of a different set of facts:
* It used to be that there was only one base snap for snaps to use, the core snap. The core snap is/was unique in that it contained _both_ snapd the project and the rootfs that application snaps use/depend on. That means that snapctl is included in the core snap
* For bionic, we realized that it would make more sense to split snapd out into it's own snap, the snapd snap, and then to ship the rootfs for snaps to execute from into separate "base snaps" such as core18 and core20, etc.
* It is my understanding that wrapper scripts were setup in snapcraft to make the default $PATH include the the base snap even for classic snaps, this effectively meant that /snap/core/current/usr/bin ended up being higher in priority than /usr/bin, this was to facilitate classic snaps operating more correctly by using stuff from their base rather than from the host.
* Now, if a snap used to use `base: core` (or not specify a base at all in the snapcraft.yaml), and it transitioned to using `base: core18`, since core18 does not include snapd the project, it will appear as if snapctl changed, because the $PATH now has changed, where now /usr/bin/snapctl will show up before something else like /snap/snapd/current/usr/bin/snapctl, which then means that certain classic snaps like certbot will see snapctl as regressing to an older version if their snapctl from the host system is old (like the one in Debian stretch is). This is why I believe that it seems like this is a new regression and not an existing problem and may also be why the maintainer remembers testing on Debian stretch and found that it worked because they may have been using the core snap as their base at that point and thus picked up the new version of snapctl from the core snap.


Hopefully all of that information aids in understanding the situation more fully. With all of that, let me respond to your points specifically:


> You're framing this as a bug in snapd. This may be true, but the report points to the regression being triggered by a change in the core snap, and this suggests that the obvious resolution is to identify and immediately revert the change that triggered the issue

This is a bug in snapd, and it only seems like it is a bug in the core snap because of the above facts. Arguably, the fact that snapctl worked the way that it was being used from a classic snap like this is probably entirely on accident and we should design/decide on how classic snaps should use snapctl so that we can maintain a proper feature which is more thoroughly tested. Classic snap using features have always been very difficult to maintain and support since by definition classic snaps can do anything and so many times end up defeating our well intentioned support strategy as can be seen here. IMHO, probably the solution I describe in [1] is the best long term strategy as it will ensure that using /usr/bin/snapctl (or whatever the libexec dir is on other distros) from a classic snap just always does the right thing no matter what.

> I'm surprised that an inadvertent regression that breaks users doesn't result in you immediately investigating and reverting the change that you made that triggered the bug. 

We do indeed sometimes introduce regressions and also sometimes immediately investigate and sometimes also even revert changes, however we have limited time and many features/bugs to work on, so it is a matter of priority. This particular regression was not raised to us as being a significant issue for a significant number of users (debian users of specifically the certbot snap are not as high in number as say strict snap using users on ubuntu for example), so we did not investigate thoroughly or decide to revert. I apologize if this bug was particularly damaging for any users, we take bugs seriously but we have to triage things and performing reverts is very energy and resource consuming for us, since pretty much every release we push out includes other high priority bug fixes which would then go back to being very high priority for us to push out. It is almost always a better use of our time to attempt to quickly push out a new update which fixes a bug than it is to revert and try again. 

> The problem with fixing a bug in backports is that users following your own instructions at https://snapcraft.io/docs/installing-snap-on-debian wouldn't receive the fix since backports is opt-in. That would leave things broken for Debian users by default.

Well, we did not realize until just on this bug report recently how the stars aligned to create this specific problem for specifically classic snaps that use a non-core base that also use snapctl on Debian stretch, and since the version of snapd is so old on Debian stretch there are countless other bugs that we have fixed since snapd 2.21 and the best support scenario is to try and get that version of snapd updated to what we currently support. Now that we understand the unique set of circumstances which caused this bug for specifically classic snaps that use a non-core base that also use snapctl on Debian stretch, we can try to explore other ways to fix it. Indeed as you suggest maybe one of those ways is to provide some sort of minimal patch to snapctl against snapd 2.21 in Debian stretch, but I am pretty doubtful that is the best solution unless that patch is for implementing re-exec in snapctl for snapctl to switch itself over to the most up to date version of snapctl on the system. Even just switching over snapctl to talk to the right socket is probably insufficient and there would inevitably be other bugs with snapctl being used this way.

> I wonder if you could provide a minimal patch for snapd 2.21 please? Maybe it'd be possible to get such a fix into stretch-updates, although I'm not sure if that is still open.

As I said, I'm not sure what this patch would look like. As I describe in [1], I think that we have an opportunity with parallel installs enabled to setup the mount namespace to fix this bug automatically, and that fix would be effective for this specific bug because it could be delivered by the snapd or core snap, and Debian does by default do re-exec for i.e. `snap run` even in 2.21, so even if snapctl is broken on Debian stretch, as long as snapd continues to re-exec and the user has a new core/snapd snap installed, they would get the fix for this.

[1] By default classic snaps do not enter into a separate mount namespace the way that strict snaps always do, but if you enable parallel installs (it's currently experimental), then we do enter into a mount namespace that is very very close to the host's, but this would provide us an opportunity to fix this bug when we make parallel installs enabled because we could (privately, so just for the snap's mount namespace and not for the host) mount over the host's /usr/bin/snapctl from the appropriate location.