Comment 11 for bug 1957320

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We have ran some tests
- 1 - nginx -> nginx+fix
- 2 - nginx -> cause issue -> nginx+fix
- 3 - nginx -> cause issue -> nginx (re-install same)
- 4 - install nginx+fix -> cause issue

#1
- works just fine
- there is no upgrade issue with the proposed fix

#2
- fails to upgrade, because the former issue we caused makes the service non-startable and the upgrade will trigger a service restart in postinst. Since that fails it can not upgrade.

2b) If we remove the opposing file, then service restart or package upgrades work.

#3
- behaves just like #2, so it really isn't a "new" issue by the fix.
- It is instead a consequence of the original issue

#4 shows correctly that with the fix applied, the start, stop, restart works fine -> verified

In theory one could start to think about pre/postinst magic.
That magic would do the rm on the socket that is breaking the (re)start of the service.
But the problem is that the path in question is user defined, we would need to grow the ability to correctly parse an nginx configuration - to then remove a file we have gotten out of that config.

This is dangerous at best and should not be a path forward.

Outcomes from here:
- not providing that fix to users -> people will stay affected and without a fix :-/
- providing that fix to users:
  a) not affected -> it will upgrade and never occur
  b) already affected and aware (like on this bug) -> the upgrade will trigger, the issue will manually be resolved once and then things are good
  c) already affected (and potentially not knowing as it only occurs on restart) -> the issue will trigger on restart and the upgrade will disable the webserver

So this is really tricky, we need to provide a fix to avoid this from affecting people.
But providing the fix might people that are affected but not knowing to lose their function.

BUT - we are NOT making it worse.
Any update to nginx will trigger this problem IF the user has configured it to use sockets.
So this update potentially causing it would at least prevent it from happening again and again e.g. on security updates which are even unattended in the background (and would still lose the server).

I sadly can not think of a great magic, that is also safe as deleting files never is nice.
A potential way could be to do this
1. simple check through config to get candidates
$ grep -v '^\s*#' /etc/nginx/sites-enabled/* | grep -o 'unix:.*;'
unix:/run/serve-files.socket;
2. to ensure this isn't someone playing tricks, check if this is open by nginx for real
$ lsof -a -p $(systemctl show --property MainPID --value nginx) -U /run/serve-files.socket
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 3709 root 8u unix 0x0000000000000000 0t0 1845694 /run/serve-files.socket type=STREA
3. Then - only for that versioned upgrade - remove that file after stop and before start

@SRU team, with that explained how do you feel about providing the upgrade
a) as-is (as any update would cause the issue on systems configured for it, we are biting the bullet and are over it)
b) as-is + magic code (not biting the bullet but potential for a new grenade)
c) any other alternative you'd prefer?