Trying subiquity from edge ended in UI crash

Bug #1978139 reported by Frank Heimes
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Skipper Bug Screeners
subiquity
Fix Released
Undecided
Unassigned

Bug Description

While going to try the subiquity version from edge on jammy (in preparation for testing LP#1972834) the installer crashed (repeatedly) right after the installer restarted (after the snap refresh).

It drops to a shell and prints:

Installer shell session activated.

This shell session is running inside the installer environment. You
will be returned to the installer when this shell is exited, for
example by typing Control-D or 'exit'.

Be aware that this is an ephemeral environment. Changes to this
environment will not survive a reboot. If the install has started, the
installed system will be mounted at /target.
root@ubuntu-server:/# sudo snap refresh --edge subiquity
subiquity (edge) 22.02.2+git353.a9ac5a5f from Canonical* refreshed
root@ubuntu-server:/# connecting...
generating crash report
report saved to /var/crash/1654790818.540452719.ui.crash
Traceback (most recent call last):
  File "/snap/subiquity/3477/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/snap/subiquity/3477/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/__main__.py", line 5, in <module>
    sys.exit(main())
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/cmd/tui.py", line 149, in main
    subiquity_interface.run()
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/client/client.py", line 405, in run
    super().run()
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquitycore/tui.py", line 374, in run
    super().run()
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquitycore/core.py", line 135, in run
    raise exc
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/client/client.py", line 481, in move_screen
    await super().move_screen(increment, coro)
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquitycore/tui.py", line 237, in move_screen
    view = await self.wait_with_progress(
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquitycore/tui.py", line 206, in wait_with_progress
    return await self._wait_with_indication(awaitable, self.show_progress)
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquitycore/tui.py", line 163, in _wait_with_indication
    result = await awaitable
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquitycore/tui.py", line 226, in _move_screen
    return await self.make_view_for_controller(new)
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/client/client.py", line 502, in make_view_for_controller
    view = await super().make_view_for_controller(new)
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquitycore/tui.py", line 132, in make_view_for_controller
    view = await maybe_view
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/client/controllers/source.py", line 29, in make_ui
    sources = await self.endpoint.GET()
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/common/api/client.py", line 48, in impl
    return serializer.deserialize(r_ann, await resp.json())
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/common/serialize.py", line 215, in deserialize
    return self._deserialize_attr(annotation, value, metadata, path)
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/common/serialize.py", line 205, in _deserialize_attr
    fields[key], value[key], path))
KeyError: 'search_drivers'

And even drops from the ssh session.

I can reconnect to the installer (via ssh) and immediately get:

▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
  Willkommen! Bienvenue! Welcome! Добро пожаловать! Welkom! [ Help ]
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
  Use UP, DOWN and ENTER keys to select your language.

   ┌────────────────────────────────────────────────────────────────────────┐
   │ │
   │ Sorry, the installer has restarted because of an error. │
   │ │
   │ [ View full report ] │
   │ │
   │ If you want to help improve the installer, you can send an error │
   │ report. │
   │ │
   │ [ Send to Canonical ] │
   │ │
   │ Select continue to try the installation again. │
   │ │
   │ [ Continue ] │
   │ │
   └────────────────────────────────────────────────────────────────────────┘

At that stage I changed to the shell and saved the content of /var/log and /var/crash
that is attached here...

(and at the subiquity shell I also faced some refresh issues, not sure if that is related)

This happened on an s390x z/VM installation, but I'm unsure if that's is an arch. specific issue.

Revision history for this message
Frank Heimes (fheimes) wrote :
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
description: updated
Olivier Gayot (ogayot)
Changed in subiquity:
assignee: nobody → Olivier Gayot (ogayot)
Revision history for this message
Olivier Gayot (ogayot) wrote (last edit ):

Hi Frank,

I'm very interested in the refresh issues you had. Would you please provide some details?
Looking at the traceback, there seems to be something mixing up the old and refreshed snap:

Traceback (most recent call last):
  File "/snap/subiquity/3477/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/snap/subiquity/3477/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/snap/subiquity/3360/lib/python3.8/site-packages/subiquity/__main__.py", line 5, in <module>
    sys.exit(main())

/snap/subiquity/3360 is the 22.04.2 snap (the one built in jammy isos today) ; while
/snap/subiquity/3477 is the refreshed / edge snap.

The search_drivers key is not present in 3360 but should be in 3477 so it's likely related.

Thanks,
Olivier

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Olivier,
sure - but what else can I share?!
I just kicked-off a regular 22.04 install (using the GA image) on a z/VM guest
choose 'English' as language
and right at the next screen an update of the installer was indicated "Installer update available"
but I did NOT accepted it, since I want to update to edge (to test LP#1972834),
so I opened a shell and did:
sudo snap refresh --edge subiquity
The refresh/update went fine and the installer restarted,
but shortly after that the crash happened.

I guess that's all what I can say about it so far ...

(If it's helpful, I could re-create the situation and give you access to the system.)

description: updated
Olivier Gayot (ogayot)
tags: added: fr-2458
Revision history for this message
Olivier Gayot (ogayot) wrote :

Hi Frank,

After trying a few things, I've been able to reproduce the crash. Whether the snap refresh occurred through a manual refresh or through subiquity screen does not seem to make a difference.

I believe that you can work around the issue and move forward with your testing by connecting again to the installer through SSH _after_ the refresh is finished (or use the client on /dev/tty1 is that is an option for you).
Please let me know otherwise, I can probably help find another way to do the testing.

Details and investigation:
-------------------------

It seems that after the snap refresh, the clients that were connected through SSH (or maybe other means?) don't reload properly.

In my example:
3447 is the version of the snap after refresh
3314 is the version of the snap before refresh

$ pgrep -a python3
2205 /snap/subiquity/3447/usr/bin/python3.8 /snap/subiquity/3447/usr/bin/subiquity
2221 /snap/subiquity/3447/usr/bin/python3.8 -m subiquity.cmd.server
2242 /snap/subiquity/3447/usr/bin/python3.8 -m subiquity
2431 /snap/subiquity/3447/usr/bin/python3.8 -m subiquity

PID 2205 is the client running on /dev/tty/1
PID 2221 is the server (forked from the client running on /dev/tty/1 - I think)
PID 2242 is a client that has been connected through SSH since _before_ the refresh
PID 2431 is a client that connected through SSH _after_ the refresh

Looking at the environment for each client we have:

$ strings /proc/2205/environ | grep -e 3447 -e 3314
PYTHON=/snap/subiquity/3447/usr/bin/python3.8
SNAP_USER_DATA=/root/snap/subiquity/3447
SUBIQUITY_ROOT=/snap/subiquity/3447
SNAP_REVISION=3447
SNAP_DATA=/var/snap/subiquity/3447
PWD=/var/snap/subiquity/3447
SNAP=/snap/subiquity/3447
PYTHONPATH=/snap/subiquity/3447/lib/python3.8/site-packages

$ strings /proc/2242/environ | grep -e 3447 -e 3314
PYTHON=/snap/subiquity/3447/usr/bin/python3.8
SNAP_USER_DATA=/root/snap/subiquity/3447
SUBIQUITY_ROOT=/snap/subiquity/3447
SNAP_REVISION=3447
PATH=/snap/subiquity/3314/bin:/snap/subiquity/3314/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
APPORT_DATA_DIR=/snap/subiquity/3314/share/apport
SNAP_DATA=/var/snap/subiquity/3447
PYTHONPATH=:/snap/subiquity/3314/lib/python3.8/site-packages:/snap/subiquity/3447/lib/python3.8/site-packages
SNAP=/snap/subiquity/3447

$ strings /proc/2431/environ | grep -e 3447 -e 3314
PYTHON=/snap/subiquity/3447/usr/bin/python3.8
SNAP_USER_DATA=/root/snap/subiquity/3447
SUBIQUITY_ROOT=/snap/subiquity/3447
SNAP_REVISION=3447
SNAP_DATA=/var/snap/subiquity/3447
SNAP=/snap/subiquity/3447
PYTHONPATH=:/snap/subiquity/3447/lib/python3.8/site-packages

As shown above, 2242 ends up with a broken environment, making it mix up things from the old and new snap.

Thanks,
Olivier

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Olivier - many thanks for your investigations.
Well, I cannot do a proper testing of LP#1972834 by myself (since we don't have NVMe devices in your system yet), hence I also wanted to write down some precise steps to follow for the initial bug reporter of LP#1972834 to follow and test.

So I'm not sure if it's a good way to ask him/her to test LP#1972834, but while doing the refresh he needs to follow a certain workaround for an issue with the refresh itself.

I personally think that it's better to have this fixed first, before proceeding with LP#1972834 (and asking our partner for testing), no?!

Revision history for this message
Olivier Gayot (ogayot) wrote :

Hi Frank.

I see your point, and I agree with you.
Maybe for the sake of testing, we could build an ISO that already includes the snap from edge. That would eliminate the need to do a refresh, and therefore should be a robust way to avoid refresh issues :)

I believe the issue might have been caused by https://bugs.launchpad.net/subiquity/+bug/1874034 ; and so has potentially been present for a while. I don't know yet how or how long it will take to fix it ; but I will continue looking.

Thanks,
Olivier

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Olivier,
okay, that could be an option (but would cause extra effort on your / the-installer team side).
IF you guys think that the fix in https://github.com/canonical/probert/pull/117 is okay (and doesn't need some upfront testing), it could just be incl. in the next subiquity update, that will then land in one of the next ISOs and it can be tested from there.
This would allow us to overcome the workaround needed for the refresh, but we would pay for that with some more time (which I think is fine).
(But at the end would leave the issue with the refresh open.)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Hmm I think this is similar to the issue fixed by https://github.com/canonical/subiquity/pull/1300 (and so similar to bug 1975629). The client restarts itself when it detects that a snap refresh has happened but it restarts into an environment that is appropriate for the old snap. I think SubiquityClient.restart should pass orig_environ() to os.execvpe.

But of course this bug is in the old version, so this doesn't help upgrades from the version in jammy. I think we could be less, uh, accommodating, in our environment variable setting during subiquity startup: we really ought to know what PYTHONPATH should be for our processes (when in a snap), so we perhaps we should just set PYTHONPATH to what we want, rather than simply appending things to the pre-existing values. This probably requires some delicate thinking but might make things simpler overall.

Revision history for this message
Dan Bungert (dbungert) wrote :
Changed in subiquity:
status: New → Fix Committed
assignee: Olivier Gayot (ogayot) → nobody
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

I think that can be closed meanwhile!
Dan/Olivier, please shout if you object ...

Changed in subiquity:
status: Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments