subiquity crash while detecting disks on arm64

Bug #1961628 reported by Alexandre Erwin Ittner
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
subiquity
Fix Committed
Undecided
Unassigned

Bug Description

Got following backtracke while installing in an arm64.

Machine is a 64-core TaiShan 2280, 128 GiB RAM, 8 TB HDD, installing 20.04.4 HWE from an USB stick.
Disk already had a 20.0.4 with GA kernel installed.
Installer booted from USB, started on console then migrated to SSH install.
subiquity crashed before I had a chance to chose the target disk.
Network and internet access available without proxies during the entire test.

SourcePackage: subiquity
Title: Installer UI crashed with AttributeError
Traceback:
 Traceback (most recent call last):
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/client/client.py", line 404, in run
     super().run()
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/tui.py", line 374, in run
     super().run()
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/core.py", line 135, in run
     raise exc
   File "/snap/subiquity/3095/usr/lib/python3.8/asyncio/events.py", line 81, in _run
     self._context.run(self._callback, *self._args)
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/async_helpers.py", line 26, in _done
     fut.result()
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/common/errorreport.py", line 243, in upload
     oops_id = await run_in_thread(_bg_upload)
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/async_helpers.py", line 46, in run_in_thread
     return await loop.run_in_executor(None, func, *args)
   File "/snap/subiquity/3095/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
     result = self.fn(*self.args, **self.kwargs)
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/common/errorreport.py", line 230, in _bg_upload
     data = bson.BSON().encode(for_upload)
 AttributeError: module 'bson' has no attribute 'BSON'

There was an additional related backtrace while using option "Submit crash log to Canonical", with the same failure on call star (no bson.BSON).

Result of the session where I called the installer from SSH:

ittner@olwe:~$ ssh -opubKeyAuthentication=no installer@10.228.68.107
The authenticity of host '10.228.68.107 (10.228.68.107)' can't be established.
ECDSA key fingerprint is SHA256:OqsC6W875vEAMgIsPY3F/oLdB912N3hNvC/+muckwGw.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.228.68.107' (ECDSA) to the list of known hosts.
installer@10.228.68.107's password:
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-30-generic aarch64)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/advantage

  System information as of Mon Feb 21 17:47:34 UTC 2022

  System load: 0.1 Processes: 782
  Usage of /home: unknown Users logged in: 0
  Memory usage: 1% IPv4 address for enP10p17s0f1np1: 10.228.68.59
  Swap usage: 0% IPv4 address for enahisic2i0: 10.228.68.107

0 updates can be applied immediately.

Last login: Mon Feb 21 14:29:02 2022 from 10.172.68.66
connecting...
generating crash report
report saved to /var/crash/1645465682.378801107.ui.crash
Traceback (most recent call last):
  File "/snap/subiquity/3095/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/snap/subiquity/3095/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/__main__.py", line 5, in <module>
    sys.exit(main())
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/cmd/tui.py", line 149, in main
    subiquity_interface.run()
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/client/client.py", line 404, in run
    super().run()
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/tui.py", line 374, in run
    super().run()
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/core.py", line 135, in run
    raise exc
  File "/snap/subiquity/3095/usr/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/async_helpers.py", line 26, in _done
    fut.result()
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/common/errorreport.py", line 243, in upload
    oops_id = await run_in_thread(_bg_upload)
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/async_helpers.py", line 46, in run_in_thread
    return await loop.run_in_executor(None, func, *args)
  File "/snap/subiquity/3095/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/common/errorreport.py", line 230, in _bg_upload
    data = bson.BSON().encode(for_upload)
AttributeError: module 'bson' has no attribute 'BSON'
Connection to 10.228.68.107 closed.

Tags: iso-testing
Revision history for this message
Alexandre Erwin Ittner (aittner) wrote :
description: updated
Revision history for this message
Alexandre Erwin Ittner (aittner) wrote :
description: updated
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Hm that looks like the snap is busted somehow :(

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Tried arm64 ISO for 20.04.4 ISO on Hisilicon d05 and failed too. Here is the python traceback.

report saved to /var/crash/1645517951.896736860.ui.crash
Traceback (most recent call last):
  File "/snap/subiquity/3095/usr/bin/subiquity", line 8, in <module>
    sys.exit(main())
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/cmd/tui.py", line 149, in main
    subiquity_interface.run()
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/client/client.py", line 404, in run
    super().run()
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/tui.py", line 374, in run
    super().run()
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/core.py", line 135, in run
    raise exc
  File "/snap/subiquity/3095/usr/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/async_helpers.py", line 26, in _done
    fut.result()
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/common/errorreport.py", line 243, in upload
    oops_id = await run_in_thread(_bg_upload)
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/async_helpers.py", line 46, in run_in_thread
    return await loop.run_in_executor(None, func, *args)
  File "/snap/subiquity/3095/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/common/errorreport.py", line 230, in _bg_upload
    data = bson.BSON().encode(for_upload)
AttributeError: module 'bson' has no attribute 'BSON'

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So this crash is trying to upload an error report. I don't know why it's failing like this but I'd also like to see the error report it's trying to upload -- can you dig it out of /var/crash? It should be saved to the usb stick too.

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1961628

tags: added: iso-testing
Revision history for this message
Alexandre Erwin Ittner (aittner) wrote :

Just adding the full logs for this bug. As by the time I had already started testing another isos, this come from a new deployment of the same ISO (20220218.1, SHA1 3b8bf2b51ea933c600f57bdd6e37abcf54fcd2bf) in the same machine.

The disk detection error seems the one from the following stack trace, while the one in the original message is the reporting script failing. Should we split the bug?

 2022-02-22 17:58:16,505 ERROR root:39 finish: subiquity/Filesystem/_probe/probe_once: FAIL: cancelled
 2022-02-22 17:58:16,505 ERROR root:39 finish: subiquity/Filesystem/_probe: FAIL: cancelled
 2022-02-22 17:58:18,509 INFO curtin:1322 Validating extracted storage config components
 2022-02-22 17:58:18,585 ERROR root:39 finish: subiquity/Filesystem/_probe/probe_once: FAIL: Invalid dep_id (disk-dm-0) not in storage config
 2022-02-22 17:58:18,588 ERROR block-discover:433 block probing failed restricted=False
 Traceback (most recent call last):
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/server/controllers/filesystem.py", line 427, in _probe
     await asyncio.wait_for(self._probe_once_task.task, 15.0)
   File "/snap/subiquity/3095/usr/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
     return fut.result()
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquitycore/context.py", line 148, in decorated_async
     return await meth(self, **kw)
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/server/controllers/filesystem.py", line 412, in _probe_once
     self.model.load_probe_data(storage)
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/models/filesystem.py", line 1287, in load_probe_data
     self.reset()
   File "/snap/subiquity/3095/lib/python3.8/site-packages/subiquity/models/filesystem.py", line 965, in reset
     self._orig_config = storage_config.extract_storage_config(
   File "/snap/subiquity/3095/lib/python3.8/site-packages/curtin/storage_config.py", line 1346, in extract_storage_config
     tree = get_config_tree(cfg.get('id'), final_config)
   File "/snap/subiquity/3095/lib/python3.8/site-packages/curtin/storage_config.py", line 306, in get_config_tree
     for dep in find_item_dependencies(item, sconfig):
   File "/snap/subiquity/3095/lib/python3.8/site-packages/curtin/storage_config.py", line 276, in find_item_dependencies
     _validate_dep_type(item_id, dep_key, dep, config)
   File "/snap/subiquity/3095/lib/python3.8/site-packages/curtin/storage_config.py", line 223, in _validate_dep_type
     raise ValueError(
 ValueError: Invalid dep_id (disk-dm-0) not in storage config
 2022-02-22 17:58:18,600 INFO subiquitycore.common.errorreport:406 saving crash report 'block probing crashed with ValueError' to /var/crash/1645552698.598547697.block_probe_fail.crash
 2022-02-22 17:58:18,602 INFO root:39 start: subiquity/ErrorReporter/1645552698.598547697.block_probe_fail/add_info:
InterpreterPath: /snap/subiquity/3095/usr/bin/python3.8

Full logs attached.

Dan Bungert (dbungert)
Changed in subiquity:
status: New → In Progress
Revision history for this message
Alexandre Erwin Ittner (aittner) wrote :

Another data-point: it seems to only happen if prober detects another install on the disk. I deleted all partitions, tested again and it passed.
This also explains why my first attempt with GA worked; it's not related with GA/HWE differences at all.

It seems LP #1961829 and LP #1961640 are all related, independent of any differences in disk types.

Revision history for this message
Dan Bungert (dbungert) wrote :

@Alexandre - your analysis matches ours. We have os-prober running and I think it's causing some issues. We've backed out os-prober usage for now as it is not required.

Are you able to retest with a kernel option on the command line? If you pass subiquity-channel=edge that should pick up our proposed fix. Or if you can get to a command line in the install environment you can try `sudo snap refresh subiquity --edge`.

Revision history for this message
Alexandre Erwin Ittner (aittner) wrote :

@dbungert: tried with subiquity 22.02.1+git5.bd4da8b0 (3108); installer crashed several times, allowing me to advance the install until a random point every time. Then it crashed for real and prevented any progress.

I'm attaching everything I got from /var/log after a few crashes.

Now I noticed that 22.02.1+git7.b67c0b7d is now available. I'm currently testing again.

(accidentally posted this in LP #1961640 too...)

Revision history for this message
Alexandre Erwin Ittner (aittner) wrote :

And now the logs from subiquity 22.02.1+git7.b67c0b7d (3116). It has the same behavior.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think the issue with testing this way is that the initial version of the snap still runs os-prober and that messes you up. We need to get you an iso with the right version of the snap on it to start with.

Revision history for this message
Alexandre Erwin Ittner (aittner) wrote :

Just finished testing 20220223 (sha1sum d0c92501647921086a1f81c6b4594a4e65baf1ec). The fix worked!
Could not test the code to submit crash reports, however.

Dan Bungert (dbungert)
Changed in subiquity:
status: In Progress → Fix Committed
Revision history for this message
Dan Bungert (dbungert) wrote :

Thanks Alexandre.

The fix to allow the point release to move forward was to not have Subiquity use os-prober temporarily. We have addressed that and are now using a newer version of os-prober which should not have the device mapper problems seen in this LP. When you get a moment, can you re-verify?

Revision history for this message
Taihsiang Ho (tai271828) wrote :

This issue seems related, but I did not reproduce it anymore with the released 20.04.4 https://bugs.launchpad.net/subiquity/+bug/1961829

Revision history for this message
Patricia Domingues (patriciasd) wrote :

Alexandre, Tai, did you test it with subiquity from edge `snap refresh subiquity --edge`?

Revision history for this message
Alexandre Erwin Ittner (aittner) wrote :

@Patricia, @Tai: Tested with a version from a few days ago and seems Ok, but I will redo the test with the newer versions.

@dbungert: There will be a "testing iso" that enables os-prober from the start, the same way it did before being disabled? By now it seems to only load from the refreshed snap. I'm not sure if this can have some undesired effect when the snap is replaced mid-install...

Revision history for this message
Taihsiang Ho (tai271828) wrote :

@Patricia, thanks for the heads up. No, I simply re-tested with the released iso in my comment#15.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.