[ppc64el] curtin block-discover fails to generate storage-config for partitions on raids

Bug #1853018 reported by Paride Legovini on 2019-11-18
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
Medium
Unassigned
subiquity
Undecided
Unassigned

Bug Description

Premise: I don't think this is really specific to ppc64el, but as I'm not sure of what the problem is I'll start with describing what I observe, then let's try to draw the right conclusions.

When trying to perform an automated raid1 installation with the answers.yaml [1] by using virt-install like this:

virt-install --name isotest-raid1 --os-variant ubuntu18.04 --noautoconsole --wait -1 --memory 2048 --disk bus=virtio,cache=unsafe,size=6 --disk bus=virtio,cache=unsafe,size=6 --disk disk3.qcow2,bus=virtio,cache=unsafe,size=6 --disk vol=answers.img,bus=virtio,readonly=on --cdrom /path/to/focal-live-server-ppc64el.iso

the install fails printing the following errors in the console where subiquity is running:

ERROR: hook /usr/share/apport/general-hooks/powerpc.py crashed:
Traceback (most recent call last):
File "/snap/subiquity/1335/usr/lib/python3/dist-packages/apport/report.py", line 198, in _run_hook
symb['add_info'](report, ui)
File "/usr/share/apport/general-hooks/powerpc.py", line 42, in add_info
is_kernel = report['ProblemType'].startswith('Kernel') or 'linux' in report.get('Package')
TypeError: argument of type 'NoneType' is not iterable

which seems to be an error with the call to apport. This is probably by itself a bug. Looking in /var/log/install, there are 3 subiquity log files:

$ du -h subiquity-debug.log.*
608K subiquity-debug.log.2167
56K subiquity-debug.log.2180
20K subiquity-debug.log.4033

The bigger one is likely the one that actually did the installation. Looking at what's in it, the first thing I notice is:

2019-11-17 11:24:36,725 ERROR subiquity:139 Failed to lock auto answers file, proceding without it.
Traceback (most recent call last):
  File "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/cmd/tui.py", line 136, in main
    fcntl.flock(opts.answers, fcntl.LOCK_EX | fcntl.LOCK_NB)
BlockingIOError: [Errno 11] Resource temporarily unavailable

so this subiquity seems to have lost the "race" lock the answers.yaml, but then did a lot of things. Strange. At the very end of the log there are two tracebacks, probably the true reason of the failure. The subiquity-debug.log.2167 log file shows very similar tracebacks.

Interesting data points:

 - The very same installation succeeds on amd64;
 - An automated, non-raid installation done on ppc64el
   using virt-install in the same way suceeds;
 - A non-automated install with the same setup on
   ppc64el also succeeds (!).

Attached: the full /var/log of the installer system.

Paride

[1] http://paste.ubuntu.com/p/xvKtK5bY5P/

Related branches

Paride Legovini (legovini) wrote :
Download full text (3.7 KiB)

   Reading package lists...
        E: Failed to fetch
http://us.archive.ubuntu.com/ubuntu/dists/focal/main/binary-ppc64el/Packages
404 Not Found [IP: 91.189.91.23 80]
        E: Some index files failed to download. They have been ignored, or
old ones used instead.

Failure during apt-get update command,

+ /snap/subiquity/1335/usr/bin/python3 -m curtin in-target -- apt-get update

On Mon, Nov 18, 2019 at 10:10 AM Paride Legovini <
<email address hidden>> wrote:

> Public bug reported:
>
> Premise: I don't think this is really specific to ppc64el, but as I'm
> not sure of what the problem is I'll start with describing what I
> observe, then let's try to draw the right conclusions.
>
>
> When trying to perform an automated raid1 installation with the
> answers.yaml [1] by using virt-install like this:
>
> virt-install --name isotest-raid1 --os-variant ubuntu18.04
> --noautoconsole --wait -1 --memory 2048 --disk
> bus=virtio,cache=unsafe,size=6 --disk bus=virtio,cache=unsafe,size=6
> --disk disk3.qcow2,bus=virtio,cache=unsafe,size=6 --disk
> vol=answers.img,bus=virtio,readonly=on --cdrom /path/to/focal-live-
> server-ppc64el.iso
>
> the install fails printing the following errors in the console where
> subiquity is running:
>
>
> ERROR: hook /usr/share/apport/general-hooks/powerpc.py crashed:
> Traceback (most recent call last):
> File
> "/snap/subiquity/1335/usr/lib/python3/dist-packages/apport/report.py", line
> 198, in _run_hook
> symb['add_info'](report, ui)
> File "/usr/share/apport/general-hooks/powerpc.py", line 42, in add_info
> is_kernel = report['ProblemType'].startswith('Kernel') or 'linux' in
> report.get('Package')
> TypeError: argument of type 'NoneType' is not iterable
>
>
> which seems to be an error with the call to apport. This is probably by
> itself a bug. Looking in /var/log/install, there are 3 subiquity log files:
>
> $ du -h subiquity-debug.log.*
> 608K subiquity-debug.log.2167
> 56K subiquity-debug.log.2180
> 20K subiquity-debug.log.4033
>
> The bigger one is likely the one that actually did the installation.
> Looking at what's in it, the first thing I notice is:
>
>
> 2019-11-17 11:24:36,725 ERROR subiquity:139 Failed to lock auto answers
> file, proceding without it.
> Traceback (most recent call last):
> File
> "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/cmd/tui.py",
> line 136, in main
> fcntl.flock(opts.answers, fcntl.LOCK_EX | fcntl.LOCK_NB)
> BlockingIOError: [Errno 11] Resource temporarily unavailable
>
>
> so this subiquity seems to have lost the "race" lock the answers.yaml, but
> then did a lot of things. Strange. At the very end of the log there are two
> tracebacks, probably the true reason of the failure. The
> subiquity-debug.log.2167 log file shows very similar tracebacks.
>
> Interesting data points:
>
> - The very same installation succeeds on amd64;
> - An automated, non-raid installation done on ppc64el
> using virt-install in the same way suceeds;
> - A non-automated install with the same setup on
> ppc64el also succeeds (!).
>
> Attached: the full /var/log of the installer system.
>
> Paride
>
>
> [1] http://paste.ubuntu.com/p/xvKtK5bY5P/
>
> ** A...

Read more...

Right! That was pretty obvious from the curtin logs, less so from the subiquity logs. The problem was that this was on ppc64el and I was using archive.ubuntu.com instead of ports.ubuntu.com.

Switching to the correct mirror allows the installation to complete, *however* the tracebacks are still there, e.g.:

2019-11-18 23:03:54,528 ERROR block-discover:210 load_probe_data failed restricted=False
Traceback (most recent call last):
  File "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/controllers/filesystem.py", line 207, in _probe_done
    self.model.load_probe_data(probe.result)
  File "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/models/filesystem.py", line 1378, in load_probe_data
    self.reset()
  File "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/models/filesystem.py", line 1202, in reset
    config = storage_config.extract_storage_config(self._probe_data)
  File "/snap/subiquity/1335/lib/python3.6/site-packages/curtin/storage_config.py", line 1249, in extract_storage_config
    tree = get_config_tree(cfg.get('id'), final_config)
  File "/snap/subiquity/1335/lib/python3.6/site-packages/curtin/storage_config.py", line 274, in get_config_tree
    for dep in find_item_dependencies(item, sconfig):
  File "/snap/subiquity/1335/lib/python3.6/site-packages/curtin/storage_config.py", line 244, in find_item_dependencies
    _validate_dep_type(item_id, dep_key, dep, config)
  File "/snap/subiquity/1335/lib/python3.6/site-packages/curtin/storage_config.py", line 192, in _validate_dep_type
    'Invalid dep_id (%s) not in storage config' % dep_id)
ValueError: Invalid dep_id (raid-md1p1) not in storage config

I can attach the logs of an installation with the correct mirror set up.

It would be to add a "no tracebacks in the installer logs" step in the ISO testing jobs, what do you think?

Paride Legovini (legovini) wrote :

Attached: the installer log of a successful installation, but with several Tracebacks in them.

summary: - [autoinstall] raid1 install fails on ppc64el
+ raid1 install on ppc64el hits Tracebacks

On Tue, Nov 19, 2019 at 7:40 AM Paride Legovini <
<email address hidden>> wrote:

> Attached: the installer log of a successful installation, but with
> several Tracebacks in them.
>
> ** Attachment added: "installer_logs.tar.gz"
>
> https://bugs.launchpad.net/subiquity/+bug/1853018/+attachment/5306471/+files/installer_logs.tar.gz
>
> ** Summary changed:
>
> - [autoinstall] raid1 install fails on ppc64el
> + raid1 install on ppc64el hits Tracebacks
>
>
The second traceback, on 'probe_data missing bcache data' is known, and
there's a branch
for it:

 https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/375463

The first trace is related to partitions on a raid. Let's rename this bug
for this
curtin block-discover fails to generate storage-config for partitions on
raids.

--
> You received this bug notification because you are subscribed to
> subiquity.
> Matching subscriptions: subiquity-bugs
> https://bugs.launchpad.net/bugs/1853018
>
> Title:
> raid1 install on ppc64el hits Tracebacks
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/subiquity/+bug/1853018/+subscriptions
>

Paride Legovini (legovini) wrote :

Thanks Ryan. I renamed the bug, adding a [ppc64el] tag because I'm not getting those tracebacks on amd64.

summary: - raid1 install on ppc64el hits Tracebacks
+ [ppc64el] curtin block-discover fails to generate storage-config for
+ partitions on raids
Paride Legovini (legovini) wrote :

And neither on arm64, now that I tested it.

Ryan Harper (raharper) on 2019-11-20
Changed in curtin:
importance: Undecided → Medium
status: New → In Progress

This bug is fixed with commit 8c4b2252 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=8c4b2252

Changed in curtin:
status: In Progress → Fix Committed
Changed in subiquity:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers