autoinstall crash with python3[1872]: segfault

Bug #1905296 reported by eoli3n
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
subiquity
New
Undecided
Unassigned

Bug Description

I want to autoinstall ubuntu server 20.04 with new cloud-config tool.
Iso is extracted and served as nfsroot.

When i run autoinstall with qemu, autoinstall crash with:

2020-11-23 15:10:07,223 DEBUG subiquitycore.utils:83 arun_command ['systemd-cat', '--level-prefix=false', '--identifier=curtin_log.5282', '/snap/subiquity/1966/usr/bin/python3', '-m', 'curtin', '--showtrace', '-c', '/var/log/installer/subiquity-curtin-install.conf', 'install'] exited with code 3
 2020-11-23 15:10:07,224 DEBUG subiquity.lockfile:44 unlocking exclusive /run/subiquity/installing
 2020-11-23 15:10:07,224 ERROR root:39 finish: subiquity/InstallProgress/install/curtin_install: FAIL: Command '['systemd-cat', '--level-prefix=false', '--identifier=curtin_log.5282', '/snap/subiquity/1966/usr/bin/python3', '-m', 'curtin', '--showtrace', '-c', '/var/log/installer/subiquity-curtin-install.conf', 'install']' returned non-zero exit status 3.
 2020-11-23 15:10:07,224 ERROR subiquitycore.controller.installprogress:145 curtin_error
 Traceback (most recent call last):
   File "/snap/subiquity/1966/lib/python3.6/site-packages/subiquity/controllers/installprogress.py", line 304, in install
     await self.curtin_install(context=context)
   File "/snap/subiquity/1966/lib/python3.6/site-packages/subiquitycore/context.py", line 142, in decorated_async
     return await meth(self, **kw)
   File "/snap/subiquity/1966/lib/python3.6/site-packages/subiquity/controllers/installprogress.py", line 281, in curtin_install
     self.logged_command(curtin_cmd), check=True)
   File "/snap/subiquity/1966/lib/python3.6/site-packages/subiquitycore/utils.py", line 85, in arun_command
     raise subprocess.CalledProcessError(proc.returncode, cmd)
 subprocess.CalledProcessError: Command '['systemd-cat', '--level-prefix=false', '--identifier=curtin_log.5282', '/snap/subiquity/1966/usr/bin/python3', '-m', 'curtin', '--showtrace', '-c', '/var/log/installer/subiquity-curtin-install.conf', 'install']' returned non-zero exit status 3.
 2020-11-23 15:10:07,227 DEBUG subiquity.core:470 generating crash report
 2020-11-23 15:10:07,228 INFO subiquity.core:487 saving crash report 'install failed crashed with CalledProcessError' to /var/crash/1606144207.227556705.install_fail.crash
 2020-11-23 15:10:07,231 DEBUG subiquity.core:511 show_error_report '1606144207.227556705.install_fail'
 2020-11-23 15:10:07,231 ERROR root:39 finish: subiquity/InstallProgress/install: FAIL: Command '['systemd-cat', '--level-prefix=false', '--identifier=curtin_log.5282', '/snap/subiquity/1966/usr/bin/python3', '-m', 'curtin', '--showtrace', '-c', '/var/log/installer/subiquity-curtin-install.conf', 'install']' returned non-zero exit status 3.
 2020-11-23 15:10:07,232 DEBUG root:39 start: subiquity/Error/1606144207.227556705.install_fail/add_info:

Config is fully automated and reproductible: https://github.com/eoli3n/vagrant-pxe/tree/dev
cmdline, and autoinstall configs are here: https://github.com/eoli3n/vagrant-pxe/tree/dev/roles/ubuntu/files
Full crash file attached

Revision history for this message
eoli3n (eoli3neoli3n) wrote :
Revision history for this message
eoli3n (eoli3neoli3n) wrote :
Revision history for this message
eoli3n (eoli3neoli3n) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

Deep in the crash file, curtin install went fine, the post-install apt update failed with:

Command: ['/snap/bin/subiquity.subiquity-configure-apt', '/snap/subiquity/1966/usr/bin/python3', 'true']
...
Fetched 17.6 MB in 5s (3820 kB/s)
 Reading package lists...
 E: Failed to fetch file:/cdrom/dists/focal/main/binary-amd64/Packages File not found - /cdrom/dists/focal/main/binary-amd64/Packages (2: No such file or directory)
 E: Some index files failed to download. They have been ignored, or old ones used instead.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

Yes, saw that too. How do I tell the installer to fetch from nfsroot ?
At this point, /boot/efi/ is empty, so install is not completely done.
Sometimes, installer loops endlessly.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

/target/boot/efi*

Revision history for this message
eoli3n (eoli3neoli3n) wrote :
Download full text (6.5 KiB)

In the crash file, some files downloads well from cdrom. nfseroot is correctly mounted at /cdrom
and then bind mounted to /target/cdrom

When installer is booted, mount | grep cdrom gives

192.168.0.254:/var/www/focal on /cdrom type nfs (ro,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,nolock,proto=tcp,port=2049,timeo=600,retrans=3,sec=sys,local_lock=all,addr=192.168.0.254)

Here the crash file part

+ mount -t overlay overlay -o lowerdir=/target/etc/apt,workdir=/tmp/tmp.qNfhyoiBak/work,upperdir=/tmp/tmp.qNfhyoiBak/upper /target/etc/apt
 + mkdir -p /target/cdrom
 + mount --bind /cdrom /target/cdrom
 + '[' true = true ']'
 + mv /target/etc/apt/sources.list /target/etc/apt/sources.list.d/original.list
 + cat
 ++ lsb_release -sc
 + /snap/subiquity/1966/usr/bin/python3 -m curtin in-target -- apt-get update
 start: cmd-install/stage-curthooks/001-configure-apt/cmd-in-target: curtin command in-target
 Running command ['mount', '--bind', '/dev', '/target/dev'] with allowed return codes [0] (capture=False)
 Running command ['mount', '--bind', '/proc', '/target/proc'] with allowed return codes [0] (capture=False)
 Running command ['mount', '--bind', '/sys', '/target/sys'] with allowed return codes [0] (capture=False)
 Running command ['mount', '--bind', '/sys/firmware/efi/efivars', '/target/sys/firmware/efi/efivars'] with allowed return codes [0] (capture=False)
 Running command ['unshare', '--help'] with allowed return codes [0] (capture=True)
 Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/target', 'apt-get', 'update'] with allowed return codes [0] (capture=False)
 Get:1 file:/cdrom focal InRelease
 Ign:1 file:/cdrom focal InRelease
 Get:2 file:/cdrom focal Release [1486 B]
 Get:2 file:/cdrom focal Release [1486 B]
 Get:3 file:/cdrom focal Release.gpg [833 B]
 Get:3 file:/cdrom focal Release.gpg [833 B]
 Hit:4 http://fr.archive.ubuntu.com/ubuntu focal InRelease
 Get:5 http://fr.archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
 Get:6 http://fr.archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
 Get:7 http://fr.archive.ubuntu.com/ubuntu focal-security InRelease [109 kB]
 Get:8 file:/cdrom focal/main amd64 Packages [18.7 kB]
 Ign:8 file:/cdrom focal/main amd64 Packages
 Get:8 file:/cdrom focal/main amd64 Packages [18.7 kB]
 Ign:8 file:/cdrom focal/main amd64 Packages
 Get:8 file:/cdrom focal/main amd64 Packages [65.1 kB]
 Ign:8 file:/cdrom focal/main amd64 Packages
 Get:8 file:/cdrom focal/main amd64 Packages [65.1 kB]
 Err:8 file:/cdrom focal/main amd64 Packages
   File not found - /cdrom/dists/focal/main/binary-amd64/Packages (2: No such file or directory)
 Get:9 http://fr.archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [22.0 kB]
 Get:10 http://fr.archive.ubuntu.com/ubuntu focal/restricted Translation-en [6212 B]
 Get:11 http://fr.archive.ubuntu.com/ubuntu focal/restricted amd64 c-n-f Metadata [392 B]
 Get:12 http://fr.archive.ubuntu.com/ubuntu focal/universe amd64 Packages [8628 kB]
 Get:13 http://fr.archive.ubuntu.com/ubuntu focal/universe Translation-en [5124 kB]
 Get:14 http://fr.archive.ubuntu.com/ubuntu focal/universe amd64 c-n-f Metadata [265 kB]
 Get:15 http://fr.archive.ubuntu.com/u...

Read more...

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

I just reproduced, installer crashed then i check:
the file /cdrom/dists/focal/main/binary-amd64/Packages doesn't exist
the file /cdrom/dists/focal/main/binary-amd64/Packages.gz does.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

The Realease file contains
 44fa689d816504f1c9ba27b08d46f25a4897a4f21ced7921f5a8530e32ce0b60 95 main/binary-amd64/Release
 16845fa97086e82e6a77c54c0cc5cbfbcc0dd6498639c53e3fda35238bb6e5ad 18748 main/binary-amd64/Packages.gz
 b933d017be7d0dd1866f1c5fd50eb4f8bc530d5ffc3be29ba1bdb39448d8b32f 65067 main/binary-amd64/Packages

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

Some boots, installer enter in a endless loop, can't find what triggers this.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

When i run after the crash
chroot /target apt -o Debug::Acquire::cdrom=true update

See attached screenshot
When i paste and redirect all output with &> , the "_apt Permission denied" one doesn't appears

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

the _apt thing seems to be a simple warning over a file which even doesn't exist.
On the server if i notify .gz file, client does access it before crashing

root@pxe-server:/tftpboot# inotifywait /exports/focal/dists/focal/main/binary-amd64/Packages.gz
Setting up watches.
Watches established.
/exports/focal/dists/focal/main/binary-amd64/Packages.gz ACCESS

Revision history for this message
TJ (tj) wrote :

Is apt-update in the PID namespace running as user _apt or user root? Your read tests (that we've been doing on IRC #ubuntu-server) were as root user and were successful.

As this is over an NFS share user IDs, mappings, and file-system permissions may enter into this, although as we see inotify report an ACCESS that argues against a simple NFS issue here.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So it does seem that the Release files in the pool reference both Packages and Packages.gz but only the Packages.gz is actually there. Most of the time apt downloads the Packages.gz and ignores the Packagesm which is why this hasn't been noticed until now. For some reason your apt is only downloading the Packages files from all sources -- even the network sources:

 Get:9 http://fr.archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [22.0 kB]

I have no idea why it would be doing that. So, we should fix the pool, but there is something else going on here.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Oh I think I misread the logs there, it's actually downloading the Packages.xz file (judging by file sizes) but not reporting this in the logs. But still confused otherwise.

Revision history for this message
Julian Andres Klode (juliank) wrote :

To identify what is going on pass

-o debug::acquire::cdrom=1 (maybe s/cdrom/http/ too)
-o debug::PkgAcquire::worker=1

Looking at the CD log, apt tried to find the compressed files, but they were not available or accessible so it tried the uncompressed one and gave up when it couldn't find them.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

I already gave -o debug::acquire::cdrom=1 (maybe s/cdrom/http/ too)
Please see upper.

Revision history for this message
TJ (tj) wrote :

Cause of this is user error.

The NFS export on the server did not allow non-root (UID 0) to traverse or read the exported file-system.

Fix on server is:

sudo find /exports/focal/dists /exports/focal/pool -type d -execdir chmod go+x {} \;
sudo find /exports/focal/dists /exports/focal/pool -type f -execdir chmod go+r {} \;

Revision history for this message
TJ (tj) wrote :

I forgot to add; the hypothesis is apt dropped to using the _apt user on the client and therefore could no longer read the /exports/focal/ sub-directories or files.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

Good catch ! But is there any fix nfs-server side ?
Because maybe some other users will drop into this

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

I mean in export config ? To be able to just extract the iso without tweaking anything else.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

The fix is not working :

vagrant ssh server -c "sudo find /exports/focal/dists /exports/focal/pool -type d -execdir chmod go+x {} \; && sudo find /exports/focal/dists /exports/focal/pool -type f -execdir chmod go+r {} \;"

I get the same behaviour when booting client.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

chmod og+x /exports/focal fixed the fix :)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So I think there are a few things going on here.

One is that to make nfsroot work you need to allow non-root to connect. We could probably arrange things for this not to be true (by preventing apt from setuiding to _apt?) or maybe it should just be documented.

Another is that we have Release files for the pool that reference uncompressed indices but not the uncompressed indices themselves.

Yet another is that the failure mode was so very obscure. I want to do better, upfront, validation of the supplied apt config anyway, which would help some failures that are a bit like this but TBH I would probably have assumed that the local config was OK and not validated that... Even when you've read enough logs to know that it's "apt update" that is failing, the way apt fails is very confusing, although if we fixed the pool stuff it might have been a bit less confusing.

Finally, why on earth does creating the error report after the install has failed make python segfault??

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

"Allow non-root to connect"
As root mounts, there is actually no non-root connection but only non-root access on the nfs mount.
So unless perms are fixed with chmod a+rw * on isos, i would need to apply it myself.

You can continue to work on this with vagrant-pxe, I merged dev branch into master, lets use master.

Revision history for this message
eoli3n (eoli3neoli3n) wrote :

(by non-root connection I mean "mount")

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.