autopkgtest-buildvm-ubuntu-cloud fails on ppc64el in cloud-init (AttributeError: 'int' object has no attribute 'index')

Bug #1973628 reported by Heinrich Schuchardt
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
autopkgtest (Ubuntu)
Fix Released
Undecided
Paride Legovini
cloud-init (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

autopkgtest-buildvm-ubuntu-cloud -v --release jammy
fails on ppc64el (Power9).

Ubuntu release Jammy

[ 116.046504] cloud-init[639]: Cloud-init v. 22.1-14-g2e17a0d6-0ubuntu1~22.04.5 running 'init' at Mon, 16 May 2022 16:29:49 +0000. Up 114.73 seconds.
[ 116.586918] cloud-init[639]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++
[ 116.589398] cloud-init[639]: ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
[ 116.593260] cloud-init[639]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
[ 116.597580] cloud-init[639]: ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
[ 116.601717] cloud-init[639]: ci-info: | enp0s0 | True | 10.0.2.15 | 255.255.255.0 | global | 52:54:00:12:34:56 |
[ 116.609763] cloud-init[639]: ci-info: | enp0s0 | True | fec0::5054:ff:fe12:3456/64 | . | site | 52:54:00:12:34:56 |
[ 116.613491] cloud-init[639]: ci-info: | enp0s0 | True | fe80::5054:ff:fe12:3456/64 | . | link | 52:54:00:12:34:56 |
[ 116.621174] cloud-init[639]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
[ 116.623000] cloud-init[639]: ci-info: | lo | True | ::1/128 | . | host | . |
[ 116.633089] cloud-init[639]: ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
[ 116.634467] cloud-init[639]: ci-info: ++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++
[ 116.635502] cloud-init[639]: ci-info: +-------+-------------+----------+-----------------+-----------+-------+
[ 116.645176] cloud-init[639]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
[ 116.646539] cloud-init[639]: ci-info: +-------+-------------+----------+-----------------+-----------+-------+
[ 116.647573] cloud-init[639]: ci-info: | 0 | 0.0.0.0 | 10.0.2.2 | 0.0.0.0 | enp0s0 | UG |
[ 116.653043] cloud-init[639]: ci-info: | 1 | 10.0.2.0 | 0.0.0.0 | 255.255.255.0 | enp0s0 | U |
[ 116.657567] cloud-init[639]: ci-info: | 2 | 10.0.2.2 | 0.0.0.0 | 255.255.255.255 | enp0s0 | UH |
[ 116.658725] cloud-init[639]: ci-info: | 3 | 10.0.2.3 | 0.0.0.0 | 255.255.255.255 | enp0s0 | UH |
[ 116.665697] cloud-init[639]: ci-info: +-------+-------------+----------+-----------------+-----------+-------+
[ 116.669530] cloud-init[639]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[ 116.673547] cloud-init[639]: ci-info: +-------+-------------+---------+-----------+-------+
[ 116.681356] cloud-init[639]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[ 116.685558] cloud-init[639]: ci-info: +-------+-------------+---------+-----------+-------+
[ 116.689486] cloud-init[639]: ci-info: | 1 | fe80::/64 | :: | enp0s0 | U |
[ 116.697139] cloud-init[639]: ci-info: | 2 | fec0::/64 | :: | enp0s0 | Ue |
[ 116.701584] cloud-init[639]: ci-info: | 3 | ::/0 | fe80::2 | enp0s0 | UGe |
[ 116.705025] cloud-init[639]: ci-info: | 5 | local | :: | enp0s0 | U |
[ 116.713140] cloud-init[639]: ci-info: | 6 | local | :: | enp0s0 | U |
[ 116.717551] cloud-init[639]: ci-info: | 7 | multicast | :: | enp0s0 | U |
[ 116.718723] cloud-init[639]: ci-info: +-------+-------------+---------+-----------+-------+
[ 118.289675] cloud-init[639]: 2022-05-16 16:29:52,574 - util.py[WARNING]: failed stage init
[ 118.384942] cloud-init[639]: failed run of stage init
[ 118.388479] cloud-init[639]: ------------------------------------------------------------
[ 118.394363] cloud-init[639]: Traceback (most recent call last):
[ 118.400954] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 761, in status_wrapper
[ 118.402439] cloud-init[639]: ret = functor(name, args)
[ 118.408865] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 433, in main_init
[ 118.410320] cloud-init[639]: init.apply_network_config(bring_up=bring_up_interfaces)
[ 118.413886] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 868, in apply_network_config
[ 118.417500] cloud-init[639]: netcfg, src = self._find_networking_config()
[ 118.422912] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 799, in _find_networking_config
[ 118.429061] cloud-init[639]: self.paths.get_cpath("data"), "upgraded-network"
[ 118.433340] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 171, in paths
[ 118.441310] cloud-init[639]: path_info = self._extract_cfg("paths")
[ 118.444885] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 155, in _extract_cfg
[ 118.449485] cloud-init[639]: self.read_cfg()
[ 118.453739] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 235, in read_cfg
[ 118.461548] cloud-init[639]: self._cfg = self._read_cfg(extra_fns)
[ 118.465517] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 244, in _read_cfg
[ 118.466695] cloud-init[639]: base_cfg=fetch_base_config(),
[ 118.475751] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 1164, in fetch_base_config
[ 118.481629] cloud-init[639]: util.read_conf_with_confd(CLOUD_CONFIG),
[ 118.482734] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1024, in read_conf_with_confd
[ 118.493164] cloud-init[639]: confd_cfg = read_conf_d(confd)
[ 118.494505] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 998, in read_conf_d
[ 118.495545] cloud-init[639]: cfgs.append(read_conf(os.path.join(confd, fn)))
[ 118.498247] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 293, in read_conf
[ 118.509059] cloud-init[639]: return load_yaml(load_file(fname), default={})
[ 118.510386] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 915, in load_yaml
[ 118.511423] cloud-init[639]: converted = safeyaml.load(blob)
[ 118.521131] cloud-init[639]: File "/usr/lib/python3/dist-packages/cloudinit/safeyaml.py", line 31, in load
[ 118.522424] cloud-init[639]: return yaml.load(blob, Loader=_CustomSafeLoader)
[ 118.523685] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 114, in load
[ 118.529002] cloud-init[639]: return loader.get_single_data()
[ 118.533615] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 49, in get_single_data
[ 118.541072] cloud-init[639]: node = self.get_single_node()
[ 118.545062] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/composer.py", line 35, in get_single_node
[ 118.546291] cloud-init[639]: if not self.check_event(StreamEndEvent):
[ 118.553038] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/parser.py", line 98, in check_event
[ 118.557481] cloud-init[639]: self.current_event = self.state()
[ 118.558578] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/parser.py", line 142, in parse_implicit_document_start
[ 118.565059] cloud-init[639]: if not self.check_token(DirectiveToken, DocumentStartToken,
[ 118.569671] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 116, in check_token
[ 118.570796] cloud-init[639]: self.fetch_more_tokens()
[ 118.577404] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 159, in fetch_more_tokens
[ 118.581493] cloud-init[639]: self.scan_to_next_token()
[ 118.582576] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 780, in scan_to_next_token
[ 118.589024] cloud-init[639]: self.forward()
[ 118.593432] cloud-init[639]: File "/usr/lib/python3/dist-packages/yaml/reader.py", line 105, in forward
[ 118.594554] cloud-init[639]: self.index += 1
[ 118.601418] cloud-init[639]: AttributeError: 'int' object has no attribute 'index'
[ 118.605433] cloud-init[639]: ------------------------------------------------------------
...
[FAILED] Failed to start Execute cloud user/final scripts.
See 'systemctl status cloud-final.service' for details.
[ OK ] Reached target Cloud-init target.
timed out on cloud-init
qemu-system-ppc64le: terminating on signal 15 from pid 10924 (/usr/bin/python3)

Revision history for this message
Chad Smith (chad.smith) wrote :

This trace makes me think something in ppc64el image contains YAML config that our YamlDumper can't understand. I'm curious if we have access to dump of the files in /etc/cloud/cloud.cfg and /etc/cloud/cloud.cfg.d/*.cfg

Revision history for this message
Chad Smith (chad.smith) wrote :

This does not seem to affect x86_64.

In reviewing the #cloud-config user-data provided by autopkgtest-buildvm-ubuntu-cloud it is valid YAML and valid #cloud-config user-data. So, invalid YAML user-data format isn't the problem here. it is something uring reading and processing and that is leading to something trampling on the Reader instance. I have seen both stacktraces failing at line reader.py:105 int' object has no attribute 'index' and line 111 'int' object has no attribute 'column'.

I'm able to reproduce on openstack PPC64LE Jammy VMs and qemu kvm locally emulating ppc64le with the following:

sudo apt install autopkgtest qemu-utils genisoimage qemu-system-ppc
autopkgtest-buildvm-ubuntu-cloud -v --release jammy

It seems to be only an intermittent issue on first boot only for me in qemu kvm launches via virt-manager. Some boots experience it, some don't. I ran cloudinit's safeyaml.load to ensure it's not doing something terrible w.r.t. memory management as it reads YAML and things seem ok there.

I also see this failure occurring at different cloud-init boot stages. It doesn't seem confined to just the 'init' boot stage. Sometimes it happens in init-local, init or modules:config stages as al cloud-config yaml files are loaded.

Invoking the same logical path after initial boot succeeds without error 100% of the time. I can occasionally see OOMs on test ppc64le VMs in question during early boot which could lead to some symptoms due to resource contention.

An example of the OOM on ppc64le is below in cloud-init's modules:config boot stage related to locale generation. Just an indicator that the system under load with general memory contention. Not certain this is the cause of the problem.

This seems intermittent on my test envs.[ 364.196460] cloud-init[1305]: Cloud-init v. 22.2-0ubuntu1~22.04.2 running 'modules:config' at Tue, 28 Jun 2022 16:14:20 +0000. Up 362.41 seconds.
[ 365.296377] cloud-init[1305]: Generating locales (this might take a while)...
[ 380.015993] Out of memory: Killed process 1324 (localedef) total-vm:133120kB, anon-rss:118784kB, file-rss:576kB, shmem-rss:0kB, UID:0 pgtables:45kB oom_score_adj:0
[ 380.436996] cloud-init[1305]: en_US.UTF-8...
Cloud-init 22.2-0ubuntu1~22.04.2 received SIGTERM, exiting...
  Filename: /usr/lib/python3.10/subprocess.py
  Function: _try_wait
  Line number: 1896
    Filename: /usr/lib/python3.10/subprocess.py
    Function: _wait
    Line number: 1938
      Filename: /usr/lib/python3.10/subprocess.py
      Function: wait
      Line number: 1204

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Revision history for this message
Chad Smith (chad.smith) wrote :

I can confirm I see this problem fairly frequently, but would like to spend a bit more time triaging.

Revision history for this message
Chad Smith (chad.smith) wrote :

Ok OOM issues look to be the cuplrit here and YAML reads failures seem to be just a symptom of memory and resource contention.

The defaults autopkgtest-buildvm-ubuntu-cloud sets are
-s DISK SIZE 20G
--ram-size 512 (Mib)

When launching with 2 GiB of memory I no longer see any failures.

sudo autopkgtest-buildvm-ubuntu-cloud -v --release jammy --ram-size 2048 -s 30G

Maybe it's worth updating RAM defaults from 512Mib to 1024Mib as there are a lot more services running nowadays in default cloud-images and this could be resulting in intermittent memory-related errors and OOM killer interrupts.

Note I also unearthed the fairly recent commit https://salsa.debian.org/ci-team/autopkgtest/commit/e9c24258962ff1e6a528a67cc4b929e877fe401e which added and referenced a ppc64el issue as to the need for this new --ram-size param.

Doesn't appear to be a cloud-init specific bug, but cloud-init does perform YAML tokenizing and processing in early boot is a bit impactful on memory consumption in tight memory conditions eventually losing the OOM killer race.

Changed in cloud-init (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Chad Smith (chad.smith) wrote :

Marking Invalid for cloud-init proper as this seems generally a problem on ppc64le that has a workaround given the parameter defaults.

Revision history for this message
Chad Smith (chad.smith) wrote :

Also given that the comparable --ram-size param from /usr/bin/autopkgtest-virt-qemu also defaults to 1024 instead of 512. I'd suggest we minimally align this default value with the other autopkgtest script tooling.

Paride Legovini (paride)
summary: autopkgtest-buildvm-ubuntu-cloud fails on ppc64el in cloud-init
+ (AttributeError: 'int' object has no attribute 'index')
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in autopkgtest (Ubuntu):
status: New → Confirmed
Revision history for this message
Frank Heimes (fheimes) wrote :

So I'm getting further than before, using for example:
$ sudo autopkgtest-buildvm-ubuntu-cloud -r jammy -v --cloud-image-url http://cloud-images.ubuntu.com/daily/server -s 15G -a ppc64el --ram-size=8192

But the image preparation still fails (I think this is more a follow-on error):

...
Tue, 23 Aug 2022 16:29:10 +0000. Up 524.83 seconds.
[ 527.034556] cloud-init[1270]: 2022-08-23 16:29:12,713 - util.py[WARNING]: Running module ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_ssh_authkey_fingerprints.py'>) failed

[ 527.260692] cloud-init[1270]: Cloud-init v. 22.2-0ubuntu1~22.04.3 finished at Tue, 23 Aug 2022 16:29:12 +0000. Datasource DataSourceNoCloud [seed=/dev/vdb][dsmode=net]. Up 527.18 seconds
[FAILED] Failed to start Execute cloud user/final scripts.
See 'systemctl status cloud-final.service' for details.
[ OK ] Reached target Cloud-init target.
timed out on cloud-init
qemu-system-ppc64le: terminating on signal 15 from pid 3614 (/usr/bin/python3)

or in a different attempt (where I used '--ram-size=4096'):
...
[ 3532.036566] cloud-init[1393]: Failed to get unit file state for apt-daily-upgrade.timer: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 3557.272658] systemd-journald[332]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 3557.277579] cloud-init[1393]: Failed to retrieve unit state: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 3557.286250] cloud-init[1393]: apt-daily-upgrade.timer is a disabled or a static unit not running, not starting it.
timed out on cloud-init
qemu-system-ppc64le: terminating on signal 15 from pid 3452 (/usr/bin/python3)

Revision history for this message
Paride Legovini (paride) wrote :

Hi Frank, I think these last issues you're seeing are due to LP: #1988527.

Paride Legovini (paride)
Changed in autopkgtest (Ubuntu):
assignee: nobody → Paride Legovini (paride)
Revision history for this message
Paride Legovini (paride) wrote :

I am going to open an upstream MR making autopkgtest-buildvm-ubuntu-cloud default to 2GB of RAM, which is the amount autopkgtest-virt-qemu uses by default.

Revision history for this message
Paride Legovini (paride) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package autopkgtest - 5.34ubuntu2

---------------
autopkgtest (5.34ubuntu2) noble; urgency=medium

  [ Paride Legovini ]
  * d/t/unshare: skip test (exit 77) if simple unshare command fails
  * gitlab-ci: do not include the salsa pipeline

  [ Florent 'Skia' Jacquet ]
  * tools/a-b-lxd: make sure the reboot happens when needed

 -- Paride Legovini <email address hidden> Wed, 17 Apr 2024 23:39:55 +0200

Changed in autopkgtest (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.