Victoria -> Wallaby openstack-upgrade fails with "Command '['apt-get'] ' returned non-zero exit status 100." and apt gets into "Try 'apt --fix-broken install'"

Bug #2068109 reported by Aliaksandr Vasiuk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
New
Undecided
Unassigned

Bug Description

Hi,

Environment is:
* Focal-Victoria cloud during upgrade to Focal-Wallaby
* Juju 2.9.45
* All charms were refreshed to their victoria/latest on the 29th of May.
* The juju status --relations output: https://pastebin.ubuntu.com/p/p4SNTnnNpk/
* Nova-compute revision after refresh to wallaby/stable is 726
* The sanitized bundle is here: https://pastebin.canonical.com/p/4fJ3JXrkTR/

Cloud is under upgrade from Victoria to Wallaby. I'm doing one by one upgrades.
1. Control plane is all on Wallaby, including nova-cloud-controller
2. Set `juju config nova-compute action-managed-upgrade=true`
3. Refreshed `nova-compute` to 'wallaby/stable'
4. `dist-upgrade` on the node
5. Set `juju config nova-compute openstack-origin="cloud:focal-wallaby"`
6. Ran `juju run-action --wait nova-compute/XX pause`
7. Ran `juju run-action --wait nova-compute/XX openstack-upgrade`

Two out of three nodes fail `openstack-upgrade`
```
$ juju show-action-status 7259
actions:
- action: openstack-upgrade
  completed at: "2024-06-05 06:32:57"
  id: "7259"
  status: failed
  unit: nova-compute/2

$ juju show-action-output 7259
...
  outcome: upgrade failed, see traceback.
  traceback: |
    Traceback (most recent call last):
      File "/var/lib/juju/agents/unit-nova-compute-2/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1445, in do_action_openstack_upgrade
        upgrade_callback(configs=configs)
      File "/var/lib/juju/agents/unit-nova-compute-2/charm/hooks/nova_compute_utils.py", line 778, in do_openstack_upgrade
        apt_upgrade(options=dpkg_opts, fatal=True, dist=True)
      File "/var/lib/juju/agents/unit-nova-compute-2/charm/hooks/charmhelpers/fetch/ubuntu.py", line 399, in apt_upgrade
        _run_apt_command(cmd, fatal)
      File "/var/lib/juju/agents/unit-nova-compute-2/charm/hooks/charmhelpers/fetch/ubuntu.py", line 963, in _run_apt_command
        _run_with_retries(
      File "/var/lib/juju/agents/unit-nova-compute-2/charm/hooks/charmhelpers/fetch/ubuntu.py", line 940, in _run_with_retries
        result = subprocess.check_call(cmd, env=env, **kwargs)
      File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['apt-get', '--assume-yes', '--option', 'Dpkg::Options::=--force-confnew', '--option', 'Dpkg::Options::=--force-confdef', 'dist-upgrade']' returned non-zero exit status 100.
```

The full output of the failed action is: https://pastebin.ubuntu.com/p/z852YxzwVC/

Unit log during the upgrade is: https://pastebin.ubuntu.com/p/YgX7xG6x8g/

Workaround was to complete the upgrade manually:
```
sudo apt install --fix-broken --option Dpkg::Options::=--force-confnew --option Dpkg::Options::=--force-confdef

sudo apt upgrade --option Dpkg::Options::=--force-confnew --option Dpkg::Options::=--force-confdef
```
Hosts work well after that, I cannot detect any obvious issue.

I suspect there are some broken dependencies in apt repos for the upgrade. To make it easier to troubleshoot, I collected a nova-wallaby-upgrade-snapshots.zip file, it has some logs and apt packages states for two nodes where I faced the issue.
* packages-*.txt - contains `dpkg -l`
* upgradable-*.txt - contains `apt list --upgradable`
* sources-*.txt - contains `grep ^ /etc/apt/sources.list /etc/apt/sources.list.d/*`
* apt-history.log - a copy of /var/log/apt/history.log
* *-before-the-upgrade.txt - means I took it before the upgrade, after step 3 from the upgrade scenario on top of the bug report
* *-before-the-upgrade.txt - means I took it before the upgrade, after step 3 from the upgrade scenario on top of the bug report
* *-during-fix-broken.txt - means I took it right after `openstack-upgrade` action failed
* *-after-fix-broken.txt - means I took it after `apt install --fix-broken`
* *-after-upgade.txt - means I took it after manual `apt upgrade`
* fix-broken-output.txt contains output of manual `apt install --fix-broken` run

Revision history for this message
Aliaksandr Vasiuk (valexby) wrote :
Revision history for this message
Aliaksandr Vasiuk (valexby) wrote :

Sorry for my lack of expertise in apt dependency management. I'm attaching also /var/log/apt/eipp.log for both nodes. Maybe it will be useful.

Revision history for this message
Aliaksandr Vasiuk (valexby) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.