local-provider precise failed to upgrade

Bug #1424777 reported by Curtis Hovey
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Dimiter Naydenov
1.22
Fix Released
Critical
Dimiter Naydenov

Bug Description

As of commit 2ba3166, the local-provider upgrade tests fail on precise. We see the agents are downloaded and started, but they never call the state-server. The env says they are still 1.21.2, but the log does show 1.23 is started.

The last passing commit was 570cda4. The suspect commits are:
commit c555742 Merge pull request #1634 from waigani/ha-bootstrap-env-only …
commit 9141c96 Merge pull request #1598 from waigani/sortout-env-destroy-cmd …
commit f223dd1 Merge pull request #1625 from waigani/clean-envusers-on-destroy …
commit cb8a990 Merge pull request #1648 from axw/tag-s-disktag-volumetag …
commit 9d172de Merge pull request #1640 from axw/state-storage-lifecycle …
commit 2ba3166 Merge pull request #1626 from wallyworld/rootfs-storage-provider …

As we can see from the logs of various runs, the upgrades are in progress when the 10 minute time out arrived. A hack was made to extend the upgrade time to 20 minutes. The test passed in 18 minutes. Recent changes have increased the needed time for local precise upgrades.

The question is what to do? Does juju need to be faster? Should the time out be extended? MAAS has a timeout set to 30 minutes because is is very slow.

Curtis Hovey (sinzui)
Changed in juju-ci-tools:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I can confirm even local deploy from a trusty machine to a precise lxc container in 1.22 is broken:

E: Command line option --target-release precise-updates/cloud-tools cloud-utils cloud-image-utils is not understood
2015-02-24 09:50:32,914 - cc_apt_update_upgrade.py[WARNING]: Failed to install packages: ['--target-release precise-upda
tes/cloud-tools cloud-utils cloud-image-utils', 'curl cpu-checker bridge-utils rsyslog-gnutls']
2015-02-24 09:50:32,916 - __init__.py[WARNING]: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cloudinit/CloudConfig/__init__.py", line 117, in run_cc_modules
    cc.handle(name, run_args, freq=freq)
  File "/usr/lib/python2.7/dist-packages/cloudinit/CloudConfig/__init__.py", line 78, in handle
    [name, self.cfg, self.cloud, cloudinit.log, args])
  File "/usr/lib/python2.7/dist-packages/cloudinit/__init__.py", line 327, in sem_and_run
    func(*args)
  File "/usr/lib/python2.7/dist-packages/cloudinit/CloudConfig/cc_apt_update_upgrade.py", line 126, in handle
    raise errors[0]
CalledProcessError: Command '['apt-get', '--option', 'Dpkg::Options::=--force-confold', '--assume-yes', 'install', '--ta
rget-release precise-updates/cloud-tools cloud-utils cloud-image-utils', 'curl cpu-checker bridge-utils rsyslog-gnutls']
' returned non-zero exit status 100

2015-02-24 09:50:32,916 - __init__.py[ERROR]: config handling of apt-update-upgrade, None, [] failed

2015-02-24 09:50:32,923 - cloud-init-cfg[ERROR]: errors running cloud_config [config]: ['apt-update-upgrade']
errors running cloud_config [config]: ['apt-update-upgrade']

I'm working on a fix for 1.22 and 1.23 - all deploys to precise (lxc or not) are affected, bootstrap to precise is fine though.

Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Dimiter Naydenov (dimitern)
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Found a working solution. The root of the problem is cloud-init on precise (0.6.3) interpreting badly apt-get install commands for packages that need --target-release precise-updates/cloud-tools. So the changes I did include:
1. Always set apt-get-update to true in cloud-init userdata when the series is precise (otherwise neither the cloud-tools archive will be added nor any packages will be installed)
2. Add "--target-release", "precise-updates/cloud-tools", "<package-name>" as separate entries in the packages section of the generated cloud-init userdata for those packages that need it (e.g. cloud-utils, cloud-image-utils).

After doing the above I managed to successfully bootstrap a trusty local environment and then manually add machines with series: precise, quantal, raring, saucy, trusty, utopic, and vivid. All of them succeeded, despite giving warnings for not finding some archive index files for quantal and raring (now past EOL I guess).

I'm continuing to test the same fix with a local environment bootstrapped on precise.

Curtis Hovey (sinzui)
no longer affects: juju-ci-tools
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I've successfully tested a precise local environment and found a few more things needed for the fix: even though precise cloud-init needs "--target-release" and "precise-updates/cloud-tools" listed as separate "packages", along the package that needs cloud-tools, cloudinit/sshinit needed to be modified to convert the package list properly, i.e.

packages:
- --target-release
- precise-updates/cloud-tools
- cloud-utils
- curl
- bridge-utils

becomes:

Installing package: --target-release precise-updates/cloud-tools cloud-utils
Installing package: curl
Installing package: bridge-utils

when passed via sshinit. Confirmed the above works on newer cloud-init versions.

I'm about to propose the fix for 1.22.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Fix proposed: https://github.com/juju/juju/pull/1670

Just to be on the safe side, I did a final test with MAAS - bootstrapping a trusty node, then adding a precise one. It worked without issues.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

The fix for 1.22 has landed, working on testing the port of the same fix for 1.23.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Filed new bug 1425245 to improve the tests for the fix, as suggested on the review.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Fix for 1.23 proposed: https://github.com/juju/juju/pull/1673

Exactly the same set of tests as before - all successful.

Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23 → 1.23-beta1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.