Upgrade path calculation incorrect

Bug #1412698 reported by Michael Vogt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Snappy
Fix Released
Critical
Barry Warsaw
Ubuntu system image
Fix Released
High
Barry Warsaw

Bug Description

The upgrade path calculation is incorrect (as reported by asac). I can reproduce this too:
- create image
- force upgrade to ensure both sys-a,b are populated
- set version of current image to version -=1 (e.g. 244 instead of 245 if 245 is latest)
- set version of current image to version -= 2
- run "system-image-dbus -v"
- run "sudo snappy update"
- watch the log message of system-image-dbus show only a single version to upgrade instead of two (also downlods only a single delta instead of two)

Asac debugged this last night and we suspect the bug consists of two parts:
- partition.py does not point to the correct other partition system-image file due to a bug
- system-image caches the update calculcation and therefore does not recalculcate on reloadconfiguration

I attach two proof of concept patches that outline the fix. Both need work.

Tags: client

Related branches

Revision history for this message
Michael Vogt (mvo) wrote :

This changes the code to use the actual mount_target of the other parition. The call parition.get_mount_target() gives a path that is not actually used in the partition.mount_other_rootfs() earlier. So the following code that checks if other_config exists will never see it as this mount_point is empty.

Changed in snappy-ubuntu:
importance: Undecided → Critical
Revision history for this message
Michael Vogt (mvo) wrote :

This patch changes ReloadConfiguration to also flush the "self._update state.

For some unknown reason I had to self._checking.release() first or it would hang. This needs to be understood first before the patch can be applied.

Revision history for this message
Barry Warsaw (barry) wrote : Re: [Bug 1412698] Re: Upgrade path calculation incorrect

On Jan 20, 2015, at 09:29 AM, Michael Vogt wrote:

>This patch changes ReloadConfiguration to also flush the "self._update
>state.
>
>For some unknown reason I had to self._checking.release() first or it would
>hang. This needs to be understood first before the patch can be applied.

Both make sense. I'll work on this fix.

One thing you could do in the meantime is call .Exit() and then
re-D-Bus-activate. That defeats the purpose of .ReloadConfiguration() but it
would work.

Barry Warsaw (barry)
Changed in snappy-ubuntu:
status: New → In Progress
assignee: nobody → Barry Warsaw (barry)
Revision history for this message
Barry Warsaw (barry) wrote :

CheckForUpdate() acquires the checking lock. This gets released after the update is downloaded or when CancelUpdate gets called.

I just realized that if manual downloads are enabled, the checking lock might not get released after the check completes. That could be the bug you're encountering here, since I believe snappy always disables automatic downloads.

Barry Warsaw (barry)
tags: added: client
Changed in ubuntu-system-image:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Barry Warsaw (barry)
milestone: none → 3.0
status: In Progress → Invalid
Barry Warsaw (barry)
Changed in ubuntu-system-image:
status: Invalid → Triaged
Revision history for this message
Barry Warsaw (barry) wrote :

Why #4 is a problem here but not in the standard system-image.

In standard si, CheckForUpdate tries to acquire the checking lock, but it does so non-blocking. If the lock cannot be acquired, but there is cached update information already available, it will just send an UpdateAvailableStatus with the cached information. If no cached information is available, it is assumed the check is still in progress, and when *that* check completes, an UAS signal is emitted and the update information is cached. So for all intents and purposes, it doesn't matter if the checking lock is acquired.

However, when snappy's version with mvo's patch is used, it changes the internal assumptions. Let's say a CFU is called, and it completes, sending a UAS signal. However, as with the si trunk, the checking lock is not released, *and* snappy always uses manual downloads, so the release after the download completes never happens. When ReloadConfiguration invalidates the cache (by setting self._update to None), we now have a situation where the checking lock is acquired, no download or CancelUpdate is called to release the lock, *and* there is no cached update information. So from here on, no UAS signal will be emitted.

mvo should confirm, but I believe when he says in comment #2 that "I had to self._checking.release() first or it would hang", the "it" means the snappy client would hang because it would be waiting on a UAS signal that would never arrive. It's *not* the case that si-dbus would hang, at least I predict as such. :)

This then explains why mvo had to also release the checking lock after invalidating the cache. Once that's done, then next CFU will see the lock get re-acquired and will re-initiate the check, eventually leading to a UAS.

Ideally, the checking lock should probably get released after the UAS signal is emitted, but that's too risky of a change here. I'll work on that for si 3.0 though, thus the additional bug task. For the snappy version, I think mvo's patch is exactly right.

Barry Warsaw (barry)
Changed in ubuntu-system-image:
status: Triaged → Fix Committed
Michael Vogt (mvo)
Changed in snappy-ubuntu:
status: In Progress → Fix Released
Barry Warsaw (barry)
Changed in ubuntu-system-image:
status: Fix Committed → Fix Released
Michael Terry (mterry)
affects: snappy-ubuntu → snappy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.