Better protection against concurrent access

Bug #1277589 reported by Alan Pope 🍺🐧🐱 🦄 on 2014-02-07
60
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Ubuntu system image
Critical
Barry Warsaw
ubuntu-download-manager
Fix Released
Critical
Manuel de la Peña
ubuntu-system-settings (Ubuntu)
Undecided
Unassigned

Bug Description

Trying to update to 170 from a fresh 169 on mako I get the above error message after downloading the update and trying to install it.

http://popey.com/~alan/phablet/device-2014-02-07-112816.png

TEST CASE (verified on mako):
1. Flash the device with an older build than latest in -proposed
$ phablet-flash ubuntu-system --channel devel-proposed -b --revision 176
2. Wait
3. On the device: Skip intro
4. Configure networking and make sure the device connects to the Wifi network
5. Go to system-settings
6. Select 'Updates'
7. The device will detect that an update is available (e.g 178), proceed with the update

EXPECTED RESULT:
Upgrade succeeds and user can reboot the device to apply them

ACTUAL RESULT:
Upgrade fails with
FileNotFoundError: /var/lib/system-image/blacklist.tar.xz

Related branches

Manuel de la Peña (mandel) wrote :

Looking at the syslogs the only files that system image told the download manager to download are:

Feb 7 16:26:39 ubuntu-phablet ubuntu-download-manager[5980]: 2014-02-07 16:26:39,527 - DEBUG - Finished downloads ("/android/cache/recovery/keyring.tar.xz.asc", "/android/cache/recovery/keyring.tar.xz")

And therefore if that blacklist.tar.xz file is coming from the outside world it was not processed by udm AFAIK.

Barry Warsaw (barry) wrote :

Hi Alan, can you attach /var/log/system-image/client.log

tags: added: client

May have been a dodgy network. It's just worked.

Be nice if we didn't barf with a technical error message, but offered to retry.

Barry Warsaw (barry) wrote :
Download full text (3.6 KiB)

The client.log you posted is very confusing. Here's an excerpt:

[systemimage] Feb 07 16:25:55 2014 (5955) Running group download reactor
[systemimage] Feb 07 16:25:55 2014 (5955) Group download reactor done
[systemimage] Feb 07 16:25:56 2014 (5955) Upgrade path is 170
[systemimage] Feb 07 16:25:56 2014 (5955) Requesting group download:
 http://system-image.ubuntu.com/pool/ubuntu-0e1d92a78814995f84c2647fd3a25bd6c996cb9ad0bdf7bd2bea0a056ebeff9a.delta-ubuntu-f614715d833681978602b6e9d86b282acd008ef98c42be6f510560e91d3be6b9.tar.xz -> /android/cache/recovery/ubuntu-0e1d92a78814995f84c2647fd3a25bd6c996cb9ad0bdf7bd2bea0a056ebeff9a.delta-ubuntu-f614715d833681978602b6e9d86b282acd008ef98c42be6f510560e91d3be6b9.tar.xz
 http://system-image.ubuntu.com/pool/ubuntu-0e1d92a78814995f84c2647fd3a25bd6c996cb9ad0bdf7bd2bea0a056ebeff9a.delta-ubuntu-f614715d833681978602b6e9d86b282acd008ef98c42be6f510560e91d3be6b9.tar.xz.asc -> /android/cache/recovery/ubuntu-0e1d92a78814995f84c2647fd3a25bd6c996cb9ad0bdf7bd2bea0a056ebeff9a.delta-ubuntu-f614715d833681978602b6e9d86b282acd008ef98c42be6f510560e91d3be6b9.tar.xz.asc
 http://system-image.ubuntu.com/trusty-proposed/mako/version-170.tar.xz -> /android/cache/recovery/version-170.tar.xz
 http://system-image.ubuntu.com/trusty-proposed/mako/version-170.tar.xz.asc -> /android/cache/recovery/version-170.tar.xz.asc

[systemimage] Feb 07 16:25:56 2014 (5955) Running group download reactor
[systemimage] Feb 07 16:25:56 2014 (5955) Looking for blacklist: https://system-image.ubuntu.com/gpg/blacklist.tar.xz

What this is telling me is that s-i got to the step where it begins downloading the data files, but then suddenly jumps back to trying to find the blacklist. I don't see how that is possible.

That may or may not be a clue to the problem, because I also see this:

[systemimage] Feb 07 16:26:37 2014 (5955) Running group download reactor
[systemimage] Feb 07 16:26:39 2014 (5955) Group download reactor done
[systemimage] Feb 07 16:26:39 2014 (5955) No valid image master key found
[systemimage] Feb 07 16:26:39 2014 (5955) uncaught exception in state machine
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/systemimage/state.py", line 175, in run_until
    step()
  File "/usr/lib/python3/dist-packages/systemimage/state.py", line 351, in _get_master_key
    'archive-master', self.blacklist)
  File "/usr/lib/python3/dist-packages/systemimage/keyring.py", line 114, in get_keyring
    raise SignatureError
systemimage.gpg.SignatureError
[systemimage] Feb 07 16:26:40 2014 (5955) Group download reactor done
[systemimage] Feb 07 16:26:40 2014 (5955) uncaught exception in state machine
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/systemimage/state.py", line 175, in run_until
    step()
  File "/usr/lib/python3/dist-packages/systemimage/state.py", line 513, in _download_files
    with Context(*keyrings, blacklist=self.blacklist) as ctx:
  File "/usr/lib/python3/dist-packages/systemimage/gpg.py", line 74, in __init__
    raise FileNotFoundError(blacklist)
FileNotFoundError: /var/lib/system-image/blacklist.tar.xz
[systemimage] Feb 07 16:36:40 2014 (5955) SystemImage dbus main loop exited

This...

Read more...

Changed in ubuntu-system-image:
status: New → Incomplete

Tricky to reproduce, assuming it was in fact a network error.

I'm at a sprint where there's 3 networks I have been on. One (canonical) which has a WPA key, two others (iban & iban_conference) have captive portals that I may or may not have been signed into (they timeout). It's possible I dropped from one and arrived on another which hijacked DNS to show the captive portal.

Hard to reproduce.

Pat McGowan (pat-mcgowan) wrote :

I had the same symptom.

I was connected to an AP that required a login through a web api.
System settings were up
I entered the web code, the network came on, then settings indicated update available
Tried to get the update, then it failed as described.

On Feb 07, 2014, at 07:20 PM, Pat McGowan wrote:

>I was connected to an AP that required a login through a web api. System
>settings were up I entered the web code, the network came on, then settings
>indicated update available Tried to get the update, then it failed as
>described.

I wonder if the login is confusing u-d-m? Let's say it doesn't recognize that
the login is necessary, so it gets garbage for the file downloads (but not
errors). Then, when the signatures are checked, they're bogus because the
downloaded files are bogus.

I suppose one way to combat this is to pass the hashes to u-d-m when we know
them, as in LP: #1262256. But we don't know all the hashes so that won't be a
complete solution.

Stéphane Graber (stgraber) wrote :

On Fri, Feb 07, 2014 at 07:49:27PM -0000, Barry Warsaw wrote:
> On Feb 07, 2014, at 07:20 PM, Pat McGowan wrote:
>
> >I was connected to an AP that required a login through a web api. System
> >settings were up I entered the web code, the network came on, then settings
> >indicated update available Tried to get the update, then it failed as
> >described.
>
> I wonder if the login is confusing u-d-m? Let's say it doesn't recognize that
> the login is necessary, so it gets garbage for the file downloads (but not
> errors). Then, when the signatures are checked, they're bogus because the
> downloaded files are bogus.
>
> I suppose one way to combat this is to pass the hashes to u-d-m when we know
> them, as in LP: #1262256. But we don't know all the hashes so that won't be a
> complete solution.

Yeah, especially in this particular case as we don't have the hashes of
the GPG tarballs since I haven't found any obvious location for those
short of introducing a new gpg.json at the root (which would then suffer
from the same lack of checksum anyway...).

--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com

Manuel de la Peña (mandel) wrote :

On 7 Feb 2014 21:20, "Stéphane Graber" <email address hidden> wrote:
>
> On Fri, Feb 07, 2014 at 07:49:27PM -0000, Barry Warsaw wrote:
> > On Feb 07, 2014, at 07:20 PM, Pat McGowan wrote:
> >
> > >I was connected to an AP that required a login through a web api.
 System
> > >settings were up I entered the web code, the network came on, then
settings
> > >indicated update available Tried to get the update, then it failed as
> > >described.
> >
> > I wonder if the login is confusing u-d-m? Let's say it doesn't
recognize that
> > the login is necessary, so it gets garbage for the file downloads (but
not
> > errors). Then, when the signatures are checked, they're bogus because
the
> > downloaded files are bogus.
> >

Good point, I'm going to do some extra debugging for this particular cases.
I know how to check if a log in is needed and that could be fwd as a signal
to system image updates.

> > I suppose one way to combat this is to pass the hashes to u-d-m when we
know
> > them, as in LP: #1262256. But we don't know all the hashes so that
won't be a
> > complete solution.
>
> Yeah, especially in this particular case as we don't have the hashes of
> the GPG tarballs since I haven't found any obvious location for those
> short of introducing a new gpg.json at the root (which would then suffer
> from the same lack of checksum anyway...).
>
> --
> Stéphane Graber
> Ubuntu developer
> http://www.ubuntu.com
>
> --
> You received this bug notification because you are a member of Ubuntu
> System Image team, which is subscribed to Ubuntu system image.
> https://bugs.launchpad.net/bugs/1277589
>
> Title:
> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
>
> Status in Ubuntu system image (server/client/updater):
> Incomplete
>
> Bug description:
> Trying to update to 170 from a fresh 169 on mako I get the above error
> message after downloading the update and trying to install it.
>
> http://popey.com/~alan/phablet/device-2014-02-07-112816.png
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu-system-image/+bug/1277589/+subscriptions

I just had this again on my mako image while at home on a normal domestic WPA internet connection.

I connected to wifi, went to system settings and tapped the "updates available" icon then after a short while I get the same error:-

Barry Warsaw (barry) wrote :

We've now seen multiple reports of this problem, and the common thread seems to be trying to upgrade on networks that require a login. So until we know for sure otherwise, I've opened a new bug-task on u-d-m, since Manuel knows how to detect login required as per comment #10.

To be clear, it happened on my home wifi which does not require login.

Changed in ubuntu-download-manager:
importance: Undecided → High
assignee: nobody → Manuel de la Peña (mandel)
status: New → Confirmed
importance: High → Critical

On Feb 10, 2014, at 09:29 PM, Alan Pope ㋛ wrote:

>To be clear, it happened on my home wifi which does not require login.

Can you verify the checksums as described in mailing list?

IIRC, you first saw the error when you tried to do an update on a network that
required login, right? If so, then there may be a situation where once you've
gotten corrupt keyrings, system-image doesn't clean up after itself
correctly. E.g. you get a SignatureError but the keyrings are still cached or
subsequent checks don't treat the keyrings as garbage and re-download them.

I'm fairly certain we're doing some checks like that but it's entirely
possible there's a missing step there somewhere. I'll reopen the bug and
double check.

Manuel de la Peña (mandel) wrote :

On Mon, Feb 10, 2014 at 10:14 PM, Barry Warsaw
<email address hidden>wrote:

> We've now seen multiple reports of this problem, and the common thread
> seems to be trying to upgrade on networks that require a login. So
> until we know for sure otherwise, I've opened a new bug-task on u-d-m,
> since Manuel knows how to detect login required as per comment #10.
>
>
Setting this as Critical and "Confirmed" even if we are no 100% sure that
we are dealing with issues with login networks. I can grab the auth issues
(proxy and other) and fwd to system image updates. That way we can get more
information from my side.

** Also affects: ubuntu-download-manager
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are a member of Ubuntu
> System Image team, which is subscribed to Ubuntu system image.
> https://bugs.launchpad.net/bugs/1277589
>
> Title:
> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
>
> Status in Ubuntu Download Manager:
> New
> Status in Ubuntu system image (server/client/updater):
> Incomplete
>
> Bug description:
> Trying to update to 170 from a fresh 169 on mako I get the above error
> message after downloading the update and trying to install it.
>
> http://popey.com/~alan/phablet/device-2014-02-07-112816.png
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu-download-manager/+bug/1277589/+subscriptions
>

Barry Warsaw (barry) on 2014-02-10
Changed in ubuntu-system-image:
status: Incomplete → New
Manuel de la Peña (mandel) wrote :

On Mon, Feb 10, 2014 at 10:29 PM, Alan Pope ㋛ <email address hidden> wrote:

> To be clear, it happened on my home wifi which does not require login.
>
> Nevertheless I'll do my best to get more info from udm.

> --
> You received this bug notification because you are a member of Ubuntu
> System Image team, which is subscribed to Ubuntu system image.
> https://bugs.launchpad.net/bugs/1277589
>
> Title:
> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
>
> Status in Ubuntu Download Manager:
> New
> Status in Ubuntu system image (server/client/updater):
> Incomplete
>
> Bug description:
> Trying to update to 170 from a fresh 169 on mako I get the above error
> message after downloading the update and trying to install it.
>
> http://popey.com/~alan/phablet/device-2014-02-07-112816.png
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu-download-manager/+bug/1277589/+subscriptions
>

Unfortunately I can no longer get that data as I had to wipe my phone completely for other testing :(

Your suggestion that the problem on my home network was a continuation of the problem I had on the network which required authentication is plausible of course.

I confirm that this happens on networks that do not require login.

I verified the checksums of the files in /var/lib/system-image/keyrings/* and they match with files downloaded on another system
root@ubuntu-phablet:/# md5sum /var/lib/system-image/keyrings/*
1a4a5d8b79d01a77523c2d3dfba0304b /var/lib/system-image/keyrings/image-master.tar.xz
d895b83c18976542e2bb658729903b7c /var/lib/system-image/keyrings/image-master.tar.xz.asc
b89d60561ca2a866c5abc4e472663109 /var/lib/system-image/keyrings/image-signing.tar.xz
9a4214fb6d6d9b27dc5df03b0f2569d1 /var/lib/system-image/keyrings/image-signing.tar.xz.asc

Note that there are only 4 files. archive-master.* and blacklist.* are missing from this directory.

Dave Morley (davmor2) wrote :

I have experienced this on R173. Client log attached as confirmation.

Dave Morley (davmor2) wrote :

Oddly this issue seems to of arrived since the setting app has been popping up the message that there is an update available. I don't know if the 2 are entwined somehow.

On Tue, Feb 11, 2014 at 1:17 PM, Dave Morley <email address hidden> wrote:

> Oddly this issue seems to of arrived since the setting app has been
> popping up the message that there is an update available. I don't know
> if the 2 are entwined somehow.
>
>
Interesting AFAIK udm and system-image-updates have not changed much and if
Popey states that it happened at his home network with no auth... we might
need to look somewhere else, I'll ping the developer that did the system
settings changes (I'm assuming that setting apps is the system settings
app, correct?).

> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1277589
>
> Title:
> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
>
> Status in Ubuntu Download Manager:
> Confirmed
> Status in Ubuntu system image (server/client/updater):
> New
>
> Bug description:
> Trying to update to 170 from a fresh 169 on mako I get the above error
> message after downloading the update and trying to install it.
>
> http://popey.com/~alan/phablet/device-2014-02-07-112816.png
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu-download-manager/+bug/1277589/+subscriptions
>

Manuel de la Peña (mandel) wrote :

On Tue, Feb 11, 2014 at 1:47 PM, Manuel de la Pena <
<email address hidden>> wrote:

>
>
>
> On Tue, Feb 11, 2014 at 1:17 PM, Dave Morley <email address hidden>wrote:
>
>> Oddly this issue seems to of arrived since the setting app has been
>> popping up the message that there is an update available. I don't know
>> if the 2 are entwined somehow.
>>
>>
> Interesting AFAIK udm and system-image-updates have not changed much and
> if Popey states that it happened at his home network with no auth... we
> might need to look somewhere else, I'll ping the developer that did the
> system settings changes (I'm assuming that setting apps is the system
> settings app, correct?).
>

Have we started using push notifications for updates?? could that be the
issue?

>
>
>> --
>> You received this bug notification because you are a bug assignee.
>> https://bugs.launchpad.net/bugs/1277589
>>
>> Title:
>> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
>>
>> Status in Ubuntu Download Manager:
>> Confirmed
>> Status in Ubuntu system image (server/client/updater):
>> New
>>
>> Bug description:
>> Trying to update to 170 from a fresh 169 on mako I get the above error
>> message after downloading the update and trying to install it.
>>
>> http://popey.com/~alan/phablet/device-2014-02-07-112816.png
>>
>> To manage notifications about this bug go to:
>>
>> https://bugs.launchpad.net/ubuntu-download-manager/+bug/1277589/+subscriptions
>>
>
>

This might seem to appear now, because several error messages were not being shown before.
System Settings is just forwarding the signals with the errors from system-image in order to show something in that area.
Let me know if i can do anything to help with this, but from the System Settings point of view, it's only calling and forwarding stuff from the system-image.

Barry Warsaw (barry) wrote :

Manuel's right, neither system-image nor u-d-m have changed since December. From comment #23 it sounds like what's changed is that the errors are now getting propagated to the ui, so they are visible to the user. @Diego: what would have happened previously, if there was an error in system-image-dbus? Would the ui have just shown no update available?

@jibel in comment #18, the checksums of the .xz files do seem to match mine. Are you saying that you've never tried to do a download on a login-protected network?

I am going to work up a branch to add some additional debug logging. I just have never seen this error on my local network. I'm still suspecting some kind of login-required issue (though I'll keep jibel's comment in mind), so I'll see if I can figure out how to mimic this behavior with a proxy, or head up to the local internet cafe to see what happens.

I also still think it's worthwhile to instrument u-d-m to watch out for login-protected networks.

Barry Warsaw (barry) wrote :

Also, for those of you seeing this error, can you paste the output of the following:

$ adb shell
# system-image-cli --info

taiebot65 (taiebot65) wrote :

I have this erro on mako and can not update to 174 the installation stop at 100% downloading and the error pop up

system-image-cli --info
current build number: 173
device name: mako
channel: trusty
last update: 2014-02-10 18:53:39
version version: 173
version ubuntu: 20140210
version device: 20140115.1

On Tue, Feb 11, 2014 at 9:52 PM, taiebot65 <email address hidden> wrote:

> I have this erro on mako and can not update to 174 the installation stop
> at 100% downloading and the error pop up
>
>
For those that are experiencing this problem, retrying the upgrade with a
reliable
Wifi connections seems to work. Something in the network stack has changed
making the downloader miss behave during mobile data downloads.

If I'm correct all of the people that have reported the bug have tried to
do the
update over 3g, am I correct?

> system-image-cli --info
> current build number: 173
> device name: mako
> channel: trusty
> last update: 2014-02-10 18:53:39
> version version: 173
> version ubuntu: 20140210
> version device: 20140115.1
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1277589
>
> Title:
> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
>
> Status in Ubuntu Download Manager:
> Confirmed
> Status in Ubuntu system image (server/client/updater):
> New
>
> Bug description:
> Trying to update to 170 from a fresh 169 on mako I get the above error
> message after downloading the update and trying to install it.
>
> http://popey.com/~alan/phablet/device-2014-02-07-112816.png
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu-download-manager/+bug/1277589/+subscriptions
>

Same error first on a wifi network with password then on my home open network.

current build number: 173
device name: mako
channel: devel
alias: trusty
last update: 2014-02-10 14:16:53
version version: 173
version ubuntu: 20140210
version device: 20140115.1

Elias K Gardner (zorkerz) wrote :

Successful updae to 174 on the open network after shutting the device off a few times and waiting 7 hours. I do not have a 3g card in this phone and cellular data is turned off.

Barry Warsaw (barry) wrote :

Thanks Elias, this is very helpful information. I think this aligns with my thinking that we don't properly handle login-protected networks. But there's likely more: once we have a problem due to the login network, we don't recover as easily when switched to an open network (or the login is provided, thus giving us unencumbered downloads). This is a case I need to cover in the test suite.

Here's a question though: did you test it after a single reboot? Or after, say a 30m wait? I ask because the system-image-dbus process has a lifetime of 10 minutes of inactivity, meaning if you attempt no updates and s-i-dbus sees no other activity for 10m, it will exit. Of course a reboot will also exit the process. So the question is whether subsequent errors are due to some state being retained in the running process, or some extra-process state hanging around on the file system.

Anyway, both are interesting situations to explore.

To the best of my memory I tried on the passworded notwork and got the
error. Rebooted and tried again with the same error. Then about an hour
later on the open network without rebooting, same error. Then rebooted and
got the same error again. Then I waited a good five hours but can't
remember if I rebooted again or not.

On Wed, Feb 12, 2014 at 10:11 AM, Barry Warsaw
<email address hidden>wrote:

> Thanks Elias, this is very helpful information. I think this aligns
> with my thinking that we don't properly handle login-protected networks.
> But there's likely more: once we have a problem due to the login
> network, we don't recover as easily when switched to an open network (or
> the login is provided, thus giving us unencumbered downloads). This is
> a case I need to cover in the test suite.
>
> Here's a question though: did you test it after a single reboot? Or
> after, say a 30m wait? I ask because the system-image-dbus process has
> a lifetime of 10 minutes of inactivity, meaning if you attempt no
> updates and s-i-dbus sees no other activity for 10m, it will exit. Of
> course a reboot will also exit the process. So the question is whether
> subsequent errors are due to some state being retained in the running
> process, or some extra-process state hanging around on the file system.
>
> Anyway, both are interesting situations to explore.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1277589
>
> Title:
> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu-download-manager/+bug/1277589/+subscriptions
>

I have a nexus10 running build 170 trying to update to 176
The update continues to fail, and I noticed this crash file at the time it reported the file not found error
Once it crashes the update just spins and I never see u-d-m running.

Just got the same error today on my passworded work network updating from
174 to 176. Will transition to the unprotected home network in a few hours.
If there is anything I can test let me know.

On Wed, Feb 12, 2014 at 2:02 PM, Pat McGowan <email address hidden>wrote:

> I have a nexus10 running build 170 trying to update to 176
> The update continues to fail, and I noticed this crash file at the time it
> reported the file not found error
> Once it crashes the update just spins and I never see u-d-m running.
>
> ** Attachment added: "_usr_sbin_system-image-dbus.0.crash"
>
> https://bugs.launchpad.net/ubuntu-system-image/+bug/1277589/+attachment/3978749/+files/_usr_sbin_system-image-dbus.0.crash
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1277589
>
> Title:
> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu-download-manager/+bug/1277589/+subscriptions
>

While I guess you have sufficient information already. On my mako I got the blacklist error again going from previous stable to latest stable (176). I killed system settings and restarted it and it worked. I have auto-download set on, and I'm on home wifi.

Barry Warsaw (barry) wrote :

I actually don't think this is related to either login-protected networks or 3g. I've seen the same bug exactly once on my n10 sitting on my home (open) wifi network. That's the good news. The bad news is that I've only seen it once and can't yet reproduce it.

Alan O'Dannel (aoddev) wrote :

I'm seeing the same issues as reported above today. I have been using my home network for all attempts. I have restarted my device (mako) 5 times, I have left the device idle for a couple of hours, no luck.

running system-image-cli --info returns:
current build number: 174
device name: mako
channel: trusty
last update: 2014-02-11 11:50:56
version version: 174
version ubuntu: 20140211
version device: 20140115.1

Latest attempt to donwlaod the latest build:
[systemimage] Feb 12 20:49:54 2014 (26847) Running group download reactor
[systemimage] Feb 12 20:50:16 2014 (26847) Group download reactor done
[systemimage] Feb 12 20:50:16 2014 (26847) No signed blacklist found
[systemimage] Feb 12 20:50:16 2014 (26847) Getting the image master key
[systemimage] Feb 12 20:50:16 2014 (26847) Requesting group download:
 https://system-image.ubuntu.com/gpg/image-master.tar.xz -> /android/cache/recovery/keyring.tar.xz
 https://system-image.ubuntu.com/gpg/image-master.tar.xz.asc -> /android/cache/recovery/keyring.tar.xz.asc

[systemimage] Feb 12 20:50:16 2014 (26847) Running group download reactor
[systemimage] Feb 12 20:50:17 2014 (26847) Group download reactor done
[systemimage] Feb 12 20:50:17 2014 (26847) No valid image master key found
[systemimage] Feb 12 20:50:17 2014 (26847) uncaught exception in state machine
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/systemimage/state.py", line 175, in run_until
    step()
  File "/usr/lib/python3/dist-packages/systemimage/state.py", line 351, in _get_master_key
    'archive-master', self.blacklist)
  File "/usr/lib/python3/dist-packages/systemimage/keyring.py", line 114, in get_keyring
    raise SignatureError
systemimage.gpg.SignatureError
[systemimage] Feb 12 20:50:18 2014 (26847) Group download reactor done
[systemimage] Feb 12 20:50:18 2014 (26847) uncaught exception in state machine
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/systemimage/state.py", line 175, in run_until
    step()
  File "/usr/lib/python3/dist-packages/systemimage/state.py", line 513, in _download_files
    with Context(*keyrings, blacklist=self.blacklist) as ctx:
  File "/usr/lib/python3/dist-packages/systemimage/gpg.py", line 74, in __init__
    raise FileNotFoundError(blacklist)

I can reproduce it reliably on a login-less Wifi network WPA-PSK/CCMP security. I'll add a testcase to the description.

description: updated

I reproduce this issue during an upgrade from 176 to 180 with u-d-m 0.3+14.04.20140213.3-0ubuntu1 from the PPA https://launchpad.net/~ci-train-ppa-service/+archive/landing-011/+packages

Content of /var/log/ubuntu-download-manager/ attached

In this directory there is also a symlink UNKNOWN.INFO -> ubuntu-download-manager.log20140213-151948.2314

Barry Warsaw (barry) on 2014-02-13
Changed in ubuntu-system-image:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Barry Warsaw (barry)
milestone: none → 2.1
Pat McGowan (pat-mcgowan) wrote :

There is a strong suspicion as to the root cause being related to the update check causing two simultaneous processes doing the update. Two changes are planed: 1) to limit this to a single instance of system-update and 2) to download files first to /tmp then atomically rename

In the meantime users should change the Auto Download option in the Updates page in System Settings.
Change from Download future updates automatically from "When on wifi" or "On any data connection" to "Never"

Barry Warsaw (barry) wrote :
Download full text (3.3 KiB)

We now have a plausible scenario for the cause. Much thanks to ogra, pmcgowan, mandel, and laney for a fantastic team effort. One caveat: we won't declare success until we have a fix, but the cause we're working on makes the most sense, and we have a plan for fixing it.

* A recent (as in ~the time of popey's original bug report) change in the ui caused an CheckForUpdate to start when System Settings was opened, even before Update icon is clicked. Now remember that when auto-download is set (the default), not only do we start the check, but we then start the download if an update is available.

* Now click Update icon, a second update is attempted. We think we have a race condition here where two updates are started at the same time, either by a flaw in the in-process s-i-dbus flag (I can imagine a small window for racing against this flag), or two s-i-dbus processes being started.

* We've verified that the SignatureError seen in all the logs is because the keyring.tar.xz and keyring.tar.xz.asc files are *empty* after u-d-m reports a successful download. Imagine two download happening at the same time, either because of the above races, or a dbus-inherent race (i.e. messages getting queued up and delivered at an unfortunate time). s-i-dbus requests u-d-m to download the files to a specific location on the file system, so if a second request comes in with the same destination, it will overwrite the first requests. We think this is why we're seeing the zero-sized files.

* When we see what is essentially corrupt blacklist files, we assume that it must be the image-master key that's out of date, so we try to download that. Same thing happens here (which is suspicious, but let's say it's possible). We see zero sized image-master tar.xz and .asc files. Those of course fail signature checks, so s-i then left with assuming that the pre-loaded archive master key is possibly corrupt. We cannot download that key for security purposes, so we're left at the end of the chain and s-i gives up. The exception that's reported to the ui is a little misleading, or at worst unhelpful, but it's essentially saying "we tried to download a proper chain of trust, and failed at the end of the chain, so we have to give up". There's nothing more s-i can do.

We have two action items to prevent this:

1) s-i-dbus will add guards to ensure that only a single s-i-dbus process is ever running, by ensuring that the process owns its D-Bus name. If it can't own the name, another process is running and we'll exit. Hopefully this is a race-resilient check. For even better guarantee, I could use flufl.lock which is a long battle tested library for file system locks, with timeouts against stale locks. I will also double check the in-process flag guard and add lots of tests!

2) mandel will fix u-d-m to download files to a .tmp file, then atomically rename into place once the files are successfully downloaded. That way, even in the face of a race condition, it will be impossible to "see" a partially downloaded file.

For those of you who are experiencing this, the work around is to wait 10m for s-i-dbus to exit and try again. Note that you can also adb shell i...

Read more...

Oliver Grawert (ogra) wrote :

see also bug 1280169

Download full text (3.3 KiB)

I was just talking with Manuel, about this problem that seems to appears when concurrent requests are made to the System Image updates service for checkForUpdates.
Let me explain how System Settings works:

- When System Settings gets started, a process is called to determine if the Entry for System Updates needs to be shown in the main page of System Settings (The large button with the number of updates):
https://wiki.ubuntu.com/SoftwareUpdates?action=AttachFile&do=get&target=phone-settings-quick-access-updates.png

- If that process says that System Updates needs to be shown, the entry component is loaded in the UI, and the checkForUpdates call is made again to check how many updates needs to be shown in that component (the number at the right in the UI). For this situation, there is no conflict between the previous step and this one, because this second check is only triggered if the first one already finished and it says there was an update available.

- Then, if the System Updates button, or the System Updates entry in the main page of the System Settings UI is pressed, the Page for System Updates is loaded (which is another different binary and process from the previous two ones, that's because System Settings works using plugins for everything), so checkForUpdates needs to be triggered again for this situation to be able to populate the UI and show the different components in the UI according to the updates available.
https://wiki.ubuntu.com/SoftwareUpdates?action=AttachFile&do=get&target=phone-setting-updates-checking-existing.png

- The last situation is when automatic updates is activated, this seems to be another process (I've not worked with that code), that triggers another checkForUpdates when System Settings is opened in order to decide if it needs to start upgrading the system.

So, if concurrent calls to checkForUpdates is the problem:
- The fourth situation is crashing with any of the two first situations.
- The Third situation could crash with any of the two first situations as well if the System Updates button is pressed fast, before the previous checks ends.

The Entry and PageComponent for System Updates plugins are in fact filtering the results to make sure that we show the results only once in case concurrent calls are made to checkForUpdates, but the problem here seems to be a raise condition between processes when the files that the System Image service needs to check if there is in fact an update available are being downloaded lots of times and overwritting between each other.
When this was tested using the system-image-dbus mock server to reproduce all the different scenarios here:
https://wiki.ubuntu.com/ImageBasedUpgrades/Client
the only problem about concurrent calls was that a component could receive multiple results of a system update, and that was the fix that was made to filter that, but this specific problem didn't show up because probably the mock server is not downloading stuff to determine if there is an update.

Could it be possible for the System Image service to know if a current check for update is already in progress and avoid triggering that again??
In that way, any concurrent service trying to ...

Read more...

Barry Warsaw (barry) on 2014-02-14
summary: - "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz"
+ Better protection against concurrent access
Download full text (4.4 KiB)

Thanks for the background Diego. Some thoughts inline.

On Feb 14, 2014, at 02:23 PM, Diego Sarmentero wrote:

>- The last situation is when automatic updates is activated, this seems
>to be another process (I've not worked with that code), that triggers
>another checkForUpdates when System Settings is opened in order to
>decide if it needs to start upgrading the system.

Remember that the default setting for system-image is to automatically
download the updates after a check-for-updates finds one available. So even
the first check call will trigger the downloads. You may or may not want to
do that based on whether you'd like the data files to be immediately
available. That's a ux decision, but do be aware of that. Also note that the
downloaded files are cached so a subsequent check will verify the validity of
the cached files, but can avoid costly downloads if they're all still valid.

You can of course disable auto-downloads globally by setting system image to
manually download. You might want to make the initial check with manual
downloads set, then switch back to automatic downloads for subsequent checks.
It would be a D-Bus API change to specify the download behavior in the
CheckForUpdate call.

>The Entry and PageComponent for System Updates plugins are in fact filtering
>the results to make sure that we show the results only once in case
>concurrent calls are made to checkForUpdates, but the problem here seems to
>be a raise condition between processes when the files that the System Image
>service needs to check if there is in fact an update available are being
>downloaded lots of times and overwritting between each other. When this was
>tested using the system-image-dbus mock server to reproduce all the different
>scenarios here: https://wiki.ubuntu.com/ImageBasedUpgrades/Client the only
>problem about concurrent calls was that a component could receive multiple
>results of a system update, and that was the fix that was made to filter
>that, but this specific problem didn't show up because probably the mock
>server is not downloading stuff to determine if there is an update.

Be careful about trusting the mock too much. Mocks are by definition just a
simulation of the real thing, and they do not always accurately mimic the
behavior of the real service. (I tend to think they cannot possibly do so
without actually *being* the real service.)

For example, currently the service attempts to avoid multiple CheckForUpdate
background actions. If a check is already in progress, subsequent checks will
never receive an UpdateAvailable signal. It may be difficult to impossible to
actually do that, because if a check is in progress but not yet completed, we
won't have the information to send yet, and if the check is completed, a
subsequent CheckForUpdate has different semantics so a full second check will
be performed first (but again, because it's highly likely a second check will
find the same update available, it'll use the cached data files and will send
a low cost UpdateAvailable signal).

Think of it like this: you'll get one UpdateAvailable signal for every new or
in-progress update check when CheckForUpdate is called. If you ca...

Read more...

Barry Warsaw (barry) on 2014-02-14
Changed in ubuntu-system-image:
status: In Progress → Fix Committed
Barry Warsaw (barry) on 2014-02-14
Changed in ubuntu-system-image:
status: Fix Committed → In Progress
Barry Warsaw (barry) wrote :

I'm moving the status of this back to in-progress, but there does seem to be more going on. If you want to try an experimental package, look here:

https://launchpad.net/~barry/+archive/systemimage/+build/5590226

Turn your phone writable; download the .debs; dpkg -i them. Now try to do an update and watch /var/log/system-image/client.log.

For me, with this package installed, the first boot after flashing to 174 still gives me the errors, and the checksums in the error message still indicate zero-sized destination files written by udm.

There are two sets of concurrency guards in si now: we ensure unique ownership of the bus name and exit if we don't get that, and we use a threading lock to close the small race condition in the code while acquiring the "checking" lock.

Now, these tests were done without the udm atomic rename branch, so that may help, but in the meantime I'm can try two additional changes: 1) I think I can add an atomic rename to my side of the udm interface; 2) I'm going to put the same kind of threading lock around the 'downloading' flag, even though this error occurs before downloading is started.

If you do try the PPA packages, please follow up here with your experience. I'll let you know when I have new packages to try.

Barry Warsaw (barry) wrote :

So I've tried #1, and uploaded that to my PPA, but it still doesn't completely eliminate the errors. However, I did perform 5 complete reflashes to r174, then installed my experimental package and click on System Settings (which invokes the update and download). None of those 5 got the SignatureError. The last update though did give me the FileNotFoundError on blacklist.tar.xz. So I can't claim success yet.

I do think there's something to the udm bug of leaving a zero-sized file for the requested download. Requesting a local file path of a temp file and doing the atomic rename on the client side seems to help, although I have no idea why.

More on Monday. But please, if you do feel adventurous, give the PPA packages a try and let us know here what happens. I do think it will be progress if I've managed to eliminate the SignatureErrors.

Changed in ubuntu-download-manager:
status: Confirmed → In Progress
Barry Warsaw (barry) on 2014-02-18
Changed in ubuntu-system-image:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-system-settings - 0.1+14.04.20140218.1-0ubuntu1

---------------
ubuntu-system-settings (0.1+14.04.20140218.1-0ubuntu1) trusty; urgency=low

  [ Iain Lane ]
  * Don't check for updates in the EntryComponent of the update plugin.
    Temporarily disables the feature. Will be re-enabled when bug
    #1277589 is fixed. (LP: #1277589)
  * Show default wallpaper in the "Ubuntu art" section of the background
    selector
  * Add some initial AP tests for manual selection of cellular network.
 -- Ubuntu daily release <email address hidden> Tue, 18 Feb 2014 21:36:59 +0000

Changed in ubuntu-system-settings (Ubuntu):
status: New → Fix Released
Matthew Paul Thomas (mpt) wrote :

I reported bug 1282499 on the way the error message is displayed.

Changed in ubuntu-download-manager:
status: In Progress → Fix Released
Barry Warsaw (barry) on 2014-02-28
Changed in ubuntu-system-image:
status: Fix Committed → Fix Released

I seem to be able to still trigger this error message "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz" on image #212 with ubuntu-system-settings 0.1+14.04.20140218.1-0ubuntu1

root@ubuntu-phablet:/# apt-cache policy ubuntu-system-settings
ubuntu-system-settings:
  Installed: 0.1+14.04.20140218.1-0ubuntu1
  Candidate: 0.1+14.04.20140224.1-0ubuntu1
  Version table:
     0.1+14.04.20140224.1-0ubuntu1 0
        500 http://ports.ubuntu.com/ubuntu-ports/ trusty/universe armhf Packages
 *** 0.1+14.04.20140218.1-0ubuntu1 0
        100 /var/lib/dpkg/status

Easily reproduced. Go to system settings -> updates. Choose to install the update, but while it's downloading, pull up the toolbar and press "back". Go back into "Updates" and after a while the error appears. Is this a separate bug or the same?

On Sat, Mar 01, 2014 at 07:34:04PM -0000, Alan Pope ㋛ wrote:
> I seem to be able to still trigger this error message
> "FileNotFoundError: /var/lib/system-image/blacklist.tar.xz" on image
> #212 with ubuntu-system-settings 0.1+14.04.20140218.1-0ubuntu1
>
> root@ubuntu-phablet:/# apt-cache policy ubuntu-system-settings
> ubuntu-system-settings:
> Installed: 0.1+14.04.20140218.1-0ubuntu1
> Candidate: 0.1+14.04.20140224.1-0ubuntu1
> Version table:
> 0.1+14.04.20140224.1-0ubuntu1 0
> 500 http://ports.ubuntu.com/ubuntu-ports/ trusty/universe armhf Packages
> *** 0.1+14.04.20140218.1-0ubuntu1 0
> 100 /var/lib/dpkg/status
>
> Easily reproduced. Go to system settings -> updates. Choose to install
> the update, but while it's downloading, pull up the toolbar and press
> "back". Go back into "Updates" and after a while the error appears. Is
> this a separate bug or the same?

That apt-cache policy line shows your ubuntu-system-settings is out of
date. The main fixes for this bug are actually in system-image-dbus and
ubuntu-download-manager, FWIW. But they all got upgraded in the same
image so they will be out of date too.

I think you want image #213 which has the newer packages. (AFAICT, I'm
not very good at mapping dates to image numbers. This one:
http://people.canonical.com/~ogra/touch-image-stats/20140228.2.changes)

Also, I didn't try this exact reproduction recipe but I believe it's the
same thing. See if you can make it happen with a new image, please.

--
Iain Lane [ <email address hidden> ]
Debian Developer [ <email address hidden> ]
Ubuntu Developer [ <email address hidden> ]

#213 also has the issue. May be related to bug 1286553 in which case I'll try again with #216 tomorrow.

Barry Warsaw (barry) wrote :

On Mar 01, 2014, at 07:34 PM, Alan Pope ㋛ wrote:

>Easily reproduced. Go to system settings -> updates. Choose to install
>the update, but while it's downloading, pull up the toolbar and press
>"back". Go back into "Updates" and after a while the error appears. Is
>this a separate bug or the same?

On my device, I've done several flashes to 213, upgrading to 215 using the
recipe above. Works for me each time.

I have triggered it again on 216 upgrading to 217.

To be clear, I'm *not* upgrading, but starting the upgrade then pressing back, then going back in later, and then the error appears. Here's a video:-

https://www.youtube.com/watch?v=YmD6cGYvIAI

Attached client.log after repeating the issue.

Barry Warsaw (barry) wrote :

On Mar 03, 2014, at 08:44 AM, Alan Pope ㋛ wrote:

>I have triggered it again on 216 upgrading to 217.
>
>To be clear, I'm *not* upgrading, but starting the upgrade then pressing
>back, then going back in later, and then the error appears. Here's a
>video:-
>
>https://www.youtube.com/watch?v=YmD6cGYvIAI

One difference from the previous incarnation is that you're doing manual
downloads. Your video actually helped quite a bit as I can reproduce this
fairly consistently now by flashing to 215 and upgrading to whatever is the
latest image.

Log file analysis leads me to think there's a race condition between u-d-m
doing its atomic renames and it sending the 'finished' signal for the group
download. If I put some logging right after the finished signal is received,
and I list the directories containing the destination files, I see that they
have .tmp.tmp suffixes. The first .tmp is put there by s-i (which still has
its own atomic-rename workaround), but the second .tmp is put there by u-d-m
for *its* atomic rename operation.

It should not be possible for s-i to see the .tmp.tmp files. This can only
happen if it's seeing the finished signal before u-d-m does its atomic
rename.

I tried the following experiment: at the point where the finished signal is
received, I log the data as explained above, and then I sleep(2) before
continuing on. With the sleep in there, it doesn't crash for me.

This isn't definitive proof of what's going on, but it seems plausible.
Manuel is investigating u-d-m, and then he'll provide a package for me to test
on my device.

Manuel de la Peña (mandel) wrote :
Download full text (3.2 KiB)

On 3 Mar 2014 19:20, "Barry Warsaw" <email address hidden> wrote:
>
> On Mar 03, 2014, at 08:44 AM, Alan Pope ㋛ wrote:
>
> >I have triggered it again on 216 upgrading to 217.
> >
> >To be clear, I'm *not* upgrading, but starting the upgrade then pressing
> >back, then going back in later, and then the error appears. Here's a
> >video:-
> >
> >https://www.youtube.com/watch?v=YmD6cGYvIAI
>
> One difference from the previous incarnation is that you're doing manual
> downloads. Your video actually helped quite a bit as I can reproduce this
> fairly consistently now by flashing to 215 and upgrading to whatever is
the
> latest image.
>
> Log file analysis leads me to think there's a race condition between u-d-m
> doing its atomic renames and it sending the 'finished' signal for the
group
> download. If I put some logging right after the finished signal is
received,
> and I list the directories containing the destination files, I see that
they
> have .tmp.tmp suffixes. The first .tmp is put there by s-i (which still
has
> its own atomic-rename workaround), but the second .tmp is put there by
u-d-m
> for *its* atomic rename operation.
>
> It should not be possible for s-i to see the .tmp.tmp files. This can
only
> happen if it's seeing the finished signal before u-d-m does its atomic
> rename.
>
> I tried the following experiment: at the point where the finished signal
is
> received, I log the data as explained above, and then I sleep(2) before
> continuing on. With the sleep in there, it doesn't crash for me.
>
> This isn't definitive proof of what's going on, but it seems plausible.
> Manuel is investigating u-d-m, and then he'll provide a package for me to
test
> on my device.

After adding extra test and logs I can say that udm sees all the downloaded
files before raising the signal that s-I receives.

udm does not delete any files are all unless an error occurs and this is
not the case.

>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1277589
>
> Title:
> Better protection against concurrent access
>
> Status in Ubuntu Download Manager:
> Fix Released
> Status in Ubuntu system image (server/client/updater):
> Fix Released
> Status in “ubuntu-system-settings” package in Ubuntu:
> Fix Released
>
> Bug description:
> Trying to update to 170 from a fresh 169 on mako I get the above error
> message after downloading the update and trying to install it.
>
> http://popey.com/~alan/phablet/device-2014-02-07-112816.png
>
> TEST CASE (verified on mako):
> 1. Flash the device with an older build than latest in -proposed
> $ phablet-flash ubuntu-system --channel devel-proposed -b --revision 176
> 2. Wait
> 3. On the device: Skip intro
> 4. Configure networking and make sure the device connects to the Wifi
network
> 5. Go to system-settings
> 6. Select 'Updates'
> 7. The device will detect that an update is available (e.g 178),
proceed with the update
>
> EXPECTED RESULT:
> Upgrade succeeds and user can reboot the device to apply them
>
> ACTUAL RESULT:
> Upgrade fails with
> FileNotFoundError: /var/lib/system-image/blacklist.tar.xz
>
> To manag...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers