Downloads should not lock Apt

Bug #830492 reported by James Haigh on 2011-08-21
2
Affects Status Importance Assigned to Milestone
APT
Won't Fix
Undecided
Unassigned
Aptdaemon
Undecided
Unassigned
aptboost
Fix Committed
Critical
James Haigh
synaptic
Won't Fix
Undecided
Unassigned
apt (Ubuntu)
Undecided
Unassigned
synaptic (Ubuntu)
Undecided
Unassigned

Bug Description

It should be possible to download while installing, or download 2 things at once.

Currently an apt-get command with '--download-only' will lock apt (/var/lib/dpkg/lock). Synaptic even locks apt just by being open!

Examples:
* I might be updating my system. The update is still in the download stage, and I want to download and install a small package for something. Currently I would have to stop the download, install the package, and resume the download.
* I'm installing lots of application from the Software Centre. While 1 app is installing the next should be downloading.
* Run apt-get from command line while browsing software in Synaptic.

To avoid multiple instances of apt downloading the same file, each partial file should have it's own lock (For example, /var/cache/apt/archives/partial/inkscape_0.48.1-2ubuntu2_i386.deb would have inkscape_0.48.1-2ubuntu2_i386.lock). It may also be necessary to prevent updating package lists (sudo apt-get update) when any download lock exists as well as apt's main lock.

description: updated
Julian Andres Klode (juliank) wrote :

 (a) synaptic needs an exclusive lock in order to work. If you install a package while running synaptic, synaptic might produce random garbage, crash, or whatever else, because the cache it uses does not reflect the reality.

 (b) Locking single download files would be too complicated, we can either lock the complete download stage or not. Otherwise, things start getting crazy when two applications want to download the same file.

 (c) If you are updating your system, you obviously can't do other changes in the meantime, as then the system state changes, and the update can get broken. Imagine you install B while upgrading your system which in turns installs A, and A conflicts with B. Now, if you install B in the background, APT would try to install A as part of the upgrade and fail.

Not locking downloads while installing might be possible to add at the moment, but then there's also work to download & install in parallel which renders this impossible again (because we need to lock downloads and installs, as we are doing both at the same time).

James Haigh (james.r.haigh) wrote :

a: This could be solved be checking the cache just after locking for install. If any changes have occurred that conflict with the install then Synaptic should remove the install lock and ask the user what do to about the conflict. This would be failsafe because it is checking /after/ locking for install, but Synaptic could additionally check periodically to give users an early warning.

b: I already considered b in the bug description:
"To avoid multiple instances of apt downloading the same file, each partial file should have it's own lock (For example, /var/cache/apt/archives/partial/inkscape_0.48.1-2ubuntu2_i386.deb would have inkscape_0.48.1-2ubuntu2_i386.lock)."

This would prevent 2 applications from downloading the same file.

c: So while an update is still in the download phase, I install a package that conflicts with the update. This is very rare because if the package /is/ going to conflict, it will most likely conflict /before/ the update. But in this case, when the update finishes downloading, it should lock for install, check the cache and notice the conflict, remove the install lock, print a message about the conflict, and quit.

Julian Andres Klode (juliank) wrote :

> This could be solved be checking the cache just after locking for install.
Not at all. Synaptic requires more information then found in the cache, and it needs the database on the system, that is, /var/lib/dpkg/status and other lists to not change in order to be able to display the information correctly. Otherwise, it tries reading at the wrong offset in the file and who knows what happens then.

> This would prevent 2 applications from downloading the same file.
Yes, but you still have the file in the download lists in both runs. The problem here is that you can't display correct progress information anymore and would have to wait until the other process finishes the download and you probably can't download something else in between, without major changes.

On the other things, it's just too complicated to lock for installation, then regenerate the cache and transfer the modifications to the new cache (you'd need to correctly map from the old cache to the new one). Then you might even notice that you need to download some more packages, because someone removed packages while you downloaded, etc. In Summary: It's just too complicated to do right.

James Haigh (james.r.haigh) wrote :

Yes, it's not easy, but I would like to see this in the long-term.

Also work on this bug may be common with work on:
https://bugs.launchpad.net/ubuntu/+source/synaptic/+bug/80753

James Haigh (james.r.haigh) wrote :

I have thought of a solution involving a Python script that is simple enough for me to work on, without having to touch apt-get code.

The reason this bug is important to me is that I have a slow Internet connection. A more general solution is to implement a network shared package cache like Pacserve[1] or pkgdistcache[2].

It doesn't matter that apt is locked during a download if the packages are already in the local cache (/var/cache/apt/archives/) because the download stage is instantaneous. My script will run a virtual package server on localhost which should be set as the only repository in Software Sources.

When my script receives a request it starts downloading the package from the quickest location, and sends an error to the apt client. My script will then get a complete list of packages to download, and the package manager will exit. I will also make a wrapper for apt-get so that when the download is finished, installation can proceed automatically.

Synaptic will require bug #80753 to be fixed so that the lock is only made when changes are applied. It then also needs to integrate with the download script to display progress and proceed with installation automatically.

Ubuntu Software Centre would just need to integrate with the script because it is not affected by bug #80753 anyway.

So I'm assigning this bug to me, but for Synaptic this solution is blocked by bug #80753, so can someone please try to fix that.

[1] http://xyne.archlinux.ca/projects/pacserve/
[2] http://venator.ath.cx/dw/doku.php?id=linux:pkgdistcache

Changed in apt:
assignee: nobody → James Haigh (james.r.haigh)
status: New → In Progress
Julian Andres Klode (juliank) wrote :

Your local hacks are not part of APT, so do not assign bugs for your personal hacking projects.

Changed in apt:
assignee: James Haigh (james.r.haigh) → Julian Andres Klode (juliank)
status: In Progress → New
assignee: Julian Andres Klode (juliank) → nobody
Julian Andres Klode (juliank) wrote :

I'd prefer to close this as "Won't fix" or "Opinion", as it is contradicts plans for APT to download and install in parallel; but will discuss this in the APT team first.

James Haigh (james.r.haigh) wrote :

"Your local hacks are not part of APT, so do not assign bugs for your personal hacking projects."

Sorry Julian, I just wanted to help.

However, they won't be "local" and they won't be "hacks" any more than APT is a 'hack' for dpkg.

Changed in aptboost:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → James Haigh (james.r.haigh)
Changed in synaptic:
status: New → Confirmed
James Haigh (james.r.haigh) wrote :

I don't mind you closing this as 'Won't fix', as I'm now working on a fix for apt-get.

What are the plans for APT to download and install in parallel? That's one of my goals.

Michael Vogt (mvo) wrote :

Hey James! Great to hear that you work on a more general fix for apt-get. You may want to join the irc channel #debian-apt on irc.oftc.net at some point to talk about your plans so that we can coordinate.

Download and install in parallel is currently progressing only slowly, during the current summer of code we got a new order algorithm that can now order the packages in a way that puts them together in groups that can be installed independently.

You linksed to "pacserve" which is quite interessting. Are you aware of the similar (but not quite the same) project "squid-deb-proxy" and "squid-deb-proxy-client" ?

Michael Vogt (mvo) wrote :

I close this as won't fix for apt and synaptic for now. I personally think that its a good idea, but its at the wrong layer. I.e. there should be something like aptdaemon or packagekit that you can tell do queue another download and it should just do that. This may requires some targeted changes in libapt as well, so I'm inclined for a "whishlist" tag actually.

Changed in synaptic (Ubuntu):
status: New → Won't Fix
Changed in apt (Ubuntu):
status: New → Won't Fix
Changed in apt:
status: New → Won't Fix
Changed in synaptic:
status: Confirmed → Won't Fix
James Haigh (james.r.haigh) wrote :

Hey Michael, sorry I've not been participating recently.

I've had a good look at Aptdaemon and touched on the Python APT bindings.

This is what I envision:

A transaction's download and install parts are separated.

== Install ==

The install part should be like apt-get's '--no-download' option, so '_apply_changes' should never download files. No lock should be waiting for a download; this is wasting time. Only installs/removes/upgrades/triggers/etc should be holding a lock; the things that actually need it.

When taking the next transaction in the queue in the function '_on_transaction_done', instead of:
            next_trans = self._queue.popleft()
next_trans should be the next transaction /that has nothing to download/.

== Download ==

apt-get's '--print-uris' option outputs lines like this:

'http://gb.archive.ubuntu.com/ubuntu/pool/universe/g/gnash/gnash_0.8.9-1ubuntu1_i386.deb' gnash_0.8.9-1ubuntu1_i386.deb 197240 MD5Sum:32996ac2d709e1c6e0b65a90272202bb

Of course all of this information can be represented in the form of a magnet URI, something like this:

magnet:?as=http%3A//gb.archive.ubuntu.com/ubuntu/pool/universe/g/gnash/gnash_0.8.9-1ubuntu1_i386.deb&dn=gnash_0.8.9-1ubuntu1_i386.deb&xl=197240&xt=urn:md5:32996ac2d709e1c6e0b65a90272202bb

If the apt repo could also supply the BTIH, then the whole thing becomes very easy. A 'xt=urn:btih:' parameter can be added to the magnet which will be sent via RPC to Aria2 in daemon mode or deluge's server part.

This approach could also allow load balancing of mirrors by specifying multiple Acceptable Source ('as') params to the magnet, and would make it trivial to obtain packages from other machines on the LAN making downloads upto 2 orders of magnitude faster out-of-the-box.

I prefer the idea of having only one download utility on a machine (or even a LAN), this way downloads can be reordered or paused by any RPC client.

When each file completes it is placed in /var/cache/apt/archives/ as usual, and of course when all files for a transaction complete, that transaction can be applied.

So how is this sounding? Personally, I can't wait to get this working.

James Haigh (james.r.haigh) wrote :

> Great to hear that you work on a more general fix for apt-get.

As described in comment #5, I was originally planning to make a virtual package server such that a list of files can be easily obtained from any package manager just by directing it to the server. It would then download them and place in /var/cache/apt/archives/, then you would try again in the package manager which would have nothing to download. I was going to wrap apt-get such that the retry is automatic. Then I noticed that instead of wrapping apt-get, there are some Python APT bindings available. And I was about to start reinventing parts of Aptdaemon.

So instead of using apt-get, I will use aptdcon and implement solutions in Aptdaemon as described in comment #7.

After I have done this, I may still apply the virtual package server idea as an Aptdaemon client. This may allow 'legacy' package managers like Synaptic to /partially/ use Aptdaemon and allow concurrent downloads/installs. Synaptic, however, will still require bug #80753 to be fixed in this case.

> Download and install in parallel is currently progressing only slowly, during the current summer of code we got a new order algorithm that can now order the packages in a way that puts them together in groups that can be installed independently.

That's interesting. Are those groups equivalent to transactions?

> You linksed to "pacserve" which is quite interessting. Are you aware of the similar (but not quite the same) project "squid-deb-proxy" and "squid-deb-proxy-client" ?

No, I hadn't. My long-term aim is that more efficient downloads (from LAN if available) can be achieved without configuring or messing around with proxies/shared caches/local mirrors/etc.

So I'm preferring to contribute to Aptdaemon. It would be awesome if Ubuntu has these Aptdaemon solutions out-of-the-box. I'm also thinking that this is going to take a great deal of load off Ubuntu's servers if/when this happens.

James Haigh (james.r.haigh) wrote :

> So instead of using apt-get, I will use aptdcon and implement solutions in Aptdaemon as described in comment #7.

Huh?! I mean comment #12.

Changed in aptboost:
status: In Progress → Incomplete
James Haigh (james.r.haigh) wrote :

Michael, I've been working on implementing this in Aptdaemon. It's going quite well. I have downloads working concurrently without locking Apt.

I'm testing with Update Manager, as it's a little more verbose than Software Centre.

Download progress isn't working, I'm trying to figure out how/when/where it updates over DBus. I haven't used GObject before. I have a function to update the value but I don't know where to call it from.

I'm currently using Aria2 for downloads. It would be very easy to extend to use Bittorrent if the repos could supply a BTIH for every file.

Changed in aptdaemon:
assignee: nobody → James Haigh (james.r.haigh)
status: New → In Progress
James Haigh (james.r.haigh) wrote :

I have a refresh() function in TransactionQueue to update the progress information from Aria2, and to start install when download is finished for a Transaction. This function should be called at least every second.

Would this be done by GObject signals?

Sebastian Heinlein (glatzor) wrote :

Hello James, I am the main author of aptdaemon.

I don't understand why you want to add a custom download tool? Apt already supports a variety of protocols. If you want to support new protocolls you should target apt itself. This way all tools which are based on apt could make use of it transparently.

There is already a branch of mine in which I started separating the download bits from the cache modifying and the install ones:

lp:~aptdaemon-developers/aptdaemon/separation

Aptdaemon in this branch already allows to download while installing other packages.

Cheers,

Sebastian

James Haigh (james.r.haigh) wrote :

Hi Sebastian.

I got progress to refresh on Thursday night. Software Centre works flawlessly for install and remove, and Update Manager works and shows progress but is missing details for files. Updating package information still works the old way, so those downloads still lock but are relatively small.

"I don't understand why you want to add a custom download tool? Apt already supports a variety of protocols."

So does Aria2. By using a dedicated download tool we save reinventing/reimplementing. Package Management in Ubuntu is moving towards Aptdaemon anyway. By all means add support in Apt, but I can't help out with that yet as my C is poor.

I'm hoping to add Bittorent support but in any case the BTIH would be required. It would be ideal if this was supplied along with the MD5, SHA1 and SHA256 hashes.

Using Aria2 or Deluge, we already have great features like Local Peer Discovery and DHT. LPD would allow a LAN to download updates only once rather than the number of Ubuntu machines on the LAN. Out-of-the-box! It would make many updates and installs a lot quicker, and save data throughput for both the LAN and Canonical's servers.

I'll look at your branch and I'm also in the process of cleaning up and preparing my branch. My code is still a bit messy because I'm inexperienced and new to Aptdaemon/Python-apt.

I'm hoping that these features will make it into Precise. I'm hoping to save some bandwidth for people, and make updates, installs and removes a lot quicker and more fluent.

Anyway, it will be a /lot/ harder to use Bittorent if there are no BTIHs available. Could they be computed on the repo servers along with the other hashes?

Thank you.

Sebastian Heinlein (glatzor) wrote :

Hello James,

There is already squid-deb-proxy-client which was written by michael and allows to automatically detect apt proxies in the local network.

I don't currently plan to merge the download separation branch for Precise since it contains some deep changes to aptdaemon and still requires some new features of apt which haven't been written yet, e.g. accessing the apt_pkg.Acquire instance of an ListUpdate.

You are very excieted about your idea and put a lot of work into getting it done, but I am not sure if this is the right direction that aptdaemon should go. So please share your code as early as possible. It is better to talk, plan and decide about features before actually writting some code. I don't want you to waste your time.

Cheers,

Sebastian

James Haigh (james.r.haigh) wrote :

While cleaning up the code, I realised how to show the details. So now details for each file works as well.

I've tried it with Software Centre and Update Manager and they both work well.

I'm using this myself now, not just testing. I'm quite happy with it. As far as I'm concerned this bug is fixed and no longer affects me.

Just need Bittorrent and LPD now. But I'm not sure where to lookup the BTIHs from, as they won't be on Ubuntu's servers. (At least not for a while.) Maybe I could use DHT, but I don't know the scope of DHT or whether that's possible. Or Avahi for a LAN-only solution.

I'll file a new ticket.

The branch is branched off r661 as that's is the latest version for good ol' Natty.

Changed in aptdaemon:
assignee: James Haigh (james.r.haigh) → nobody
James Haigh (james.r.haigh) wrote :

BTW, I actually started on r646 because that was the version I had installed when I started attempting to fix it.

I merged to r661 which was an update for Natty before making a branch.

I tried a later revision but it didn't work. It probably depends on a lot of new things in Apt, python-apt, etc.

I hope you don't mind that I'm developing on Natty. I'm not a fan of the changes to Unity in Oneiric, and will probably switch to Gnome 3 before upgrading.

Changed in aptboost:
status: Incomplete → Fix Committed
summary: - Downloads should not lock apt
+ Use an alternative download manager (Aria2)

Sorry, James. But I cannot merge your changes. The alternative downloader has to be optional and provide a clear advantage over the current situation. You haven't solved the situation when other package manager try to download packages. Furthermore you have to make the changes for the latest revision of aptdaemon - that is where the development takes place. Your statement to no longer work on this doesn't make the situation better. You have to coloberate with the other developers and not just dropping some piece of code and hoping someone will pick it up automatically.

By the way you can even use a full gnome3 with Ubuntu oneiric. Just install the gnome meta package.

Changed in aptdaemon:
status: In Progress → Opinion
James Haigh (james.r.haigh) wrote :

"Sorry, James. But I cannot merge your changes."

That's ok. I wasn't expecting you to after you said you weren't planning to merge your own branch before Precise. I think I was a bit ambitious to hope this got into Precise.

"The alternative downloader has to be optional and provide a clear advantage over the current situation."

Yes, it will be. It will support Apt, Aria2 and Deluge. If I can find some decent documentation for Deluge's RPC.

"Furthermore you have to make the changes for the latest revision of aptdaemon - that is where the development takes place."

I'll merge it.

"Your statement to no longer work on this doesn't make the situation better."

Did I say that? Hmm. Maybe I came across like that. Sorry. I'm still working on the Aria2/Deluge support. There are a few things I need to fix.

"You have to coloberate with the other developers and not just dropping some piece of code and hoping someone will pick it up automatically."

Well I did ask some questions a while back and nobody answered so I started anyway. I'm not dropping the code.

BTW, I unassigned myself from Aptdaemon because you were already working on your separation branch.

Use an alternative download manager isn't really a problem, it's a potential solution to a couple of problems:
* 'Downloads should not lock apt'
* Bug #912402

summary: - Use an alternative download manager (Aria2)
+ Downloads should not lock Apt
Changed in aptdaemon:
status: Opinion → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers