Distribution Upgrade fails to complete -- Cannot allocate memory

Bug #154493 reported by Loye Young
10
Affects Status Importance Assigned to Milestone
adept (Ubuntu)
In Progress
Undecided
BigPick

Bug Description

This is similar to Bug 153975, though the specific facts are different.

I fire up adept-manager. Fetch updates. Version Upgrade button lights up. Click.

<snip "Fixin' to do stuff" lingo>
<snip annoying, time-consuming, and bandwidth-wasting behavior of rewriting sources.list file to require use of the official mirrors instead of local network mirror>

Information pop-up window: "Support for some applications ended" with list

clicked Close

Package Changes
"Do you want to start? 1 package is going to be removed. Close applications and documents. "
Remove libgl1-mesa

Clicked Start Upgrade. Package changes window closes, then Distribution Upgrade window closes. Nothing happens after that.

Several retries results in exact same behavior, every time.

The following excerpt is repeated several times in /var/log/dist-upgrade/apt.log (BTW - consideration should be given to time-stamping log):
=================
Original exception was:
Traceback (most recent call last):
  File "/tmp/kde-loyeyoung/adept_managerSnBE6b.tmp-extract/dist-upgrade.py", line 59, in <module>
    app.run()
  File "/tmp/kde-loyeyoung/adept_managerSnBE6b.tmp-extract/DistUpgradeControler.py", line 1346, in run
    self.fullUpgrade()
  File "/tmp/kde-loyeyoung/adept_managerSnBE6b.tmp-extract/DistUpgradeControler.py", line 1328, in fullUpgrade
    if not self.doDistUpgrade():
  File "/tmp/kde-loyeyoung/adept_managerSnBE6b.tmp-extract/DistUpgradeControler.py", line 798, in doDistUpgrade
    res = self.cache.commit(fprogress,iprogress)
  File "/tmp/kde-loyeyoung/adept_managerSnBE6b.tmp-extract/DistUpgradeCache.py", line 69, in commit
    apt.Cache.commit(self, fprogress, iprogress)
  File "/usr/lib/python2.5/site-packages/apt/cache.py", line 215, in commit
    res = self.installArchives(pm, installProgress)
  File "/usr/lib/python2.5/site-packages/apt/cache.py", line 190, in installArchives
    res = installProgress.run(pm)
  File "/usr/lib/python2.5/site-packages/apt/progress.py", line 213, in run
    pid = self.fork()
  File "/tmp/kde-loyeyoung/adept_managerSnBE6b.tmp-extract/DistUpgradeViewKDE.py", line 244, in fork
    self.child_pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
adept_manager: no process killed
adept_updater: no process killed
Starting
Starting 2
Done
MarkUpgrade() called on a non-upgrable pkg: 'kubuntu-desktop'
kdecore (KProcess): WARNING: setPty()
kdecore (KProcess): WARNING: _attachPty() 46
updateStatus: Reading cache
updateStatus: Checking package manager
updateStatus: Updating repository information
updateStatus: Checking package manager
updateStatus: Asking for confirmation
updateStatus: Fetching
updateStatus: Upgrading
Error in sys.excepthook:
Traceback (most recent call last):
  File "/tmp/kde-loyeyoung/adept_managerlC4zOb.tmp-extract/DistUpgradeViewKDE.py", line 460, in _handleException
    if not run_apport():
  File "/tmp/kde-loyeyoung/adept_managerlC4zOb.tmp-extract/DistUpgradeApport.py", line 44, in run_apport
    ret = subprocess.call(p)
  File "/usr/lib/python2.5/subprocess.py", line 443, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.5/subprocess.py", line 593, in __init__
    errread, errwrite)
  File "/usr/lib/python2.5/subprocess.py", line 1061, in _execute_child
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
=================

I have plenty of memory. free -mt shows the following:
=================
             total used free shared buffers cached
Mem: 1011 369 641 0 2 139
-/+ buffers/cache: 227 784
Swap: 0 0 0
Total: 1011 369 641
=================

I have plenty of disk space. df -h shows the following:
=================
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 36G 9.7G 24G 30% /
varrun 506M 196K 506M 1% /var/run
varlock 506M 0 506M 0% /var/lock
udev 506M 84K 506M 1% /dev
devshm 506M 0 506M 0% /dev/shm
lrm 506M 34M 472M 7% /lib/modules/2.6.22-14-generic/volatile
==================

Both before and after the upgrade tool rewrote the sources.list <snip grumbling under breath>, aptitude full-upgrade shows:
==================
<yada, yada, yada>
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
<yada, yada, yada>
==================

do-release-upgrade shows:
==================
Checking for a new ubuntu release
No new release found
==================

cat /etc/lsb-release shows:
==================
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=7.10
DISTRIB_CODENAME=gutsy
DISTRIB_DESCRIPTION="Ubuntu 7.10"
==================

Happy Trails,

Loye Young
Isaac & Young Computer Company
Laredo, Texas
http://www.iycc.biz

Revision history for this message
BigPick (wpickard) wrote :

I can confirm this error. Identical circumstances.

Upgrade Feisty -> Gusty using adept "Distribution Upgrade Tool". Tool goes through all steps until "Installing the upgrades", at which point the tool closes unexpectedly.

Upgrade was completed using aptitude, however some important scripts have obviously not been run as dbus is failing initialize properly at boot and requires manual restart to get Network Manager, guidance-power-manager and HAL related operations to run properly.

Additionally, adept continues to show the "Version Upgrade" button even though aptitude full-upgrade and /etc/lsb-release report the upgrade is complete as mentioned above.

Revision history for this message
BigPick (wpickard) wrote :

UPDATE:

The dbus problem was resolved. At some point during the upgrade the /etc/rc2.d/S12dbus link was changed to /etc/rc2.d/S50dbus. The /etc/rc2.d/S10acpid was also changed to /etc/rc2.d/S50acpid. These changes resulted in the dbus and acpid init scripts being called too late in boot process.

At first I thought this was a freak accident, however I was surprised to find that these changes were repeated upon running the "Distribution Upgrade Tool" again. I am at a loss to explain why this is occurring.

Revision history for this message
Loye Young (loyeyoung) wrote : Re: [Bug 154493] Re: Distribution Upgrade fails to complete -- Cannot allocate memory

You say that the dbus problem was "resolved". Were you able to successfully
complete after changing the links, or did the problem resurrect when trying
again?

On 10/21/07, BigPick <email address hidden> wrote:
>
> UPDATE:
>
> The dbus problem was resolved. At some point during the upgrade the
> /etc/rc2.d/S12dbus link was changed to /etc/rc2.d/S50dbus. The
> /etc/rc2.d/S10acpid was also changed to /etc/rc2.d/S50acpid. These
> changes resulted in the dbus and acpid init scripts being called too
> late in boot process.
>
> At first I thought this was a freak accident, however I was surprised to
> find that these changes were repeated upon running the "Distribution
> Upgrade Tool" again. I am at a loss to explain why this is occurring.
>
> --
> Distribution Upgrade fails to complete -- Cannot allocate memory
> https://bugs.launchpad.net/bugs/154493
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Loye Young
Isaac & Young Computer Company
Laredo, Texas
(956) 857-1172
<email address hidden>

Revision history for this message
BigPick (wpickard) wrote :

My apologies, I was not clear.

I was not able to complete the "Distribution Upgrade Tool" after changing the links, however I was able to boot the system successfully. At this point the system is running perfectly, including compiz fusion, which is a pleasant surprise considering the amount of pain and agony usually required to get this laptop fully operational. The only remaining side effect is that adept still shows the "Version Upgrade" button.

Sadly, clicking on that "Version Upgrade" button still results in the tool telling me that there are no upgrades available, asking me to file a bug report, then closing unexpectedly leaving behind the broken links. Its a vicious cycle. I'm just going to save myself some pain for now and refrain from pressing that little magic button of agony.

The broken links may be an oddity of my system only. I have never heard of this occurring before. But the failure of the "Distribution Upgrade Tool" remains almost identical to what you described.

Revision history for this message
Loye Young (loyeyoung) wrote : [Bug 154493] Distribution Upgrade fails to complete -- Cannot allocate memory

On Monday, October 22, 2007 8:04:09 am BigPick wrote:
> I have never heard
> of this occurring before. But the failure of the "Distribution Upgrade
> Tool" remains almost identical to what you described.

I haven't had great luck with the Distribution Upgrade Tool nor the
do-release-upgrade script. My sense is that it works just peachy if your
system is configured and connected to the internet in a simple case. But
various situations cause it to fail.

Revision history for this message
BigPick (wpickard) wrote :

I have identified the error and written a patch to the dist-update scripts.

The error is very serious in nature and is likely causing many other bugs, crashes and corruptions. As indicated in the error logs, the "Distribution Upgrade Tool" exits due to an out of memory error when trying to spawn a thread. This low memory condition is caused by an infinite loop in the "doUpdate()" method in the DistUpgradeControler class. The infinite loop in turn, is a result of an incomplete "markUpgrade()" method in "/usr/lib/python2.5/site-packages/apt/package.py". The loop is incremented upon reception of an IOError Exception, however this exception is not thrown as expected. The loop is designed to terminate if no Exception is received, but an Exception of a type other than IOError is sometimes returned resulting in an endless loop.

To fix these issues, the patch replaces the reference to the stub "markUpgrade()" method with "markInstall()". While not ideal, the result is the same.

Additionally, I added some robustness to the maxRetries loops in DistUpgradeControler.py by converting them to 'for' loops. A fall-back 'except' was also added to the loops purely for the purposes of debugging.

To use this patch, download the Gutsy dist-upgrade script package from http://archive.ubuntu.com/ubuntu/dists/gutsy/main/dist-upgrader-all/0.81/gutsy.tar.gz, extract and apply the patch. The just navigate to the extracted directory and run "./dist-upgrade.py".

Revision history for this message
Loye Young (loyeyoung) wrote : Re: [Bug 154493] Re: Distribution Upgrade fails to complete -- Cannot allocate memory

On Monday, October 22, 2007 4:01:03 pm BigPick wrote:
> To use this patch, download the Gutsy dist-upgrade script package from
> http://archive.ubuntu.com/ubuntu/dists/gutsy/main/dist-upgrader-
> all/0.81/gutsy.tar.gz, extract and apply the patch. The just navigate to
> the extracted directory and run "./dist-upgrade.py".
>
When you say "extract and apply the patch," what do you mean? Do you mean
simply extract, or is there an extra step to "apply"?

After testing, will the patched file be automagically downloaded as part of
the normal updating and upgrading process?

Revision history for this message
BigPick (wpickard) wrote :

An example will help. What we want to do is manually download the "dist-upgrade" scripts from the archives without adept. We are then going to patch this script, and run it locally. Adept does the exact same thing, except it stores the "dist-upgrade" scripts in a temporary folder that is removed once the scripts exit. We need the scripts to stick around.

(I also just realized I misspelled the patch name. Oh well.)

Example:
~$ wget http://archive.ubuntu.com/ubuntu/dists/gutsy/main/dist-upgrader-all/0.81/gutsy.tar.gz
~$ mkdir gutsy
~$ cd gutsy
~/gutsy$ tar xzvf ../gutsy.tar.gz
~/gutsy$ wget http://launchpadlibrarian.net/10123821/gusty-dist-upgrade.patch
~/gutsy$ patch -p1 < gusty-dist-upgrade.patch
~/gutsy$ sudo ./dist-upgrade.py

Revision history for this message
BigPick (wpickard) wrote :

Bug described is dangerous memory leak in dist-upgrade.
Bug confirmed and reproducible.

Possible patch uploaded.

Changed in adept:
assignee: nobody → wpickard
status: New → In Progress
Revision history for this message
Christian Assig (chrassig) wrote :

After applying BigPick's patch and running dist-upgrade.py, I still get an "[Errno 12] Cannot allocate memory" message.
The only differences are that the message appears in a message box now (I had to take a look at the log files before), and after the error message the script continues by trying to restore the previous state of the system.

Revision history for this message
BigPick (wpickard) wrote :

Well its a step in the right direction. The program is now failing more gracefully and is at least able to attempt recovery.

This indicates that the infinite loop has been resolved, but the memory leak persists. I'm going to take a closer look at how the scripts are handling child processes.

Revision history for this message
BigPick (wpickard) wrote :

Hey Christian, could you do me the favor of posting the logs of your latest attempt. I'm having trouble finding the source of the error you described.

Many thanks for helping fix this bug!

Revision history for this message
Christian Assig (chrassig) wrote :

No worries at all, here you go

Revision history for this message
Christian Assig (chrassig) wrote :
Revision history for this message
Christian Assig (chrassig) wrote :
Revision history for this message
BigPick (wpickard) wrote :

Well those logs look very promising. As expected the patch stops the script from going into an infinite loop, however it did not solve the memory leak.

2007-10-23 18:34:55,490 INFO cache.commit()
2007-10-23 18:34:59,375 ERROR Unhandled exception in cache.commit(): '[Errno 12] Cannot allocate memory'. Retrying (currentRetry: 0)
2007-10-23 18:34:59,375 INFO cache.commit()
2007-10-23 18:35:01,607 ERROR Unhandled exception in cache.commit(): '[Errno 12] Cannot allocate memory'. Retrying (currentRetry: 1)
2007-10-23 18:35:01,607 INFO cache.commit()
2007-10-23 18:35:02,279 ERROR Unhandled exception in cache.commit(): '[Errno 12] Cannot allocate memory'. Retrying (currentRetry: 2)

This clearly shows that somewhere, something is raising exceptions of a type other than "SystemError", which most of the try/catch checks are looking for. So I have gone ahead and updated the patch to switch some of the try/catches to look for the more general "Exception" type instead of the specific "System Error" type. Not all have been changed, only those that I suspect are behaving badly. This will serve the dual purpose of helping the script to recover from these exceptions earlier, and it will properly log these exceptions.

Unfortunately, I am unable to replicate this error on my machines, so I would greatly appreciate if you would download this new patch and, going through the same procedure in the above example, test the upgrade utility again. This particular patch probably won't fix the issue, but it will help identify what we need to do.

When applying the patch, I recommend using a fresh extraction of gutsy.tar.gz; when the upgrade tool runs lots of changes are to the directory that might freak out the patch.

Revision history for this message
Christian Assig (chrassig) wrote :

Logs based on BigPick's patch rev 2

Revision history for this message
Christian Assig (chrassig) wrote :
Revision history for this message
Christian Assig (chrassig) wrote :
Revision history for this message
John (jglabbee-gmail) wrote :

Patch solved my stalled upgrade. Logs attached as requested.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.