Automatic translations export fails intermittently

Bug #648075 reported by Stefan Schweizer on 2010-09-26
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
Aaron Bentley

Bug Description

Sometimes translations export to branch will stop working for a particular project. At the moment we have ~20 projects in that state.

Our export would fail with one of two errors:

 * ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
 * ERROR Failure: ValueError("We are missing inventories for revisions: [StaticTuple(...)]",)

Example branches where this still happens: lp:pino, lp:~bjorn-lindeijer/mana/translations-export

However, it seems this has all started a few months ago (out of a very small sample I looked at closely), so perhaps it's a codehosting problem that has since been fixed.

Workaround for branch owners: commit to your branch to get it to be scanned again. (commit --unchanged should be fine)

Related OOPSes: OOPS-1875TEB1, OOPS-1875TEB15, OOPS-1922BZR14871, OOPS-1922BZR19481

Related branches

We seem to be consistently getting problems committing to your branch since August 4th:

2010-08-04 03:52:44 INFO Exporting Trackservice trunk series.
2010-08-04 03:52:45 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
...
2010-09-22 05:44:02 INFO Exporting Trackservice trunk series.
2010-09-22 05:44:03 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-09-24 05:24:50 INFO Exporting Trackservice trunk series.
2010-09-24 05:24:51 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-09-25 05:26:19 INFO Exporting Trackservice trunk series.
2010-09-25 05:26:20 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-09-26 05:27:51 INFO Exporting Trackservice trunk series.
2010-09-26 05:27:52 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-09-27 05:25:56 INFO Exporting Trackservice trunk series.
2010-09-27 05:25:57 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)

Provided you don't hold any locks on the branch, we'll need to have our LP Code guys investigate into this problem. Can you try committing and pushing the branch yourself, or using "bzr break-lock" to see if there are any active locks.

Once you provide information about how this goes, please update the bug report back to 'New' so we act on it asap.

Changed in rosetta:
status: New → Incomplete

I do not hold a lock on the branch and I just pushed it with no changes (bzr commit --unchanged). Maybe that helps, I keep you posted. Thanks.

It seems to have worked: sorry for the trouble it caused. I guess it was just in a weird state recorded in Launchpad (there were some changes in code-hosting over the last month or two). We'll keep an eye for how many other projects it might affect.

Yes, thanks. The translations are now up-to-date.

Download full text (4.3 KiB)

We are seeing failures on other projects as well:

2010-12-22 04:41:49 INFO Exporting Ubuntu support points map trunk series.
2010-12-22 04:41:59 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 04:41:59 INFO Exporting onBoard trunk series.
2010-12-22 04:42:34 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:47:18 INFO Unable to obtain lock held by codehost
2010-12-22 05:47:47 ERROR Failure: LockContention(Could not acquire lock "(local)": )
--
2010-12-22 05:48:16 INFO Exporting Déjà Dup 10 series.
2010-12-22 05:48:37 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:48:37 INFO Exporting PulseAudio Mixer Applet trunk series.
2010-12-22 05:48:39 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:48:56 INFO Exporting Lucruri trunk series.
2010-12-22 05:49:16 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:51:02 INFO Exporting AUSTRUMI trunk series.
2010-12-22 05:51:19 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:51:20 INFO Exporting chive trunk series.
2010-12-22 05:51:48 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:51:49 INFO Exporting onBoard 0.92 series.
2010-12-22 05:51:56 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:53:04 INFO Exporting Mana 1.0 series.
2010-12-22 05:55:37 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:55:40 INFO Exporting TestDrive trunk series.
2010-12-22 05:56:05 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:56:06 INFO Exporting YouAmp maemo4 series.
2010-12-22 05:56:10 ERROR Failure: ValueError("We are missing inventories for revisions: [StaticTuple('codehost@crowberry-20091030083223-p0lrrb3o9l5450m0',), StaticTuple('<email address hidden>',), StaticTuple('<email address hidden>',), StaticTuple('<email address hidden>',), StaticTuple('<email address hidden>',)]",)
--
2010-12-22 05:56:13 INFO Exporting Boots trunk series.
2010-12-22 05:56:15 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:57:37 INFO Unstacking branch to work around bug 375013.
2010-12-22 05:57:38 ERROR Failure: ValueError("We are missing inventories for revisions: [StaticTuple('<email address hidden>',), StaticTuple('<email address hidden>',)]",)
--
2010-12-22 05:57:40 INFO Exporting Pino trunk series.
2010-12-22 05:57:44 ERROR Failure: ConcurrentUpdateError('Branch has been changed. Not committing.',)
--
2010-12-22 05:57:45 INFO Exporting Cover Thumbnailer v0.7 series.
2010-12-22 05:57:50 ERROR Failure: ConcurrentUpdateError('Branch has been chang...

Read more...

Changed in launchpad:
status: Incomplete → Triaged
importance: Undecided → Medium
summary: - Automatic export stopped working
+ Automatic translations export stopped working
description: updated
description: updated
tags: added: branch-scanner

I tried the suggested workaround of "bzr commit --unchanged" but got the following error. Is that normal?

"""
Unable to obtain lock held by codehost
at crowberry [process #3483], acquired 2097 hours, 7 minutes ago.
See "bzr help break-lock" for more.
bzr: ERROR: Could not acquire lock "(remote lock)": bzr+ssh://bazaar.launchpad.net/~mterry/deja-dup/trunk-translations/
"""

No, it means that something is holding a lock, and that you hit a different problem. Unfortunately, I am not sure I can find someone who can help debug this at this time.

Changed in launchpad:
importance: Medium → High
Ursula Junque (ursinha) on 2011-02-21
description: updated
tags: added: oops
Changed in launchpad:
importance: High → Critical
Robert Collins (lifeless) wrote :

danilos - thats a stale lock - open for 2000 hours - probably a failed backend process that didn't clean up properly. break-lock on the branch should fix it.

Processes that get killed as part of the rollout usually do not "clean up properly". I am pretty sure one of the culprits is translations-export-to-branch.py script. And what's else, it sometimes (though very rarely) gets killed by OOM, which is definitely not something we can anticipate from inside the script, so when that happens, we'd need notification from operations about it (if they can arrange that).

Other branch handling processes can put it into an undeterminate state, and we still rely on LP bazaar-experts to fix those cases when they occur.

On Wed, Feb 23, 2011 at 1:27 AM, Данило Шеган <email address hidden> wrote:
> Processes that get killed as part of the rollout usually do not "clean
> up properly".

Any reason why not? We kill things in a staggered process, starting by
simply disabling cron, and only actually killing at the last possible
moment. We can make sure we start with SIGINT that will trigger stack
unwinding.

> I am pretty sure one of the culprits is translations-
> export-to-branch.py script. And what's else, it sometimes (though very
> rarely) gets killed by OOM, which is definitely not something we can
> anticipate from inside the script, so when that happens, we'd need
> notification from operations about it (if they can arrange that).

We can put a ulimit on it - that will cause a MemoryError exception to
unwind the stack, rather than the process being hard-killed.

description: updated
Aaron Bentley (abentley) on 2011-05-18
Changed in launchpad:
assignee: nobody → Aaron Bentley (abentley)
Aaron Bentley (abentley) on 2011-05-18
summary: - Automatic translations export stopped working
+ Automatic translations export fails intermittently
Aaron Bentley (abentley) wrote :

All the evidence suggests that this is a transient problem, so "stopped working" isn't accurate. I've grovelled the logs, and every branch that had a failure later succeeded, except for things that happened today (and presumably haven't had a chance to succeed yet).

There's also no reason to believe that the ConcurrentUpdateError is inaccurate. The problem is just how we handle it. I believe we should retry the job instead of oopsing, and only oops if we exceed max_retries.

Aaron Bentley (abentley) wrote :

Further investigation:
1. This isn't handled by jobs, it's handled by polling, so transient failures are corrected on the next script run.
2. There was a bug in my analysis. With that corrected, some branches do indeed appear to have long-term failures:
2011-04-15 04:50:49 INFO Exporting onBoard trunk series.
2011-04-15 05:08:02 INFO Exporting Déjà Dup 10 series.
2011-04-15 05:08:27 INFO Exporting PulseAudio Mixer Applet trunk series.
2011-04-15 05:08:47 INFO Exporting Lucruri trunk series.
2011-04-15 05:26:27 INFO Exporting AUSTRUMI trunk series.
2011-04-15 05:26:51 INFO Exporting chive trunk series.
2011-04-15 05:27:29 INFO Exporting onBoard 0.92 series.
2011-04-15 05:36:57 INFO Exporting Mana 1.0 series.
2011-04-15 05:39:43 INFO Exporting TestDrive trunk series.
2011-04-15 05:40:33 INFO Exporting Boots trunk series.
2011-04-15 05:40:57 INFO Exporting Pino trunk series.
2011-04-15 05:41:03 INFO Exporting Cover Thumbnailer v0.7 series.
2011-04-15 05:41:56 INFO Exporting perroquet 1.0 series.
2011-04-15 05:51:28 INFO Exporting Humanity Project trunk series.

Aaron Bentley (abentley) wrote :

Many of these (Cover Thumbnailer, Pino, Boots) have branches whose database last_scanned_id is out-of-sync with the actual last_revision in the hosted branch. The last_mirrored_id appears to match the last_scanned_id, because lp:~troorl/pino/trunk does not suggest it is syncing. Unfortunately, last_mirrored_id is not currently exposed in the web service, so it is difficult to be certain.

Aaron Bentley (abentley) wrote :

Testing shows that, at least on the happy path, the script is updating last_mirrored_id. So the cause of the suspected stale last_mirrored_id is unknown. But we can at least fix the behaviour when a stale last_mirrored_id is encountered.

Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
Aaron Bentley (abentley) on 2011-05-30
tags: added: qa-untestable
removed: qa-needstesting
William Grant (wgrant) on 2011-06-09
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers