TranslationBranchApprover uses 'walkdirs', will be very slow on large bzr trees

Bug #352944 reported by Henning Eggers
2
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

On 2009-03-27 on #launchpad-code
[13:38] <abentley> henninge: Also, I think you should not be calling to_tree.walkdirs in _init_translation_file_lists. That can be slow for big trees, and you only care about changed files.
[13:38] <henninge> abentley: No, I need a full list.
[13:39] <abentley> henninge: Oh, why?
[13:39] <henninge> abentley: for the approver to work.
...
[13:44] <abentley> henninge: There are other ways, I'm not sure if they're easier or harder. Tree.iter_entries_by_dir comes to mind.
[13:45] <abentley> henninge: But why does the approver want to know about unchanged files?
[13:47] <henninge> abentley: to get the full picture and to know which of the existing database entries have corresponding files in the tree.
[13:47] <henninge> abentley: to catch template files being moved or renamed.
[13:48] <abentley> henninge: Moved or renamed counts as "changed"
[13:49] <henninge> abentley: If done properly using "bzr move"
[13:49] <henninge> yeah, maybe we should expect people to do it properly...
[13:49] <abentley> henninge: If done improperly, it's a create and a delete, and those definitely count as changed.
[13:50] <henninge> abentley: will I see deleted files?
[13:50] <abentley> iter_changes will tell you about deleted files.
[13:51] <abentley> It will say they went from versioned to unversioned, and from kind='file' to kind=None
[13:51] <henninge> but I am only using the to_tree information ...
[13:51] <henninge> so those will not be treated by my code.
[13:51] <abentley> henninge: You are doing to_tree.iter_changes( from_tree, specific_files=file_names)
[13:52] <henninge> abentley: yes, those file_names I gathered during walktree
[13:52] <abentley> henninge: So stop doing that.
[13:52] <henninge> ;-)
[13:52] <abentley> Just do normal iter_changes.
[13:53] <abentley> henninge: Sorry to be pushy about this, but whole-tree operations are a scalability problem.
[13:55] <henninge> abentley: that changes the whole concept of the approver so I will have to think about it.
[13:55] <flacoste> salgado: but yeah, a box with jaunty is a good idea anyway
[13:55] <abentley> henninge: It's certainly something you can fix in a future release.
[13:56] <henninge> abentley: so the scalabilty issue is not imminent?
[13:57] <abentley> henninge: I have no idea. We'll see when it hits production.
[13:57] * henninge shudders.
[13:57] <abentley> You have a much better idea than I how often these jobs will run against how many branches.
[13:59] <henninge> abentley: well, the feature is new so I have the hope that it will not be used widely at first.
[13:59] <abentley> henninge: It must be explicitly enabled? We're fine, then.
[13:59] <henninge> abentley: but it is also a cool feature so people may jump to it...
[13:59] <henninge> abentley: oh yeah, it must!

Revision history for this message
Henning Eggers (henninge) wrote :

If the approver cannot gather information about all template files in the source tree from bazaar it must get this information from the database. Two solution come to mind:

1. It assumes that each potemplate entry in the database has been uploaded from the tree at one point and therefore corresponds to a template file in the source tree. So this list plus added files that iter_changes reports make up the full list of template files. This implies that manual uploads must be prohibited when import from branches is active to avoid that the owner breaks the above assumption by uploading templates. This may be viewed by the user as a limitation.

2. We add a flag to potemplate to indicate its origin. The approver will only consider those templates that are flagged as having been imported from the branch. For new templates arriving in the branch it checks for an existing template of the same name without the flag and uses that (while setting the flag) instead of creating a new template. This way existing templates can be re-used if the project switches to import from branches. If the owner chooses to deactivate import from branches, all flags are removed.

Revision history for this message
Данило Шеган (danilo) wrote :

Unless we disallow changes to the "known-good tree listing" (i.e. the one in the DB) through other means than bzr tree changes (eg. by uploads, admin interface), you must have a full tree to be able to compare.

For auto-approver to be improved even further than what it is now, we might end up needing to get a full list anyway (i.e. incremental changes don't describe the tree well, since we don't have a trusted copy anywhere).

Changed in rosetta:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Robert Collins (lifeless) wrote :

I'm unclear why unmodified files matter to the approver? This seems to make a significant conceptual difference to the work required. Can someone cluey fill me in?

Changed in launchpad:
importance: Medium → High
summary: - TranslationBranchApprover should not use 'walkdirs'
+ TranslationBranchApprover uses 'walkdirs', will be very slow on large
+ bzr trees
Curtis Hovey (sinzui)
tags: added: translations-branch
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.