Activity log for bug #622765

Date Who What changed Old value New value Message
2010-08-23 14:40:33 Michael Nelson bug added bug
2010-08-23 14:43:31 Michael Nelson description With 10.07 we fixed bug 588288 which allowed us to set the maximum number of lines of (each) log file that will be parsed. Initially we'd thought this would help us solve the memory issue when running the parser against the backlog of ppa access logs, but after trialling with logparser_max_parsed_lines set to 100, the PPA log parser still had to be killed as it consumed too much memory. Going back to the code, there are a lot of other improvements that could be made. One that stands out is that currently *all* log files with new lines to parse are opened during get_files_to_parse(), being returned in a dict. This means that when we run the parser against the backlog of ppa access files, there are over 2600 files being opened at once. It would be great to instead use a generator, and limit the number of lines processed to all files, rather than for each file. Note: there is also a comment related to the librarian logfile parser in the docstring at: cronscripts/parse-librarian-apache-access-logs.py: which, applying it to the PPA log file parser, implies that we could additionally update the script to clear the storm cache (store._cache.clear()) at some regular interval (such as after each file is processed). This will reduce the benefit of the cache of course, but will limit the amount of ram storm consumes during the process. With 10.07 we fixed bug 588288 which allowed us to set the maximum number of lines of (each) log file that will be parsed. Initially we'd thought this would help us solve the memory issue when running the parser against the backlog of ppa access logs, but after trialling with logparser_max_parsed_lines set to 100, the PPA log parser still had to be killed as it consumed too much memory. Going back to the code, there are a lot of other improvements that could be made. One that stands out is that currently *all* log files with new lines to parse are opened during get_files_to_parse(), being returned in a dict. This means that when we run the parser against the backlog of ppa access files, there are over 2600 files being opened at once. It would be great to instead use a generator, and limit the number of lines processed to all files, rather than for each file. Note: there is also a comment related to the librarian logfile parser in the docstring at: cronscripts/parse-librarian-apache-access-logs.py: which, applying it to the PPA log file parser, implies that we could additionally update the script to clear the storm cache (store._cache.clear()) at some regular interval (such as after each file is processed). This will reduce the benefit of the cache of course, but will limit the amount of ram storm consumes during the process. Also, with this knowledge, we can QA such a change on dogfood or locally by simply copying the log file that we have many times before running the script.
2010-08-23 16:45:00 Tom Haddon tags canonical-losa-lp
2010-08-23 16:45:10 Tom Haddon bug added subscriber Canonical LOSAs
2010-08-27 18:39:02 Benji York launchpad-foundations: assignee Benji York (benji)
2010-08-27 18:40:44 Benji York launchpad-foundations: status New In Progress
2010-08-27 18:44:23 Benji York branch linked lp:~benji/launchpad/bug-622765
2010-08-29 07:47:32 Launchpad QA Bot launchpad-foundations: milestone 10.09
2010-08-29 07:47:34 Launchpad QA Bot tags canonical-losa-lp canonical-losa-lp qa-needstesting
2010-08-29 07:47:36 Launchpad QA Bot launchpad-foundations: status In Progress Fix Committed
2010-08-31 18:52:44 Gary Poster launchpad-foundations: assignee Benji York (benji) Michael Nelson (michael.nelson)
2010-09-01 21:58:39 Benji York bug added subscriber Benji York
2010-09-07 14:16:51 Julian Edwards tags canonical-losa-lp qa-needstesting canonical-losa-lp qa-ok
2010-09-09 12:44:57 Curtis Hovey launchpad-foundations: status Fix Committed Fix Released