Bug #484779 “Complete base perfomance test on GHO” : Bugs : Global Health Observatory

Philippe Boucher (boucherp) on 2009-11-18

Changed in gho:
assignee:	nobody → Knut Staring (knutst)
milestone:	none → 1.0
importance:	Undecided → High

Revision history for this message

Philippe Boucher (boucherp) wrote on 2009-11-23:

#1

Performance testing approach:

Analysis of a connection to the GHO through Firebug on Firefox shows that on the client side, the download and rendering of a datatable for an indicator on the left navigation menu is done through a POST to http://host/ghodata/OlapPrint that then returns HTML describing the requested table. I've collected all the post texts required for the tables in World Health Statistics and written a script to sequentially download them - it's is very basic for the moment, the script loops 50 times and will call each entry in the WHS in succession (it does not do simultaneous downloads for the moment). An archive with the script and the post files is attached - cygwin or linux are required, with sh/bash and wget installed. The performance script is perf.sh

Revision history for this message

Philippe Boucher (boucherp) wrote on 2009-11-23:

#2

Simple performance test script + data Edit (13.8 KiB, application/zip)

Changed in gho:
importance:	High → Critical

Revision history for this message

Philippe Boucher (boucherp) wrote on 2009-11-23:

#3

Running the script against the single war file version of the GHO, running on a virtual machine running RedHat linux, 1GB RAM, 1 GB Swap.
Things start off fine, once the script gets going, it downloads each table in .3 to .9 seconds, after about 250 downloads, it starts to slow down and eventually everything stops.
The tomcat server was originally configured to have a larger memory footprint than RAM + Swap - in that case, it stopped responding, although it looks like it was issuing log messages in an infinite loop.
Corrected this so that the memory footprint was small enough to sit within RAM + Swap (in this case Java VM Max memory set to 768M, that translated to an overall process image size of about 920M). Now we get an out of memory error (heap space) after about 350 downloads.
(Log files are attached)
We also get the following error on every table render:

[gho] 23 11 15:56:56 ERROR intl.who.oh2.olapPrinter.helper.OlapPrintModel <init> - toRow paremeter is not nummeric
[gho] 23 11 15:56:57 DEBUG mondrian.mdx execute - 360: select NON EMPTY Crossjoin({[Indicator].[All Indicators].[WHS].[WHS8_110
]}, {[Time Period].[Year].Members}) ON COLUMNS,
NON EMPTY {Order([Location].[Countries].Members, [Location].CurrentMember.Properties("CountryName"), BASC)} ON ROWS
from [Core Health Sum Indicators]
where [Measures].[CHSI Indicator Value (String)]

This error is generated when running the attached performance script as well as when we simply click on an indicator entry in the left menu.

Revision history for this message

Philippe Boucher (boucherp) wrote on 2009-11-23:

#4

Catalina log file Edit (9.5 KiB, application/zip)

Changed in gho:
assignee:	Knut Staring (knutst) → Jiri Dvorak (jiri-dvorak)
status:	New → In Progress

Revision history for this message

Philippe Boucher (boucherp) wrote on 2009-11-23:

#5

Reassigning this to Jiri - we need to have this resolved quickly - we're still targetting a soft launch next week (hand over to IT dept on dec 3) and we're targetting 50K hits/24 hour period - we cant restart the application every X requests

Revision history for this message

Jiri Dvorak (jiri-dvorak) wrote on 2009-11-24:

#6

The error message from OlapPrintModel has been fixed - the new code is now in StarTeam, please refresh.

Revision history for this message

Jiri Dvorak (jiri-dvorak) wrote on 2009-11-24:

#7

Based on the messages earlier in this thread, the system (especially the Java JVM) is running out of memory. The bad news is that there is not much we can do about it on the application side - barring any unexpected internal bugs or memory leaks, the OH/GHO code memory usage is quite modest, and very little can be gained by additional tuning.

By far, the biggest memory consumer is the Mondrian software component from Pentaho, which is being used by GHO as "black box" (by including mondrian.jar in the build).

Other than adding physical memory to the machine, configuring more "generous" virtual memory in Linux, and maximizing memory availability to the JVM executable (via "java ... -ms ... -mx ..." etc.), the only method of tuning Mondrian resource usage is via the file /who_gho/resources/mondrian.properties . That tuning involved a bit of "black magic" and some testing, and the only official documentation we were able to find is on the URL http://mondrian.pentaho.org/documentation/configuration.php .

Specifically, the following parameters may have immediate influence on Mondrian memory usage (with "sample" values):

- mondrian.query.limit = 4 ... maximum number of simultaneous queries the system will allow

- mondrian.result.limit = 50000 ... maximum size of a result set

- mondrian.rolap.evaluate.MaxEvalDepth = 10 ... maximum number of passes allowable while evaluating an MDX expression

- mondrian.rolap.queryTimeout = 10 ... limits the number of seconds a query executes before it is aborted

- mondrian.rolap.IterationLimit = 10 ... maximum number of iterations allowed when evaluating an aggregate

- mondrian.olap.fun.crossjoin.optimizer.size = ??? ... see the URL, this is pure black magic

- mondrian.rolap.star.disableCaching = true ... setting this to true will clear aggregate cache after each query; that may save a lot of memory

- mondrian.expCache.enable = false ... see the URL, may save some memory, but some MDX queries will run more slowly

- mondrian.rolap.EnableRolapCubeMemberCache ... see the URL, this one probably should stay as "true", but may be worth 1-2 tests ...

- mondrian.util.memoryMonitor.enable = true and mondrian.util.memoryMonitor.percentage.threshold = 60(or 70,80,90) can be used to see memory usage messages earlier in the log file during testing

ALSO PLEASE READ CAREFULLY THE WHOLE SECTION "MEMORY MANAGEMENT" at the URL http://mondrian.pentaho.org/documentation/configuration.php !!!!

Based on the messages earlier in this thread, the system (especially the Java JVM) is running out of memory.  The bad news is that there is not much we can do about it on the application side - barring any unexpected internal bugs or memory leaks, the OH/GHO code memory usage is quite modest, and very little can be gained by additional tuning.

By far, the biggest memory consumer is the Mondrian software component from Pentaho, which is being used by GHO as "black box" (by including mondrian.jar in the build).

Other than adding physical memory to the machine, configuring more "generous" virtual memory in Linux, and maximizing memory availability to the JVM executable (via "java ... -ms ... -mx ..." etc.), the only method of tuning Mondrian resource usage is via the file /who_gho/resources/mondrian.properties .  That tuning involved a bit of "black magic" and some testing, and the only official documentation we were able to find is on the URL http://mondrian.pentaho.org/documentation/configuration.php .