Build statistics for datasets

Bug #254446 reported by andrew
2
Affects Status Importance Assigned to Milestone
Hamlet
Fix Committed
Low
andrew

Bug Description

Add an additional component to the pre-processor which creates a file containing statistics for a dataset.

Things to consider:
Mean and variance of word counts in documents.
Number of documents.
Number of unique terms.
Mean and variance of all term frequencies.
Mean and variance of selected term frequencies.
Time taken to cluster, select features, and build term matrix.

Tim, any additional statistics for the datasets that you can think of?

Changed in hamlet:
assignee: nobody → andrew-j-matheny
Revision history for this message
Gregory Gay (gregoryg) wrote :

Also needs scalability statistics

Changed in hamlet:
importance: Undecided → Low
Revision history for this message
andrew (andrew-j-matheny) wrote :

added a logger class which facilitates generating the statistics file

Changed in hamlet:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.