Build statistics for datasets
Bug #254446 reported by
andrew
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Hamlet |
Fix Committed
|
Low
|
andrew |
Bug Description
Add an additional component to the pre-processor which creates a file containing statistics for a dataset.
Things to consider:
Mean and variance of word counts in documents.
Number of documents.
Number of unique terms.
Mean and variance of all term frequencies.
Mean and variance of selected term frequencies.
Time taken to cluster, select features, and build term matrix.
Tim, any additional statistics for the datasets that you can think of?
Changed in hamlet: | |
assignee: | nobody → andrew-j-matheny |
To post a comment you must log in.
Also needs scalability statistics