Facet doc count reports more docs than actual
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Searchlight |
Fix Released
|
Medium
|
Steve McLellan |
Bug Description
Some facet doc counts report inaccurate numbers. For example, OS::Nova::Server facets has the following:
{
"type": "string",
"name": "networks.name",
"options": [
{
"key": "private",
}
]
},
{
"type": "string",
"name": "networks.
"options": [
{
"key": "fixed",
}
]
}
It says doc_count of 6, however, there are only 3 servers that actually are connected to the "private" network. It happens that there are multiple listings for the private network on a single server. This makes the doc_count seem inaccurate when displaying to users without a lot of explanation. See snippet of data below. It should be noted that networks are one of the things we double index to allow for proper searching.
{
"_score": 1,
"_type": "OS::Nova::Server",
{
},
{
}
]
},
{
},
{
}
],
{
}
],
},
Changed in searchlight: | |
milestone: | mitaka-1 → mitaka-3 |
The solution to this is a 'reverse nested' aggregation: https:/ /www.elastic. co/guide/ en/elasticsearc h/guide/ current/ nested- aggregation. html. For instance, in the example above (I have two servers with the current code, I get two networks.name buckets even though it's just one document:
{ OS-EXT- IPS:type" ,
"doc_ count": 4
"type": "string",
"name": "networks.
"options": [
{
"key": "fixed",
}
]
}
Adding a reverse_nested aggregation (notice the extra _unique_docs) :
{ OS-EXT- IPS:type" ,
"options" : [
" doc_count" : 4,
" key": "fixed",
" networks_ _OS-EXT- IPS:type_ unique_ docs": {
"doc_ count": 2
}
"name": "networks.
{
}
],
"type": "string"
},
We'd need to then transform the results slightly to delete the unique_docs and replace the doc_count. I've not yet found a way to make this the default (i.e. aggregated nested but return the 'reverse' counts by default) which would be better since it avoids meddling with the e-s format overmuch.