Fluentd doesn't generate _id for documents sent to elasticsearch

Bug #1896610 reported by Krzysztof Klimonda
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Confirmed
Medium
Unassigned
Victoria
Confirmed
Medium
Unassigned

Bug Description

When fluentd pushes documents into elasticsearch it does not generate _id for each one, leaving that to elasticsearch instead.
When request to ES times out (either on fluentd side, or by some proxy) fluentd will retry the request but ES can't tell it's processing the same documents (as the _id differs) and so will try to save them all, potentially leading to another timeout.
If that happens it's very easy to have kolla's fluentd DDos the entire cluster by filling up the entire available space.
It's possible to generate _id before sending documents to elasticsearch, which will let cluster know that some documents are sent twice and update them instead of creating duplicates - see https://github.com/uken/fluent-plugin-elasticsearch#generate-hash-id for the configuration.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/753291

Changed in kolla-ansible:
assignee: nobody → Krzysztof Klimonda (kklimonda)
status: New → In Progress
Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: Undecided → Medium
Mark Goddard (mgoddard)
Changed in kolla-ansible:
milestone: 11.0.0 → none
Revision history for this message
Abdulmalik Banaser (abanaser) wrote :

Hello,
We came across the same issue. Is there any status on the implementation of the solution? Thanks!

Tom Fifield (fifieldt)
Changed in kolla-ansible:
assignee: Krzysztof Klimonda (kklimonda) → nobody
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.