logstash processes logs very slowly in 9.0.6/10.1.2

Bug #1423755 reported by Matt Dorn
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Invalid
High
Jesse Pretorius
Icehouse
Fix Released
High
Jesse Pretorius
Juno
Fix Released
High
Jesse Pretorius
Trunk
Invalid
High
Jesse Pretorius

Bug Description

the additional logstash filters added in 9.0.6 appear to be causing issues with log processing immediately after deploying and using an environment.

removed the new 9.0.6 filters (09-apache.conf, 10-mysql.conf, 11-neutron.conf, 98-mutate.conf), restarted logstash and processing resumed.

i was not able to replicate these issues when testing 9.0.3.

Matt Dorn (madorn)
summary: - logstash stops processing logs
+ logstash stops processing logs in 9.0.6
Revision history for this message
Kevin Carter (kevin-carter) wrote : Re: logstash stops processing logs in 9.0.6

@Matt Dorn do you have any more details on what failed and if restarting logstash with the filters resumed log processing? Also can you provide a bit of detail on the environment that you're using, (IE Multi-node, dedicated logging node, etc...)? I'd like to see if I can reproduce this on my lab cluster so any additional details you can provide will be much appreciated.

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

@Matt Dorn Can you only remove the filters which use the multiline codec and see if that improves things? If the behavior is no different then can you try increasing the number of workers as well (ie implement both changes).

I suspect that the issue really comes down to logstash not being able to process log inputs fast enough as it only has one worker. At this stage we can't do more than one worker because we're using the multiline codec (it doesn't work with more than one worker). If this is the isolated solution then we'll have to consider:

1) Implementing multiple logstash containers (or multiple logstash instances in one container) and using a sticky (ie each host always gets sent to the same back-end) load balancer to spread the load.
2) Implementing logstash log shipping in the rsyslog containers so that the multi-line handling can be distributed to those containers. This does mean that the rsyslog containers would no longer actually be rsyslog containers, but log shipping containers.

Changed in openstack-ansible:
status: New → Incomplete
Revision history for this message
Matt Dorn (madorn) wrote :

@Kevin Carter: I am testing on reference architecture - 3 infra, 1 logger, 2 compute, 1 storage. Ample storage on all nodes.

After provisioning a new RPC environment, I will perform some actions to produce a good amount of logs (register image, create network, boot instance, etc) One will notice that new log files will suddenly stop being inserted into Elasticsearch.

I typically verify this by using es2unix inside the utility container: while true; do es count | head -1; sleep 1; done;

srvrfwd files begin to build up inside /var/spool/rsyslog in rsyslog containers. confirm connectivity to logstash container from rsyslog container is fine..

restarting logstash doesn't help. removing the new filters and then restarting logstash seems to fix the issue.

per @Jesse Pretorious. I'm going to replicate the issue and try to specifically remove the filters which contain multiline codecs to determine if this is the culprit.

no longer affects: openstack-ansible
summary: - logstash stops processing logs in 9.0.6
+ logstash processes logs very slowly in 9.0.6/10.1.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (icehouse)

Reviewed: https://review.openstack.org/166758
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=aaa0ba317166e50ef38b66757e922604a58ed7bd
Submitter: Jenkins
Branch: icehouse

commit aaa0ba317166e50ef38b66757e922604a58ed7bd
Author: Jesse Pretorius <email address hidden>
Date: Wed Mar 18 18:20:54 2015 +0000

    Remove logstash multiline filter and change default number of workers

    This patch removes the use of the multiline filter in the logstash
    configuration and changes the default number of workers used to the
    number of CPU's reported by ansible. The number of workers can still be
    set manually if so desired.

    This is specifically to improve logstash event processing speed which is
    far too slow for a multi-server environment to be useful in any way.

    Change-Id: Id9fbab6b0f6e513c7dcae4e2713555840f9d996a
    Closes-Bug: #1423755
    (cherry picked from 9cb3b4184fcd4edd6b895914425ed403fd76eeac)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (juno)

Reviewed: https://review.openstack.org/165546
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=9cb3b4184fcd4edd6b895914425ed403fd76eeac
Submitter: Jenkins
Branch: juno

commit 9cb3b4184fcd4edd6b895914425ed403fd76eeac
Author: Jesse Pretorius <email address hidden>
Date: Wed Mar 18 18:20:54 2015 +0000

    Remove logstash multiline filter and change default number of workers

    This patch removes the use of the multiline filter in the logstash
    configuration and changes the default number of workers used to the
    number of CPU's reported by ansible. The number of workers can still be
    set manually if so desired.

    This is specifically to improve logstash event processing speed which is
    far too slow for a multi-server environment to be useful in any way.

    Change-Id: Id9fbab6b0f6e513c7dcae4e2713555840f9d996a
    Closes-Bug: #1423755

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.