OpenStack-Ansible

logstash processes logs very slowly in 9.0.6/10.1.2

Series icehouse
Bug #1423755

Bug #1423755 reported by Matt Dorn on 2015-02-20

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
OpenStack-Ansible	Invalid	High	Jesse Pretorius
Icehouse	Fix Released	High	Jesse Pretorius	OpenStack-Ansible 9.0.7
Juno	Fix Released	High	Jesse Pretorius	OpenStack-Ansible 10.1.3
Trunk	Invalid	High	Jesse Pretorius

Bug Description

the additional logstash filters added in 9.0.6 appear to be causing issues with log processing immediately after deploying and using an environment.

removed the new 9.0.6 filters (09-apache.conf, 10-mysql.conf, 11-neutron.conf, 98-mutate.conf), restarted logstash and processing resumed.

i was not able to replicate these issues when testing 9.0.3.

Matt Dorn (madorn) on 2015-02-20

summary:

- logstash stops processing logs
+ logstash stops processing logs in 9.0.6

Revision history for this message

Kevin Carter (kevin-carter) wrote on 2015-02-21: Re: logstash stops processing logs in 9.0.6

@Matt Dorn do you have any more details on what failed and if restarting logstash with the filters resumed log processing? Also can you provide a bit of detail on the environment that you're using, (IE Multi-node, dedicated logging node, etc...)? I'd like to see if I can reproduce this on my lab cluster so any additional details you can provide will be much appreciated.

Revision history for this message

Jesse Pretorius (jesse-pretorius) wrote on 2015-02-23:

@Matt Dorn Can you only remove the filters which use the multiline codec and see if that improves things? If the behavior is no different then can you try increasing the number of workers as well (ie implement both changes).

I suspect that the issue really comes down to logstash not being able to process log inputs fast enough as it only has one worker. At this stage we can't do more than one worker because we're using the multiline codec (it doesn't work with more than one worker). If this is the isolated solution then we'll have to consider:

1) Implementing multiple logstash containers (or multiple logstash instances in one container) and using a sticky (ie each host always gets sent to the same back-end) load balancer to spread the load.
2) Implementing logstash log shipping in the rsyslog containers so that the multi-line handling can be distributed to those containers. This does mean that the rsyslog containers would no longer actually be rsyslog containers, but log shipping containers.

Kevin Carter (kevin-carter) on 2015-02-24

Changed in openstack-ansible:
status:	New → Incomplete

Revision history for this message

Matt Dorn (madorn) wrote on 2015-02-24:

@Kevin Carter: I am testing on reference architecture - 3 infra, 1 logger, 2 compute, 1 storage. Ample storage on all nodes.

After provisioning a new RPC environment, I will perform some actions to produce a good amount of logs (register image, create network, boot instance, etc) One will notice that new log files will suddenly stop being inserted into Elasticsearch.

I typically verify this by using es2unix inside the utility container: while true; do es count | head -1; sleep 1; done;

srvrfwd files begin to build up inside /var/spool/rsyslog in rsyslog containers. confirm connectivity to logstash container from rsyslog container is fine..

restarting logstash doesn't help. removing the new filters and then restarting logstash seems to fix the issue.

per @Jesse Pretorious. I'm going to replicate the issue and try to specifically remove the filters which contain multiline codecs to determine if this is the culprit.

Darren Birkett (darren-birkett) on 2015-02-26

no longer affects:

openstack-ansible

Jesse Pretorius (jesse-pretorius) on 2015-03-23

summary:

- logstash stops processing logs in 9.0.6
+ logstash processes logs very slowly in 9.0.6/10.1.2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-26: Fix merged to os-ansible-deployment (icehouse)

Reviewed: https://review.openstack.org/166758
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=aaa0ba317166e50ef38b66757e922604a58ed7bd
Submitter: Jenkins
Branch: icehouse

commit aaa0ba317166e50ef38b66757e922604a58ed7bd
Author: Jesse Pretorius <email address hidden>
Date: Wed Mar 18 18:20:54 2015 +0000

Remove logstash multiline filter and change default number of workers

    This patch removes the use of the multiline filter in the logstash
    configuration and changes the default number of workers used to the
    number of CPU's reported by ansible. The number of workers can still be
    set manually if so desired.

This is specifically to improve logstash event processing speed which is
far too slow for a multi-server environment to be useful in any way.

    Change-Id: Id9fbab6b0f6e513c7dcae4e2713555840f9d996a
    Closes-Bug: #1423755
    (cherry picked from 9cb3b4184fcd4edd6b895914425ed403fd76eeac)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-30: Fix merged to os-ansible-deployment (juno)

Reviewed: https://review.openstack.org/165546
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=9cb3b4184fcd4edd6b895914425ed403fd76eeac
Submitter: Jenkins
Branch: juno

commit 9cb3b4184fcd4edd6b895914425ed403fd76eeac
Author: Jesse Pretorius <email address hidden>
Date: Wed Mar 18 18:20:54 2015 +0000

Remove logstash multiline filter and change default number of workers

This is specifically to improve logstash event processing speed which is
far too slow for a multi-server environment to be useful in any way.

Change-Id: Id9fbab6b0f6e513c7dcae4e2713555840f9d996a
Closes-Bug: #1423755

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1433721

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.