Rabbitmq crashed with no more index entries in atom_tab

Bug #1534519 reported by Fabrizio Soppelsa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Alexey Lebedeff
6.1.x
Won't Fix
Medium
MOS Maintenance
7.0.x
Won't Fix
Medium
MOS Maintenance
8.0.x
Won't Fix
Medium
MOS Maintenance
Mitaka
Fix Released
Medium
Alexey Lebedeff

Bug Description

On one of the controller nodes, RabbitMQ crashed with the following error:
no more index entries in atom_tab (max=1048576).

Environment details:
MOS 6.1 with MU4
RabbitMQ 3.3.5 not under pacemaker, but run standalone

=erl_crash_dump:0.3
Fri Dec 25 23:22:20 2015
Slogan: no more index entries in atom_tab (max=1048576)
System version: Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:8:8] [async-threads:30] [kernel-poll:true]
Compiled: Tue Aug 19 16:45:11 2014

Further details (as sysctl values) to be added when available.

Workaround was to restart RabbitMQ.

Revision history for this message
Alexey Lebedeff (alebedev-a) wrote :

This condition could be hit in following conditions:
1) Increased limit for kernel.pid_max - some value greater than ~1 million
2) Big amount of nodes in cluster - but it will require at least 32 nodes with default linux pid_max value

Proper fix will be included in rabbitmq 3.6.1 ( https://github.com/rabbitmq/rabbitmq-server/pull/552 ). It could be backported to any version of rabbitmq starting from 3.5.0.

For earlier versions (as for 3.3.5 in the initial report) we'll need to add some shell magic instead - if it is deemed to be important enough.

Ilya Kutukov (ikutukov)
Changed in fuel:
assignee: nobody → MOS Maintenance (mos-maintenance)
milestone: none → 6.1-updates
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Roman Rufanov (rrufanov) wrote :

Customer found on 6.1

Maciej Relewicz (rlu)
tags: added: area-mos
no longer affects: fuel/future
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

We believe that the issue can not appear on a 'stock' version of the product and was caused by customised deployment, so we do not consider it of high priority and I am lowering it to medium. Also, the fix is already merged into RabbitMQ (see Alexey's commit, which he referenced above) and probably we will pick it in 9.0.

Revision history for this message
Fabrizio Soppelsa (fsoppelsa) wrote :

So, analysis is correct here. In sysctl we have:

node-70.sysctl:kernel.pid_max = 4194303
node-71.sysctl:kernel.pid_max = 4194303
node-72.sysctl:kernel.pid_max = 4194303

> For earlier versions (as for 3.3.5 in the initial report)
> we'll need to add some shell magic instead - if it is
> deemed to be important enough.

Alexey, can you elaborate here please? Will this "shell trick" resolve or mitigate in <3.5.0, and how exactly?

Revision history for this message
Alexey Lebedeff (alebedev-a) wrote :

In rabbits <3.5.0 rabbitmqctl node name is generated in shell as "rabbitmqctl-$$". So this code should be replaced with some algorithm in shell that looks like this:
- Choose small random number (from 1 to 100)
- Using some locking mechanism (e.g. flock(1)) we shoud try to lock this number
- If lock failes - generate another random number and start again
- If lock succeeds - hold it for the rest of rabbitmqctl invocation.
- Use random number as rabbitmqctl nodename - "rabbitmqctl-$OUR_RANDOM_NUMBER"

Revision history for this message
Alexey Lebedeff (alebedev-a) wrote :
tags: added: wontfix-low
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

We have upgraded RabbitMQ to 3.6.1 in 9.0, hence according to Alexey Lebedeff's post above we have the fix for that issue there.

Revision history for this message
Alexey Galkin (agalkin) wrote :

Verification on:
fuel_build_id - 97
fuel_release - 9.0
RabbitMQ:
{rabbitmq_management,"RabbitMQ Management Console","3.6.1"}
,
{amqp_client,"RabbitMQ AMQP Client","3.6.1"}
,
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.1"}
,
{rabbit,"RabbitMQ","3.6.1"}
,
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.1"}

Alexey Galkin (agalkin)
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.