Mirantis OpenStack

all OSD on ceph-osd nodes get down

Bug #1578827 reported by Andrii Petrenko on 2016-05-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Incomplete	High	Andrii Petrenko	Mirantis OpenStack 7.0-updates
	7.0.x	Incomplete	High	Andrii Petrenko	Mirantis OpenStack 7.0-updates

Bug Description

on the customer's ceph-osd server all OSD get down.

Error: on logs:
grep 'void Thread::create(size_t)' *|grep -v '>'|grep " 20:"
ceph-osd.32.log:2016-05-04 20:42:35.135375 7fbd54ea7700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fbd54ea7700 time 2016-05-04 20:42:35.133589
ceph-osd.33.log:2016-05-04 20:42:41.183548 7f8a7d63f700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f8a7d63f700 time 2016-05-04 20:42:41.181656
ceph-osd.34.log:2016-05-04 20:42:38.969135 7fd4d9794700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fd4d9794700 time 2016-05-04 20:42:38.967367
ceph-osd.35.log:2016-05-04 20:42:39.191213 7fa632e59700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fa632e59700 time 2016-05-04 20:42:39.189330
ceph-osd.36.log:2016-05-04 20:41:49.592106 7fe44d0d6700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fe44d0d6700 time 2016-05-04 20:41:49.590262
ceph-osd.37.log:2016-05-04 20:41:49.976383 7fa717482700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fa717482700 time 2016-05-04 20:41:49.974213
ceph-osd.37.log:2016-05-04 20:42:08.494309 7fa77aff7700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fa77aff7700 time 2016-05-04 20:42:08.492618
ceph-osd.38.log:2016-05-04 20:41:50.552626 7faed7a0c700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7faed7a0c700 time 2016-05-04 20:41:50.550463
ceph-osd.38.log:2016-05-04 20:41:53.079255 7faf05c11700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7faf05c11700 time 2016-05-04 20:41:53.077452
ceph-osd.39.log:2016-05-04 20:41:49.817492 7f66df053700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f66df053700 time 2016-05-04 20:41:49.815616
ceph-osd.40.log:2016-05-04 20:42:09.614569 7f1217f73700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f1217f73700 time 2016-05-04 20:42:09.612584
ceph-osd.41.log:2016-05-04 20:42:43.608912 7f2b0dfe1700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f2b0dfe1700 time 2016-05-04 20:42:43.607110
ceph-osd.42.log:2016-05-04 20:42:37.984996 7f0e57a3d700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f0e57a3d700 time 2016-05-04 20:42:37.983147
ceph-osd.42.log:2016-05-04 20:42:45.305307 7f0e56a3b700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f0e56a3b700 time 2016-05-04 20:42:45.303521
ceph-osd.43.log:2016-05-04 20:41:49.610410 7fd6c3fac700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fd6c3fac700 time 2016-05-04 20:41:49.608545
ceph-osd.44.log:2016-05-04 20:42:43.303795 7f8888c48700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f8888c48700 time 2016-05-04 20:42:43.301679
ceph-osd.45.log:2016-05-04 20:42:37.684938 7f4eef652700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f4eef652700 time 2016-05-04 20:42:37.683107
ceph-osd.46.log:2016-05-04 20:42:35.544142 7fc6ba59a700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fc6ba59a700 time 2016-05-04 20:42:35.525877
ceph-osd.47.log:2016-05-04 20:42:35.479270 7f865cd1f700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f865cd1f700 time 2016-05-04 20:42:35.477467
ceph-osd.48.log:2016-05-04 20:42:38.507663 7ff7e4160700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7ff7e4160700 time 2016-05-04 20:42:38.505868
ceph-osd.49.log:2016-05-04 20:42:41.636908 7f71c767d700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f71c767d700 time 2016-05-04 20:42:41.635148
ceph-osd.50.log:2016-05-04 20:41:49.774139 7ff7eebbb700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7ff7eebbb700 time 2016-05-04 20:41:49.772331
ceph-osd.51.log:2016-05-04 20:42:34.925049 7f4f51293700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f4f51293700 time 2016-05-04 20:42:34.923227
ceph-osd.52.log:2016-05-04 20:41:53.055201 7f337fdc1700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f337fdc1700 time 2016-05-04 20:41:53.053058
ceph-osd.52.log:2016-05-04 20:42:35.038611 7f33ea9c8700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f33ea9c8700 time 2016-05-04 20:42:35.036934
ceph-osd.53.log:2016-05-04 20:42:39.954108 7f1b0cc71700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f1b0cc71700 time 2016-05-04 20:42:39.952291
ceph-osd.54.log:2016-05-04 20:42:40.744019 7f2890034700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f2890034700 time 2016-05-04 20:42:40.742188

for osd-36
2016-05-04 20:41:49.592106 7fe44d0d6700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fe44d0d6700 time 2016-05-04 20:41:49.590262
common/Thread.cc: 129: FAILED assert(ret == 0)

ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
1: (Thread::create(unsigned long)+0x8a) [0x7fe45f4f82aa]
2: (SimpleMessenger::add_accept_pipe(int)+0x59) [0x7fe45f4ef6e9]
3: (Accepter::entry()+0x363) [0x7fe45f59d653]
4: (()+0x8182) [0x7fe45e37f182]
5: (clone()+0x6d) [0x7fe45caf247d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-10000> 2016-05-04 20:41:39.451877 7fe4470ca700 1 -- 192.168.1.2:6808/32128 --> osd.80 192.168.1.25:6876/31636 -- osd_sub_op(client.5865660.0:518880 12.e9 917234e9/rbd_data.4da8a446de3ea6.00000000000006ab/head//12
-9999> 2016-05-04 20:41:39.451982 7fe4470ca700 5 -- op tracker -- , seq: 5368144, time: 2016-05-04 20:41:39.451982, event: commit_queued_for_journal_write, request: osd_op(client.5865660.0:518880 rbd_data.4da8a446
....
-1> 2016-05-04 20:41:49.573983 7fe44b0d2700 1 -- 192.168.0.12:6853/32128 --> 192.168.0.29:0/1043 -- osd_ping(ping_reply e51632 stamp 2016-05-04 20:41:49.566069) v2 -- ?+0 0x7fe47316b2c0 con 0x7fe46cddbfa0
0> 2016-05-04 20:41:49.592106 7fe44d0d6700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fe44d0d6700 time 2016-05-04 20:41:49.590262
common/Thread.cc: 129: FAILED assert(ret == 0)

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  99/99 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 10000
  max_new 1000
  log_file /var/log/ceph/ceph-osd.36.log
--- end dump of recent events ---
2016-05-04 20:42:09.919710 7fe45abfc700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7fe4470ca700' had timed out after 15
2016-05-04 20:51:41.270916 7fe45abfc700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7fe4468c9700' had timed out after 15
2016-05-04 20:51:41.271454 7fe43d381700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.27:6811/1000475 pipe(0x7fe47289f980 sd=227 :6808 s=2 pgs=2571 cs=1 l=0 c=0x7fe464040c60).fault with nothing to send, going to standby
2016-05-04 20:51:41.271503 7fe432dfd700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.25:6860/31162 pipe(0x7fe46ec97680 sd=271 :6808 s=2 pgs=20 cs=1 l=0 c=0x7fe473f16f20).fault with nothing to send, going to standby
2016-05-04 20:51:41.271524 7fe4403b1700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.16:6811/45083371 pipe(0x7fe46a247180 sd=51 :48193 s=2 pgs=153273 cs=1 l=0 c=0x7fe46a1bf600).fault with nothing to send, going to stan
2016-05-04 20:51:41.271552 7fe42c494700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.25:6890/31971 pipe(0x7fe46cb59900 sd=289 :6808 s=2 pgs=32 cs=1 l=0 c=0x7fe473f17340).fault with nothing to send, going to standby
2016-05-04 20:51:41.271648 7fe42b383700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.26:6876/3004902 pipe(0x7fe46e469480 sd=375 :6808 s=2 pgs=482 cs=1 l=0 c=0x7fe469e87de0).fault with nothing to send, going to standby
2016-05-04 20:51:41.271667 7fe4305d5700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.25:6872/31489 pipe(0x7fe46e41ea80 sd=319 :6808 s=2 pgs=32 cs=1 l=0 c=0x7fe463fcd700).fault with nothing to send, going to standby
2016-05-04 20:51:41.271789 7fe42c292700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.25:6876/31636 pipe(0x7fe471759180 sd=290 :6808 s=2 pgs=6 cs=1 l=0 c=0x7fe463fcd2e0).fault with nothing to send, going to standby
2016-05-04 20:51:41.271813 7fe42e9b9700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.27:6829/2012410 pipe(0x7fe4730c1680 sd=407 :6808 s=2 pgs=1380 cs=1 l=0 c=0x7fe4685f82c0).fault with nothing to send, going to standby
2016-05-04 20:51:41.272412 7fe427343700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.24:6801/1023577 pipe(0x7fe46fb7c300 sd=70 :6808 s=2 pgs=445 cs=1 l=0 c=0x7fe464047c80).fault with nothing to send, going to standby
2016-05-04 20:51:41.272415 7fe433c0b700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.16:6800/94084658 pipe(0x7fe472c82300 sd=158 :6808 s=2 pgs=110229 cs=1 l=0 c=0x7fe464040420).fault with nothing to send, going to stan
2016-05-04 20:51:41.273139 7fe433b0a700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.12:6822/64075135 pipe(0x7fe472c81180 sd=263 :6808 s=2 pgs=82737 cs=1 l=0 c=0x7fe4640402c0).fault with nothing to send, going to stand
2016-05-04 20:51:41.273179 7fe43f7a5700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.24:6800/2014843 pipe(0x7fe46a248580 sd=121 :52404 s=2 pgs=3348 cs=1 l=0 c=0x7fe46a1c1c80).fault with nothing to send, going to standb
2016-05-04 20:51:41.273184 7fe43b55d700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.27:6819/1001043 pipe(0x7fe46d668f80 sd=589 :6808 s=2 pgs=3065 cs=1 l=0 c=0x7fe4640411e0).fault with nothing to send, going to standby
2016-05-04 20:51:41.273312 7fe3f67cd700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.27:6844/9736 pipe(0x7fe46fb8c280 sd=663 :6808 s=2 pgs=25 cs=1 l=0 c=0x7fe46cf1dde0).fault with nothing to send, going to standby
2016-05-04 20:51:41.272914 7fe434514700 0 -- 192.168.1.2:6808/32128 >> 192.168.1.24:6849/22096 pipe(0x7fe46caab680 sd=257 :53349 s=2 pgs=54 cs=1 l=0 c=0x7fe46ca92c60).fault with nothing to send, going to standby
2016-05-04 20:51:41.279274 7fe4448c5700 -1 osd.36 51632 skipping heartbeat_check, hbqueue max age: 580.288
2016-05-04 20:51:41.279379 7fe44a0d0700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7fe4470ca700' had timed out after 15
2016-05-04 20:51:41.279389 7fe44a0d0700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7fe4470ca700' had suicide timed out after 150
2016-05-04 20:51:41.282171 7fe44a0d0700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fe44a0d0700 time 2016-05-04 20:51:41.279483
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2a9) [0x7fe45f4587e9]
2: (ceph::HeartbeatMap::is_healthy()+0xb6) [0x7fe45f459056]
3: (OSD::handle_osd_ping(MOSDPing*)+0x7bb) [0x7fe45f0d87eb]
4: (OSD::heartbeat_dispatch(Message*)+0x343) [0x7fe45f0d9c53]
5: (DispatchQueue::entry()+0x57a) [0x7fe45f5aff6a]
6: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fe45f4f407d]
7: (()+0x8182) [0x7fe45e37f182]
8: (clone()+0x6d) [0x7fe45caf247d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

2016-05-04 20:51:41.284821 7fe44d0d6700 -1 *** Caught signal (Aborted) **
in thread 7fe44d0d6700

this case is similar to
https://bugzilla.redhat.com/show_bug.cgi?id=1306242
http://tracker.ceph.com/issues/10988

I will provide logs on demand.

Tags:

Andrii Petrenko (aplsms) on 2016-05-05

tags:

added: customer-found support

Revision history for this message

Mykola Golub (mgolub) wrote on 2016-05-06:

This looks like resources exhaustion. Most probably, pid_max limit has been reached (can be increased via sysctl). atop, dmesg might be useful to investigate. Still, it is not clear why it is reported here.

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-05-06:

> Most probably, pid_max limit has been reached (can be increased via sysctl)

I somewhat doubt it. A typical OSD spawns ~ 100 threads, thus the default value of
/proc/sys/kernel/pid_max (being 32768) is enough for running ~ 300 OSDs (provided the node has
enough RAM).

So the cause might be a bit simpler: the node needs more RAM to run that many OSDs.
It's hard to find out the exact reason without even basic information about the cluster and nodes.

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-05-06:

Please provide at least basic information about the cluster and the problematic node,
such as the number of OSD nodes, the number of OSDs per a node, basic hardware info, etc.

Changed in mos:
status:	New → Incomplete

Revision history for this message

Mykola Golub (mgolub) wrote on 2016-05-06:

> I somewhat doubt it. A typical OSD spawns ~ 100 threads

It depends much on the cluster size and the current cluster status. See:

http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#osd-not-running

We had a customer who claimed that increasing kernel.pid_max in their case helped to recover from this error.

We also have a blueprint for this:

https://blueprints.launchpad.net/fuel/+spec/ceph-osd-nodes-kernel-pid-max

"Not enough ram" is also possible reason, though in production usually segfaults observed in this case (with corresponding messages in dmesg), not create thread assertion failure.

Revision history for this message

Andrii Petrenko (aplsms) wrote on 2016-05-06:

Ceph osd tree output Edit (5.4 KiB, text/plain)

Customer has environment with 8 OSD nodes:

1) older hardware: intel oem chassis, qty 2 Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz. 2 chassis have 8 ssd OSD, 1 chassis has 16 ssd OSD.
newer hardware: supermicro chassis, qty 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz. all chassis (5) have 23 ssd OSD.
2) switch is configured identically for all ceph nodes
3) no heavy cloud activity

ceph osd tree attached.

processes per ceph-osd node:

[root@fuel tmp]# for i in `~/bin/nodelist ceph`; do echo -ne "node-$i:\t "; date; ssh node-$i ps -elfT 2> /dev/null | wc -l; done
node-98: Thu May 5 21:59:35 UTC 2016
28930
node-99: Thu May 5 21:59:37 UTC 2016
15356
node-100: Thu May 5 21:59:38 UTC 2016
16472
node-187: Thu May 5 21:59:39 UTC 2016
33097
node-223: Thu May 5 21:59:42 UTC 2016
29964
node-224: Thu May 5 21:59:43 UTC 2016
31856
node-225: Thu May 5 21:59:45 UTC 2016
30645
node-226: Thu May 5 21:59:46 UTC 2016
30637

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-05-06:

@Andrii:

> 1) older hardware: intel oem chassis, qty 2 Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz. 2 chassis have 8 ssd OSD, 1 chassis has 16 ssd OSD.
> newer hardware: supermicro chassis, qty 2 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz. all chassis (5) have 23 ssd OSD.

What about RAM? Do these nodes run anything but ceph?

Revision history for this message

Andrii Petrenko (aplsms) wrote on 2016-05-06:

in a 30 sec before incident Edit (1.0 MiB, image/png)

1. only ceph running on the ceph nodes.
2. all new hosts has 256G ram
3. there is no any issues with old ceph nodes.
atop screenshot attached.

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-05-06:

> https://bugs.launchpad.net/mos/+bug/1578827/+attachment/4657164/+files/Screen%20Shot%202016-05-06%20at%204.24.24%20AM.png

Wow, 1100 to 1800 threads per an OSD process! That's embarrassing (there are better way to program
the state machine). Before ceph gets fixed to not abuse threads (which unfortunately is not going
to happen anytime soon) the only work around is to bump the maximal processes and threads number:

sysctl -w kernel.pid_max=1048576
sysctl -w kernel.threads-max=16777216

Revision history for this message

Roman Rufanov (rrufanov) wrote on 2016-05-14:

Ok. since it is customer found bug - is this new sysctl setting incorporated into Fuel?

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-05-14:

#10

Roman,

Setting such insane limits is a work around to let the cluster recover. A production environment
should use more conservative limits, actually the kernel defaults are OK. Having > 1000 threads
per an OSD is abnormal, we should find out what is (was) causing such a misbehavior.

Revision history for this message

Roman Rufanov (rrufanov) wrote on 2016-05-19:

#11

@asheplyakov - please file a bug against Starlight to raise Alert for this situation then. Define level for Warning and Critical alerts.

Revision history for this message

Dina Belova (dbelova) wrote on 2016-06-06:

#12

Moving to invalid after more than a month in the incomplete state.

Changed in mos:
status:	Incomplete → Invalid

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-06-07:

#13

Not a critical since there is a work around (see https://bugs.launchpad.net/mos/+bug/1578827/comments/8)

Revision history for this message

Dina Belova (dbelova) wrote on 2016-06-07:

#14

Marking as incomplete for now, let's wait for support team here.

Changed in mos:
status:	Invalid → Incomplete
importance:	Undecided → High
assignee:	nobody → MOS Ceph (mos-ceph)
milestone:	none → 7.0-updates

Revision history for this message

Sergey Kolekonov (skolekonov) wrote on 2016-07-08:

#15

There were no updates during month, please move to confirmed if there will be any

Changed in mos:
status:	Incomplete → Invalid

Revision history for this message

Andrii Petrenko (aplsms) wrote on 2016-07-11:

#16

Colleagues,

Parameters mentioned in the (see https://bugs.launchpad.net/mos/+bug/1578827/comments/8) is like 'last resort' for workaround. If that parameters OK for ceph node by default, please provide patches for sysctl.conf for default installation/puppet, otherwise please continue troubleshooting.

Sergey Kolekonov (skolekonov) on 2016-07-13

Changed in mos:
status:	Invalid → Confirmed

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-07-13:

#17

There are no steps to reproduce the problem, no meangingful logs, no hardware data, nothing at all.
Thus there's no way to find out what happened to that poor cluster and why on Earth OSDs were trying to spawn 1500+ threads. Marking as Incomplete.