Activity log for bug #1678142

Date Who What changed Old value New value Message
2017-03-31 13:23:35 Dmitry Goloshubov bug added bug
2017-03-31 13:23:48 Dmitry Goloshubov nominated for series mos/9.x
2017-03-31 13:23:48 Dmitry Goloshubov bug task added mos/9.x
2017-03-31 13:23:56 Dmitry Goloshubov mos/9.x: importance Undecided High
2017-03-31 13:24:07 Dmitry Goloshubov mos: importance Undecided High
2017-03-31 13:24:48 Dmitry Goloshubov description MOS 9.2 During the tests, the communication with the ceilometer-api on one of the controllers suddenly fails. From the ceilometer-api.log: 2017-03-16T15:18:10.265429+09:00 cic-1 ceilometer-api[2362]: 2017-03-16 15:18:10.265 2362 INFO ceilometer.api.app [-] serving on http://192.168.6.60:8777 2017-03-16T15:18:10.265857+09:00 cic-1 ceilometer-api[2362]: 2017-03-16 15:18:10.265 2362 INFO werkzeug [-] * Running on http://192.168.64.60:8777/ 2017-03-16T15:18:10.322773+09:00 cic-1 ceilometer-api[2362]: 2017-03-16 15:18:10.266 2362 CRITICAL ceilometer [-] error: [Errno 98] Address already in use Workaround: restart the service. Haven't found a way to reproduce that. ---------------------- Additional information: To probe the status of the Ceilometer services we manually created a Curl request to address locally to every ceilometer instance on the CICs. [From any of the CIC] curl -g -i -X 'GET' 'http://<CIC_IP_on_mmt>:8777/' -H 'User-Agent: ceilometerclient.openstack.common.apiclient' -H 'X-Auth-Token: ...' Result: request goes through all the CICs (Result: 200 OK) except CIC-1 where it gets a time out. >>>>> ceilomter-api service status <<<<<< We checked the status of the service with the following results >>>>> strace root@cic-1:~# strace -tt -T -p 5596 Process 5596 attached 19:48:39.985419 wait4(0, <- the ceilomter-api master got stuck here >>>>> GDB >> CIC-1 !Faulty! (gdb) bt #0 0x00007f2feb0dced9 in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0 <- Stuck in wait #1 0x000000000041d95a in ?? () #2 0x000000000049968d in PyEval_EvalFrameEx () #3 0x0000000000499ef2 in PyEval_EvalFrameEx () #4 0x0000000000499ef2 in PyEval_EvalFrameEx () #5 0x0000000000499ef2 in PyEval_EvalFrameEx () #6 0x00000000004a1c9a in ?? () #7 0x00000000004dfe94 in ?? () #8 0x0000000000499be5 in PyEval_EvalFrameEx () #9 0x00000000004a090c in PyEval_EvalCodeEx () #10 0x000000000049ab45 in PyEval_EvalFrameEx () #11 0x00000000004a090c in PyEval_EvalCodeEx () #12 0x000000000049ab45 in PyEval_EvalFrameEx () #13 0x00000000004a090c in PyEval_EvalCodeEx () #14 0x0000000000499a52 in PyEval_EvalFrameEx () #15 0x0000000000499ef2 in PyEval_EvalFrameEx () #16 0x0000000000499ef2 in PyEval_EvalFrameEx () #17 0x00000000004a1634 in ?? () #18 0x000000000044e4a5 in PyRun_FileExFlags () #19 0x000000000044ec9f in PyRun_SimpleFileExFlags () #20 0x000000000044f904 in Py_Main () #21 0x00007f2fead29f45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #22 0x0000000000578c4e in _start () >> CIC-2 !working fine! (gdb) bt #0 0x00007f7d3376dc53 in select () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x000000000047fbbd in ?? () #2 0x000000000049c4d9 in PyEval_EvalFrameEx () #3 0x00000000004a090c in PyEval_EvalCodeEx () #4 0x000000000049ab45 in PyEval_EvalFrameEx () #5 0x00000000004a1c9a in ?? () #6 0x00000000004dfe94 in ?? () #7 0x0000000000499be5 in PyEval_EvalFrameEx () #8 0x00000000004a090c in PyEval_EvalCodeEx () #9 0x000000000049ab45 in PyEval_EvalFrameEx () #10 0x00000000004a090c in PyEval_EvalCodeEx () #11 0x000000000049ab45 in PyEval_EvalFrameEx () #12 0x00000000004a090c in PyEval_EvalCodeEx () #13 0x0000000000499a52 in PyEval_EvalFrameEx () #14 0x0000000000499ef2 in PyEval_EvalFrameEx () #15 0x0000000000499ef2 in PyEval_EvalFrameEx () #16 0x00000000004a1634 in ?? () #17 0x000000000044e4a5 in PyRun_FileExFlags () #18 0x000000000044ec9f in PyRun_SimpleFileExFlags () #19 0x000000000044f904 in Py_Main () #20 0x00007f7d3369df45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #21 0x0000000000578c4e in _start () >>>>> /proc/<ceilometer-api>/stack >> CIC-1 !Faulty! [<ffffffff8107fd11>] do_wait+0x1c1/0x230 <- Stuck in wait [<ffffffff81080da4>] SyS_wait4+0x64/0xc0 [<ffffffff817fa4f6>] entry_SYSCALL_64_fastpath+0x16/0x75 [<ffffffffffffffff>] 0xffffffffffffffff >> CIC-2 !Working fine! [<ffffffff81211799>] poll_schedule_timeout+0x49/0x70 [<ffffffff8121212c>] do_select+0x58c/0x750 [<ffffffff812124bc>] core_sys_select+0x1cc/0x2d0 [<ffffffff8121266b>] SyS_select+0xab/0xf0 [<ffffffff817fa4f6>] entry_SYSCALL_64_fastpath+0x16/0x75 [<ffffffffffffffff>] 0xffffffffffffffff The root cause seems to be related to a deadlock in the ceilometer-api ---------------------- Also, there is a bug that could be potentially related: https://bugs.launchpad.net/mos/8.0.x/+bug/1566202 MOS 9.2 During the tests, the communication with the ceilometer-api on one of the controllers suddenly fails. From the ceilometer-api.log: 2017-03-16T15:18:10.265429+09:00 cic-1 ceilometer-api[2362]: 2017-03-16 15:18:10.265 2362 INFO ceilometer.api.app [-] serving on http://192.168.6.60:8777 2017-03-16T15:18:10.265857+09:00 cic-1 ceilometer-api[2362]: 2017-03-16 15:18:10.265 2362 INFO werkzeug [-] * Running on http://192.168.6.60:8777/ 2017-03-16T15:18:10.322773+09:00 cic-1 ceilometer-api[2362]: 2017-03-16 15:18:10.266 2362 CRITICAL ceilometer [-] error: [Errno 98] Address already in use Workaround: restart the service. Haven't found a way to reproduce that. ---------------------- Additional information: To probe the status of the Ceilometer services we manually created a Curl request to address locally to every ceilometer instance on the CICs. [From any of the CIC] curl -g -i -X 'GET' 'http://<CIC_IP_on_mmt>:8777/' -H 'User-Agent: ceilometerclient.openstack.common.apiclient' -H 'X-Auth-Token: ...' Result: request goes through all the CICs (Result: 200 OK) except CIC-1 where it gets a time out. >>>>> ceilomter-api service status <<<<<< We checked the status of the service with the following results >>>>> strace root@cic-1:~# strace -tt -T -p 5596 Process 5596 attached 19:48:39.985419 wait4(0, <- the ceilomter-api master got stuck here >>>>> GDB >> CIC-1 !Faulty! (gdb) bt #0 0x00007f2feb0dced9 in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0 <- Stuck in wait #1 0x000000000041d95a in ?? () #2 0x000000000049968d in PyEval_EvalFrameEx () #3 0x0000000000499ef2 in PyEval_EvalFrameEx () #4 0x0000000000499ef2 in PyEval_EvalFrameEx () #5 0x0000000000499ef2 in PyEval_EvalFrameEx () #6 0x00000000004a1c9a in ?? () #7 0x00000000004dfe94 in ?? () #8 0x0000000000499be5 in PyEval_EvalFrameEx () #9 0x00000000004a090c in PyEval_EvalCodeEx () #10 0x000000000049ab45 in PyEval_EvalFrameEx () #11 0x00000000004a090c in PyEval_EvalCodeEx () #12 0x000000000049ab45 in PyEval_EvalFrameEx () #13 0x00000000004a090c in PyEval_EvalCodeEx () #14 0x0000000000499a52 in PyEval_EvalFrameEx () #15 0x0000000000499ef2 in PyEval_EvalFrameEx () #16 0x0000000000499ef2 in PyEval_EvalFrameEx () #17 0x00000000004a1634 in ?? () #18 0x000000000044e4a5 in PyRun_FileExFlags () #19 0x000000000044ec9f in PyRun_SimpleFileExFlags () #20 0x000000000044f904 in Py_Main () #21 0x00007f2fead29f45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #22 0x0000000000578c4e in _start () >> CIC-2 !working fine! (gdb) bt #0 0x00007f7d3376dc53 in select () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x000000000047fbbd in ?? () #2 0x000000000049c4d9 in PyEval_EvalFrameEx () #3 0x00000000004a090c in PyEval_EvalCodeEx () #4 0x000000000049ab45 in PyEval_EvalFrameEx () #5 0x00000000004a1c9a in ?? () #6 0x00000000004dfe94 in ?? () #7 0x0000000000499be5 in PyEval_EvalFrameEx () #8 0x00000000004a090c in PyEval_EvalCodeEx () #9 0x000000000049ab45 in PyEval_EvalFrameEx () #10 0x00000000004a090c in PyEval_EvalCodeEx () #11 0x000000000049ab45 in PyEval_EvalFrameEx () #12 0x00000000004a090c in PyEval_EvalCodeEx () #13 0x0000000000499a52 in PyEval_EvalFrameEx () #14 0x0000000000499ef2 in PyEval_EvalFrameEx () #15 0x0000000000499ef2 in PyEval_EvalFrameEx () #16 0x00000000004a1634 in ?? () #17 0x000000000044e4a5 in PyRun_FileExFlags () #18 0x000000000044ec9f in PyRun_SimpleFileExFlags () #19 0x000000000044f904 in Py_Main () #20 0x00007f7d3369df45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #21 0x0000000000578c4e in _start () >>>>> /proc/<ceilometer-api>/stack >> CIC-1 !Faulty! [<ffffffff8107fd11>] do_wait+0x1c1/0x230 <- Stuck in wait [<ffffffff81080da4>] SyS_wait4+0x64/0xc0 [<ffffffff817fa4f6>] entry_SYSCALL_64_fastpath+0x16/0x75 [<ffffffffffffffff>] 0xffffffffffffffff >> CIC-2 !Working fine! [<ffffffff81211799>] poll_schedule_timeout+0x49/0x70 [<ffffffff8121212c>] do_select+0x58c/0x750 [<ffffffff812124bc>] core_sys_select+0x1cc/0x2d0 [<ffffffff8121266b>] SyS_select+0xab/0xf0 [<ffffffff817fa4f6>] entry_SYSCALL_64_fastpath+0x16/0x75 [<ffffffffffffffff>] 0xffffffffffffffff The root cause seems to be related to a deadlock in the ceilometer-api ---------------------- Also, there is a bug that could be potentially related: https://bugs.launchpad.net/mos/8.0.x/+bug/1566202
2017-03-31 13:25:42 Dmitry Goloshubov summary Ceilometer API service doesn't start - Address already in use Ceilometer API service can't start - Address already in use
2017-03-31 13:28:13 Dmitry Goloshubov mos/9.x: milestone 9.x-updates
2017-03-31 13:35:30 Alexander Rubtsov bug added subscriber Alexander Rubtsov
2017-03-31 13:38:00 Dmitry Goloshubov summary Ceilometer API service can't start - Address already in use Ceilometer API service stuck - Address already in use
2017-04-03 13:05:24 Denis Meltsaykin mos: assignee MOS Ceilometer (mos-ceilometer)
2017-04-03 13:05:29 Denis Meltsaykin mos/9.x: assignee Denis Meltsaykin (dmeltsaykin)
2017-04-03 13:05:38 Denis Meltsaykin mos/9.x: assignee Denis Meltsaykin (dmeltsaykin) MOS Ceilometer (mos-ceilometer)
2017-04-03 13:05:40 Denis Meltsaykin mos: status New Confirmed
2017-04-03 13:05:43 Denis Meltsaykin mos/9.x: status New Confirmed
2017-04-03 13:05:51 Denis Meltsaykin mos: milestone 10.0
2017-04-03 13:06:06 Denis Meltsaykin tags customer-found sla1 area-ceilometer customer-found sla1
2017-04-06 13:42:16 Nadya Privalova mos: assignee MOS Ceilometer (mos-ceilometer) Ilya Tyaptin (ityaptin)
2017-04-06 13:42:36 Nadya Privalova mos/9.x: assignee MOS Ceilometer (mos-ceilometer) Ilya Tyaptin (ityaptin)
2017-04-12 12:55:50 Ilya Tyaptin mos/9.x: status Confirmed In Progress
2017-04-25 16:12:36 Denis Meltsaykin mos/9.x: milestone 9.x-updates 9.2-mu-2
2017-04-26 08:54:33 Aleksey Zvyagintsev mos/9.x: status In Progress Fix Committed
2017-05-23 12:25:06 TatyanaGladysheva mos/9.x: status Fix Committed Fix Released
2017-05-29 08:52:29 Alexey Stupnikov mos: assignee Ilya Tyaptin (ityaptin) MOS Maintenance (mos-maintenance)
2017-10-16 14:30:14 Alexey Stupnikov mos: importance High Low
2017-10-16 14:30:57 Alexey Stupnikov mos: status Confirmed Won't Fix