Bug #1267937 “No warning for HA with OSD and MON roles on same n...” : Bugs : Fuel for OpenStack

Revision history for this message

mauro (maurof) wrote on 2014-01-10:

#1

ceph.log Edit (60.6 KiB, text/plain)

in attachment ceph.log

snapshot available under request

Revision history for this message

mauro (maurof) wrote on 2014-01-10:

#2

the described problem seems not occurring if the affected node is not the leader

Andrew Woodward (xarses) on 2014-01-10

tags:

added: ceph

Revision history for this message

mauro (maurof) wrote on 2014-01-13:

#3

Download full text (3.1 KiB)

Thanks Andrew, I see you routed the porblem in ceph area.
Do you think is someone starting the investigation?

In my opinion ceph mon is active on the remainig 2 nodes , but in any case ceph is not accessible.

[root@node-15 ceph]# ceph -s
2014-01-13 09:02:26.467554 7f5574237700 0 -- :/1002881 >> 192.168.121.3:6789/0 pipe(0x7f556800a480 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f556800a6e0).fault
2014-01-13 09:02:29.468596 7f5574136700 0 -- :/1002881 >> 192.168.121.3:6789/0 pipe(0x7f556800ad80 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5568002f50).fault

[root@node-15 ceph]# ps aux | grep ceph
root 2223 0.2 0.8 787920 143404 ? Ssl Jan10 11:45 /usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf
root 3780 0.2 0.4 713348 71992 ? Ssl Jan10 11:28 /usr/bin/ceph-osd -i 16 --pid-file /var/run/ceph/osd.16.pid -c /etc/ceph/ceph.conf
root 6541 0.2 1.3 867784 216584 ? Ssl Jan10 12:38 /usr/bin/ceph-osd -i 17 --pid-file /var/run/ceph/osd.17.pid -c /etc/ceph/ceph.conf
root 41060 0.0 0.0 103240 884 pts/0 S+ 08:56 0:00 grep ceph
root 42717 0.0 0.2 253716 49184 ? Sl Jan10 0:58 /usr/bin/ceph-mon -i node-15 --pid-file /var/run/ceph/mon.node-15.pid -c /etc/ceph/ceph.conf
root 42870 0.3 0.8 791888 147336 ? Ssl Jan10 12:57 /usr/bin/ceph-osd -i 11 --pid-file /var/run/ceph/osd.11.pid -c /etc/ceph/ceph.conf
root 44785 0.3 0.9 802136 150416 ? Ssl Jan10 14:32 /usr/bin/ceph-osd -i 8 --pid-file /var/run/ceph/osd.8.pid -c /etc/ceph/ceph.conf
root 47618 0.2 1.0 806588 166588 ? Ssl Jan10 12:25 /usr/bin/ceph-osd -i 15 --pid-file /var/run/ceph/osd.15.pid -c /etc/ceph/ceph.conf

According to the following output the system in unsusable: DHCP and L3 agent do not works:

[root@node-15 ceph]# crm status
Last updated: Mon Jan 13 08:41:13 2014
Last change: Fri Jan 10 16:38:49 2014 via crmd on node-14.prisma
Stack: openais
Current DC: node-15.prisma - partition with quorum
Version: 1.1.8-1.el6-1f8858c
3 Nodes configured, 3 expected votes
17 Resources configured.

Online: [ node-14.prisma node-15.prisma ]
OFFLINE: [ node-13.prisma ]

vip__management_old (ocf::heartbeat:IPaddr2): Started node-14.prisma
vip__public_old (ocf::heartbeat:IPaddr2): Started node-15.prisma

Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-14.prisma node-15.prisma ]
     Stopped: [ p_haproxy:2 ]
Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-14.prisma node-15.prisma ]
     Stopped: [ p_mysql:2 ]
Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
     Started: [ node-14.prisma node-15.prisma ]
     Stopped: [ p_neutron-openvswitch-agent:2 ]
Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-14.prisma node-15.prisma ]
     Stopped: [ p_neutron-metadata-agent:2 ]
p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started node-15.prisma (unmanaged) FAILED
p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-14.prisma (unmanaged) FAILED
openstack-heat-engine (ocf::mirantis:openstack-heat-engine): Started node-14.prisma

...

Thanks Andrew, I see you routed the porblem in ceph area. 
Do you think is someone starting the investigation?

In my opinion  ceph mon is active on the  remainig 2 nodes , but in any case ceph is not accessible.

[root@node-15 ceph]# ceph -s
2014-01-13 09:02:26.467554 7f5574237700  0 -- :/1002881 >> 192.168.121.3:6789/0 pipe(0x7f556800a480 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f556800a6e0).fault
2014-01-13 09:02:29.468596 7f5574136700  0 -- :/1002881 >> 192.168.121.3:6789/0 pipe(0x7f556800ad80 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5568002f50).fault

[root@node-15 ceph]# ps aux | grep ceph
root      2223  0.2  0.8 787920 143404 ?       Ssl  Jan10  11:45 /usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf
root      3780  0.2  0.4 713348 71992 ?        Ssl  Jan10  11:28 /usr/bin/ceph-osd -i 16 --pid-file /var/run/ceph/osd.16.pid -c /etc/ceph/ceph.conf
root      6541  0.2  1.3 867784 216584 ?       Ssl  Jan10  12:38 /usr/bin/ceph-osd -i 17 --pid-file /var/run/ceph/osd.17.pid -c /etc/ceph/ceph.conf
root     41060  0.0  0.0 103240   884 pts/0    S+   08:56   0:00 grep ceph
root     42717  0.0  0.2 253716 49184 ?        Sl   Jan10   0:58 /usr/bin/ceph-mon -i node-15 --pid-file /var/run/ceph/mon.node-15.pid -c /etc/ceph/ceph.conf
root     42870  0.3  0.8 791888 147336 ?       Ssl  Jan10  12:57 /usr/bin/ceph-osd -i 11 --pid-file /var/run/ceph/osd.11.pid -c /etc/ceph/ceph.conf
root     44785  0.3  0.9 802136 150416 ?       Ssl  Jan10  14:32 /usr/bin/ceph-osd -i 8 --pid-file /var/run/ceph/osd.8.pid -c /etc/ceph/ceph.conf
root     47618  0.2  1.0 806588 166588 ?       Ssl  Jan10  12:25 /usr/bin/ceph-osd -i 15 --pid-file /var/run/ceph/osd.15.pid -c /etc/ceph/ceph.conf

According to the following output the system in unsusable: DHCP and L3 agent do not works:

[root@node-15 ceph]# crm status
Last updated: Mon Jan 13 08:41:13 2014
Last change: Fri Jan 10 16:38:49 2014 via crmd on node-14.prisma
Stack: openais
Current DC: node-15.prisma - partition with quorum
Version: 1.1.8-1.el6-1f8858c
3 Nodes configured, 3 expected votes
17 Resources configured.

Online: [ node-14.prisma node-15.prisma ]
OFFLINE: [ node-13.prisma ]

vip__management_old    (ocf::heartbeat:IPaddr2):       Started node-14.prisma
 vip__public_old        (ocf::heartbeat:IPaddr2):       Started node-15.prisma

Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-14.prisma node-15.prisma ]
     Stopped: [ p_haproxy:2 ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-14.prisma node-15.prisma ]
     Stopped: [ p_mysql:2 ]
 Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
     Started: [ node-14.prisma node-15.prisma ]
     Stopped: [ p_neutron-openvswitch-agent:2 ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-14.prisma node-15.prisma ]
     Stopped: [ p_neutron-metadata-agent:2 ]
 p_neutron-dhcp-agent   (ocf::mirantis:neutron-agent-dhcp):     Started node-15.prisma (unmanaged) FAILED
 p_neutron-l3-agent     (ocf::mirantis:neutron-agent-l3):       Started node-14.prisma (unmanaged) FAILED
 openstack-heat-engine  (ocf::mirantis:openstack-heat-engine):  Started node-14.prisma

Vladimir Kuklin (vkuklin) on 2014-01-28

Changed in fuel:
assignee:	nobody → Andrey Korolyov (xdeller)
milestone:	none → 4.1

Revision history for this message

Andrey Korolyov (xdeller) wrote on 2014-01-28:

#4

It is bad practice to mix OSD and MON roles on the same nodes and be ready to power failures. Also looks like your client simply lose connectivity to the quorum between

2014-01-10 14:16:11.699720 mon.0 192.168.121.3:6789/0 391 : [INF] pgmap v272: 492 pgs: 492 active+clean; 644 MB data, 39089 MB used, 44631 GB / 44670 GB avail
2014-01-10 14:39:54.201945 mon.2 192.168.121.5:6789/0 2 : [INF] mon.node-15 calling new monitor election

So please consider to separate those roles before failure testing.

Changed in fuel:
importance:	Undecided → Wishlist
status:	New → Won't Fix

Revision history for this message

Andrey Korolyov (xdeller) wrote on 2014-01-28:

#5

Dima Borodaenko, please discuss with UI team possibility of adding warning for such config. since it is not true HA.

Revision history for this message

mauro (maurof) wrote on 2014-01-28:

#6

Deployment was done according to Fuel options.

I noticed the following behaviour: in a cluster or 3 ceph nodes as 1 ceph node is down the following occurs:

a) if the crashed ceph-node is not the "original leader ":
--> it s possible to create/attach/detach delete new volumes
--> It s not possibile to detach/attach/ delete existing volumes ( the command hangs)

b) if the crashed ceph-node is the "original leader" :
ceph and RDB commands do not work anymore
--> it s not possible to create/attach/detach delete new volumes ( the command hangs)
--> It s not possibile to detach/attach delete existing volumes ( the command hangs)

Concerning point b) in my opinion ceph.conf file addresses only the ceph-mon of the the original leader node,
and once it 's downn there is no more chance to access to volume service , rdb and so on..

here below the current ceph.conf (Now we deployed 4.0 with the same topology. Openstack is up&running .
( node-8 node-10 node-12 controllers+ceph)
(node-7 node-9 node11 compute))

[root@node-8 ~]# more /etc/ceph/ceph.conf
[global]
filestore_xattr_use_omap = true
mon_host = 192.168.121.4
fsid = 1a305a5d-0dd7-488b-9c71-b008158329a6
mon_initial_members = node-8
auth_supported = cephx
osd_journal_size = 2048
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 100
public_network = 192.168.121.0/24
osd_pool_default_pgp_num = 100
osd_mkfs_type = xfs
cluster_network = 192.168.122.0/24

Wouldn' t it be useful to set "mon_host = 192.168.121.2" ( virtual ip address)???

[root@node-8 ~]# ip a | grep 192.168.121
    inet 192.168.121.4/24 brd 192.168.121.255 scope global br-mgmt
[root@node-8 ~]# ssh node-10
Warning: Permanently added 'node-10,192.168.121.6' (RSA) to the list of known hosts.
Last login: Mon Jan 27 17:20:13 2014 from 192.168.120.100
[root@node-10 ~]# ip a | grep 192.168.121
    inet 192.168.121.6/24 brd 192.168.121.255 scope global br-mgmt
    inet 192.168.121.2/24 brd 192.168.121.255 scope global secondary br-mgmt:ka
[root@node-10 ~]# ssh node-12
Warning: Permanently added 'node-12,192.168.121.8' (RSA) to the list of known hosts.
Last login: Mon Jan 27 16:48:04 2014 from 192.168.121.6
[root@node-12 ~]# ip a | grep 192.168.121
    inet 192.168.121.8/24 brd 192.168.121.255 scope global br-mgmt

Deployment was done according to Fuel options.

I noticed the  following behaviour: in a cluster or 3 ceph nodes as 1 ceph node is down the following occurs:

a) if the crashed ceph-node is not the "original leader ":
           --> it s possible to create/attach/detach delete new volumes
          --> It s not possibile to detach/attach/ delete existing volumes ( the command hangs)

b) if the crashed ceph-node is the "original leader"  :
ceph and RDB commands do not work anymore 
      --> it s not  possible to create/attach/detach delete  new volumes ( the command hangs)
      --> It s not possibile to detach/attach delete existing volumes ( the command hangs)

Concerning point b) in my opinion   ceph.conf file  addresses only the ceph-mon of the the original leader node,
and once it 's downn there is no more chance to access to volume service , rdb and so on..

here below the current ceph.conf (Now we deployed 4.0  with the same topology. Openstack is up&running .
( node-8 node-10 node-12 controllers+ceph)
(node-7 node-9 node11 compute))

[root@node-8 ~]# more /etc/ceph/ceph.conf
[global]
filestore_xattr_use_omap = true
mon_host = 192.168.121.4
fsid = 1a305a5d-0dd7-488b-9c71-b008158329a6
mon_initial_members = node-8
auth_supported = cephx
osd_journal_size = 2048
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 100
public_network = 192.168.121.0/24
osd_pool_default_pgp_num = 100
osd_mkfs_type = xfs
cluster_network = 192.168.122.0/24

Wouldn' t it be useful to set "mon_host = 192.168.121.2"  ( virtual ip address)???

[root@node-8 ~]# ip a | grep 192.168.121
    inet 192.168.121.4/24 brd 192.168.121.255 scope global br-mgmt
[root@node-8 ~]# ssh node-10
Warning: Permanently added 'node-10,192.168.121.6' (RSA) to the list of known hosts.
Last login: Mon Jan 27 17:20:13 2014 from 192.168.120.100
[root@node-10 ~]# ip a | grep 192.168.121
    inet 192.168.121.6/24 brd 192.168.121.255 scope global br-mgmt
    inet 192.168.121.2/24 brd 192.168.121.255 scope global secondary br-mgmt:ka
[root@node-10 ~]# ssh node-12
Warning: Permanently added 'node-12,192.168.121.8' (RSA) to the list of known hosts.
Last login: Mon Jan 27 16:48:04 2014 from 192.168.121.6
[root@node-12 ~]# ip a | grep 192.168.121
    inet 192.168.121.8/24 brd 192.168.121.255 scope global br-mgmt

Revision history for this message

Andrey Korolyov (xdeller) wrote on 2014-01-28:

#7

Ehm. Speaking of setting to the VIP - no, it is bad practice. I wonder why we`re not pushing an array of monitors everywhere. Raising bug level then since it is way more important than just daemon placement. Thank you for pointing it out!

Changed in fuel:
assignee:	Andrey Korolyov (xdeller) → Dmitry Borodaenko (dborodaenko)
status:	Won't Fix → Confirmed
importance:	Wishlist → High

Revision history for this message

mauro (maurof) wrote on 2014-01-28:

#8

Ok your welcome

I figure out that the described behaviour is the expected one according to the "state of art" and to my specific deployment .
Could you confirm it?

thanks
mauro

Vladimir Kuklin (vkuklin) on 2014-01-28

Changed in fuel:
importance:	High → Critical

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-01-28:

#9

Andrey, we already have a bug about not pushing an array of monitors everywhere:
https://bugs.launchpad.net/fuel/+bug/1268579

As far as I understand, the other remaining issue highlighted in this bug was lack of warning in the UI for combining OSD and MON on the same node. I think this is a more general problem than just Ceph, I think there should be a general notice in the role assignment screen to warn that combining more than one node per role will reduce reliability and dedicated nodes should be allocated for all roles if sufficient hardware is available.

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-01-28:

#10

Typo, the above comment should read: "combining more than one role per node".

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-01-28:

#11

Please confirm my conclusions, and, if you agree, downgrade this and assign back to me for resolution of the UI warning aspect.

Changed in fuel:
assignee:	Dmitry Borodaenko (dborodaenko) → Andrey Korolyov (xdeller)

Revision history for this message

Andrey Korolyov (xdeller) wrote on 2014-01-28:

#12

mauro - yes, you are correct - at first, your setup is not supposed to hold node kill always correctly, according to Inktank words, and at second we`re not describing mon array as we should in the client section of ceph.conf

summary:	- ceph unavailable after leader node power crash + No warning for HA with OSD and MON roles on same nodes
Changed in fuel:
importance:	Critical → Medium
assignee:	Andrey Korolyov (xdeller) → Dmitry Borodaenko (dborodaenko)

Dmitry Borodaenko (angdraug) on 2014-01-30

tags:

added: docs

Roman Alekseenkov (ralekseenkov) on 2014-02-06

tags:

added: customer-found

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-02-14:

#13

So what is the resolution? Is there nothing we could do except fixing https://bugs.launchpad.net/fuel/+bug/1268579 ?

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-02-14:

#14

There's two more things we can do:

1) add a note in documentation that although Fuel allows to combine multiple roles on the same node, it is recommended to use dedicated OSD nodes in production for better reliability -- that's why there's a docs tag on this bug.

2) add ceph-mon as a standalone role (trivial to do once granular deployment is implemented) to separate failure domains of Ceph monitor and OpenStack controller components -- should be done separately under the blueprint I've just created:

https://blueprints.launchpad.net/fuel/+spec/ceph-mon-role

Mike Scherbakov (mihgen) on 2014-02-21

Changed in fuel:
milestone:	4.1 → 5.0

Vladimir Kuklin (vkuklin) on 2014-03-24

Changed in fuel:
milestone:	5.0 → 5.1

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-03-26:

#15

Vladimir, the only action that still needs to be addressed in this bug is the documentation, the code change requirements are all documented elsewhere. The documentation change should be doable in 5.0.

Mike Scherbakov (mihgen) on 2014-03-27

Changed in fuel:
milestone:	5.1 → 5.0

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-03-27:

#16

And what's regarding separate ceph-mon role?

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-03-27:

#17

As I mentioned above in comment #14, separate ceph-mon role should be addressed by a blueprint. The BP linked above has been superceded by a more generic BP that covers ceph-rgw in addition to ceph-mon:
https://blueprints.launchpad.net/fuel/+spec/fuel-ceph-roles

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-04-25:

#18

In fact, there's already a warning against placing Ceph OSD role on Controller nodes that was added here:
https://review.openstack.org/#/c/78631/2/pages/reference-architecture/0030-cluster-sizing.rst

Changed in fuel:
status:	Confirmed → Fix Committed

Dmitry Borodaenko (angdraug) on 2014-05-09

Changed in fuel:
status:	Fix Committed → Fix Released

Fuel for OpenStack

No warning for HA with OSD and MON roles on same nodes

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches