wrong datatype key in cmap - corosync-qdevice don't start

Bug #1733889 reported by Erik Ilavsky
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
corosync-qdevice (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

Hi,
root@ores2:~# cat /etc/debian_version
9.1

root@ores2:~# dpkg -la | grep coro
ii corosync 2.4.2-3 amd64 cluster engine daemon and utilities
ii corosync-qdevice 2.4.2-3 amd64 cluster engine quorum device daemon

root@ores2:~# pcs quorum device add model net host=oresq algorithm=lms
Setting up qdevice certificates on nodes...
ores2: Succeeded
ores1: Succeeded
Enablingcorosync-qdevice...
Error: 192.168.9.58: Enabling corosync-qdevice failed

root@ores2:~# corosync-qdevice -df
Nov 22 16:10:33 debug Initializing votequorum
Nov 22 16:10:33 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:10:33 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:10:33 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:10:33 debug Initializing local socket
Nov 22 16:10:33 debug Registering qdevice models
Nov 22 16:10:33 debug Configuring qdevice
Nov 22 16:10:33 debug Configuring master_wins
Nov 22 16:10:33 debug Getting configuration node list
Nov 22 16:10:33 debug Initializing qdevice model
Nov 22 16:10:33 debug Initializing qdevice_net_instance
Nov 22 16:10:33 debug Registering algorithms
Nov 22 16:10:33 debug Initializing NSS
Nov 22 16:10:33 debug Cast vote timer remains stopped.
Nov 22 16:10:33 crit 50:50 split algorithm works only if quorum.device.votes configuration key is set to 1!
Nov 22 16:10:33 error Algorithm init failed

root@ores2:~# corosync-cmapctl | grep quorum.quorum.device.votes
quorum.device.votes (str) = 1

root@ores2:~# corosync-cmapctl -s quorum.device.votes u32 1

root@ores2:~# corosync-cmapctl | grep quorum.device.votes
quorum.device.votes (u32) = 1

root@ores2:~# corosync-qdevice -df
Nov 22 16:11:31 debug Initializing votequorum
Nov 22 16:11:31 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:11:31 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:11:31 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:11:31 debug Initializing local socket
Nov 22 16:11:31 debug Registering qdevice models
Nov 22 16:11:31 debug Configuring qdevice
Nov 22 16:11:31 debug Configuring master_wins
Nov 22 16:11:31 debug Getting configuration node list
Nov 22 16:11:31 debug Initializing qdevice model
Nov 22 16:11:31 debug Initializing qdevice_net_instance
Nov 22 16:11:31 debug Registering algorithms
Nov 22 16:11:31 debug Initializing NSS
Nov 22 16:11:31 debug Cast vote timer remains stopped.
Nov 22 16:11:31 debug Initializing cmap tracking
Nov 22 16:11:31 debug Waiting for ring id
Nov 22 16:11:31 debug Votequorum nodelist notify callback:
Nov 22 16:11:31 debug Ring_id = (1.220)
Nov 22 16:11:31 debug Node list (size = 2):
Nov 22 16:11:31 debug 0 nodeid = 1
Nov 22 16:11:31 debug 1 nodeid = 2
Nov 22 16:11:31 debug Algorithm decided to not send list and result vote is No change
...

now works...

Best regards,
Erik Ilavsky

Tags: ubuntu-ha
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: This needs more time or more corosync experience for a RCA.

@Erik - if in the meantime you found more on this let us know.

Changed in corosync (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
status: New → Triaged
importance: Undecided → Medium
tags: added: ubuntu-ha
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm flagging this as ubuntu-ha so I can investigate this together with the Ubuntu HA work being done. Assigned case to myself.

no longer affects: corosync (Ubuntu)
Changed in corosync-qdevice (Ubuntu):
status: New → Triaged
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync-qdevice (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in corosync-qdevice (Ubuntu Bionic):
status: New → Triaged
Changed in corosync-qdevice (Ubuntu Groovy):
status: Triaged → Fix Released
Changed in corosync-qdevice (Ubuntu Focal):
status: New → Fix Released
Revision history for this message
Christoph Roeder (brightdroid) wrote :

Problem still exists for me with corosync-qdevice 3.0.0-4ubuntu1 on ubuntu 20.04:

# systemctl start corosync-qdevice.service
Job for corosync-qdevice.service failed because the control process exited with error code.
See "systemctl status corosync-qdevice.service" and "journalctl -xe" for details.

# corosync-qdevice -df
Aug 07 11:26:59 debug Initializing votequorum
Aug 07 11:26:59 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Aug 07 11:26:59 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Aug 07 11:26:59 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Aug 07 11:26:59 debug Initializing local socket
Aug 07 11:26:59 debug Registering qdevice models
Aug 07 11:26:59 debug Configuring qdevice
Aug 07 11:26:59 error Can't read quorum.device.model cmap key.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Thanks for the feedback Christoph,

I'll give it a look and provide feedback here soon. Marking bug as confirmed again.

Changed in corosync-qdevice (Ubuntu Groovy):
status: Fix Released → Confirmed
Changed in corosync-qdevice (Ubuntu Focal):
status: Fix Released → Confirmed
no longer affects: corosync-qdevice (Ubuntu Groovy)
Changed in corosync-qdevice (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync-qdevice (Ubuntu Focal):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync-qdevice (Ubuntu Bionic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Revision history for this message
Christoph Roeder (brightdroid) wrote :

Sorry to ask so early, but I really need this. Do you found something?

Thanks in advance

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Christoph,

I haven't got back to this yet... but there is a workaround, as described in bug description:

"""

root@ores2:~# corosync-cmapctl | grep quorum.quorum.device.votes
quorum.device.votes (str) = 1

root@ores2:~# corosync-cmapctl -s quorum.device.votes u32 1

root@ores2:~# corosync-cmapctl | grep quorum.device.votes
quorum.device.votes (u32) = 1

"""

hope that helps for now. Will try to prioritize this on my side...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I cannot reproduce this in Focal.. and the bug description shows:

"""
root@ores2:~# cat /etc/debian_version
9.1
"""

(k)rafaeldtinoco@corosyncqdev:~/.../corosync-qdevice/debian/tests$ sudo corosync-cmapctl | grep quorum.device
quorum.device.model (str) = net
quorum.device.net.host (str) = 127.0.0.1
quorum.device.votes (u32) = 1 <- HERE, unsigned integer 32 bits by default

@Christoph,

do you have a way to reproduce this ?

Revision history for this message
Christoph Roeder (brightdroid) wrote :

I reinstalled all test-vms and now it works...

Maybe it was a problem with an earlier test (2 node cluster) with this properties:

$ pcs property set no-quorum-policy=ignore
$ pcs property set stonith-enabled=false

Thanks seems to work now.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Alright,

Thanks for the feedback then.

Changed in corosync-qdevice (Ubuntu):
status: Confirmed → Fix Released
Changed in corosync-qdevice (Ubuntu Focal):
status: Confirmed → Fix Released
no longer affects: corosync-qdevice (Ubuntu Bionic)
Changed in corosync-qdevice (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in corosync-qdevice (Ubuntu Focal):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.