wrong datatype key in cmap - corosync-qdevice don't start

Bug #1733889 reported by Erik Ilavsky
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
corosync-qdevice (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

Hi,
root@ores2:~# cat /etc/debian_version
9.1

root@ores2:~# dpkg -la | grep coro
ii corosync 2.4.2-3 amd64 cluster engine daemon and utilities
ii corosync-qdevice 2.4.2-3 amd64 cluster engine quorum device daemon

root@ores2:~# pcs quorum device add model net host=oresq algorithm=lms
Setting up qdevice certificates on nodes...
ores2: Succeeded
ores1: Succeeded
Enablingcorosync-qdevice...
Error: 192.168.9.58: Enabling corosync-qdevice failed

root@ores2:~# corosync-qdevice -df
Nov 22 16:10:33 debug Initializing votequorum
Nov 22 16:10:33 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:10:33 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:10:33 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:10:33 debug Initializing local socket
Nov 22 16:10:33 debug Registering qdevice models
Nov 22 16:10:33 debug Configuring qdevice
Nov 22 16:10:33 debug Configuring master_wins
Nov 22 16:10:33 debug Getting configuration node list
Nov 22 16:10:33 debug Initializing qdevice model
Nov 22 16:10:33 debug Initializing qdevice_net_instance
Nov 22 16:10:33 debug Registering algorithms
Nov 22 16:10:33 debug Initializing NSS
Nov 22 16:10:33 debug Cast vote timer remains stopped.
Nov 22 16:10:33 crit 50:50 split algorithm works only if quorum.device.votes configuration key is set to 1!
Nov 22 16:10:33 error Algorithm init failed

root@ores2:~# corosync-cmapctl | grep quorum.quorum.device.votes
quorum.device.votes (str) = 1

root@ores2:~# corosync-cmapctl -s quorum.device.votes u32 1

root@ores2:~# corosync-cmapctl | grep quorum.device.votes
quorum.device.votes (u32) = 1

root@ores2:~# corosync-qdevice -df
Nov 22 16:11:31 debug Initializing votequorum
Nov 22 16:11:31 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:11:31 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:11:31 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Nov 22 16:11:31 debug Initializing local socket
Nov 22 16:11:31 debug Registering qdevice models
Nov 22 16:11:31 debug Configuring qdevice
Nov 22 16:11:31 debug Configuring master_wins
Nov 22 16:11:31 debug Getting configuration node list
Nov 22 16:11:31 debug Initializing qdevice model
Nov 22 16:11:31 debug Initializing qdevice_net_instance
Nov 22 16:11:31 debug Registering algorithms
Nov 22 16:11:31 debug Initializing NSS
Nov 22 16:11:31 debug Cast vote timer remains stopped.
Nov 22 16:11:31 debug Initializing cmap tracking
Nov 22 16:11:31 debug Waiting for ring id
Nov 22 16:11:31 debug Votequorum nodelist notify callback:
Nov 22 16:11:31 debug Ring_id = (1.220)
Nov 22 16:11:31 debug Node list (size = 2):
Nov 22 16:11:31 debug 0 nodeid = 1
Nov 22 16:11:31 debug 1 nodeid = 2
Nov 22 16:11:31 debug Algorithm decided to not send list and result vote is No change
...

now works...

Best regards,
Erik Ilavsky

Tags: ubuntu-ha
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: This needs more time or more corosync experience for a RCA.

@Erik - if in the meantime you found more on this let us know.

Changed in corosync (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
status: New → Triaged
importance: Undecided → Medium
tags: added: ubuntu-ha
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm flagging this as ubuntu-ha so I can investigate this together with the Ubuntu HA work being done. Assigned case to myself.

no longer affects: corosync (Ubuntu)
Changed in corosync-qdevice (Ubuntu):
status: New → Triaged
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync-qdevice (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in corosync-qdevice (Ubuntu Bionic):
status: New → Triaged
Changed in corosync-qdevice (Ubuntu Groovy):
status: Triaged → Fix Released
Changed in corosync-qdevice (Ubuntu Focal):
status: New → Fix Released
Revision history for this message
Christoph Roeder (brightdroid) wrote :

Problem still exists for me with corosync-qdevice 3.0.0-4ubuntu1 on ubuntu 20.04:

# systemctl start corosync-qdevice.service
Job for corosync-qdevice.service failed because the control process exited with error code.
See "systemctl status corosync-qdevice.service" and "journalctl -xe" for details.

# corosync-qdevice -df
Aug 07 11:26:59 debug Initializing votequorum
Aug 07 11:26:59 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Aug 07 11:26:59 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Aug 07 11:26:59 debug shm size:1048589; real_size:1052672; rb->word_size:263168
Aug 07 11:26:59 debug Initializing local socket
Aug 07 11:26:59 debug Registering qdevice models
Aug 07 11:26:59 debug Configuring qdevice
Aug 07 11:26:59 error Can't read quorum.device.model cmap key.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Thanks for the feedback Christoph,

I'll give it a look and provide feedback here soon. Marking bug as confirmed again.

Changed in corosync-qdevice (Ubuntu Groovy):
status: Fix Released → Confirmed
Changed in corosync-qdevice (Ubuntu Focal):
status: Fix Released → Confirmed
no longer affects: corosync-qdevice (Ubuntu Groovy)
Changed in corosync-qdevice (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync-qdevice (Ubuntu Focal):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync-qdevice (Ubuntu Bionic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Revision history for this message
Christoph Roeder (brightdroid) wrote :

Sorry to ask so early, but I really need this. Do you found something?

Thanks in advance

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Christoph,

I haven't got back to this yet... but there is a workaround, as described in bug description:

"""

root@ores2:~# corosync-cmapctl | grep quorum.quorum.device.votes
quorum.device.votes (str) = 1

root@ores2:~# corosync-cmapctl -s quorum.device.votes u32 1

root@ores2:~# corosync-cmapctl | grep quorum.device.votes
quorum.device.votes (u32) = 1

"""

hope that helps for now. Will try to prioritize this on my side...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I cannot reproduce this in Focal.. and the bug description shows:

"""
root@ores2:~# cat /etc/debian_version
9.1
"""

(k)rafaeldtinoco@corosyncqdev:~/.../corosync-qdevice/debian/tests$ sudo corosync-cmapctl | grep quorum.device
quorum.device.model (str) = net
quorum.device.net.host (str) = 127.0.0.1
quorum.device.votes (u32) = 1 <- HERE, unsigned integer 32 bits by default

@Christoph,

do you have a way to reproduce this ?

Revision history for this message
Christoph Roeder (brightdroid) wrote :

I reinstalled all test-vms and now it works...

Maybe it was a problem with an earlier test (2 node cluster) with this properties:

$ pcs property set no-quorum-policy=ignore
$ pcs property set stonith-enabled=false

Thanks seems to work now.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Alright,

Thanks for the feedback then.

Changed in corosync-qdevice (Ubuntu):
status: Confirmed → Fix Released
Changed in corosync-qdevice (Ubuntu Focal):
status: Confirmed → Fix Released
no longer affects: corosync-qdevice (Ubuntu Bionic)
Changed in corosync-qdevice (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in corosync-qdevice (Ubuntu Focal):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers