Ubuntu
corosync package

Bug #1739033
Comment #0

Comment 0 for bug 1739033

Revision history for this message

Victor Tapia (vtapia) wrote on 2017-12-19:

[Description]

Corosync sigaborts if it starts before the interface it has to bind to is ready.

On boot, if no interface in the bindnetaddr range is up/configured, corosync binds to lo (127.0.0.1). Once an applicable interface is up, corosync crashes with the following error message:

corosync: votequorum.c:2019: message_handler_req_exec_votequorum_nodeinfo: Assertion `sender_node != NULL' failed.
Aborted (core dumped)

The last log entries show that the interface is trying to join the cluster:

Dec 19 11:36:05 [22167] xenial-pacemaker corosync debug [TOTEM ] totemsrp.c:2089 entering OPERATIONAL state.
Dec 19 11:36:05 [22167] xenial-pacemaker corosync notice [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:444) was formed. Members joined: 704573706

During the quorum calculation, the generated nodeid (704573706) for the node is being used instead of the nodeid specified in the configuration file (1), and the assert fails because the nodeid is not present in the member list. Corosync should use the correct nodeid and continue running after the interface is up, as shown in a fixed corosync boot:

Dec 19 11:50:56 [4824] xenial-corosync corosync notice [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:80) was formed. Members joined: 1

[Environment]

Xenial 16.04.3

Packages:

ii corosync 2.3.5-3ubuntu1 amd64 cluster engine daemon and utilities
ii libcorosync-common4:amd64 2.3.5-3ubuntu1 amd64 cluster engine common library

[Reproducer]

Config:

totem {
version: 2

transport: udpu

crypto_cipher: none
crypto_hash: none

        interface {
                ringnumber: 0
                member {
                        memberaddr: 169.254.241.10
                }
                member {
                        memberaddr: 169.254.241.20
                }
                bindnetaddr: 169.254.241.0
                mcastport: 5405
                ttl: 1
        }
}

quorum {
provider: corosync_votequorum
expected_votes: 2
}

nodelist {
        node {
                ring0_addr: 169.254.241.10
                nodeid: 1
        }
        node {
                ring0_addr: 169.254.241.20
                nodeid: 2
        }
}

1. ifdown interface (169.254.241.10)
2. start corosync (/usr/sbin/corosync -f)
3. ifup interface

[Fix]

Commit https://github.com/corosync/corosync/commit/aab55a004bb12ebe78db341dc56759dfe710c1b2 fixes the way the CMAP is populated, and seems to fix this bug.

[Description]

Corosync sigaborts if it starts before the interface it has to bind to is ready.

On boot, if no interface in the bindnetaddr range is up/configured, corosync binds to lo (127.0.0.1). Once an applicable interface is up, corosync crashes with the following error message:

corosync: votequorum.c:2019: message_handler_req_exec_votequorum_nodeinfo: Assertion `sender_node != NULL' failed.
Aborted (core dumped)

The last log entries show that the interface is trying to join the cluster:

Dec 19 11:36:05 [22167] xenial-pacemaker corosync debug   [TOTEM ] totemsrp.c:2089 entering OPERATIONAL state.
Dec 19 11:36:05 [22167] xenial-pacemaker corosync notice  [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:444) was formed. Members joined: 704573706

Dec 19 11:50:56 [4824] xenial-corosync corosync notice  [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:80) was formed. Members joined: 1

[Environment]

Xenial 16.04.3

Packages:

ii  corosync                     2.3.5-3ubuntu1    amd64    cluster engine daemon and utilities
ii  libcorosync-common4:amd64    2.3.5-3ubuntu1    amd64    cluster engine common library

[Reproducer]

Config:

totem {
        version: 2

transport: udpu

crypto_cipher: none
        crypto_hash: none

quorum {
        provider: corosync_votequorum
        expected_votes: 2
}

1. ifdown interface (169.254.241.10)
2. start corosync (/usr/sbin/corosync -f)
3. ifup interface

[Fix]

Commit https://github.com/corosync/corosync/commit/aab55a004bb12ebe78db341dc56759dfe710c1b2 fixes the way the CMAP is populated, and seems to fix this bug.

Ubuntucorosync package

Comment 0 for bug 1739033

Ubuntu
corosync package