cman does not start ... corosync died

Bug #538139 reported by ITec
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Server papercuts
Invalid
Undecided
Unassigned
redhat-cluster (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Versions:
Ubuntu 10.04 Alpha-3
redhat-cluster-suite 3.0.2-2ubuntu2
cman 3.0.2-2ubuntu2
corosync 1.2.0-0ubuntu1

/etc/init.d/cman start
Starting cluster:
   Global setup... [ OK ]
   Loading kernel modules... [ OK ]
   Mounting configfs... [ OK ]
   Setting network parameters... [ OK ]
   Starting cman... corosync died: Could not read cluster configuration

I generated /etc/cluster/cluster.conf with system-config-cluster:
<?xml version="1.0" ?>
<cluster alias="ubu" config_version="4" name="ubu">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="ubu1-24" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="ubu2-24" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="manual"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

/etc/corosync/corosync.conf:
totem {
        version: 2
        token: 3000
        token_retransmits_before_loss_const: 10
        join: 60
        consensus: 4800
        vsftype: none
        max_messages: 20
        clear_node_high_bit: yes
       secauth: off
        threads: 0
       rrp_mode: none
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.24.221
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}

amf {
        mode: disabled
}

service {
        ver: 0
        name: pacemaker
}

aisexec {
        user: root
        group: root
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: no
        to_syslog: yes
        syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

/var/log/syslog:
Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Unloading all Corosync service engines.
Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service
Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync configuration service
Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01
Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync profile loading service
Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01
Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Mar 15 11:52:14 ubu1 corosync[3904]: [MAIN ] Corosync Cluster Engine exiting with status -1 at main.c:158.
Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Corosync Cluster Engine ('1.2.0'): started and ready to provide service.
Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Corosync built-in features: nss
Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Mar 15 11:52:14 ubu1 corosync[4043]: [TOTEM ] Initializing transport (UDP/IP).
Mar 15 11:52:14 ubu1 corosync[4043]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Mar 15 11:52:14 ubu1 corosync[4043]: [TOTEM ] The network interface [192.168.24.221] is now up.
Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service failed to load 'pacemaker'.
Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync configuration service
Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync profile loading service
Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Mar 15 11:52:14 ubu1 corosync[4043]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Corosync Cluster Engine ('1.2.0'): started and ready to provide service.
Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Corosync built-in features: nss
Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Successfully parsed cman config
Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Successfully configured openais services to load
Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] parse error in config: The consensus timeout parameter (4800 ms) must be atleast 1.2 * token (12000 ms).
Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Corosync Cluster Engine exiting with status -9 at main.c:1359.

I see:
consensus (4800 ms) must be atleast 1.2 * token (12000 ms)
But token is 3000!

So what's wrong?
Thank's and best regards!
Christian

ITec (itec)
description: updated
ITec (itec)
description: updated
Revision history for this message
ITec (itec) wrote :

Hi!

I got a little help. Just for completeness:
The solution is to set consensus and token in /etc/cluster/cluster.conf
 <cluster>
    <totem consensus="6000" token="3000"/>
 ...
 </cluster>

.. and not to start corosync on its own, as It is started automatically by cman.
Is this explanded anywhere?

I think the redhat-cluster-suite package needs urgently some descriptive examples for /etc/cluster/cluster.conf.
Could you include some before Lucid ist released?

Best regards
Christian

Revision history for this message
Tais P. Hansen (taisph) wrote :

The totem hint worked for me on Lucid. It was not necessary on Maverick.

Revision history for this message
Ante Karamatić (ivoks) wrote :

I know it's a bit late, but here's an explanation. Corosync in Lucid is adopted to pacemaker, and this is true for all further releases. This means that corosync is started on its own, spawning pacemaker services. /etc/corosync/corosync.conf is adjusted for pacemaker too - if you wish to run cman with corosync, you should stop corosync from starting on boot and adopt corosync.conf - there are two examples in /etc/corosync.

I'll mark this as invalid since service needs to be configured before it's started. If you feel this isn't correct, please reopen the bug and suggest what should be done. Thanks.

Changed in redhat-cluster (Ubuntu):
status: New → Invalid
Joshua Powers (powersj)
Changed in server-papercuts:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.