mariadb Connection refused

Bug #1732314 reported by lxdong
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
kolla
Expired
Undecided
Unassigned

Bug Description

171115 9:57:39 [Note] WSREP: save pc into disk
171115 9:57:39 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.4.151:4567
171115 9:57:39 [Note] WSREP: discarding pending addr proto entry 0x7f68650a1500
171115 9:57:39 [Note] WSREP: gcomm: connected
171115 9:57:39 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
171115 9:57:39 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
171115 9:57:39 [Note] WSREP: Opened channel 'openstack'
171115 9:57:39 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
171115 9:57:39 [Note] WSREP: Waiting for SST to complete.
171115 9:57:39 [Note] WSREP: Starting new group from scratch: 5c2b0a40-c9a8-11e7-8068-8a9646dd46c7
171115 9:57:39 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 5c2b2411-c9a8-11e7-99b7-76153643690e
171115 9:57:39 [Note] WSREP: STATE EXCHANGE: sent state msg: 5c2b2411-c9a8-11e7-99b7-76153643690e
171115 9:57:39 [Note] WSREP: STATE EXCHANGE: got state msg: 5c2b2411-c9a8-11e7-99b7-76153643690e from 0 (compute01)
171115 9:57:39 [Note] WSREP: Quorum results:
        version = 4,
        component = PRIMARY,
        conf_id = 0,
        members = 1/1 (joined/total),
        act_id = 0,
        last_appl. = -1,
        protocols = 0/7/3 (gcs/repl/appl),
        group UUID = 5c2b0a40-c9a8-11e7-8068-8a9646dd46c7
171115 9:57:39 [Note] WSREP: Flow-control interval: [16, 16]
171115 9:57:39 [Note] WSREP: Restored state OPEN -> JOINED (0)
171115 9:57:39 [Note] WSREP: Member 0.0 (compute01) synced with group.
171115 9:57:39 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
171115 9:57:39 [Note] WSREP: New cluster view: global state: 5c2b0a40-c9a8-11e7-8068-8a9646dd46c7:0, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
171115 9:57:39 [Note] WSREP: SST complete, seqno: 0
171115 9:57:39 [Note] InnoDB: Using mutexes to ref count buffer pool pages
171115 9:57:39 [Note] InnoDB: The InnoDB memory heap is disabled
171115 9:57:39 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
171115 9:57:39 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
171115 9:57:39 [Note] InnoDB: Compressed tables use zlib 1.2.7
171115 9:57:39 [Note] InnoDB: Using Linux native AIO
171115 9:57:39 [Note] InnoDB: Using CPU crc32 instructions
171115 9:57:39 [Note] InnoDB: Initializing buffer pool, size = 8.0G
171115 9:57:39 [Note] InnoDB: Completed initialization of buffer pool
171115 9:57:40 [Note] InnoDB: Highest supported file format is Barracuda.
171115 9:57:40 [Note] InnoDB: 128 rollback segment(s) are active.
171115 9:57:40 [Note] InnoDB: Waiting for purge to start
171115 9:57:40 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.1 started; log sequence number 1616885
171115 9:57:40 [Note] Plugin 'FEEDBACK' is disabled.
171115 9:57:40 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.0.32-MariaDB-wsrep' socket: '/var/lib/mysql/mysql.sock' port: 0 MariaDB Server, wsrep_25.20.rc3fc46e
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.4.150' (111 "Connection refused")
171115 9:57:40 [ERROR] WSREP: Process completed with error: /usr/local/bin/wsrep-notify.sh --status Joined --uuid 5c2b0a40-c9a8-11e7-8068-8a9646dd46c7 --primary yes --index 0 --members 5c2a55c6-c9a8-11e7-be2e-07b00d412d97/compute01/192.168.4.150:0: 1 (Operation not permitted)
171115 9:57:40 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/local/bin/wsrep-notify.sh --status Joined --uuid 5c2b0a40-c9a8-11e7-8068-8a9646dd46c7 --primary yes --index 0 --members 5c2a55c6-c9a8-11e7-be2e-07b00d412d97/compute01/192.168.4.150:0"
171115 9:57:40 [Note] WSREP: REPL Protocols: 7 (3, 2)
171115 9:57:40 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
171115 9:57:40 [Note] WSREP: Service thread queue flushed.
171115 9:57:40 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(5c2b0a40-c9a8-11e7-8068-8a9646dd46c7:0)

and the other mariadb keeps restarting until now.
kolla 5.0.1 dev21

Revision history for this message
David Aikema (david-aikema) wrote :

Were you ever able to resolve this problem?

We seem to be encountering something similar to this (Using slightly newer releases - kolla 5.0.1 and current Ubuntu 16 release of the relevant packages - e.g. MariaDB is now 10.0.33 w/ wsrep_25.21.rc3fc46e)

Revision history for this message
David Aikema (david-aikema) wrote :

Still haven't been able to sort this out.

Will note that my output when doing a kolla-ansible deploy looks like the following:

RUNNING HANDLER [mariadb : Starting first MariaDB container] ********************************************************************************************************************************************************************************************************************************************************************************************
changed: [idia-ctrl-01-03.maas]

RUNNING HANDLER [mariadb : wait first mariadb container] ************************************************************************************************************************************************************************************************************************************************************************************************
FAILED - RETRYING: wait first mariadb container (10 retries left).
ok: [idia-ctrl-01-03.maas]

RUNNING HANDLER [mariadb : restart slave mariadb] *******************************************************************************************************************************************************************************************************************************************************************************************************
skipping: [idia-ctrl-01-03.maas]
changed: [idia-comp-01-05.maas]
changed: [idia-ctrl-01-04.maas]

RUNNING HANDLER [mariadb : wait for slave mariadb] ******************************************************************************************************************************************************************************************************************************************************************************************************
FAILED - RETRYING: wait for slave mariadb (10 retries left).
FAILED - RETRYING: wait for slave mariadb (10 retries left).
FAILED - RETRYING: wait for slave mariadb (9 retries left).
FAILED - RETRYING: wait for slave mariadb (9 retries left).
FAILED - RETRYING: wait for slave mariadb (8 retries left).
FAILED - RETRYING: wait for slave mariadb (8 retries left).
FAILED - RETRYING: wait for slave mariadb (7 retries left).
FAILED - RETRYING: wait for slave mariadb (7 retries left).
...

I think that the point where the logs (as above) show connection refused is parallel to the "FAILED - RETRYING: wait first mariadb container (10 retries left)." ... which kolla then seems to think it's recovered from.

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

Is it still a problem? Can you try to reproduce in a fresh install of queens/rocky and provide logs?

Changed in kolla:
status: New → Incomplete
Revision history for this message
David Aikema (david-aikema) wrote :

I'm not sure about the orginal poster, but as someone who'd noted experiencing similar systems, the underlying cause seemed related to the MTUs we were using at the time.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for kolla because there has been no activity for 60 days.]

Changed in kolla:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.