Comment 10 for bug 1624013

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

More observation from my side:

The problem is the following command:
mysql -uclustercheck -pOObsCqCTtkLkRHK52n0H0N8O -Nbe "show status like 'wsrep_local_state_comment'" | grep -q -e Synced && sleep 10

will result into this error:
(HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)

If we check into the environment just after the deployment failure, we can get the same error by manually running the mysql commands. But the error will go away after a while - actually need wait for a long while (not sure of the exact time, but >1 hour in one of my test).

When a new controller node joined, it seems mysql will try to transfer State Snapshot from other existing controllers. During this transferring period, mysql doesn't give regular service. That may have resulted the error.

But I have no idea so far why SS transfer takes longer than expected. Any idea on what we can do for further trouble shooting?

root@node-25:~# pstree -pas 17159
init,1
  └─mysqld_safe,17159 /usr/bin/mysqld_safe --pid-file=/var/run/resource-agents/mysql-wss/mysql-wss.pid --socket=/var/run/mysqld/mysqld.sock ...
      ├─logger,18113 -t mysqld -p daemon.error
      └─mysqld,18112 --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --open-files-limit=102400...
          ├─sh,18257 -c...
          │ └─wsrep_sst_xtrab,18258 -ue /usr//bin/wsrep_sst_xtrabackup-v2 --role joiner --address 192.168.0.7 --auth root:iM9VWfY1anaR5ZXWconKfZq3 ...
          │ ├─socat,18530 -u TCP-LISTEN:4444,reuseaddr stdio
          │ └─xbstream,18531 -x
          ├─{mysqld},18114
          ├─{mysqld},18252
          ├─{mysqld},18253
          ├─{mysqld},18254
          ├─{mysqld},18255
          └─{mysqld},18256