tripleo-ci-centos-8-scenario001-standalone jobs are failing overcloud deploy on master - "ERROR gnocchi.utils: Unable to initialize coordination driver"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Unassigned |
Bug Description
tripleo-
2021-10-04 19:55:53,642 [35] ERROR gnocchi.utils: Unable to initialize coordination driver
Traceback (most recent call last):
File "/usr/lib/
return func(*args, **kwargs)
File "/usr/lib/
self.
File "/usr/lib/
return self.execute_
File "/usr/lib/
conn = self.connection or pool.get_
File "/usr/lib/
connection.
File "/usr/lib/
self.
File "/usr/lib/
auth_response = self.read_
File "/usr/lib/
response = self._parser.
File "/usr/lib/
raw = self._buffer.
File "/usr/lib/
self.
File "/usr/lib/
raise ConnectionError
redis.exception
Full logs are below:
The failure started on 10/04:
https:/
Changed in tripleo: | |
milestone: | none → xena-3 |
importance: | Undecided → Critical |
status: | New → Triaged |
tags: | added: ci promotion-blocker |
Seems to be a problem with haproxy believing redis is down.
Redis itself is up: run/redis/ redis.sock: 0 fd=8 name= age=0 idle=0 flags=U db=0 sub=0 psub=0 multi=-1 qbuf=34 qbuf-free=32734 obl=0 oll=0 omem=0 events=r cmd=slaveof')
187:M 04 Oct 2021 21:03:21.235 * MASTER MODE enabled (user request from 'id=13 addr=/var/
tcp 0 0 192.168.24.1:6379 0.0.0.0:* LISTEN 90388/redis-server
tcp 0 0 192.168.24.3:6379 0.0.0.0:* LISTEN 79003/haproxy
But haproxy believes it is down: 2021:21: 11:31.576] redis redis/<NOSRV> -1/-1/0 0 SC 7/1/0/0/0 0/0 2021:21: 11:31.882] redis redis/<NOSRV> -1/-1/0 0 SC 7/1/0/0/0 0/0
Oct 4 21:11:31 standalone haproxy[12]: 192.168.24.3:57834 [04/Oct/
Oct 4 21:11:31 standalone haproxy[12]: 192.168.24.3:57844 [04/Oct/
The haproxy config is: UFH3QQRuVR\ r\n ctlplane. localdomain 192.168.24.1:6379 check fall 5 inter 2000 on-marked-down shutdown-sessions rise 2
frontend redis
bind 192.168.24.3:6379 transparent
option tcplog
backend redis_be
balance first
tcp-check connect port 6379
tcp-check send AUTH\ nHyR7JmXmTWzrA0
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
option tcp-check
server standalone.
Which looks fairly sane. I'll debug this with damien today