Deployment fails with :/usr/bin/docker-current: Error response from daemon: grpc: the connection is unavailable

Bug #1799717 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
High
Unassigned

Bug Description

The error message is highly likely a red herring pointing out to some other sort of issues, like system under pressure perhaps.

Examples:

Single error: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_48_00

Multiple errors: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_15_16

dstat: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/dstat.html.gz

dstat shows correlation with high CPU wait numbers (>70%) and an increased memory use (See 19h 48m 44s and further on)

See also the elastic-recheck stats for that error pattern:

total hits: 7
build_branch
  85% master
  14% stable/rocky
build_change
  14% 591540 610728 582301
  14% 611447 608354
  14% 582735
  14% 610087
  14% 610491
build_name
  14% tripleo-ci-centos-7-containers-multinode tripleo-ci-centos-7-scenario001-multinode-oooq-container
  14% tripleo-ci-centos-7-undercloud-containers tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates tripleo-ci-centos-7-scenario003-multinode-oooq-container
  14% tripleo-ci-centos-7-containers-multinode
  14% tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades
  14% tripleo-ci-centos-7-scenario004-multinode-oooq-container
build_node
  100% centos-7
build_queue
  85% check
  14% gate
build_status
  71% FAILURE
  14% FAILURE SUCCESS
  14% SUCCESS FAILURE
build_zuul_url
  100% N/A
filename
  71% logs/undercloud/var/log/extra/logstash.txt
  28% logs/undercloud/var/log/extra/errors.txt
log_url
  14% http://logs.openstack.org/40/591540/69/check/tripleo-ci-centos-7-undercloud-containers/7c3676b/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/28/610728/5/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/a299956/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/logstash.txt
  14% http://logs.openstack.org/47/611447/2/check/tripleo-ci-centos-7-containers-multinode/3ede27e/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/54/608354/3/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/9690552/logs/undercloud/var/log/extra/logstash.txt
  14% http://logs.openstack.org/15/610515/1/check/tripleo-ci-centos-7-scenario004-multinode-oooq-container/02cd3e4/logs/undercloud/var/log/extra/logstash.txt
  14% http://logs.openstack.org/35/582735/10/check/tripleo-ci-centos-7-containers-multinode/72c2e19/logs/undercloud/var/log/extra/logstash.txt
  14% http://logs.openstack.org/87/610087/3/check/tripleo-ci-centos-7-scenario007-multinode-oooq-container/6002e49/logs/undercloud/var/log/extra/logstash.txt
node_provider
  57% inap-mtl01
  14% inap-mtl01 rax-dfw
  14% ovh-gra1 rax-iad inap-mtl01
  14% rax-iad
port
  14% 35486
  14% 38788
  14% 42428
  14% 42552
  14% 45124
project
  28% openstack/tripleo-heat-templates
  28% openstack/tripleo-quickstart-extras
  14% openstack/tripleo-heat-templates openstack/congress
  14% openstack/tripleo-quickstart openstack/tripleo-quickstart-extras openstack/tripleo-heat-templates
  14% openstack/tripleo-common
severity
  71% INFO
  28% ERROR
tags
  71% logstash.txt console postci multiline _grokparsefailure
  28% errors.txt console errors multiline _grokparsefailure
voting
  57% 1
  28% 0
  14% 1 0
zuul_executor
  28% ze09.openstack.org
  14% ze07.openstack.org ze02.openstack.org ze01.openstack.org
  14% ze10.openstack.org ze05.openstack.org
  14% ze03.openstack.org
  14% ze07.openstack.org

So jobs not always fail with that error. It should be CPU wait (IO) and memory pressure related instead.

Tags: ci
Changed in tripleo:
importance: Undecided → High
milestone: none → stein-1
status: New → Triaged
tags: added: ci
description: updated
description: updated
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
wes hayutin (weshayutin) wrote :

Follow up w/ Bogdan

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.