Deployment fails with :/usr/bin/docker-current: Error response from daemon: grpc: the connection is unavailable

Bug #1799717 reported by Bogdan Dobrelya on 2018-10-24
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Unassigned

Bug Description

The error message is highly likely a red herring pointing out to some other sort of issues, like system under pressure perhaps.

Examples:

Single error: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_48_00

Multiple errors: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/journal.txt.gz#_Oct_23_19_15_16

dstat: http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/dstat.html.gz

dstat shows correlation with high CPU wait numbers (>70%) and an increased memory use (See 19h 48m 44s and further on)

See also the elastic-recheck stats for that error pattern:

total hits: 7
build_branch
  85% master
  14% stable/rocky
build_change
  14% 591540 610728 582301
  14% 611447 608354
  14% 582735
  14% 610087
  14% 610491
build_name
  14% tripleo-ci-centos-7-containers-multinode tripleo-ci-centos-7-scenario001-multinode-oooq-container
  14% tripleo-ci-centos-7-undercloud-containers tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates tripleo-ci-centos-7-scenario003-multinode-oooq-container
  14% tripleo-ci-centos-7-containers-multinode
  14% tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades
  14% tripleo-ci-centos-7-scenario004-multinode-oooq-container
build_node
  100% centos-7
build_queue
  85% check
  14% gate
build_status
  71% FAILURE
  14% FAILURE SUCCESS
  14% SUCCESS FAILURE
build_zuul_url
  100% N/A
filename
  71% logs/undercloud/var/log/extra/logstash.txt
  28% logs/undercloud/var/log/extra/errors.txt
log_url
  14% http://logs.openstack.org/40/591540/69/check/tripleo-ci-centos-7-undercloud-containers/7c3676b/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/28/610728/5/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/a299956/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/01/582301/28/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/7009de4/logs/undercloud/var/log/extra/logstash.txt
  14% http://logs.openstack.org/47/611447/2/check/tripleo-ci-centos-7-containers-multinode/3ede27e/logs/undercloud/var/log/extra/logstash.txt http://logs.openstack.org/54/608354/3/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/9690552/logs/undercloud/var/log/extra/logstash.txt
  14% http://logs.openstack.org/15/610515/1/check/tripleo-ci-centos-7-scenario004-multinode-oooq-container/02cd3e4/logs/undercloud/var/log/extra/logstash.txt
  14% http://logs.openstack.org/35/582735/10/check/tripleo-ci-centos-7-containers-multinode/72c2e19/logs/undercloud/var/log/extra/logstash.txt
  14% http://logs.openstack.org/87/610087/3/check/tripleo-ci-centos-7-scenario007-multinode-oooq-container/6002e49/logs/undercloud/var/log/extra/logstash.txt
node_provider
  57% inap-mtl01
  14% inap-mtl01 rax-dfw
  14% ovh-gra1 rax-iad inap-mtl01
  14% rax-iad
port
  14% 35486
  14% 38788
  14% 42428
  14% 42552
  14% 45124
project
  28% openstack/tripleo-heat-templates
  28% openstack/tripleo-quickstart-extras
  14% openstack/tripleo-heat-templates openstack/congress
  14% openstack/tripleo-quickstart openstack/tripleo-quickstart-extras openstack/tripleo-heat-templates
  14% openstack/tripleo-common
severity
  71% INFO
  28% ERROR
tags
  71% logstash.txt console postci multiline _grokparsefailure
  28% errors.txt console errors multiline _grokparsefailure
voting
  57% 1
  28% 0
  14% 1 0
zuul_executor
  28% ze09.openstack.org
  14% ze07.openstack.org ze02.openstack.org ze01.openstack.org
  14% ze10.openstack.org ze05.openstack.org
  14% ze03.openstack.org
  14% ze07.openstack.org

So jobs not always fail with that error. It should be CPU wait (IO) and memory pressure related instead.

Tags: ci Edit Tag help
Changed in tripleo:
importance: Undecided → High
milestone: none → stein-1
status: New → Triaged
tags: added: ci
description: updated
description: updated
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
wes hayutin (weshayutin) wrote :

Follow up w/ Bogdan

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers