iscsi container failed with: 'text file busy'

Bug #1760787 reported by Michele Baldessari
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Michele Baldessari

Bug Description

Seen on https://logs.rdoproject.org/38/558038/2/openstack-check/gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/Z8c4937957a5f4bc5987573b6ee8def15/overcloud-controller-1/var/log/extra/docker/docker_allinfo.log.txt.gz

We can then see that the iscsid container failed with:
Mar 31 09:18:47 overcloud-controller-1 os-collect-config[2982]: "Digest: sha256:1570019a54948c7062ffaef3a27abc5daef19d425f6443772fee36ef46a383dd",
Mar 31 09:18:47 overcloud-controller-1 os-collect-config[2982]: "Status: Downloaded newer image for 192.168.24.1:8787/master/centos-binary-iscsid:bc4a4d35820cfa20be20587efaf07d6431716fa5_1f757656",
Mar 31 09:18:47 overcloud-controller-1 os-collect-config[2982]: "2018-03-31 09:17:22,715 DEBUG: 20118 -- NET_HOST enabled",
Mar 31 09:18:47 overcloud-controller-1 os-collect-config[2982]: "2018-03-31 09:17:22,715 DEBUG: 20118 -- Running docker command: /usr/bin/docker run --user root --name docker-puppet-iscsid --env PUPPET_TAGS=file,file_line,concat,augeas,cron,iscsid_config --env NAME=iscsid --env HOSTNAME=overcloud-controller-1 --env NO_ARCHIVE= --env STEP=6 --volume /etc/localtime:/etc/localtime:ro --volume /tmp/tmp_C1ykQ:/etc/config.pp:ro,z --volume /etc/puppet/:/tmp/puppet-etc/:ro,z --volume /usr/share/openstack-puppet/modules/:/usr/share/openstack-puppet/modules/:ro,z --volume /var/lib/config-data:/var/lib/config-data/:z --volume tripleo_logs:/var/log/tripleo/ --volume /dev/log:/dev/log --volume /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume /var/lib/docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:z --volume /etc/iscsi:/etc/iscsi --entrypoint /var/lib/docker-puppet/docker-puppet.sh --net host --volume /etc/hosts:/etc/hosts:ro 192.168.24.1:8787/master/centos-binary-iscsid:bc4a4d35820cfa20be20587efaf07d6431716fa5_1f757656",

Mar 31 09:18:47 overcloud-controller-1 os-collect-config[2982]: "2018-03-31 09:17:23,019 ERROR: 20118 -- Failed running docker-puppet.py for iscsid", Mar 31 09:18:47 overcloud-controller-1 os-collect-config[2982]: "2018-03-31 09:17:23,019 ERROR: 20118 -- standard_init_linux.go:178: exec user process caused \"text file busy\"",
Mar 31 09:18:47 overcloud-controller-1 os-collect-config[2982]: "2018-03-31 09:17:23,019 INFO: 20118 -- Finished processing puppet configs for iscsid",

Revision history for this message
Michele Baldessari (michele) wrote :

I have seen this only once so far, hence just the medium severity.

Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
Brent Eagles (beagles) wrote :

Hit this today with a pretty fresh setup, so it is still around.

Revision history for this message
Michele Baldessari (michele) wrote :

Hit this on a queens setup today on a different container though:
[root@overcloud-controller-2 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a401108cd827 192.168.24.1:8787/tripleoqueens/centos-binary-glance-api:current-tripleo-rdo "/var/lib/docker-p..." 19 minutes ago Exited (1) 19 minutes ago docker-puppet-glance_api
[root@overcloud-controller-2 ~]# docker logs docker-puppet-glance_api
standard_init_linux.go:178: exec user process caused "text file busy"
[root@overcloud-controller-2 ~]#

Revision history for this message
Michele Baldessari (michele) wrote :

So the above error is caused by the following code in docker docker-94f42401bf8b413eac9dbb0e4065fa5bcb640696/runc-e9c345b3f906d5dc5e8100b05ce37073a811c74a/libcontainer/standard_init_linux.go
:
if err := syscall.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
   return newSystemErrorWithCause(err, "exec user process")
}

Now text file busy is (via vendor/golang.org/x/sys/unix/zerrors_linux_amd64.go) error 26 which
corresponds to: asm-generic/errno-base.h:#define ETXTBSY 26 /* Text file busy */
Which translates to:
The file referred to by pathname is open for writing by another process.

Now in our case the pathname being executed was:
[root@overcloud-controller-2 ~]# docker inspect --format='{{.Path}}' docker-puppet-glance_api
/var/lib/docker-puppet/docker-puppet.sh

So that would imply that when the docker-puppet-glance_api failed something else had opened /var/lib/docker-puppet/docker-puppet.sh for writing.
It does seem that we bind mount that file inside a container without the 'ro' attribute:
{
    "Type": "bind",
    "Source": "/var/lib/docker-puppet/docker-puppet.sh",
    "Destination": "/var/lib/docker-puppet/docker-puppet.sh",
    "Mode": "z",
    "RW": true,
    "Propagation": "rprivate"
},

So two questions immediately spring to mind:
A) Why would we bind mount this file in RW ?
B) Who is opening for write docker-puppet.sh on the host? (assuming that it was busy because it was opened for writing from the host and not from the container. The latter seems rather unlikely)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/571438

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
status: Triaged → In Progress
Revision history for this message
Michele Baldessari (michele) wrote :

The answer to A) is that we create this file concurrently in docker-puppet.py and that actually explains this race.

tags: added: pike-backport-potential queens-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/571438
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1037786b6c1ee9bb135d94c42f45194b3d31835e
Submitter: Zuul
Branch: master

commit 1037786b6c1ee9bb135d94c42f45194b3d31835e
Author: Michele Baldessari <email address hidden>
Date: Thu May 31 12:30:20 2018 +0200

    Create docker-puppet.sh only once to avoid ETXTBSY errors

    We currently create /var/lib/docker-puppet/docker-puppet.sh
    inside the mp_puppet_config() function which then gets
    invoked in parallel via the following:

      p = multiprocessing.Pool(process_count)
      returncodes = list(p.map(mp_puppet_config, process_map))

    This is problematic because we have the following potential race:
    1) Process A opens /var/lib/docker-puppet/docker-puppet.sh for writing
    2) Process B runs docker run and has the following bind mount:
       /var/lib/docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:z
    3) Process B will fail because an exec of a file being written to
       will return ETXTBSY

    The deployment can fail due to the above with the following error:
    [root@overcloud-controller-2 ~]# docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    a401108cd827 192.168.24.1:8787/tripleoqueens/centos-binary-glance-api:current-tripleo-rdo "/var/lib/docker-p..." 19 minutes ago Exited (1) 19 minutes ago docker-puppet-glance_api
    [root@overcloud-controller-2 ~]# docker logs docker-puppet-glance_api
    standard_init_linux.go:178: exec user process caused "text file busy"

    Since /var/lib/docker-puppet/docker-puppet.sh never changes
    there is really no need to create it multiple times. Let's just
    create it once before spawning the multiple docker run commands
    so we avoid any ETXTBSY errors.

    Ran 20 successful deployments in sequence with this change applied.

    Change-Id: I16b19488ce9f1411273459576db76d16b318dacb
    Closes-Bug: #1760787

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/572125

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.0.0.0b3

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/572125
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=96ae89f07c4bc42672bc444f5707a5ba7f903e43
Submitter: Zuul
Branch: stable/queens

commit 96ae89f07c4bc42672bc444f5707a5ba7f903e43
Author: Michele Baldessari <email address hidden>
Date: Thu May 31 12:30:20 2018 +0200

    Create docker-puppet.sh only once to avoid ETXTBSY errors

    We currently create /var/lib/docker-puppet/docker-puppet.sh
    inside the mp_puppet_config() function which then gets
    invoked in parallel via the following:

      p = multiprocessing.Pool(process_count)
      returncodes = list(p.map(mp_puppet_config, process_map))

    This is problematic because we have the following potential race:
    1) Process A opens /var/lib/docker-puppet/docker-puppet.sh for writing
    2) Process B runs docker run and has the following bind mount:
       /var/lib/docker-puppet/docker-puppet.sh:/var/lib/docker-puppet/docker-puppet.sh:z
    3) Process B will fail because an exec of a file being written to
       will return ETXTBSY

    The deployment can fail due to the above with the following error:
    [root@overcloud-controller-2 ~]# docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    a401108cd827 192.168.24.1:8787/tripleoqueens/centos-binary-glance-api:current-tripleo-rdo "/var/lib/docker-p..." 19 minutes ago Exited (1) 19 minutes ago docker-puppet-glance_api
    [root@overcloud-controller-2 ~]# docker logs docker-puppet-glance_api
    standard_init_linux.go:178: exec user process caused "text file busy"

    Since /var/lib/docker-puppet/docker-puppet.sh never changes
    there is really no need to create it multiple times. Let's just
    create it once before spawning the multiple docker run commands
    so we avoid any ETXTBSY errors.

    Ran 20 successful deployments in sequence with this change applied.

    Change-Id: I16b19488ce9f1411273459576db76d16b318dacb
    Closes-Bug: #1760787
    (cherry picked from commit 1037786b6c1ee9bb135d94c42f45194b3d31835e)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.4

This issue was fixed in the openstack/tripleo-heat-templates 8.0.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.