spurious 'no space left on device' errors during venv build

Bug #1753790 reported by Markos Chandras
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
High
Kevin Carter

Bug Description

We are seeing the following occasional problems in the intergrated repo builds

Exception:\nTraceback (most recent call last):\n File \"/tmp/openstack-venv-builder/venvs/nova/lib/python2.7/site-packages/pip/basecommand.py\", line 215, in main\n status = self.run(options, args)\n File \"/tmp/openstack-venv-builder/venvs/nova/lib/python2.7/site-packages/pip/commands/install.py\", line 342, in run\n prefix=options.prefix_path,\n File \"/tmp/openstack-venv-builder/venvs/nova/lib/python2.7/site-packages/pip/req/req_set.py\", line 784, in install\n **kwargs\n File \"/tmp/openstack-venv-builder/venvs/nova/lib/python2.7/site-packages/pip/req/req_install.py\", line 851, in install\n self.move_wheel_files(self.source_dir, root=root, prefix=prefix)\n File \"/tmp/openstack-venv-builder/venvs/nova/lib/python2.7/site-packages/pip/req/req_install.py\", line 1064, in move_wheel_files\n isolated=self.isolated,\n File \"/tmp/openstack-venv-builder/venvs/nova/lib/python2.7/site-packages/pip/wheel.py\", line 345, in move_wheel_files\n clobber(source, lib_dir, True)\n File \"/tmp/openstack-venv-builder/venvs/nova/lib/python2.7/site-packages/pip/wheel.py\", line 323, in clobber\n shutil.copyfile(srcfile, destfile)\n File \"/usr/lib/python2.7/shutil.py\", line 83, in copyfile\n with open(dst, 'wb') as fdst:\nIOError: [Errno 28] No space left on device: '/tmp/openstack-venv-builder/venvs/nova/lib/python2.7/site-packages/networkx/algorithms/centrality/betweenness_subset.py'"

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

In looking through the test results in logstash over the last week's tests we discovered a few things:

1. This is only happening in rackspace regions which have two disks, and the data for all containers is on the second disk. The reason the second disk is used is because the primary is too small.
2. It does not happen consistently. We saw 3 failures, but 17 successes for rax test nodes with the same disk layout.
3. Both Kevin Carter and I have tried to replicate this using the same hardware profile in RAX Public Cloud, but have not been able to replicate the issue.
4. This is only happening on master & queens where the machinectl container back-end is used.

Some extra diagnostic log collection has been added to https://review.openstack.org/552047 in the hope of gaining more insight.

Changed in openstack-ansible:
importance: Critical → High
Changed in openstack-ansible:
assignee: nobody → Kevin Carter (kevin-carter)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-lxc_container_create (master)

Fix proposed to branch: master
Review: https://review.openstack.org/568430

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-lxc_hosts (master)

Reviewed: https://review.openstack.org/568427
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_hosts/commit/?id=2971b787ac9791f55666cac17cf5e260dda05185
Submitter: Zuul
Branch: master

commit 2971b787ac9791f55666cac17cf5e260dda05185
Author: Kevin Carter <email address hidden>
Date: Mon May 14 21:53:21 2018 -0500

    Enable quota system and set qgroups

    This change implements the machinectl quota system and qgroups when
    they're enabled and available. This change is being implemented to
    resolve an issue where machinectl based containers using a loopback file
    system spam DMESG with the following:

    * BTRFS error (device loop0): could not find root $INT

    While various upstream sources say this error is benign[0], it raises
    an inconsistency flag within the host system and is speculatively the
    cause of our inconsistent read-only/Full-FS issues we've seen in the
    integrated gate. Once the qgroups are properly setup the system will
    remove the inconsistency flag and the message spam will stop.

    * BTRFS info (device loop0): qgroup scan completed (inconsistency flag cleared)

    To resolve this issue the quota system is being enabled by default
    and unlimited qgroups are being setup to ensure we're not running
    into file system limitations. This change essentially acknowledges
    the built-in quota system and provides for the ability to set /
    define specific quota (qgroup) options as necessary. While many
    deployers may never use these options or this tooling, the role will
    now properly set everything up should it ever be needed.

    [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1651435
    Closes-Bug: #1753790
    Change-Id: I34a41ac8a9fe4419254284c83f4600efee274c04
    Signed-off-by: Kevin Carter <email address hidden>

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-lxc_container_create (master)

Reviewed: https://review.openstack.org/568430
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_container_create/commit/?id=25478e9b4e8eaf7db05bb87356e6e317638fdd66
Submitter: Zuul
Branch: master

commit 25478e9b4e8eaf7db05bb87356e6e317638fdd66
Author: Kevin Carter <email address hidden>
Date: Mon May 14 22:29:23 2018 -0500

    Enable quota system and set qgroups

    This change implements the machinectl quota system and qgroups when
    they're enabled and available. This change is being implemented to
    resolve an issue where machinectl based containers using a loopback file
    system spam DMESG with the following:

    * BTRFS error (device loop0): could not find root $INT

    While various upstream sources say this error is benign[0], it raises
    an inconsistency flag within the host system and is speculatively the
    cause of our inconsistent read-only/Full-FS issues we've seen in the
    integrated gate. Once the qgroups are properly setup the system will
    remove the inconsistency flag and the message spam will stop.

    * BTRFS info (device loop0): qgroup scan completed (inconsistency flag cleared)

    To resolve this issue the quota system is being enabled by default
    within the "lxc_host" role. This change essentially acknowledges
    the built-in quota system and when enabled provides for the ability
    to set / define specific quota (qgroup) options as necessary. While
    many deployers may never use these options or this tooling, the role
    will now properly set everything up should it ever be needed.

    [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1651435
    Closes-Bug: #1753790
    Depends-On: I34a41ac8a9fe4419254284c83f4600efee274c04
    Change-Id: Ica79472568799098ebf83c6cefc585f117975f37
    Signed-off-by: Kevin Carter <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-lxc_hosts (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/568867

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-lxc_container_create (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/568868

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/569226

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/569226
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=986e35ab4e618509f75bc2f8d4a48941c07a9192
Submitter: Zuul
Branch: master

commit 986e35ab4e618509f75bc2f8d4a48941c07a9192
Author: Kevin Carter <email address hidden>
Date: Thu May 17 13:47:53 2018 -0500

    Add mount options whenever formatting disks

    This change provides mount options for all of the file system types
    we use. These mount options will ensure that the underlying file systems
    react with the best possible performance when building a test cloud. The
    main motivation behind this patch is to help resolve bug 1753790 which
    is causing sporatic failure when gating on RAX test instances that have
    a secondary data disks. We were providing no options when mounting a
    secondary disk which results in using the system defaults which may not
    be very optimal under general workloads. The options have been added to
    the bootstrap role and will be consistently applied whenever a given FS
    is used.

    > Part of this chagne is to modify the FS type for /var/lib/nova. We
      were using EXT4 and the recommended FS for VM workloads is XFS.

    > The mount options have largely been derived from the following posts,
      https://www.phoronix.com/scan.php?page=article&item=linux414-fs-compare&num=1
      https://www.phoronix.com/scan.php?page=article&item=linux-415-fs&num=1

    Change-Id: I75e00af687d46551e19178604b60665342cf2cae
    Closes-Bug: #1753790
    Signed-off-by: Kevin Carter <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-lxc_hosts (stable/queens)

Reviewed: https://review.openstack.org/568867
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_hosts/commit/?id=578cbd1fe7fc71354d2d88a1f1e9cd1294a112b0
Submitter: Zuul
Branch: stable/queens

commit 578cbd1fe7fc71354d2d88a1f1e9cd1294a112b0
Author: Kevin Carter <email address hidden>
Date: Mon May 14 21:53:21 2018 -0500

    Enable quota system and set qgroups

    This change implements the machinectl quota system and qgroups when
    they're enabled and available. This change is being implemented to
    resolve an issue where machinectl based containers using a loopback file
    system spam DMESG with the following:

    * BTRFS error (device loop0): could not find root $INT

    While various upstream sources say this error is benign[0], it raises
    an inconsistency flag within the host system and is speculatively the
    cause of our inconsistent read-only/Full-FS issues we've seen in the
    integrated gate. Once the qgroups are properly setup the system will
    remove the inconsistency flag and the message spam will stop.

    * BTRFS info (device loop0): qgroup scan completed (inconsistency flag cleared)

    To resolve this issue the quota system is being enabled by default
    and unlimited qgroups are being setup to ensure we're not running
    into file system limitations. This change essentially acknowledges
    the built-in quota system and provides for the ability to set /
    define specific quota (qgroup) options as necessary. While many
    deployers may never use these options or this tooling, the role will
    now properly set everything up should it ever be needed.

    [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1651435
    Closes-Bug: #1753790
    Change-Id: I34a41ac8a9fe4419254284c83f4600efee274c04
    Signed-off-by: Kevin Carter <email address hidden>
    (cherry picked from commit 2971b787ac9791f55666cac17cf5e260dda05185)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-lxc_container_create (stable/queens)

Reviewed: https://review.openstack.org/568868
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_container_create/commit/?id=3ff8b4c604be5794827941381a399ea3d0110bf2
Submitter: Zuul
Branch: stable/queens

commit 3ff8b4c604be5794827941381a399ea3d0110bf2
Author: Kevin Carter <email address hidden>
Date: Mon May 14 22:29:23 2018 -0500

    Enable quota system and set qgroups

    This change implements the machinectl quota system and qgroups when
    they're enabled and available. This change is being implemented to
    resolve an issue where machinectl based containers using a loopback file
    system spam DMESG with the following:

    * BTRFS error (device loop0): could not find root $INT

    While various upstream sources say this error is benign[0], it raises
    an inconsistency flag within the host system and is speculatively the
    cause of our inconsistent read-only/Full-FS issues we've seen in the
    integrated gate. Once the qgroups are properly setup the system will
    remove the inconsistency flag and the message spam will stop.

    * BTRFS info (device loop0): qgroup scan completed (inconsistency flag cleared)

    To resolve this issue the quota system is being enabled by default
    within the "lxc_host" role. This change essentially acknowledges
    the built-in quota system and when enabled provides for the ability
    to set / define specific quota (qgroup) options as necessary. While
    many deployers may never use these options or this tooling, the role
    will now properly set everything up should it ever be needed.

    [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1651435

    Backport note:
    This review includes https://review.openstack.org/569409 as it
    was crucial to the proper functioning of this patch.

    Closes-Bug: #1753790
    Depends-On: I34a41ac8a9fe4419254284c83f4600efee274c04
    Change-Id: Ica79472568799098ebf83c6cefc585f117975f37
    Signed-off-by: Kevin Carter <email address hidden>
    (cherry picked from commit 25478e9b4e8eaf7db05bb87356e6e317638fdd66)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible 18.0.0.0b2

This issue was fixed in the openstack/openstack-ansible 18.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create queens-eol

This issue was fixed in the openstack/openstack-ansible-lxc_container_create queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts queens-eol

This issue was fixed in the openstack/openstack-ansible-lxc_hosts queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create rocky-eol

This issue was fixed in the openstack/openstack-ansible-lxc_container_create rocky-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts rocky-eol

This issue was fixed in the openstack/openstack-ansible-lxc_hosts rocky-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create stein-eol

This issue was fixed in the openstack/openstack-ansible-lxc_container_create stein-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts stein-eol

This issue was fixed in the openstack/openstack-ansible-lxc_hosts stein-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create train-eol

This issue was fixed in the openstack/openstack-ansible-lxc_container_create train-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts train-eol

This issue was fixed in the openstack/openstack-ansible-lxc_hosts train-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create ussuri-eol

This issue was fixed in the openstack/openstack-ansible-lxc_container_create ussuri-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts ussuri-eol

This issue was fixed in the openstack/openstack-ansible-lxc_hosts ussuri-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create yoga-eom

This issue was fixed in the openstack/openstack-ansible-lxc_container_create yoga-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts yoga-eom

This issue was fixed in the openstack/openstack-ansible-lxc_hosts yoga-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create victoria-eom

This issue was fixed in the openstack/openstack-ansible-lxc_container_create victoria-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts victoria-eom

This issue was fixed in the openstack/openstack-ansible-lxc_hosts victoria-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create wallaby-eom

This issue was fixed in the openstack/openstack-ansible-lxc_container_create wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts wallaby-eom

This issue was fixed in the openstack/openstack-ansible-lxc_hosts wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create xena-eom

This issue was fixed in the openstack/openstack-ansible-lxc_container_create xena-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts xena-eom

This issue was fixed in the openstack/openstack-ansible-lxc_hosts xena-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_container_create zed-eom

This issue was fixed in the openstack/openstack-ansible-lxc_container_create zed-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-lxc_hosts zed-eom

This issue was fixed in the openstack/openstack-ansible-lxc_hosts zed-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.