[CI] CephFS NFS job ends up causing several retries

Bug #2009083 reported by Goutham Pacha Ravi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Fix Released
High
Goutham Pacha Ravi

Bug Description

Lately, the CephFS NFS job has been going into several retries on the CI system. Zuul retries jobs that it believes failed due to connection issues. In this case, the job seems to be running out of disk space:

Some console log (attached to this report) excerpts:

2023-03-02 19:23:13.551510 | TASK [fetch-subunit-output : Find stestr or testr executable]
2023-03-02 19:23:13.663236 | controller | ERROR
2023-03-02 19:23:13.663719 | controller | {
2023-03-02 19:23:13.663841 | controller | "msg": "mkdir: cannot create directory \u2018/home/zuul/.ansible/tmp/ansible-tmp-1677784993.6142595-6-181517982736498\u2019: No space left on device\n",
2023-03-02 19:23:13.663933 | controller | "unreachable": true
2023-03-02 19:23:13.664103 | controller | }

2023-03-02 19:23:37.133634 | TASK [remove-build-sshkey : Remove the build SSH key from all nodes]
2023-03-02 19:23:37.546978 | controller | MODULE FAILURE:
2023-03-02 19:23:37.547200 | controller | Traceback (most recent call last):
2023-03-02 19:23:37.547265 | controller | File "<stdin>", line 107, in <module>
2023-03-02 19:23:37.547323 | controller | File "<stdin>", line 92, in _ansiballz_main
2023-03-02 19:23:37.547377 | controller | File "/usr/lib/python3.8/tempfile.py", line 486, in mkdtemp
2023-03-02 19:23:37.547432 | controller | prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)
2023-03-02 19:23:37.547490 | controller | File "/usr/lib/python3.8/tempfile.py", line 256, in _sanitize_params
2023-03-02 19:23:37.547544 | controller | dir = gettempdir()
2023-03-02 19:23:37.547598 | controller | File "/usr/lib/python3.8/tempfile.py", line 425, in gettempdir
2023-03-02 19:23:37.547652 | controller | tempdir = _get_default_tempdir()
2023-03-02 19:23:37.547704 | controller | File "/usr/lib/python3.8/tempfile.py", line 357, in _get_default_tempdir
2023-03-02 19:23:37.547758 | controller | raise FileNotFoundError(_errno.ENOENT,
2023-03-02 19:23:37.547811 | controller | FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/zuul']
2023-03-02 19:23:37.547874 | controller | ERROR: Ignoring Errors

https://meetings.opendev.org/irclogs/%23openstack-manila/%23openstack-manila.2023-03-02.log.html#t2023-03-02T21:39:47

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :
Changed in manila:
importance: Undecided → High
assignee: nobody → Goutham Pacha Ravi (gouthamr)
milestone: none → antelope-rc1
Changed in manila:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila-tempest-plugin (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila-tempest-plugin (master)

Reviewed: https://review.opendev.org/c/openstack/manila-tempest-plugin/+/856540
Committed: https://opendev.org/openstack/manila-tempest-plugin/commit/7669e94f4d0006b3661b59e04a9cff2065cee370
Submitter: "Zuul (22348)"
Branch: master

commit 7669e94f4d0006b3661b59e04a9cff2065cee370
Author: Goutham Pacha Ravi <email address hidden>
Date: Thu Mar 2 15:33:38 2023 -0800

    [CI] Ceph/NFS: skip data-intensive tests with ipv6

    We're running test_create_shrink_and_write
    and test_create_extend_and_write tests twice,
    changing only the client VM's IP address family.
    Not much value in running all of these since we're
    exhausting the disk space with these tests. We can
    choose to re-enable these tests when the situation
    changes with either the footprint of the tests
    or the form factor of the devstack node.

    Partial-Bug: #2009083
    Change-Id: I868b2e40934e5d10eb0c5cc6fa867e6adc2cc9fc
    Signed-off-by: Goutham Pacha Ravi <email address hidden>

Revision history for this message
Vida Haririan (vhariria) wrote :

We have a workaround for this bug ...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on manila-tempest-plugin (master)

Change abandoned by "Goutham Pacha Ravi <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/manila-tempest-plugin/+/876239
Reason: See reason on past comment

Revision history for this message
Vida Haririan (vhariria) wrote :
Changed in manila:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.