Standalone deployment immediately fails when removing the install directory fails because of EBUSY

Bug #1986742 reported by Takashi Kajinami
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Low
Takashi Kajinami

Bug Description

Description
===========

Standalone jobs failed sometimes because of EBUSY while cleaning up the install directory, even after deployment succeeds.

https://zuul.opendev.org/t/openstack/build/efd7be84688a426e8d080efd4a7d41da

https://c9839ffb15782ba81e35-f968b23fb880b8519913848b80d48440.ssl.cf5.rackcdn.com/853303/4/gate/puppet-glance-tripleo-standalone/efd7be8/logs/undercloud/home/zuul/standalone_deploy.log
~~~
2022-08-16 21:15:41Z [standalone.AllNodesDeploySteps]: CREATE_COMPLETE state changed
2022-08-16 21:15:41Z [standalone]: CREATE_COMPLETE Stack CREATE completed successfully

 Stack standalone/b3dd6eda-9538-4b08-85ca-188ee9545358 CREATE_COMPLETE

Generating default ansible config file /home/zuul/tripleo-deploy/ansible.cfg
Exception: [Errno 17] File exists: '/home/zuul/tripleo-deploy/heat_launcher/tripleo_deploy-1660684284.141998'
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tripleoclient/v1/tripleo_deploy.py", line 1281, in _standalone_deploy
    self._kill_heat(parsed_args)
  File "/usr/lib/python3.9/site-packages/tripleoclient/v1/tripleo_deploy.py", line 447, in _kill_heat
    self.heat_launch.kill_heat(self.heat_pid)
  File "/usr/lib/python3.9/site-packages/tripleoclient/heat_launcher.py", line 452, in kill_heat
    shutil.rmtree(self.install_dir)
  File "/usr/lib64/python3.9/shutil.py", line 740, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib64/python3.9/shutil.py", line 738, in rmtree
    os.rmdir(path)
OSError: [Errno 16] Device or resource busy: '/home/zuul/tripleo-deploy/heat_launcher/tripleo_deploy-wm4fcrcf'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tripleoclient/v1/tripleo_deploy.py", line 1372, in take_action
    self._standalone_deploy(parsed_args)
  File "/usr/lib/python3.9/site-packages/tripleoclient/v1/tripleo_deploy.py", line 1322, in _standalone_deploy
    self._kill_heat(parsed_args)
  File "/usr/lib/python3.9/site-packages/tripleoclient/v1/tripleo_deploy.py", line 447, in _kill_heat
    self.heat_launch.kill_heat(self.heat_pid)
  File "/usr/lib/python3.9/site-packages/tripleoclient/heat_launcher.py", line 447, in kill_heat
    shutil.copytree(
  File "/usr/lib64/python3.9/shutil.py", line 568, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
  File "/usr/lib64/python3.9/shutil.py", line 467, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
  File "/usr/lib64/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/home/zuul/tripleo-deploy/heat_launcher/tripleo_deploy-1660684284.141998'
None
[Errno 17] File exists: '/home/zuul/tripleo-deploy/heat_launcher/tripleo_deploy-1660684284.141998'
~~~

The command output does not indicate failure in umount, I suspect EBUSY was returned because rmdir was executed right after umount and there were still remaining reference caused left by mount.
We better introduce retry mechanism to avoid immediate failure, because deployment has passed without failure and the error affects only cleanup process.

Changed in tripleo:
importance: Undecided → Low
assignee: nobody → Takashi Kajinami (kajinamit)
milestone: none → zed-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)
Changed in tripleo:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/c/openstack/python-tripleoclient/+/853381
Committed: https://opendev.org/openstack/python-tripleoclient/commit/32888b9c897495ee2c4815217d17668736b010a2
Submitter: "Zuul (22348)"
Branch: master

commit 32888b9c897495ee2c4815217d17668736b010a2
Author: Takashi Kajinami <email address hidden>
Date: Wed Aug 17 09:58:44 2022 +0900

    standalone heat: Retry removing the install directory

    ... in case rmdir failed because of EBUSY, to avoid immediate command
    failure in case the directory has not yet been released after umount.

    Closes-Bug: #1986742
    Change-Id: I8d8e658a66f0fdd4992d34354bcfd583f9b366e5

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 19.0.0

This issue was fixed in the openstack/python-tripleoclient 19.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.