Deploy failed: '[Errno 17] File exists' for symlinks in /var/log/remote

Bug #1576312 reported by Timur Sufiev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Dmitry Guryanov
Mitaka
Confirmed
High
Dmitry Guryanov

Bug Description

ISO Version: 10.0 #162

Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/usr/lib/python2.7/unittest/case.py", line 1043, in runTest
    self._testFunc()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 120, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/tests/test_ceph.py", line 502, in ceph_rados_gw
    self.fuel_web.deploy_cluster_wait(cluster_id)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 455, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 440, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 491, in wrapper
    return func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 498, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 382, in wrapper
    return func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/models/fuel_web_client.py", line 891, in deploy_cluster_wait
    self.assert_task_success(task, interval=interval, timeout=timeout)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/__init__.py", line 59, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0.main.ubuntu.bvt_2/fuelweb_test/models/fuel_web_client.py", line 322, in assert_task_success
    task["name"], task['status'], 'ready', _message(task)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/asserts.py", line 55, in assert_equal
    raise ASSERTION_ERROR(message)
AssertionError: Task 'deploy' has incorrect status. error != ready, '[Errno 17] File exists'

See https://product-ci.infra.mirantis.net/view/10.0/job/10.0.main.ubuntu.bvt_2/132/console

Tags: area-python
Timur Sufiev (tsufiev-x)
Changed in fuel:
importance: Undecided → High
milestone: none → 10.0
assignee: nobody → Fuel QA Team (fuel-qa)
Revision history for this message
Timur Sufiev (tsufiev-x) wrote :
Revision history for this message
Timur Sufiev (tsufiev-x) wrote :
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → nobody
tags: added: area-library
Changed in fuel:
importance: High → Critical
tags: added: swarm-blocker
Revision history for this message
Sergey Kulanov (skulanov) wrote :

nailgun app.log

2016-04-28 15:41:11.888 ERROR [7fa28204d880] (manager) Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nailgun/task/manager.py", line 57, in _call_silently
    to_return = method(task, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 509, in message
    logs_utils.prepare_syslog_dir(node)
  File "/usr/lib/python2.7/site-packages/nailgun/utils/logs.py", line 86, in prepare_syslog_dir
    os.symlink(objects.Node.get_node_fqdn(node), l)
OSError: [Errno 17] File exists

Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
status: New → Confirmed
tags: added: area-python
removed: area-library
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Looks like duplicate for bug #1506112. However it is really rare case and shouldn't affect BVT at all.

Revision history for this message
Dmitry Guryanov (dguryanov) wrote :

FRom nailgun's log:

2016-04-28 15:41:11.878 DEBUG [7fa28204d880] (logs) Preparing syslog directories for node: node-5.test.domain.local
2016-04-28 15:41:11.879 DEBUG [7fa28204d880] (logs) prepare_syslog_dir prefix=/var/log/remote/
2016-04-28 15:41:11.883 DEBUG [7fa28204d880] (logs) prepare_syslog_dir old=/var/log/remote/10.109.15.6
2016-04-28 15:41:11.883 DEBUG [7fa28204d880] (logs) prepare_syslog_dir new=/var/log/remote/node-5.test.domain.local
2016-04-28 15:41:11.884 DEBUG [7fa28204d880] (logs) prepare_syslog_dir bak=/var/log/remote/node-5.test.domain.local.bak
2016-04-28 15:41:11.884 DEBUG [7fa28204d880] (logs) prepare_syslog_dir links=['/var/log/remote/10.109.15.6']
2016-04-28 15:41:11.884 DEBUG [7fa28204d880] (logs) Old /var/log/remote/10.109.15.6 exists and it is directory. Trying to rename into /var/log/remote/node-5.test.domain.local
2016-04-28 15:41:11.884 DEBUG [7fa28204d880] (logs) Creating symlink /var/log/remote/10.109.15.6 -> /var/log/remote/node-5.test.domain.local
2016-04-28 15:41:11.888 ERROR [7fa28204d880] (manager) Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nailgun/task/manager.py", line 57, in _call_silently
    to_return = method(task, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 509, in message
    logs_utils.prepare_syslog_dir(node)
  File "/usr/lib/python2.7/site-packages/nailgun/utils/logs.py", line 86, in prepare_syslog_dir
    os.symlink(objects.Node.get_node_fqdn(node), l)
OSError: [Errno 17] File exists

:) dguryanov@dguryanov ~/tmp/4 $ ls -l fuel-snapshot-2016-04-28_15-41-12/nailgun.test.domain.local/var/log/remote/10.109.15.6/bootstrap/
total 20
-rw-r--r-- 1 dguryanov dguryanov 317 Apr 28 18:41 agent.log
-rw-r--r-- 1 dguryanov dguryanov 105 Apr 28 18:41 auth.log
-rw-r--r-- 1 dguryanov dguryanov 447 Apr 28 18:41 mcollective.log
-rw-r--r-- 1 dguryanov dguryanov 401 Apr 28 18:41 messages.log
-rw-r--r-- 1 dguryanov dguryanov 1012 Apr 28 18:41 syslog.log

:) dguryanov@dguryanov ~/tmp/4 $ ls -l fuel-snapshot-2016-04-28_15-41-12/nailgun.test.domain.local/var/log/remote/node-5.test.domain.local/bootstrap/
total 1800
-rw-r--r-- 1 dguryanov dguryanov 29890 Apr 28 18:41 agent.log
-rw-r--r-- 1 dguryanov dguryanov 2176 Apr 28 18:41 auth.log
-rw-r--r-- 1 dguryanov dguryanov 47778 Apr 28 18:41 dmesg.log
-rw-r--r-- 1 dguryanov dguryanov 51334 Apr 28 18:41 kern.log.log
-rw-r--r-- 1 dguryanov dguryanov 85030 Apr 28 18:41 mcollective.log
-rw-r--r-- 1 dguryanov dguryanov 415225 Apr 28 18:41 messages.log
-rw-r--r-- 1 dguryanov dguryanov 272614 Apr 28 18:41 netprobe_listener.log
-rw-r--r-- 1 dguryanov dguryanov 74280 Apr 28 18:41 netprobe_sender.log
-rw-r--r-- 1 dguryanov dguryanov 473625 Apr 28 18:41 syslog.log
-rw-r--r-- 1 dguryanov dguryanov 369415 Apr 28 18:41 udev.log

Code:
https://github.com/openstack/fuel-web/blob/master/nailgun/nailgun/utils/logs.py#L44,L88

Revision history for this message
Dmitry Guryanov (dguryanov) wrote :

I've also seen this bug with slightly another log message:

2016-04-29 00:00:00.470 DEBUG [7f8581fff880] (logs) Preparing syslog directories for node: node-1.test.domain.local
2016-04-29 00:00:00.471 DEBUG [7f8581fff880] (logs) prepare_syslog_dir prefix=/var/log/remote/
2016-04-29 00:00:00.473 DEBUG [7f8581fff880] (logs) prepare_syslog_dir old=/var/log/remote/10.109.55.3
2016-04-29 00:00:00.473 DEBUG [7f8581fff880] (logs) prepare_syslog_dir new=/var/log/remote/node-1.test.domain.local
2016-04-29 00:00:00.474 DEBUG [7f8581fff880] (logs) prepare_syslog_dir bak=/var/log/remote/node-1.test.domain.local.bak
2016-04-29 00:00:00.474 DEBUG [7f8581fff880] (logs) prepare_syslog_dir links=['/var/log/remote/10.109.55.3']
2016-04-29 00:00:00.474 DEBUG [7f8581fff880] (logs) New /var/log/remote/node-1.test.domain.local already exists. Trying to backup
2016-04-29 00:00:00.474 DEBUG [7f8581fff880] (logs) Old /var/log/remote/10.109.55.3 exists and it is link. Trying to unlink
2016-04-29 00:00:00.474 DEBUG [7f8581fff880] (logs) Creating /var/log/remote/node-1.test.domain.local
2016-04-29 00:00:00.475 DEBUG [7f8581fff880] (logs) Creating symlink /var/log/remote/10.109.55.3 -> /var/log/remote/node-1.test.domain.local
2016-04-29 00:00:00.479 ERROR [7f8581fff880] (manager) Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nailgun/task/manager.py", line 57, in _call_silently
    to_return = method(task, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 509, in message
    logs_utils.prepare_syslog_dir(node)
  File "/usr/lib/python2.7/site-packages/nailgun/utils/logs.py", line 86, in prepare_syslog_dir
    os.symlink(objects.Node.get_node_fqdn(node), l)
OSError: [Errno 17] File exists

I my case there was a directory /var/log/remote/10.109.55.3/ too.

Dmitry Pyzhov (dpyzhov)
summary: - Deploy ceph HA with RadosGW for objects failed: Task 'deploy' has
- incorrect status. error != ready, '[Errno 17] File exists'
+ Deploy failed: '[Errno 17] File exists' for symlinks in /var/log/remote
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

This bug failed BVT only one time. Moving priority from Critical to high

Changed in fuel:
importance: Critical → High
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Dmitry Guryanov (dguryanov)
Revision history for this message
Dmitry Guryanov (dguryanov) wrote :

The error can occur if master node aperates slowly.

In this case it can execute code from

nailgun/nailgun/utils/logs.py

    # rename bootstrap directory into fqdn
    if os.path.islink(old):
        logger.debug("Old %s exists and it is link. "
                     "Trying to unlink", old)
        os.unlink(old)
    if os.path.isdir(old):
        logger.debug("Old %s exists and it is directory. "
                     "Trying to rename into %s", old, new)
        os.rename(old, new)
    else:
        logger.debug("Creating %s", new)
        os.makedirs(new)

    # creating symlinks
    for l in links:
        if os.path.islink(l) or os.path.isfile(l):
            logger.debug("%s already exists. "
                         "Trying to unlink", l)
            os.unlink(l)
        if os.path.isdir(l):
            logger.debug("%s already exists and it directory. "
                         "Trying to remove", l)
            shutil.rmtree(l)
---------------------
Then syslog server continues receiving messaqges from node, it recreates missing <ip>/bootstrap directory and writes some files there. Then nailgun process continues
------------------------
        logger.debug("Creating symlink %s -> %s", l, new)
        os.symlink(objects.Node.get_node_fqdn(node), l) <<<<<<<<<<<<<<< error!

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

This bug failed BVT only one time and doesn't affect swarm.

tags: removed: swarm-blocker
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.