Upgrade issues with 'Create log aggregation links'

Bug #1540531 reported by Bjoern
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
Nolan Brubaker
Trunk
Fix Released
Medium
Nolan Brubaker

Bug Description

I just stumbled across this issue installing Liberty in a older AIO system which might create an upgrade issue from Kilo to Liberty :
The task 'Create log aggregation links' can not handle existing directories when processes are running.
At this time we do not have many options other than stopping the service, move the data to /opensack/log and create the link ?

 TASK: [Create log aggregation links] ******************************************
skipping: [aio1_swift_proxy_container-24fd4cf0] => (item={'dest': '/var/log/swift', 'src': u'/openstack/log/aio1_swift_proxy_container-24fd4cf0-swift', 'state': 'link', 'group'
: 'syslog', 'owner': 'syslog'})
failed: [aio1] => (item={'dest': '/var/log/swift', 'src': u'/openstack/log/aio1-swift', 'state': 'link', 'group': 'syslog', 'owner': 'syslog'}) => {"failed": true, "gid": 104,
"group": "syslog", "item": {"dest": "/var/log/swift", "group": "syslog", "owner": "syslog", "src": "/openstack/log/aio1-swift", "state": "link"}, "mode": "0755", "owner": "sysl
og", "path": "/var/log/swift", "size": 4096, "state": "directory", "uid": 101}
msg: the directory /var/log/swift is not empty, refusing to convert it

FATAL: all hosts have already failed -- aborting

Changed in openstack-ansible:
assignee: nobody → Nolan Brubaker (nolan-brubaker)
milestone: none → mitaka-3
Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

How old was the AIO? Was it upgrade between Juno and Kilo? Asking mostly so I know what to test.

Revision history for this message
Bjoern (bjoern-t) wrote :

It is likely it was that old, but I'm not 100% sure anymore

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

Alright - sounds like the AIO's torn down then?

Having versions or background on the codebase would be extremely helpful in reproducing in the future. I'll try a Juno to Kilo to Liberty one and see what happens.

Revision history for this message
Dixon Dick (dixon1e) wrote :

I just had this happen replacing Liberty with Liberty, after using the teardown scripts. Will run teardown again and see if the log directories are correctly removed, for example.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

Thanks for the info - I'm getting around to this today, hopefully I'll have more to add to it.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

After doing a liberty install on an AIO and then running teardown.sh, I do notice that /var/log/{cinder,ceilometer,haproxy} still exist as links to their /openstack/log/ directories.

Looking in the `remove_files` variable of the teardown play, I do not see those present, but swift is.

I'm rebuilding the AIO and will report back with any errors.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

I've also experienced the failure with swift now; I'll get a patch in for teardown.sh.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

Further enumerating the openstack entries in the teardown script didn't help; I still get failures.

I'm wondering if there isn't some kind of race condition where the directory is being made by some other play/role, but I haven't tracked it down quite yet.

Changed in openstack-ansible:
status: New → Confirmed
Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

By process of elimination, this appears to be happening during infrastructure setup within run-playbooks.sh. After running `DEPLOY_OPENSTACK=no DEPLOY_CEILOMETER=no DEPLOY_SWIFT=no ./scripts/run-playbooks.sh`, the /var/log/swift entry was present.

Since I did not disable logging in that run, I'll try that next and see if the swift log files are gone then.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

Watching more closely, the file /var/log/swift/proxy-error.log is being filled with HAProxy logs while the repo-containers are building/populating, but only after an AIO is torn down, not on first run.

Changed in openstack-ansible:
importance: Undecided → Medium
milestone: mitaka-3 → 13.0.0
Revision history for this message
Dixon Dick (dixon1e) wrote :

Trying the install and teardown with Kilo after doing a Liberty install and then checkout to Kilo seems to provoke the same behavior. The /var/log/swift/proxy-error.log file is being created and filled and the Kilo AIO install fails.

With logging disabled, the logs are still being filled.

Example log traffic:

Mar 23 15:13:00 localhost haproxy[17620]: backend repo_all-back has no server available!
Mar 23 15:13:01 localhost haproxy[17620]: Server swift_proxy-back/aio1_swift_proxy_container-8871439c is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Mar 23 15:13:01 localhost haproxy[17620]: backend swift_proxy-back has no server available!
Mar 23 15:22:00 localhost haproxy[17620]: Server repo_all-back/aio1_repo_container-2cb95cdd is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

DIxon - interesting. I still have not been able to isolate why this is happening, but my suspicion is that somewhere an Ansible variable is being re-used for the log file location. Unfortunately, I haven't been able to isolate it.

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Removing all series except trunk and re-targeting this at Newton-1. Once a fix merges then this can be targeted at other branches if necessary.

no longer affects: openstack-ansible/liberty
Changed in openstack-ansible:
status: Confirmed → In Progress
Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/310573
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=2a2ad3a29653ea946ab4cf09524e2183e28baf76
Submitter: Jenkins
Branch: master

commit 2a2ad3a29653ea946ab4cf09524e2183e28baf76
Author: Nolan Brubaker <email address hidden>
Date: Wed Apr 27 15:28:32 2016 -0400

    Remove teardown.sh and update related docs

    The teardown.sh script attempts to provide a convenience, but is often
    left unmaintained. Because it is not maintained, it cannot fully tear
    down an environment. In addition, maintaining it has been made much
    harder since the introduction of independent role repositories - changes
    within those individual repositories would need to either be torn down
    in this script, or a tear down process in each role. Such a process
    would again face issues with maintenance.

    Also the teardown script has created bugs with installs started on a
    'dirty' environment. These behaviors have been hard to diagnose and fix,
    resulting in a lot of wasted time.

    Because of these issues, this patch removes the teardown script
    entirely, and alters the docs to recommend that AIOs are deployed to
    virtual machines for easier tear down and redeploy, as well as advising
    against any kind of production use.

    Fixes-Bug: #1540531

    Change-Id: Ida59bec0ff961424180940628b006d71e88e199b

Changed in openstack-ansible:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.