rsyslogd writes to wrong files after environment is reset

Bug #1315382 reported by Vladimir Kuklin
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Vladimir Sharshov
5.0.x
Won't Fix
Medium
MOS Maintenance
6.0.x
Won't Fix
Medium
MOS Maintenance
6.1.x
Won't Fix
Medium
MOS Maintenance
7.0.x
Fix Released
Medium
Vladimir Sharshov

Bug Description

deploy an environment

reset it

look for puppet-apply log to be filled in the old directory /var/log/remote/<node-name>.bak/puppet-apply.log instead of /var/log/remote/<node-name>/puppet-apply.log

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: nobody → Dmitry Pyzhov (lux-place)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

looks like rsyslog should be restarted in its docker container as well (in order to renew its FDs)

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Dmitry Pyzhov (lux-place) → Aleksey Kasatkin (alekseyk-ru)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Here is the solution:

E.g. lets check puppet logs in docker for node-6 residing in the new env (node-1, node-2, etc were belonging to the deleted envs)

Look for puppet logs FD mappings in rsyslog container
  dockerctl shell rsyslog lsof | grep puppet
  rsyslogd 391 root 15w REG 253,2 1149077 138872644 /var/log/remote/node-3.test.domain.local/puppet-apply.log
  rsyslogd 391 root 24w REG 253,2 370285 11935 /var/log/remote/node-1.test.domain.local/puppet-apply.log
  rsyslogd 391 root 26w REG 253,2 874922 138887274 /var/log/remote/node-2.test.domain.local/puppet-apply.log
Wrong mappings for old (reset/recreate) envs!

Fix is:
should be done at master node running the docker container for rsyslog on each reset/recreate env and on logrotate postscript as well
 for pids in $(pidof rsyslogd); do echo kill -HUP $pids; done

Recheck now:
dockerctl shell rsyslog lsof | grep puppet
 rsyslogd 391 root 22w REG 253,2 1281 75649827 /var/log/remote/node-6.test.domain.local/puppet-apply.log
Looks ok now, logs present.

Changed in fuel:
status: Confirmed → Triaged
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The fix (for pids in $(pidof rsyslogd); do echo kill -HUP $pids; done) should be implemented:
1) in nailgun logic for reset/create envs
2) in logrotate templates for fuel-library (should be masked by the docker presence)

Changed in fuel:
status: Triaged → In Progress
Changed in fuel:
assignee: Aleksey Kasatkin (alekseyk-ru) → Dima Shulyak (dshulyak)
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/92571

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/92574

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Yes, deletion stage fits better then creation. And stop operation should be considered as well, thanks for this fix

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/92571
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=c5425fad2a6c99ab9412f578b6a33bc981ba7f94
Submitter: Jenkins
Branch: master

commit c5425fad2a6c99ab9412f578b6a33bc981ba7f94
Author: Dima Shulyak <email address hidden>
Date: Wed May 7 15:13:40 2014 +0300

    Add master_ip to engine parameter section

    master_ip should be passed for every task where rebooting is involved
    therefore it is convenient to place them in engine section

    Sending kill -HUP to rsyslogd required after removing nodes
    to refresh its configuration

    Change-Id: Ibd3882621380da1d5fcdb15397438f554b3619c7
    Related-Bug: #1315382

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/92574
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=be48c1a5d77241871d5a0a085e8f3d58977568e4
Submitter: Jenkins
Branch: master

commit be48c1a5d77241871d5a0a085e8f3d58977568e4
Author: Dima Shulyak <email address hidden>
Date: Wed May 7 15:35:35 2014 +0300

    Send sighup to rsyslogd

    After remove_nodes action, create mcollecte client
    with execute_shell_command plugin for master node, and perform:

    ssh root@#{master_ip} 'pkill -HUP rsyslogd'

    Change-Id: I8ab764aaa3bceb9c08c99d5bfe08582c83e267c7
    Closes-Bug: #1315382

tags: added: customer-found
Revision history for this message
Dima Shulyak (dshulyak) wrote :

Guys, is there a snapshot of any kind or other source of info?

I tried to reproduce it on recent environment, and everything works as expected, sighup is sent, and rsyslogd is reconfigured upon this signal.

I need atleast logs to understand if this is same bug or another

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

It was one time issue and we cannot reproduce it. Waiting for more data.

Revision history for this message
Dima Shulyak (dshulyak) wrote :

I found a problem, it is related to created directories like /var/log/remote/node-45.test.domain.local.bak/,
and the fact that we are not sending pkill -HUP before provisioning, so it is possible that rsyslogd will write to wrong
files

Revision history for this message
Dima Shulyak (dshulyak) wrote :

According to astute sighup wont be sent only if some node failed during provisioning, or provisioning was stopped.
So if such thing happens - it should be enough to invoke:

  pkill -HUP rsyslogd

on the fuel master host

Dmitry Pyzhov (dpyzhov)
tags: added: feature-reset-env
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/6.1.x
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Seeing a comment #12, why can't we just use proposed solution by Dmitry S.? It is complicated or it's matter of 1 line change in Astute?

Changed in fuel:
status: Confirmed → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/207904

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/207904
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=6d09f3fc7f69ac558095299211ebfd081fa54b8f
Submitter: Jenkins
Branch: master

commit 6d09f3fc7f69ac558095299211ebfd081fa54b8f
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Fri Jul 31 16:35:55 2015 +0300

    Send sighup for Rsyslog for stop provision and provision failed

    Astute already send sighup for Rsyslog for all scenarios excluding
    stop provision and provision failed. Now it sends for all.

    Change-Id: I3beaecc12e2702497f370189d910f02e9cf6c5cc
    Closes-Bug: #1315382
    Co-Authored-By: Dima Shulyak <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Maksym Strukov (unbelll) wrote :

1. Create cluster
2. Run deploy

[root@nailgun remote]# ls -hal
total 40K
drwxr-xr-x 10 root root 4,0K жов 13 10:42 .
drwxr-xr-x 19 root root 4,0K жов 13 10:42 ..
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.10
lrwxrwxrwx 1 root root 24 жов 13 10:42 10.109.10.3 -> node-1.test.domain.local
lrwxrwxrwx 1 root root 24 жов 13 10:42 10.109.10.4 -> node-2.test.domain.local
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.5
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.6
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.7
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.8
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.9
drwxr-xr-x 3 root root 4,0K жов 13 10:36 node-1.test.domain.local
drwxr-xr-x 3 root root 4,0K жов 13 10:36 node-2.test.domain.local

3. Stop deploy.
4. Run deploy:

[root@nailgun remote]# ls -hal
total 48K
drwxr-xr-x 12 root root 4,0K жов 13 11:07 .
drwxr-xr-x 19 root root 4,0K жов 13 10:42 ..
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.10
lrwxrwxrwx 1 root root 24 жов 13 11:07 10.109.10.3 -> node-1.test.domain.local
lrwxrwxrwx 1 root root 24 жов 13 11:07 10.109.10.4 -> node-2.test.domain.local
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.5
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.6
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.7
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.8
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.9
drwxr-xr-x 2 root root 4,0K жов 13 11:07 node-1.test.domain.local
drwxr-xr-x 3 root root 4,0K жов 13 10:36 node-1.test.domain.local.bak
drwxr-xr-x 2 root root 4,0K жов 13 11:07 node-2.test.domain.local
drwxr-xr-x 3 root root 4,0K жов 13 10:36 node-2.test.domain.local.bak

5. Wait for deploy completed. Reset env:

[root@nailgun remote]# ls -hal
total 40K
drwxr-xr-x 10 root root 4,0K жов 13 12:04 .
drwxr-xr-x 19 root root 4,0K жов 13 11:10 ..
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.10
drwxr-xr-x 3 root root 4,0K жов 13 12:04 10.109.10.3
drwxr-xr-x 3 root root 4,0K жов 13 12:04 10.109.10.4
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.5
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.6
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.7
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.8
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.9

6. Run deploy again:

[root@nailgun remote]# ls -hal
total 40K
drwxr-xr-x 10 root root 4,0K жов 13 13:24 .
drwxr-xr-x 19 root root 4,0K жов 13 11:10 ..
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.10
lrwxrwxrwx 1 root root 24 жов 13 13:24 10.109.10.3 -> node-1.test.domain.local
lrwxrwxrwx 1 root root 24 жов 13 13:24 10.109.10.4 -> node-2.test.domain.local
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.5
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.6
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.7
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.8
drwxr-xr-x 3 root root 4,0K жов 13 10:37 10.109.10.9
drwxr-xr-x 3 root root 4,0K жов 13 12:04 node-1.test.domain.local
drwxr-xr-x 3 root root 4,0K жов 13 12:04 node-2.test.domain.local

What is expected result here?

tags: added: on-verification
tags: removed: on-verification
Revision history for this message
Sergey Novikov (snovikov) wrote :

Verified on RC4

STR:

1. Deploy env
2. Reset env and run deployment of env again
3. Check directory "/var/log/remote/" that there are not *.bak files and puppet-apply log is in /var/log/remote/<nodename>/puppet-apply.log

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for -updates as this is deployment time issue and we don't expect new deployments for versions older than 7.0

tags: added: wontfix-munotapplic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.