swift_rsync_container: failed to create pid file

Bug #1724559 reported by Dan Prince
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Dan Prince

Bug Description

When restarting or updating the swift rsync container I'm seeing the following error in the logs:

+ CMD='/usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf'
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
+ echo 'Running command: '\''/usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf'\'''
+ exec /usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf
Running command: '/usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf'
failed to create pid file /var/run/rsyncd.pid: File exists

Dan Prince (dan-prince)
Changed in tripleo:
assignee: nobody → Dan Prince (dan-prince)
status: New → In Progress
importance: Undecided → High
milestone: none → queens-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/513020

Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
tags: added: containers
tags: added: pike-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/513020
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f5754bfe53e90f5559a2bce487607a3a2491db6e
Submitter: Zuul
Branch: master

commit f5754bfe53e90f5559a2bce487607a3a2491db6e
Author: Dan Prince <email address hidden>
Date: Wed Oct 18 08:39:17 2017 -0400

    swift_rsync: don't bind mount /run

    This resolves an issue where the pid file exists from a previous
    run of the container.

    Change-Id: Id051172407f0e879d3edf18c8b2ec13734794ed2
    Closes-bug: #1724559

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.0.0b3

This issue was fixed in the openstack/tripleo-heat-templates 8.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/545864

Revision history for this message
Michele Baldessari (michele) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/545864
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6587b6e8a00bac9f0861d0cd311eb8b4866b814c
Submitter: Zuul
Branch: stable/pike

commit 6587b6e8a00bac9f0861d0cd311eb8b4866b814c
Author: Dan Prince <email address hidden>
Date: Wed Oct 18 08:39:17 2017 -0400

    swift_rsync: don't bind mount /run

    This resolves an issue where the pid file exists from a previous
    run of the container.

    Change-Id: Id051172407f0e879d3edf18c8b2ec13734794ed2
    Closes-bug: #1724559
    (cherry picked from commit f5754bfe53e90f5559a2bce487607a3a2491db6e)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.10

This issue was fixed in the openstack/tripleo-heat-templates 7.0.10 release.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I reproduced that with

Name : openstack-tripleo-heat-templates
Version : 8.0.0
Release : 0.20180305111919.910653c.el7.centos

Even though /var/run is not mounted from host anymore, the issue retains with the internal /var/run state it seems

Changed in tripleo:
status: Fix Released → Triaged
tags: added: pike-backport-potentia queens-backport-potential
removed: in-stable-pike pike-backport-potential
Changed in tripleo:
milestone: queens-3 → rocky-1
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Here is a config snippet for the container (a fragment of /var/lib/tripleo-config/hashed-docker-container-startup-config-step_4.json):

  "swift_rsync": {
    "image": "docker.io/tripleomaster/centos-binary-swift-object:current-tripleo",
    "environment": [
      "KOLLA_CONFIG_STRATEGY=COPY_ALWAYS",
      "TRIPLEO_CONFIG_HASH=39bb96726bef0af93b7d29bff71bfdc7"
    ],
    "user": "root",
    "volumes": [
      "/etc/hosts:/etc/hosts:ro",
      "/etc/localtime:/etc/localtime:ro",
      "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro",
      "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro",
      "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro",
      "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro",
      "/dev/log:/dev/log",
      "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro",
      "/etc/puppet:/etc/puppet:ro",
      "/var/lib/kolla/config_files/swift_rsync.json:/var/lib/kolla/config_files/config.json:ro",
      "/var/lib/config-data/puppet-generated/swift/:/var/lib/kolla/config_files/src:ro",
      "/srv/node:/srv/node",
      "/dev:/dev"
    ],
    "net": "host",
    "privileged": true,
    "restart": "always"

Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Related bug: https://bugs.launchpad.net/tripleo/+bug/1734674

The fix for docker/services is not complete though, we need to make sure it works for a container restarts as well

Changed in tripleo:
milestone: rocky-2 → rocky-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/577126

Revision history for this message
Christian Schwede (cschwede) wrote :

I was able to reproduce this too.

It happens if the restart of the container hits the default 10 second timeout; in that case the rsync process is killed, leaving the pidfile - which prevents a restart.

Actually we don't need the pid file at all when using containers, so we should get rid of it. However, it's currently hard-coded into the template in puppetlabs-rsync [1], so we can't simply change that setting. I proposed a patch for this [2], but we need a workaround until that PR gets merged.

I proposed one that only relies on a t-h-t change: https://review.openstack.org/577126

[1] https://github.com/puppetlabs/puppetlabs-rsync/blob/master/templates/header.erb#L4
[2] https://github.com/puppetlabs/puppetlabs-rsync/pull/120

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/577403

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/577403
Reason: The gate is suffering of timeouts, we need to clear it. Please do not restore or recheck this patch, I'll take care of it when gate is stable again.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/577403
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=448e04029e5d8bc982cbc3fbd4409304488437ec
Submitter: Zuul
Branch: master

commit 448e04029e5d8bc982cbc3fbd4409304488437ec
Author: Christian Schwede <email address hidden>
Date: Thu Jun 21 11:12:30 2018 +0000

    Disable pid file usage in the swift_rsync container

    The pidfile is useless within containers, but prevents a restart if the
    process has been stopped unclean. This happens for example when a
    restart takes longer than the default timeout of 10 seconds.

    This is mainly a workaround because the pid file setting is hardcoded in
    the used Puppet module template. It should be removed later once the
    setting can be disabled cleanly with an updated Puppet rsync module.

    Related-Bug: 1724559
    Change-Id: Iecf855a785bc1f787234e4e54430c929cb9cb906

Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/577126
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6499f09f8b1206cfb3697ebd7005b5308359c016
Submitter: Zuul
Branch: stable/queens

commit 6499f09f8b1206cfb3697ebd7005b5308359c016
Author: Christian Schwede <email address hidden>
Date: Thu Jun 21 11:12:30 2018 +0000

    Disable pid file usage in the swift_rsync container

    The pidfile is useless within containers, but prevents a restart if the
    process has been stopped unclean. This happens for example when a
    restart takes longer than the default timeout of 10 seconds.

    This is mainly a workaround because the pid file setting is hardcoded in
    the used Puppet module template. It should be removed later once the
    setting can be disabled cleanly with an updated Puppet rsync module.

    Related-Bug: 1724559
    Change-Id: Iecf855a785bc1f787234e4e54430c929cb9cb906
    (cherry picked from commit 448e04029e5d8bc982cbc3fbd4409304488437ec)

tags: added: in-stable-queens
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.