Memcached failed to start with "/var/log/memcached.log: Permission denied" reboot.

Bug #1776889 reported by Sofer Athlan-Guyot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Bogdan Dobrelya

Bug Description

Hi,

originally found there https://bugzilla.redhat.com/show_bug.cgi?id=1590297.

On a 3 nodes setup, after a update, restart, scale out and re-deploy ,
some[1] of the memcached container fail to restart with that error:

    [root@controller-1 ~]# docker logs memcached
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied
    /bin/bash: /var/log/memcached.log: Permission denied

The container is in that status:

    fa8fc2bf967d 192.168.24.1:8787/rhosp13/openstack-memcached:2018-06-12.1 "/bin/bash -c 'sou..." 14 hours ago Restarting (1) 4 hours ago memcached

each line correspond to a restart.

On the other two controllers the restart was successful.

So hold your belt tight as it's going to be hairy....

The log directory /var/log/containers/memcached is mounted directly in
/var/log where we expect memcached to write its log by adding
">>/var/log/memcached.log 2>&1". In the memcached_init_logs step of
the docker_config step1 config we init the log like this[2]:

   command: ['/bin/bash', '-c', 'source /etc/sysconfig/memcached; touch /var/log/memcached.log && chown ${USER} /var/log/memcached.log']

so far so good ...

But if you look at any memcached directory where the container were
successfully started, then you have[3]:

  - /var/log/memcached.log.1
  - /var/log/memcached.log.2

with log.1 being opened by the memcached process:

    ()[memcached@controller-0 /]$ lsof /var/log/memcached.log.1
    lsof: WARNING: can't stat() xfs file system /etc/sysconfig/memcached (deleted)
          Output information may be incomplete.
    COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
    memcached 7 memcached 1w REG 252,2 41994775 97206843 /var/log/memcached.log.1
    memcached 7 memcached 2w REG 252,2 41994775 97206843 /var/log/memcached.log.1

There is this error about /etc/sysconfig/memcached being deleted, I'll
come back to it later. For now I just don't understand how those
files are created. "log.1" and "log.2" ... it's even more puzzling
as I can't create any file in the /var/log/ directory with permission
denied on a working memcached container.

    [root@controller-0 ~]# docker exec -ti memcached bash
    ()[memcached@controller-0 /]$ ls -lrth /var/log/*
    -rw-r--r--. 1 memcached root 15M Jun 13 19:31 /var/log/memcached.log.2
    -rw-r--r--. 1 memcached root 42M Jun 14 10:05 /var/log/memcached.log.1
    ()[memcached@controller-0 /]$ ls -lrthd /var/log/
    drwxr-xr-x. 2 root root 52 Jun 13 22:01 /var/log/
    ()[memcached@controller-0 /]$ touch /var/log/memcached.log
    touch: cannot touch '/var/log/memcached.log': Permission denied
    ()[memcached@controller-0 /]$ id
    uid=42457(memcached) gid=42457(memcached) groups=42457(memcached),42400(kolla)

and the "log.1" does belong to memcached process:

    ()[memcached@controller-0 /]$ lsof /var/log/memcached.log.2
    lsof: WARNING: can't stat() xfs file system /etc/sysconfig/memcached (deleted)
          Output information may be incomplete.
    ()[memcached@controller-0 /]$ lsof /var/log/memcached.log.1
    lsof: WARNING: can't stat() xfs file system /etc/sysconfig/memcached (deleted)
          Output information may be incomplete.
    COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
    memcached 7 memcached 1w REG 252,2 43203747 97206843 /var/log/memcached.log.1
    memcached 7 memcached 2w REG 252,2 43203747 97206843 /var/log/memcached.log.1

On the host:

    [root@controller-0 ~]# ls -lrth /var/log/containers/memcached/
    total 68M
    -rw-r--r--. 1 42457 root 15M Jun 13 19:31 memcached.log.2
    -rw-r--r--. 1 42457 root 42M Jun 14 10:19 memcached.log.1
    [root@controller-0 ~]# ls -lrthd /var/log/containers/memcached/
    drwxr-xr-x. 2 root root 52 Jun 13 22:01 /var/log/containers/memcached/

Which makes no sense.

So there are two unknown here:
 1. who creates log.1 and rotate log.2 ?
 2. how come the process can write in a directory that it doesn't have
    write access to.

I'm at loss for question 1. For question 2 I guess that the mounting
sequence is flawed somehow, as the /etc/sysconfig/memcached error in
the lsof output seems to indicate. Here more about it, the file
system /etc/sysconfig/memcached is "deleted" but I can cat in it:

    # docker exec -ti memcached bash
    ()[memcached@controller-0 /]$ lsof /etc/sysconfig/memcached
    lsof: WARNING: can't stat() xfs file system /etc/sysconfig/memcached (deleted)
          Output information may be incomplete.
    ()[memcached@controller-0 /]$ cat /etc/sysconfig/memcached
    PORT="11211"
    USER="memcached"
    MAXCONN="8192"
    CACHESIZE="16005"
    OPTIONS="-vv -l 172.17.1.15 -U 0 -t 8 >> /var/log/memcached.log 2>&1"
    ()[memcached@controller-0 /]$

This tends to confirm that something is wrong in the mounting
sequence.

All in all I think we should change the way the container are created to make them
more robust after any reboot operation.

If somebody has hint at what is happening there that would be awesome ... and thanks
to read till the end.

[1] first iteration of the test found 2 memcached failing to restart,
on another run only 1 fail to restart and on another (whithout the
scale out and the reconfig) all three restarted successfully. I can
find the "log.1" in a deployment as well.

[2] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/memcached.yaml#L72..L84
[3] This seems to happen only on rhel platform as in the CI jobs memcached.log seems to be created fine.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/575409

Changed in tripleo:
importance: High → Critical
Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: [Bug 1776889] [NEW] Memcached failed to start with "/var/log/memcached.log: Permission denied" reboot.

> with log.1 being opened by the memcached process:

This part is being addressed in https://review.openstack.org/#/c/574835

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

> 1. who creates log.1 and rotate log.2 ?

There is a containerized logrotate service [0] takes care of those
rotate files for all containers managed by tripleo. It is configured via
puppet [1].

[0]
http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/logrotate-crond.yaml
[1]
http://git.openstack.org/cgit/openstack/puppet-tripleo/tree/manifests/profile/base/logging/logrotate.pp

tags: added: containers
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

The real issue comes from https://github.com/openstack/puppet-tripleo/blob/master/templates/logrotate/containers_logrotate.conf.erb#L1

We start memcached simply adding >> /var/log/memcached.log 2>&1. Thus I saw memcached process which had memcached.log.1 opened. Furthermore, after 14 days, logrotate will delete memcached.log.14 so file will be deleted but opened by memcached process. That creates a real issue as after some time memcached will utilize whole space and it will be hard to find the process. As a normal fix logrotate should restart memcached after rotation (As we cannot send signal because of >>)

As a solution I would add
    prerotate
      bash -c "[[ ! $1 =~ memcached ]]"
    endscript
to exclude memcached files from rotation.

Revision history for this message
Alex Schultz (alex-schultz) wrote :

We should honestly just not log memcached, i have yet to see anything of actual value in the log. It either works or doesn't. It logs to stdout which will get captured in the docker logs if we really really need something

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/575676

Changed in tripleo:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Athlan-Guyot sofer (<email address hidden>) on branch: master
Review: https://review.openstack.org/575409
Reason: For https://review.openstack.org/#/c/575676/2

Changed in tripleo:
assignee: Bogdan Dobrelya (bogdando) → Sofer Athlan-Guyot (sofer-athlan-guyot)
Changed in tripleo:
assignee: Sofer Athlan-Guyot (sofer-athlan-guyot) → Emilien Macchi (emilienm)
Changed in tripleo:
assignee: Emilien Macchi (emilienm) → Bogdan Dobrelya (bogdando)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/576116

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/575676
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c9c0fcf945d976dda248b52c5a23a9cf895be777
Submitter: Zuul
Branch: master

commit c9c0fcf945d976dda248b52c5a23a9cf895be777
Author: Bogdan Dobrelya <email address hidden>
Date: Fri Jun 15 10:47:20 2018 +0300

    Log memcached to stdout/err only

    Memcached logging configuration is not friendly.
    We cannot use configuration options as they seemed to be
    ignored because of a bug. We cannot use redirection >>
    as this breaks logrotate signalling SIGHUP to processes
    and expecting them to handle the open file descriptors
    associated with truncated/removed logfiles - this does
    not work with redirected output into a file.

    The only option is not log memcached into a file. There is
    not much of actual value in the log. Keep it logging to
    stdout/stderr in its container, which will get captured in
    the docker logs (and journald via the logging backend).

    Co-authored-by: Sergii Golovatiuk <email address hidden>
    Co-authored-by: Alex Schultz <email address hidden>
    Closes-bug: #1776889

    Change-Id: I3b5aec6b7f7c5fa4f84e3c3eb243c155a6189e2c
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
no longer affects: tripleo/queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/576116
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=806316c7cfa4ad610613cd8cf66b225ebc79fb49
Submitter: Zuul
Branch: stable/queens

commit 806316c7cfa4ad610613cd8cf66b225ebc79fb49
Author: Bogdan Dobrelya <email address hidden>
Date: Fri Jun 15 10:47:20 2018 +0300

    Log memcached to stdout/err only

    Memcached logging configuration is not friendly.
    We cannot use configuration options as they seemed to be
    ignored because of a bug. We cannot use redirection >>
    as this breaks logrotate signalling SIGHUP to processes
    and expecting them to handle the open file descriptors
    associated with truncated/removed logfiles - this does
    not work with redirected output into a file.

    The only option is not log memcached into a file. There is
    not much of actual value in the log. Keep it logging to
    stdout/stderr in its container, which will get captured in
    the docker logs (and journald via the logging backend).

    Co-authored-by: Sergii Golovatiuk <email address hidden>
    Co-authored-by: Alex Schultz <email address hidden>
    Closes-bug: #1776889

    (cherry picked from commit f43deec168f40835a8824ae88c6b3469af9f61d7)

    Change-Id: I3b5aec6b7f7c5fa4f84e3c3eb243c155a6189e2c
    Signed-off-by: Bogdan Dobrelya <email address hidden>

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.4

This issue was fixed in the openstack/tripleo-heat-templates 8.0.4 release.

Revision history for this message
Michele Baldessari (michele) wrote :

I don't think this bug addressed the logging to stdout/stderr fully because puppet-memcached has a default for the logfile parameter and you can't unset it via puppet/hiera (see https://tickets.puppetlabs.com/browse/PUP-5295).

I opened https://bugs.launchpad.net/tripleo/+bug/1782223 for this purpose

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.0.0.0b4

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b4 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.