docker-puppet.py errors with overlay2

Bug #1693398 reported by Steve Baker on 2017-05-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Medium
Unassigned

Bug Description

I think this only reason this isn't affecting CI is that overlay2 isn't really enabled yet due to bug #1692502, so this will become urgent soon.

With the recent switch to overlay2 I'm seeing issues where copying files out to /var/lib/config-data fails because the source glob is finding files which have been deleted, for example in docker-puppet-mysql:

+ cp -a
...
/etc/mtab /etc/my.cnf /etc/my.cnf.d
...

cp: cannot stat '/etc/my.cnf.d/auth_gssapi.cnf': No such file or directory
cp: cannot stat '/etc/my.cnf.d/mariadb-server.cnf': No such file or directory

In this case these files were deleted during the image build, but other errors are for files deleted by the puppet run, for example docker-puppet-keystone:

cp: cannot stat '/etc/httpd/conf.d/README': No such file or directory
cp: cannot stat '/etc/httpd/conf.d/autoindex.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.d/userdir.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.d/welcome.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/00-base.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/00-dav.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/00-lua.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/00-mpm.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/00-proxy.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/00-ssl.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/00-systemd.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/01-cgi.conf': No such file or directory
cp: cannot stat '/etc/httpd/conf.modules.d/10-wsgi.conf': No such file or directory

We need a bug against RHEL docker overlay2 with a simple reproducer, in the meantime we could probably fix docker-puppet.py in the short term by doing a set +e before attempting copies.

Fix proposed to branch: master
Review: https://review.openstack.org/467825

Changed in tripleo:
assignee: nobody → Steve Baker (steve-stevebaker)
status: New → In Progress
Steve Baker (steve-stevebaker) wrote :

Here is a one line reproducer of the problem with overlay2

  docker run --user root -ti --rm centos:7 /bin/bash -c "cd /root ; rm -f /etc/yum.conf ; cp -a /etc/* ~/ ; echo $?"

cp: cannot stat '/etc/yum.conf': No such file or directory
1

Steve Baker (steve-stevebaker) wrote :

Here is another oneliner, deleted files show up in 'ls' but are not there

docker run --user root -ti --rm centos:7 /bin/bash -c "rm -f /etc/init.d/README ; ls /etc/init.d ; cat /etc/init.d/README"
README
cat: /etc/init.d/README: No such file or directory

Steve Baker (steve-stevebaker) wrote :

This doesn't occur on newer centos images (CentOS-7-x86_64-GenericCloud-1704) which have the xfs filesystem created with ftype=1

Changed in tripleo:
status: In Progress → Invalid

Change abandoned by Steve Baker (<email address hidden>) on branch: master
Review: https://review.openstack.org/467825
Reason: This doesn't occur on newer centos images (CentOS-7-x86_64-GenericCloud-1704) which have the xfs filesystem created with ftype=1

Steve Baker (steve-stevebaker) wrote :

I'm going to keep this open and would request the priority be Low because this issue may hit upgrades from older releases (but not fresh installs).

Also, the problem may be fixed by the RHEL 7.4 kernel, even with ftype=0 XFS filesystems, so this bug is to retest when the RHEL 7.4 kernel is available.

Changed in tripleo:
status: Invalid → Confirmed
Changed in tripleo:
status: Confirmed → Triaged
importance: Undecided → Low
Changed in tripleo:
milestone: none → pike-3
Steve Baker (steve-stevebaker) wrote :

If you're seeing issues with rsync or rm failures in docker-puppet.py, on the host run the following command:

  xfs_info / |grep ftype

if ftype=0 then your host OS image is too old. For centos CentOS-7-x86_64-GenericCloud-1704 works fine for me

Emilien Macchi (emilienm) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in tripleo:
assignee: Steve Baker (steve-stevebaker) → nobody
Changed in tripleo:
milestone: pike-3 → pike-rc1
Changed in tripleo:
milestone: pike-rc1 → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Bogdan Dobrelya (bogdando) wrote :

The FF upgrade impact for the systems having the older xfs volumes makes the severity medium, at the least.

Changed in tripleo:
importance: Low → Medium
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
David Peacock (davidjpeacock) wrote :

Cannot be fixed due to filesystem formatting being a factor.

Changed in tripleo:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.