ceph deployments failing on selinux error

Bug #1657562 reported by Ben Nemec
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

See failure in http://logs.openstack.org/98/416298/4/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/f09dc95/logs/postci.txt.gz

I'm not going to copy-paste it here because it's quite long. The specific error is:

Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[fcontext_/srv/data]/returns: ValueError: Type ceph_var_lib_t is invalid, must be a file or device type

Here's a diff of the package changes between a working and the first failing job: https://www.diffchecker.com/npr5vXS7

There are quite a few. It looks like there must have been a pretty significant update in OS packages. The selinux packages in particular look suspicious. Emilien pointed out https://github.com/SELinuxProject/selinux/commit/5b0ad2f00ec730b86eb871d30cb98661dc7a6554

Which is double suspicious given:

Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[fcontext_/srv/data]/returns: ValueError: Type ceph_var_lib_t is invalid, must be a file or device type

Tags: alert ci
Revision history for this message
Emilien Macchi (emilienm) wrote :

Some package diff since it worked fine:
https://www.diffchecker.com/npr5vXS7

It could be policycoreutils or libsemanage?

Revision history for this message
Giulio Fidente (gfidente) wrote :

Tried upgrading locally

 libsemanage = 2.5-4.el7 for package: libsemanage-python-2.5-4.el7.x86_64
 policycoreutils = 2.5-9.el7 for package: policycoreutils-python-2.5-9.el7.x86_64

then

 semanage fcontext -a -t ceph_var_lib_t '/srv/data(/.*)?' && restorecon -R /srv/data

which worked as intended; instead I got that error message using an invalit fcontext, eg:

 semanage fcontext -a -t ceph_var_lib_s '/srv/data(/.*)?' && restorecon -R /srv/data
 ValueError: Type ceph_var_lib_s is invalid, must be a file or device type

still unsure why ceph_var_lib_t should not be available at that stage though

Revision history for this message
Giulio Fidente (gfidente) wrote :

I forgot to mention the versions I brought those to:

 libsemanage.x86_64 0:2.5-5.1.el7_3 will be an update
 policycoreutils.x86_64 0:2.5-11.el7_3 will be an update

Revision history for this message
Giulio Fidente (gfidente) wrote :

Upgrading selinux-policy-targeted didn't reproduce the issue either.

---> Package selinux-policy.noarch 0:3.13.1-102.el7_3.7 will be updated
---> Package selinux-policy.noarch 0:3.13.1-102.el7_3.13 will be an update

Revision history for this message
Eric Harney (eharney) wrote :

Bug 1633190 appears to have more info around this same error.

Revision history for this message
Giulio Fidente (gfidente) wrote :

Thanks Eric, looking at https://bugs.launchpad.net/tripleo/+bug/1633190/comments/2 indeed if:

 /usr/sbin/semodule -i /usr/share/selinux/packages/ceph.pp

is not happening then the module is not loaded and ceph_var_lib_t is not available; I see that as a postinstall step of ceph-selinux though

Revision history for this message
Ben Nemec (bnemec) wrote :

It seems like this is only happening to jobs that don't rebuild the images. Maybe something in our image update code isn't playing nice with one of the new packages?

http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/scripts/common_functions.sh#n76

Revision history for this message
Giulio Fidente (gfidente) wrote :

From the test submission at https://review.openstack.org/#/c/422226 it seems that the ceph module is not loaded into the selinux policy indeed; logs at http://logs.openstack.org/26/422226/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/a94a318/logs/overcloud-cephstorage-0/var/log/host_info.txt.gz

+ semanage -l
+ grep -i ceph

is not returning the ceph module version ; it looks like for some reason we do not install the selinux module when building the base image

summary: - Ceph deployments failing on selinux error
+ some ceph jobs failing on selinux error
summary: - some ceph jobs failing on selinux error
+ ceph deployments failing on selinux error
Revision history for this message
Giulio Fidente (gfidente) wrote :

ovb-nonha failed only 2 times out of the last 15 runs and in both occasions the failure was not related to this bug, but legit, dropping critical

Changed in tripleo:
importance: Critical → High
Revision history for this message
Ben Nemec (bnemec) wrote :

The promotion seems to have made this go away.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.