Standalone deployment with Octavia: Octavia containers fail to start - permission issue?

Bug #1812274 reported by Bernard Cafarelli
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Brent Eagles

Bug Description

This is a follow-up to bug #1806113

With https://review.openstack.org/#/c/628957 applied on current master, deployment passes with Octavia enabled (adding -e /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml )
Basic operations (image/network/server) work fine

However the Octavia containers show up as "Restarting (2) 11 minutes ago" and API calls do not work:
# openstack loadbalancer list
Unable to establish connection to http://192.168.122.237:9876/v2.0/lbaas/loadbalancers: HTTPConnectionPool(host='192.168.122.237', port=9876): Max retries exceeded with url: /v2.0/lbaas/loadbalancers (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbad27e31d0>: Failed to establish a new connection: [Errno 111] Connexion refus\xc3\xa9e',))
# docker ps|grep octa
21e1272c6c8e docker.io/tripleomaster/centos-binary-octavia-health-manager:b32ba15b0e053d6660c35713dec47ff941c11c2e_01511b06 "dumb-init --singl..." 2 days ago Restarting (2) 6 hours ago octavia_health_manager
881f081b6243 docker.io/tripleomaster/centos-binary-octavia-api:b32ba15b0e053d6660c35713dec47ff941c11c2e_01511b06 "dumb-init --singl..." 2 days ago Restarting (2) 6 hours ago octavia_api
58f19b672475 docker.io/tripleomaster/centos-binary-octavia-housekeeping:b32ba15b0e053d6660c35713dec47ff941c11c2e_01511b06 "dumb-init --singl..." 2 days ago Restarting (2) 6 hours ago octavia_housekeeping
a82ca5a66ab8 docker.io/tripleomaster/centos-binary-octavia-worker:b32ba15b0e053d6660c35713dec47ff941c11c2e_01511b06 "dumb-init --singl..." 2 days ago Restarting (2) 6 hours ago octavia_worker

All 4 affected containers show similar output with "docker logs":
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Deleting /etc/octavia/octavia.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/octavia/octavia.conf to /etc/octavia/octavia.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/octavia/conf.d/octavia-worker/worker-post-deploy.conf to /etc/octavia/conf.d/octavia-worker/worker-post-deploy.conf
ERROR:__main__:Unexpected error:
Traceback (most recent call last):
  File "/usr/local/bin/kolla_set_configs", line 411, in main
    execute_config_strategy(config)
  File "/usr/local/bin/kolla_set_configs", line 377, in execute_config_strategy
    copy_config(config)
  File "/usr/local/bin/kolla_set_configs", line 306, in copy_config
    config_file.copy()
  File "/usr/local/bin/kolla_set_configs", line 150, in copy
    self._merge_directories(source, dest)
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 99, in _merge_directories
    self._copy_file(source, dest)
  File "/usr/local/bin/kolla_set_configs", line 82, in _copy_file
    shutil.copy(source, dest)
  File "/usr/lib64/python2.7/shutil.py", line 119, in copy
    copyfile(src, dst)
  File "/usr/lib64/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 13] Permission denied: u'/var/lib/kolla/config_files/src/etc/octavia/conf.d/octavia-worker/worker-post-deploy.conf'

As said, other containers are healthy so something specific to Octavia deployment probably triggers this permission issue?

Brent Eagles (beagles)
Changed in tripleo:
status: New → Triaged
milestone: none → stein-3
importance: Undecided → High
Revision history for this message
Ian Main (imain) wrote :

I ran into this myself with the standalone. I started the container in interactive mode with paunch debug and found it was selinux denying access to the file. setenforce 0 and the container works fine.

Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

Good point, I forgot to check selinux. In permissive, I have containers working fine, and I could create a loadbalancer/listener/pool/member:
# openstack loadbalancer list
+--------------------------------------+------+----------------------------------+-----------------+---------------------+----------+
| id | name | project_id | vip_address | provisioning_status | provider |
+--------------------------------------+------+----------------------------------+-----------------+---------------------+----------+
| 081e91eb-1d8f-42e9-b02a-d6ad05dbc811 | lb1 | a0b8aa2e3bf74750b11a5ee739b5f3e4 | 192.168.100.224 | ACTIVE | amphora |
+--------------------------------------+------+----------------------------------+-----------------+---------------------+----------+

Brent Eagles (beagles)
Changed in tripleo:
assignee: nobody → Brent Eagles (beagles)
Revision history for this message
Brent Eagles (beagles) wrote :

FTR: ausearch output

time->Thu Jan 31 17:20:46 2019
type=PROCTITLE msg=audit(1548955246.784:5558): proctitle=707974686F6E002F7573722F6C6F63616C2F62696E2F6B6F6C6C615F7365745F636F6E66696773
type=SYSCALL msg=audit(1548955246.784:5558): arch=c000003e syscall=2 success=no exit=-13 a0=1222700 a1=0 a2=1b6 a3=24 items=0 ppid=242497 pid=242498 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="python" exe="/usr/bin/python2.7" subj=system_u:system_r:container_t:s0:c157,c287 key=(null)
type=AVC msg=audit(1548955246.784:5558): avc: denied { read } for pid=242498 comm="python" name="worker-post-deploy.conf" dev="vda2" ino=67939729 scontext=system_u:system_r:container_t:s0:c157,c287 tcontext=system_u:object_r:var_lib_t:s0 tclass=file permissive=0

Brent Eagles (beagles)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

Heya,

The host location /var/lib/kolla/config_files is created in common/deploy-steps-tasks.yaml[1], with the correct setype and flags.

It would be interesting to get some "ls -lZd" of the parent directory of the file, up to the /var/lib/kolla/config_files, just to ensure all is set accordingly.

Also, it would be good to know the origin of that file, maybe there's something with puppet generation (if any)?

Cheers,

C.

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/common/deploy-steps-tasks.yaml#L281-L289

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

After a more careful read of the kolla log, I suspect the issue is located in this subtree:
conf.d/octavia-worker/worker-post-deploy.conf

The base (/var/lib/kolla/config_files/src/etc/octavia/) is probably OK, since the installer could copy a file from that location (/var/lib/kolla/config_files/src/etc/octavia/octavia.conf to /etc/octavia/octavia.conf) just before the crash.

So, questions:
- how is the "conf.d/..." tree created?
- what are the SELinux labels for that tree (each directory/file under it, and conf.d as well)?

I would suspect some external process does the tree creation and doesn't do the right thing regarding the labelling. IIRC, subdirectories and files inherits parent labels, so maybe something is doing funky things in there?

Cheers,

C.

Revision history for this message
Brent Eagles (beagles) wrote :

the output of ls -lZR in /var/lib/config-data/puppet-generated/octavia

drwxr-xr-x. root root system_u:object_r:container_file_t:s0 etc

./etc:
drwxr-xr-x. root root system_u:object_r:container_file_t:s0 my.cnf.d
drwxr-xr-x. root root system_u:object_r:container_file_t:s0 octavia

./etc/my.cnf.d:
-rw-r--r--. root root system_u:object_r:container_file_t:s0 tripleo.cnf

./etc/octavia:
drwxr-xr-x. root root unconfined_u:object_r:container_file_t:s0 certs
drwxr-xr-x. root root system_u:object_r:container_file_t:s0 conf.d
-rw-r-----. root 42437 system_u:object_r:container_file_t:s0 octavia.conf

./etc/octavia/certs:
-rw-r--r--. root root system_u:object_r:var_lib_t:s0 ca_01.pem
-rw-r--r--. root root system_u:object_r:var_lib_t:s0 client.pem
drwxr-xr-x. root root unconfined_u:object_r:container_file_t:s0 private

./etc/octavia/certs/private:
-rw-r--r--. root root system_u:object_r:var_lib_t:s0 cakey.pem

./etc/octavia/conf.d:
drwxr-xr-x. root root unconfined_u:object_r:container_file_t:s0 common
drwxr-xr-x. 42437 42437 system_u:object_r:container_file_t:s0 octavia-api
drwxr-xr-x. 42437 42437 system_u:object_r:container_file_t:s0 octavia-health-manager
drwxr-xr-x. 42437 42437 system_u:object_r:container_file_t:s0 octavia-housekeeping
drwxr-xr-x. 42437 42437 system_u:object_r:container_file_t:s0 octavia-worker

./etc/octavia/conf.d/common:
-rw-r--r--. root root system_u:object_r:var_lib_t:s0 post-deploy.conf

./etc/octavia/conf.d/octavia-api:

./etc/octavia/conf.d/octavia-health-manager:
-rw-r--r--. root root system_u:object_r:var_lib_t:s0 manager-post-deploy.conf

./etc/octavia/conf.d/octavia-housekeeping:

./etc/octavia/conf.d/octavia-worker:
-rw-r--r--. root root system_u:object_r:var_lib_t:s0 worker-post-deploy.conf

Revision history for this message
Brent Eagles (beagles) wrote :

Should've noted that the config files that seem to be having the issue are all (I think) generated from ansible external_deploy_tasks

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

OK - so at least directories are OK. We can spot the following issues:
- certificates don't have the right type (ca_01.pem, client.pem and cakey.pem)
- the files located in conf.d don't inherits their parent type for some reason (post-deploy.conf, manager-post-deploy.conf and worker-post-deploy.conf).

According to this[1][2], config files are created by ansible ini_file. I'd push the setype in the different calls to that one, and check what's going on.

Next step will be the certificate - octavia won't be allowed to access them as well, for the same reason. Care to check how they are created, and maybe add the requested things?

Cheers,

C.

[1] https://github.com/openstack/tripleo-common/blob/master/playbooks/roles/octavia-controller-post-config/tasks/main.yml#L30-L38
[2] https://github.com/openstack/tripleo-common/blob/master/playbooks/roles/octavia-controller-config/tasks/octavia.yml#L5-L12
(note: there are probably other calls)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/636610

Changed in tripleo:
milestone: stein-3 → stein-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/636610
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=67a55866b257246551fde8ff774fe68dbc8de628
Submitter: Zuul
Branch: master

commit 67a55866b257246551fde8ff774fe68dbc8de628
Author: Brent Eagles <email address hidden>
Date: Thu Feb 21 20:12:44 2019 +0000

    Octavia: set selinux contexts on ansible generated configuration

    The octavia external deploy tasks creates several files and directories
    and care must be taken to ensure they have the proper selinux context.

    Change-Id: I08be6722a68ce17b7fefc0f9ca3eb8bf9c585418
    Closes-Bug: #1812274

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 10.6.1

This issue was fixed in the openstack/tripleo-common 10.6.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/701545

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/701549

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/rocky)

Reviewed: https://review.opendev.org/701545
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=b743cfaa25de5f3341de5c39fe11a77dad1af1e1
Submitter: Zuul
Branch: stable/rocky

commit b743cfaa25de5f3341de5c39fe11a77dad1af1e1
Author: Brent Eagles <email address hidden>
Date: Thu Feb 21 20:12:44 2019 +0000

    Octavia: set selinux contexts on ansible generated configuration

    The octavia external deploy tasks creates several files and directories
    and care must be taken to ensure they have the proper selinux context.

    Change-Id: I08be6722a68ce17b7fefc0f9ca3eb8bf9c585418
    Closes-Bug: #1812274
    (cherry picked from commit 67a55866b257246551fde8ff774fe68dbc8de628)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/queens)

Reviewed: https://review.opendev.org/701549
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=35feb1d6ab51112408530944779628ce07c6045c
Submitter: Zuul
Branch: stable/queens

commit 35feb1d6ab51112408530944779628ce07c6045c
Author: Brent Eagles <email address hidden>
Date: Thu Feb 21 20:12:44 2019 +0000

    Octavia: set selinux contexts on ansible generated configuration

    The octavia external deploy tasks creates several files and directories
    and care must be taken to ensure they have the proper selinux context.

    Change-Id: I08be6722a68ce17b7fefc0f9ca3eb8bf9c585418
    Closes-Bug: #1812274
    (cherry picked from commit 67a55866b257246551fde8ff774fe68dbc8de628)
    (cherry picked from commit b743cfaa25de5f3341de5c39fe11a77dad1af1e1)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common rocky-eol

This issue was fixed in the openstack/tripleo-common rocky-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common queens-eol

This issue was fixed in the openstack/tripleo-common queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.