Pike periodic promotion job multinode-1ctlr-featureset016 fail with error running docker 'gnocchi_db_sync' - rados.Rados.connect PermissionDeniedError: error connecting to the cluster

Bug #1734134 reported by Attila Darazs
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Mehdi Abaakouk

Bug Description

The full error is:

"Error running ['docker', 'run', '--name', 'gnocchi_db_sync', '--label', 'config_id=tripleo_step4', '--label', 'container_name=gnocchi_db_sync', '--label', 'managed_by=paunch', '--label', 'config_data={\"command\": \"/usr/bin/bootstrap_host_exec gnocchi_api su gnocchi -s /bin/bash -c \\'/usr/bin/gnocchi-upgrade --sacks-number=128\\'\", \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/var/lib/config-data/gnocchi/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro\", \"/var/lib/config-data/gnocchi/etc/gnocchi/:/etc/gnocchi/:ro\", \"/var/log/containers/gnocchi:/var/log/gnocchi\", \"/var/log/containers/httpd/gnocchi-api:/var/log/httpd\", \"/etc/ceph:/etc/ceph:ro\"], \"image\": \"192.168.24.1:8787/pike/centos-binary-gnocchi-api:b1f71de42ead2c1278343307307984ad1ff00c71_46bdbd6b\", \"detach\": false, \"net\": \"host\", \"privileged\": false}', '--net=host', '--privileged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/var/lib/config-data/gnocchi/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro', '--volume=/var/lib/config-data/gnocchi/etc/gnocchi/:/etc/gnocchi/:ro', '--volume=/var/log/containers/gnocchi:/var/log/gnocchi', '--volume=/var/log/containers/httpd/gnocchi-api:/var/log/httpd', '--volume=/etc/ceph:/etc/ceph:ro', '192.168.24.1:8787/pike/centos-binary-gnocchi-api:b1f71de42ead2c1278343307307984ad1ff00c71_46bdbd6b', '/usr/bin/bootstrap_host_exec', 'gnocchi_api', 'su', 'gnocchi', '-s', '/bin/bash', '-c', \"'/usr/bin/gnocchi-upgrade\", \"--sacks-number=128'\"]. [1]",

This job started failing at 2017-11-22 17:22. This is the latest error log we have though:

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-pike/a85f96f/undercloud/home/jenkins/failed_deployment_list.log.txt.gz

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-pike/a85f96f/subnode-2/var/log/containers/gnocchi/gnocchi-upgrade.log.txt.gz#_2017-11-23_11_48_31_991

2017-11-23 11:48:31,786 [12] INFO gnocchi.cli: Upgrading indexer <gnocchi.indexer.sqlalchemy.SQLAlchemyIndexer object at 0x511f110>
2017-11-23 11:48:31,956 [12] INFO gnocchi.storage.common.ceph: Ceph storage backend use 'rados' python library
2017-11-23 11:48:31,991 [12] CRITICAL root: Traceback (most recent call last):
  File "/usr/bin/gnocchi-upgrade", line 10, in <module>
    sys.exit(upgrade())
  File "/usr/lib/python2.7/site-packages/gnocchi/cli.py", line 66, in upgrade
    s = storage.get_driver(conf)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 163, in get_driver
    conf.storage, incoming, coord)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/ceph.py", line 48, in __init__
    self.rados, self.ioctx = ceph.create_rados_connection(conf)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/common/ceph.py", line 68, in create_rados_connection
    conn.connect()
  File "rados.pyx", line 785, in rados.Rados.connect (rados.c:8969)
PermissionDeniedError: error connecting to the cluster

summary: Pike periodic promotion job multinode-1ctlr-featureset016 fail with
- "Error running ['docker', 'run', '--name', 'gnocchi_db_sync', '--label',
- 'config_id=tripleo_step4'
+ error running docker 'gnocchi_db_sync' - rados.Rados.connect
+ PermissionDeniedError: error connecting to the cluster
Revision history for this message
Giulio Fidente (gfidente) wrote :
Revision history for this message
Giulio Fidente (gfidente) wrote :

It looks like the command [1] does not have privileges to read the CephX keyring.

Permissions on the keyring were changed to be more restrictive in https://review.openstack.org/#/c/508975/20/docker/services/gnocchi-api.yaml but I am not sure why the 'gnocchi' user should not have read permissions on the file.

1. /usr/bin/bootstrap_host_exec gnocchi_api su gnocchi -s /bin/bash -c '/usr/bin/gnocchi-upgrade --sacks-number=SACK_NUM'

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/522628

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/522630

Changed in tripleo:
assignee: nobody → Giulio Fidente (gfidente)
status: Triaged → In Progress
Revision history for this message
Attila Darazs (adarazs) wrote :

We haven't run these master container jobs for a while, so it's probably affecting that as well, but FYI this error is in *pike* for now, so please propose the fixes for the stable/pike branch as well.

Revision history for this message
Alex Schultz (alex-schultz) wrote :

This might be a dupe of 1733672 (or vice versa)

Changed in tripleo:
assignee: Giulio Fidente (gfidente) → Mehdi Abaakouk (sileht)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Giulio Fidente (<email address hidden>) on branch: master
Review: https://review.openstack.org/522630
Reason: Should be fixed by https://review.openstack.org/#/c/523715/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/524324

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/523715
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6c1244d7a4210d120c020f10a3937430b5b933fe
Submitter: Zuul
Branch: master

commit 6c1244d7a4210d120c020f10a3937430b5b933fe
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Nov 29 09:49:56 2017 +0100

    gnocchi: mount the correct volume for /etc/ceph

    During step4, /etc/ceph is the one of the host server instead
    of the one generated by kolla.

    This change uses the one generated by kolla and expose it to the
    container.

    Closes-bug: #1734134
    Change-Id: Ia1cca1c5d228ce0a3ef23a7c92f96a20ab958437

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/524324
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=4412cd8585657821e36596c16975d781caaca03c
Submitter: Zuul
Branch: stable/pike

commit 4412cd8585657821e36596c16975d781caaca03c
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Nov 29 09:49:56 2017 +0100

    gnocchi: mount the correct volume for /etc/ceph

    During step4, /etc/ceph is the one of the host server instead
    of the one generated by kolla.

    This change uses the one generated by kolla and expose it to the
    container.

    (cherry picked from commit 6c1244d7a4210d120c020f10a3937430b5b933fe)
    Closes-bug: #1734134
    Change-Id: Ia1cca1c5d228ce0a3ef23a7c92f96a20ab958437

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 8.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.6

This issue was fixed in the openstack/tripleo-heat-templates 7.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/522628
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f0e5f05b957378fcb9f15e4c1ab232eafff8232a
Submitter: Zuul
Branch: master

commit f0e5f05b957378fcb9f15e4c1ab232eafff8232a
Author: Giulio Fidente <email address hidden>
Date: Thu Nov 23 15:53:28 2017 +0100

    Remove Cinder UID from CephX keyrings' ACLs

    Like for the other OpenStack services, ACLs are managed by the puppet
    module.

    Depends-On: I0c1bc3d2362c6500b1a515d99f641f8c1468754a
    Change-Id: I48f0711622bfc55054de46b9fb4fd765b7e4df74
    Related-Bug: #1734134

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/527972

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/527972
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d9997b43d1aa0586ec0f3a0474df8476ac341cf5
Submitter: Zuul
Branch: stable/pike

commit d9997b43d1aa0586ec0f3a0474df8476ac341cf5
Author: Giulio Fidente <email address hidden>
Date: Thu Nov 23 15:53:28 2017 +0100

    Remove Cinder UID from CephX keyrings' ACLs

    Like for the other OpenStack services, ACLs are managed by the puppet
    module.

    Depends-On: I0c1bc3d2362c6500b1a515d99f641f8c1468754a
    Change-Id: I48f0711622bfc55054de46b9fb4fd765b7e4df74
    Related-Bug: #1734134
    (cherry picked from commit f0e5f05b957378fcb9f15e4c1ab232eafff8232a)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.