new puppet-redis ulimit change broke ha jobs

Bug #1688464 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Michele Baldessari

Bug Description

I have seen this in my review: https://review.openstack.org/#/c/462704/

Also seen here:
http://logs.openstack.org/66/462766/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha
http://logs.openstack.org/65/462765/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/9a865a6/console.html

Seems that by fixing my wishlist bug from half a year ago ulimit got broken:
https://github.com/arioch/puppet-redis/issues/130

Here are the logs:

2017-05-05 06:16:56.613018 | Warning: Could not find resource 'Service[redis]' in parameter 'notify'
2017-05-05 06:16:56.613057 | (at /etc/puppet/modules/redis/manifests/ulimit.pp:51)
2017-05-05 06:16:56.613124 | Error: Could not find dependent Service[redis] for Augeas[Systemd redis ulimit] at /etc/puppet/modules/redis/manifests/ulimit.pp:44
2017-05-05 06:16:56.613153 | (truncated, view all with --long)
2017-05-05 06:16:56.613190 | overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0:
2017-05-05 06:16:56.613221 | resource_type: OS::Heat::StructuredDeployment
2017-05-05 06:16:56.613258 | physical_resource_id: ba2836e9-4f1c-4379-8231-4b7a01fab242
2017-05-05 06:16:56.613281 | status: CREATE_FAILED
2017-05-05 06:16:56.613303 | status_reason: |
2017-05-05 06:16:56.613391 | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2017-05-05 06:16:56.613437 | deploy_stdout: |
2017-05-05 06:16:56.613523 | Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend
2017-05-05 06:16:56.613628 | Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.
2017-05-05 06:16:56.613703 | Notice: Compiled catalog for controller-0-tripleo-ci-a-foo.localdomain in environment production in 7.01 seconds
2017-05-05 06:16:56.613754 | deploy_stderr: |
2017-05-05 06:16:56.613784 | ...

Tags: ci
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

There was a similar bug with nova services wanted to be notified within containers, which is not the case. Jistr knows more details. I suppose this bug falls into the same category. The solution is to not notify Service[redis] for Augeas[Systemd redis ulimit] when running puppet tags for containers' configs.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Oh, it seems not related to the container CI jobs, sorry, my bad!

Revision history for this message
Michele Baldessari (michele) wrote :

The problem seems to be the current upstream code is broken with service_manage set to false (as it is in our redis pacemaker profile). I opened https://github.com/arioch/puppet-redis/issues/197 upstream for this.

Revision history for this message
Michele Baldessari (michele) wrote :

So as soon as the puppet-redis issue is fixed, we likely need to do the following:
1) Remove the file limit the redis pcmk profile:
diff --git a/manifests/profile/pacemaker/database/redis.pp b/manifests/profile/pacemaker/database/redis.pp
index 3ef6815..4ede81b 100644
--- a/manifests/profile/pacemaker/database/redis.pp
+++ b/manifests/profile/pacemaker/database/redis.pp
@@ -60,12 +60,12 @@ class tripleo::profile::pacemaker::database::redis (
     # we best explicitely set the file limit only in the pacemaker profile
     # (the base profile does not need it as it is using systemd which has
     # the limits set there)
- file { '/etc/security/limits.d/redis.conf':
- content => inline_template("redis soft nofile <%= @redis_file_limit %>\nredis hard nofile <%= @redis_file_limit %>\n"),
- owner => '0',
- group => '0',
- mode => '0644',
- }

As it would create a duplicate resource

2) Set the newly introduced hiera key in tht:
diff --git a/puppet/services/pacemaker/database/redis.yaml b/puppet/services/pacemaker/database/redis.yaml
index e702d28..8229fc2 100644
--- a/puppet/services/pacemaker/database/redis.yaml
+++ b/puppet/services/pacemaker/database/redis.yaml
@@ -37,5 +37,6 @@ outputs:
           - get_attr: [RedisBase, role_data, config_settings]
           - redis::service_manage: false
             redis::notify_service: false
+ redis::managed_by_cluster_manager: true
       step_config: |
         include ::tripleo::profile::pacemaker::database::redis

summary: - new puppet-redis ulimit change broke ci
+ new puppet-redis ulimit change broke ha jobs
Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
Revision history for this message
Michele Baldessari (michele) wrote :

Incoming fix for puppet-redis is here https://github.com/arioch/puppet-redis/pull/198

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/462906

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/462908

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/462906
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=926ec0151bf0bee2854cbe1d25f278ee6362eef6
Submitter: Jenkins
Branch: master

commit 926ec0151bf0bee2854cbe1d25f278ee6362eef6
Author: Michele Baldessari <email address hidden>
Date: Fri May 5 12:29:39 2017 +0200

    Remove limits for redis in /etc/security/limits.d

    Now that puppet-redis supports ulimit for cluster managed redis (via
    https://github.com/arioch/puppet-redis/pull/192), we need to remove the
    file snippet as otherwise we will get a duplicate resource error.

    We will need to create a THT change that at the very least sets the
    redis::managed_by_cluster_manager key to true so that
    /etc/security/limits.d/redis.conf gets created.
    We also add code to not break backwards compatibility with the old hiera
    key.

    Change-Id: I4ffccfe3e3ba862d445476c14c8f2cb267fa108d
    Partial-Bug: #1688464

Changed in tripleo:
status: In Progress → Fix Released
tags: removed: alert
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/462908
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=dde4f6d1cf6e4c2968c910fb0f1139ef391c3b6d
Submitter: Jenkins
Branch: master

commit dde4f6d1cf6e4c2968c910fb0f1139ef391c3b6d
Author: Michele Baldessari <email address hidden>
Date: Fri May 5 12:37:01 2017 +0200

    Set puppet-redis managed_by_cluster_manager to true

    Via https://github.com/arioch/puppet-redis/pull/192 puppet-redis grew
    ulimit support also for pacemaker managed redis instances. To be able to
    use that we need to set redis::managed_by_cluster_manager to true.

    We also allow redis::ulimit to be configurable and we set a default of
    10420 which was the default value before the above change.

    Change-Id: I06129870665d7d3bfa09057fd9f0a33a99f98397
    Depends-On: I4ffccfe3e3ba862d445476c14c8f2cb267fa108d
    Closes-Bug: #1688464

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.