Failed to add a simple environment file when updating the stack

Bug #1555676 reported by Dmitry Tantsur on 2016-03-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Undecided
Unassigned

Bug Description

During the RDO test days for M3, I was testing the HA configuration with the following command:

 openstack overcloud deploy --templates --control-flavor control --control-scale 3 --compute-flavor compute --compute-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server pool.ntp.org

It went well, but then people told me that I have no chances of running the pingtest in my limited environment (8 GiB uc, 4x 4 GiB oc) without having https://github.com/redhat-openstack/tripleo-quickstart/blob/master/playbooks/roles/tripleo/overcloud/templates/overcloud-deploy.sh.j2#L18-L40

So I've created a file with the following content:

 parameter_defaults:
  # HeatWorkers doesn't modify num_engine_workers, so handle
  # via heat::config
  controllerExtraConfig:
    heat::config::heat_config:
      DEFAULT/num_engine_workers:
        value: 1
    heat::api_cloudwatch::enabled: false
    heat::api_cfn::enabled: false
  HeatWorkers: 1
  CeilometerWorkers: 1
  CinderWorkers: 1
  GlanceWorkers: 1
  KeystoneWorkers: 1
  NeutronWorkers: 1
  NovaWorkers: 1
  SwiftWorkers: 1

and tried the update:

 openstack overcloud deploy --templates --control-flavor control --control-scale 3 --compute-flavor compute --compute-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /home/stack/min.yaml --ntp-server pool.ntp.org

It failed for me with one failed resource:

 $ heat resource-show overcloud ControllerNodesPostDeployment
....
                                                   |
| logical_resource_id | ControllerNodesPostDeployment
                                                   |
| physical_resource_id | b1dfc129-91f4-4bce-86d8-fe79aa1c08a4
                                                   |
| required_by | BlockStorageNodesPostDeployment
                                                   |
| | CephStorageNodesPostDeployment
                                                   |
| resource_name | ControllerNodesPostDeployment
                                                   |
| resource_status | UPDATE_FAILED
                                                   |
| resource_status_reason | resources.ControllerNodesPostDeployment: resources.ControllerOvercloudServicesDeployment_Step4: Error: resources[1]: Deployment to server failed: deploy_status_cod
e : Deployment exited with non-zero status code: 4 |
| resource_type | OS::TripleO::ControllerPostDeployment
                                                   |
| updated_time | 2016-03-10T14:15:31
                                                   |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------+

I tried simplifying the yaml file:

 parameter_defaults:
  # HeatWorkers doesn't modify num_engine_workers, so handle
  # via heat::config
  controllerExtraConfig:
    heat::config::heat_config:
      DEFAULT/num_engine_workers:
        value: 1
  HeatWorkers: 1
  CeilometerWorkers: 1
  CinderWorkers: 1
  GlanceWorkers: 1
  KeystoneWorkers: 1
  NeutronWorkers: 1
  NovaWorkers: 1
  SwiftWorkers: 1

and it failed with something similar:

| resource_status_reason | resources.ControllerNodesPostDeployment: resources.ControllerOvercloudServicesDeployment_Step4: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: -9 |

$ heat resource-list -n5 overcloud | grep FAILED
| ControllerNodesPostDeployment | b1dfc129-91f4-4bce-86d8-fe79aa1c08a4 | OS::TripleO::ControllerPostDeployment | UPDATE_FAILED | 2016-03-10T14:30:22 | overcloud |
| 1 | 3db70738-20c4-47b1-9c96-cad55212c055 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2016-03-10T14:32:17 | overcloud-ControllerNodesPostDeployment-dnm46aui4tyi-ControllerOvercloudServicesDeployment_Step4-ya7c5avksgdy |
| ControllerOvercloudServicesDeployment_Step4 | e550647e-bef5-40c6-bcf4-23688bcc6e9c | OS::Heat::StructuredDeployments | UPDATE_FAILED | 2016-03-10T14:32:17 | overcloud-ControllerNodesPostDeployment-dnm46aui4tyi |
| 0 | b6c3ab7c-90a3-4f79-ad31-126f7c73495d | OS::Heat::StructuredDeployment | CREATE_FAILED | 2016-03-10T14:32:18 | overcloud-ControllerNodesPostDeployment-dnm46aui4tyi-ControllerOvercloudServicesDeployment_Step4-ya7c5avksgdy |
| 2 | 2e80ea69-30ff-4c7a-bbff-34fc777ef28c | OS::Heat::StructuredDeployment | CREATE_FAILED | 2016-03-10T14:32:19 | overcloud-ControllerNodesPostDeployment-dnm46aui4tyi-ControllerOvercloudServicesDeployment_Step4-ya7c5avksgdy |

Dmitry Tantsur (divius) wrote :
Steven Hardy (shardy) wrote :

Marking incomplete as I believe this is a node running out of memory, see the paste:

"FileTypeCrontab could not read root: Cannot allocate memory "

The issue is we've added a number of new features to the overcloud, and until we have composable services they are all enabled by default - this means our RAM requirements have crept up so I'd suggest allowing 8G for the controller nodes.

With KSM (enabled by default on centos) you can overcommit with VMs so you can e.g have 4 or 5 8G VMs on a 32G host and things should work OK.

Changed in tripleo:
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers