stack-update with SoftwareDeployment 400 response from signal

Bug #1328342 reported by Steve McLellan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Zane Bitter

Bug Description

A simple stack (example below of a single server) with a single SoftwareConfig resource and single SoftwareDeployment deploys correctly if the template is handed to heat as part of the stack-create call. This appears to be because the resources being added during an update don't make it into the raw_template database table and aren't available to the engine to validate the waitcondition signal.

Log messages below are from a recent single node devstack:

If the SoftwareConfig and SoftwareDeployment are added after the instance is created, as a stack-update, the SoftwareDeployment functions correctly but in /var/lib/upstart/os-collect-config.log I see:

[2014-06-10 02:09:49,050] (heat-config) [DEBUG] Signalling to http://10.24.0.4:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3A15a41777be7244daacd242a02b862814%3Astacks%2Ftest1%2Fb2585565-e5e4-418b-a83a-8f75a7a642e5%2Fresources%2Ftest-demo-test_heat_sw_deployment?Timestamp=2014-06-10T02%3A09%3A30Z&SignatureMethod=HmacSHA256&AWSAccessKeyId=63902346e9fb4060b2a4174a5c016566&SignatureVersion=2&Signature=i0OqZrVExlfiKcUIqL7qizzR3zn6boavAcmWsg474RE%3D with {"deploy_stdout": "Testing me some heat software config\n", "deploy_stderr": "", "deploy_status_code": 0}

[2014-06-10 02:09:49,769] (heat-config) [DEBUG] Response <Response [400]>

The 400 Bad Request can be seen in the heat logs as "The Resource (test-demo-test_heat_sw_deployment) could not be found in Stack test1". I haven't confirmed this but it seems likely it's being raised in engine/service.py:resource_signal.

template-show does not show the SoftwareConfig or SoftwareDeployment (raw_template doesn't contain the resources either). This may be related to https://bugs.launchpad.net/heat/+bug/1291905 which also seems to revolve around dependencies, but since I'm not manually creating the WaitCondition in this case, the workaround doesn't work.

Template (the networking components aren't relevant, but make things easier to test):

heat_template_version: '2013-05-23'
outputs:
  test-demo-FloatingIPaddress:
    description: Floating IP assigned
    value:
      get_attr: [test-demo-FloatingIP, floating_ip_address]
resources:
  test-demo:
    properties:
      flavor: m1.small
      image: Ubuntu 13.10 Heat SW config
      key_name: devstack
      networks:
      - port: {get_resource: test-demo-port}
      user_data_format: SOFTWARE_CONFIG
    type: OS::Nova::Server
  test-demo-FloatingIP:
    properties: {floating_network_id: 24f9b76f-f292-404d-b359-ce9b209eda03}
    type: OS::Neutron::FloatingIP
  test-demo-FloatingIpAssoc:
    properties:
      floatingip_id: {get_resource: test-demo-FloatingIP}
      port_id: {get_resource: test-demo-port}
    type: OS::Neutron::FloatingIPAssociation
  test-demo-port:
    properties:
      fixed_ips:
      - subnet_id: edcd39a2-dcc0-4823-a736-4ca3dd90fdd2
      network_id: 02ab2bb0-b165-442d-9b14-6ed20555f846
      security_groups:
      - default
    type: OS::Neutron::Port

Heat SW config (add this as part of stack-update to see the error):

  test-demo-test_heat_sw:
    properties:
      config: '#!/bin/bash

        echo "Testing me some heat software config" | tee /tmp/test-heat.txt

        '
      group: script
      inputs:
      - {default: None, name: previous}
    type: OS::Heat::SoftwareConfig
  test-demo-test_heat_sw_deployment:
    properties:
      config: {get_resource: test-demo-test_heat_sw}
      server: {get_resource: test-demo}
    type: OS::Heat::SoftwareDeployment

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

If possible, can you paste the errors from h-eng and h-api-cfn too?

Changed in heat:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Steve McLellan (sjmc7) wrote :
Download full text (8.0 KiB)

These times don't match with the log above, but messages all the same. The cfn log doesn't seem to mention any errors nor the return code; the only indication of error is the second-to-last line in the engine logs. From the few hours i spent looking at this yesterday the culprit appears to be that the updated template isn't present in the raw_template table and so isn't considered part of the stack by the time the CFN signal comes in. I'm not sure I understand enough of the big picture around stack-update to dive in and fix it though.

On the instance /var/log/upstart/os-collect-config.log:

[2014-06-10 14:33:50,783] (heat-config) [INFO] Completed /var/lib/heat-config/hooks/script
[2014-06-10 14:33:50,789] (heat-config) [DEBUG] Signalling to http://10.24.0.4:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3A15a41777be7244daacd242a02b862814%3Astacks%2Ftest1%2Fd8eb8c11-81de-4ed3-81c4-87a0a8a15af0%2Fresources%2Ftest-demo-test_heat_sw_deployment?Timestamp=2014-06-10T14%3A33%3A15Z&SignatureMethod=HmacSHA256&AWSAccessKeyId=538c7f0cb61140d998e04c62647124e1&SignatureVersion=2&Signature=pjub03wXQoSlYjT9gYZkNu0ntFGLtsCMm04REsp3mLU%3D with {"deploy_stdout": "Testing me some heat software config\n", "deploy_stderr": "", "deploy_status_code": 0}
[2014-06-10 14:33:51,254] (heat-config) [DEBUG] Response <Response [400]>
dib-run-parts Tue Jun 10 14:33:51 UTC 2014 55-heat-config completed

h-eng:

2014-06-10 14:33:51.087 DEBUG heat.openstack.common.rpc.amqp received {u'_msg_id': u'4829fe8f22f64569a3538a776b8f8439', u'_context_request_id': u'req-928bfa9b-16d6-4b27-bf72-74cafe1bb78e', u'_context_trust_id': None, u'_context_auth_token': '<SANITIZED>', u'_context_user_id': u'f53f6d31c4a0422a845a5b4d4400d5a9', u'_context_username': None, u'_reply_q': u'reply_c499de3c7e8f442686a2288c5a994a90', u'_context_show_deleted': False, u'namespace': None, u'_context_is_admin': False, u'version': u'1.0', u'_context_user': None, u'method': u'resource_signal', u'_context_roles': [u'heat_stack_user'], u'_context_auth_url': u'http://10.24.0.4:5000/v2.0', u'args': {u'resource_name': u'test-demo-test_heat_sw_deployment', u'details': {u'deploy_stdout': u'Testing me some heat software config\n', u'deploy_stderr': u'', u'deploy_status_code': 0}, u'stack_identity': {u'stack_name': u'test1', u'stack_id': u'd8eb8c11-81de-4ed3-81c4-87a0a8a15af0', u'tenant': u'15a41777be7244daacd242a02b862814', u'path': u''}}, u'_unique_id': u'd8a291698bd9403f93446c85bc8c702f', u'_context_tenant_id': u'7b0358237da24700b8a750544a25cd1c', u'_context_password': '<SANITIZED>', u'_context_trustor_user_id': None, u'_context_aws_creds': u'{"ec2Credentials": {"access": "538c7f0cb61140d998e04c62647124e1", "signature": "pjub03wXQoSlYjT9gYZkNu0ntFGLtsCMm04REsp3mLU="}}', u'_context_tenant': u'15a41777be7244daacd242a02b862814-d8eb8c11-81de-4ed3-81c4-87a0a8a'} from (pid=16255) _safe_log /opt/stack/heat/heat/openstack/common/rpc/common.py:280
2014-06-10 14:33:51.087 DEBUG heat.openstack.common.rpc.amqp unpacked context: {'username': None, 'user_id': u'f53f6d31c4a0422a845a5b4d4400d5a9', 'show_deleted': False, 'roles': [u'heat_stack_user'], 'tenant_id': u'7b0358237da24700b8a750544a25cd1c', 'auth_token': '<SANITIZED>', 'tr...

Read more...

Revision history for this message
Rabi Mishra (rabi) wrote :
Download full text (17.5 KiB)

I could reproduce the issue to some extent. In my case stack update get's stuck with 'UPDATE_IN_PROGRESS'. I could see the CREATE_COMPLETE event for 'OS::Heat::SoftwareConfig' resource , but it's not in the resource list.

[stack@devstack-rabi devstack]$ heat event-list test_stack
+-----------------------------------+--------------------------------------+------------------------+--------------------+----------------------+
| resource_name | id | resource_status_reason | resource_status | event_time |
+-----------------------------------+--------------------------------------+------------------------+--------------------+----------------------+
| test-demo | 221ce347-e4e5-401b-a9c8-72d5528d3c4e | state changed | CREATE_COMPLETE | 2014-06-14T12:32:00Z |
| test-demo | a7fc0f11-9931-40e2-9630-05f7c7c41130 | state changed | CREATE_IN_PROGRESS | 2014-06-14T12:31:36Z |
| test-demo-FloatingIP | 592a7f6c-411b-4422-ad91-7b4b7ccf23a8 | state changed | CREATE_IN_PROGRESS | 2014-06-14T12:31:35Z |
| test-demo-FloatingIP | 80159eea-f4db-409a-b863-d0a3f83cdc6e | state changed | CREATE_COMPLETE | 2014-06-14T12:31:35Z |
| test-demo-FloatingIpAssoc | 0965909c-c181-49f0-8976-4f5b2e2e8d63 | state changed | CREATE_IN_PROGRESS | 2014-06-14T12:31:37Z |
| test-demo-FloatingIpAssoc | 60c0e718-3333-4218-adb5-8db0a44c80b6 | state changed | CREATE_COMPLETE | 2014-06-14T12:31:38Z |
| test-demo-port | 5f9d82aa-19e9-4576-94bb-ac8956c60247 | state changed | CREATE_COMPLETE | 2014-06-14T12:31:35Z |
| test-demo-port | 7679d779-8766-4706-8aba-7f2a57e42ff8 | state changed | CREATE_IN_PROGRESS | 2014-06-14T12:31:35Z |
| test-demo-test_heat_sw | be9ff4c2-1e04-466a-a044-7606c216c985 | state changed | CREATE_IN_PROGRESS | 2014-06-14T12:32:57Z |
| test-demo-test_heat_sw | f908ac74-9df1-4d82-a9bb-6601755b896b | state changed | CREATE_COMPLETE | 2014-06-14T12:32:58Z |
| test-demo-test_heat_sw_deployment | 80ba265a-c714-4dc7-8215-2daca764afa8 | state changed | CREATE_IN_PROGRESS | 2014-06-14T12:32:59Z |
+-----------------------------------+--------------------------------------+------------------------+--------------------+----------------------+

[stack@devstack-rabi devstack]$ heat resource-list test_stack
+---------------------------+------------------------------------+-----------------+----------------------+
| resource_name | resource_type | resource_status | updated_time |
+---------------------------+------------------------------------+-----------------+----------------------+
| test-demo | OS::Nova::Server | CREATE_COMPLETE | 2014-06-14T12:31:35Z |
| test-demo-FloatingIP | OS::Neutron::FloatingIP | CREATE_COMPLETE | 2014-06-14T12:31:35Z |
| test-demo-port | OS::Neutron::Port | CREATE_COMPLETE | 2014-06-14T12:31:35Z |
| test-demo-...

Changed in heat:
status: Triaged → Confirmed
assignee: nobody → Rabi Mishra (ramishra)
Revision history for this message
Rabi Mishra (rabi) wrote :
Download full text (5.4 KiB)

Probably this is not related to the issue mentioned in the original bug.

It times out after a long time with the following error "HTTPUnauthorized: ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1".

Interesting thing is it works without any issues with 'create-stack' and fails with 'update-stack'. I tried with both signal_transport HEAT_SIGNAL and CFN_SIGNAL

I'll investigate further.

[stack@devstack-rabi devstack]$ heat stack-list
+--------------------------------------+------------+---------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+---------------+----------------------+
| 92227c20-856e-4a75-9cb6-29787587c17f | test_stack | UPDATE_FAILED | 2014-06-14T14:04:06Z |
+--------------------------------------+------------+---------------+----------------------+

[stack@devstack-rabi devstack]$ heat resource-list test_stack
+-----------------------------------+------------------------------------+-----------------+----------------------+
| resource_name | resource_type | resource_status | updated_time |
+-----------------------------------+------------------------------------+-----------------+----------------------+
| test-demo | OS::Nova::Server | CREATE_COMPLETE | 2014-06-14T14:04:06Z |
| test-demo-FloatingIP | OS::Neutron::FloatingIP | CREATE_COMPLETE | 2014-06-14T14:04:06Z |
| test-demo-FloatingIpAssoc | OS::Neutron::FloatingIPAssociation | CREATE_COMPLETE | 2014-06-14T14:04:06Z |
| test-demo-port | OS::Neutron::Port | CREATE_COMPLETE | 2014-06-14T14:04:06Z |
| test-demo-test_heat_sw | OS::Heat::SoftwareConfig | CREATE_COMPLETE | 2014-06-14T14:05:15Z |
| test-demo-test_heat_sw_deployment | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2014-06-14T14:05:17Z |
+-----------------------------------+------------------------------------+-----------------+----------------------+

[stack@devstack-rabi devstack]$ heat resource-show test-demo-test_heat_sw_deployment
usage: heat resource-show <NAME or ID> <RESOURCE>
heat resource-show: error: too few arguments
[stack@devstack-rabi devstack]$ heat resource-show test_stack test-demo-test_heat_sw_deployment
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| description | ...

Read more...

Revision history for this message
Steve McLellan (sjmc7) wrote :

Hi Rabi,

The Authorization error i think is a red herring; the token gets invalidated after sufficient time is passed.

The problem (with the SoftwareDeployment getting stuck in IN_PROGRESS) is that when the VM tries to send the signal indicating it's finished the deployment, the server tries to look up the resource it refers to (test-demo-test_heat_sw_deployment). Because the resource is still being created, it doesn't yet exist in the template in the database, so the lookup fails. The server never gets the signal and remains IN_PROGRESS until something interrupts it (like an eventual auth problem).

I haven't attempted a fix yet because I was hoping to talk to Steve about it first, but feel free to give it a go :) The only solution I could come up with was making resources that are being created as part of a stack-update available to the signalling mechanism (as they must be for stack-create).

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

This will most likely be fixed during Zane Bitter's current stack-update work. You should check with Zane whether it is worth you looking into a fix at this point.

Revision history for this message
Rabi Mishra (rabi) wrote :
Download full text (5.5 KiB)

Thanks Steve. I'll check with Zane. Below is the signalling error as part of os-refresh-config:configure

Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,145] (os-refresh-config) [INFO] Completed phase pre-configure
Jun 16 04:58:37 localhost os-collect-config: INFO:os-refresh-config:Completed phase pre-configure
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,147] (os-refresh-config) [INFO] Starting phase configure
Jun 16 04:58:37 localhost os-collect-config: INFO:os-refresh-config:Starting phase configure
Jun 16 04:58:37 localhost os-collect-config: dib-run-parts Mon Jun 16 04:58:37 UTC 2014 Running /opt/stack/os-config-refresh/configure.d/50-os-config-applier
Jun 16 04:58:37 localhost os-collect-config: [2014/06/16 04:58:37 AM] [INFO] writing /var/run/heat-config/heat-config
Jun 16 04:58:37 localhost os-collect-config: [2014/06/16 04:58:37 AM] [INFO] writing /etc/os-collect-config.conf
Jun 16 04:58:37 localhost os-collect-config: [2014/06/16 04:58:37 AM] [INFO] success
Jun 16 04:58:37 localhost os-collect-config: dib-run-parts Mon Jun 16 04:58:37 UTC 2014 50-os-config-applier completed
Jun 16 04:58:37 localhost os-collect-config: dib-run-parts Mon Jun 16 04:58:37 UTC 2014 Running /opt/stack/os-config-refresh/configure.d/55-heat-config
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,347] (heat-config) [DEBUG] Running /var/lib/heat-config/hooks/script
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,398] (heat-config) [INFO] {"deploy_stdout": "Testing me some heat software config applied\n", "deploy_stderr": "", "deploy_status_code": 0}
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,400] (heat-config) [DEBUG] [2014-06-16 04:58:37,385] (heat-config) [INFO] previous=None
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,386] (heat-config) [INFO] deploy_server_id=087c9baf-2d41-42df-aa24-6a87b26069fb
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,386] (heat-config) [INFO] deploy_action=CREATE
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,386] (heat-config) [INFO] deploy_stack_id=test_stack/4273acb4-2c3f-4af1-8482-2f336b7a87ab
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,386] (heat-config) [INFO] deploy_resource_name=test-demo-test_heat_sw_deployment
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,386] (heat-config) [INFO] deploy_auth_url=http://192.168.66.177:5000/v3
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,386] (heat-config) [INFO] deploy_username=test_stack-test-demo-test_heat_sw_deployment-awgmifz2vixh
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,386] (heat-config) [INFO] deploy_user_id=f802fee38af2478aa2ad88f9c83b9f83
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,386] (heat-config) [INFO] deploy_password=141c39f1563246ee9f759e9939764e06
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,387] (heat-config) [INFO] deploy_project_id=3678ccf1cd0b44649402d9d574ee691f
Jun 16 04:58:37 localhost os-collect-config: [2014-06-16 04:58:37,387] (heat-config) [DEBUG] Running /var/lib/heat-config...

Read more...

Revision history for this message
Zane Bitter (zaneb) wrote :

Yep, this is the same issue as bug 1291905.

Changed in heat:
assignee: Rabi Mishra (ramishra) → Zane Bitter (zaneb)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/100047
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=40693910beb336518bf06d7e0621b7b14414fc84
Submitter: Jenkins
Branch: master

commit 40693910beb336518bf06d7e0621b7b14414fc84
Author: Zane Bitter <email address hidden>
Date: Mon Jun 16 11:05:18 2014 -0400

    Update: persist current template on change

    After each resource is updated, persist the modified template to the
    database so that any new resources will be accessible to API calls -
    including signals to new WaitConditionHandles. This also ensures that if
    the Heat engine dies completely during an update, the template in the
    database will still be in a consistent state.

    Change-Id: Ie6f234302cf72213d4b0e1f5b963cd8def422498
    Closes-Bug: #1291905
    Closes-Bug: #1206702
    Closes-Bug: #1328342
    Implements: partial-blueprint update-failure-recovery

Changed in heat:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/101288

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/101288
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=3ff80e61285b763a9d625692f1d790b9a36dd380
Submitter: Jenkins
Branch: master

commit 3ff80e61285b763a9d625692f1d790b9a36dd380
Author: Zane Bitter <email address hidden>
Date: Fri Jun 27 16:22:45 2014 -0400

    Pass the context when updating raw_templates

    Without a context, the raw_template was being stored using a different DB
    session, with the result that changes did not end up in the database.

    Also, don't use the same raw_template DB entry for the backup stack, since
    that causes competing modifications to the DB that break rollback.

    Change-Id: I6c71c19acac0b87943f36c57c2300cf2b0478aa3
    Closes-Bug: #1331872
    Related-Bug: #1291905
    Related-Bug: #1206702
    Related-Bug: #1328342

Changed in heat:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.