StarlingX

software config/deploy return signal error

Bug #1817528 reported by Tomas Holmberg on 2019-02-25

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Won't Fix	Low	wanghao

Bug Description

Title
-----
software config/deploy return signal error

Brief Description
-----------------
The return signal for a software config/deploy is sent using the external address instead of the public. The result is that the signal from guest to StarlingX will never be received.

Severity
--------
Critical

Steps to Reproduce
------------------
Run the following stack

#############
heat_template_version: 2013-05-23

description: >
  Minimal SoftwareDeployment
parameters:
  flavor:
    type: string
    default: medium_flav
  image:
    type: string
    default: ubuntu2
  network:
    default: net_a
    type: string
  user:
    type: string
    default: admin

resources:
  my_key:
    type: OS::Nova::KeyPair
    properties:
      name: my_key3
      save_private_key: True
      user: {get_param: user}

  my_security_group:
    type: OS::Neutron::SecurityGroup
    properties:
      rules:
        - protocol: tcp
          remote_ip_prefix: 0.0.0.0/0
          port_range_min: 1
          port_range_max: 65535
        - protocol: icmp
          remote_ip_prefix: 0.0.0.0/0

  my_port:
    type: OS::Neutron::Port
    properties:
      network: {get_param: network}
      security_groups:
        - default
        - { get_resource: my_security_group }

  my_software_config:
    type: OS::Heat::SoftwareConfig
    properties:
      inputs:
        - name: message
          default: 'NONE'
      outputs:
        - name: file_content
      group: script
      config: |
        #!/bin/sh -x
        echo "PNYXTER EKO"
        echo "${message}" > /tmp/pnyxter
        cat /tmp/pnyxter > ${heat_outputs_path}.file_content

  my_software_deployment:
    type: OS::Heat::SoftwareDeployment
    properties:
      signal_transport: CFN_SIGNAL
      config: {get_resource: my_software_config}
      server: {get_resource: my_server}
      input_values:
        message: 'pnyxter mestxt'

  my_server:
    type: OS::Nova::Server
    properties:
      name: th3
      image: {get_param: image}
      flavor: {get_param: flavor}
      key_name: {get_resource: my_key}
      user_data_format: SOFTWARE_CONFIG
      networks:
        - port: {get_resource: my_port}

outputs:
  deploy_output:
    value:
      get_attr: [my_software_deployment, file_content]
  stdout:
    value:
      get_attr: [my_software_deployment, deploy_stdout]
  stderr:
    value:
      get_attr: [my_software_deployment, deploy_stderr]
  status_code:
    value:
      get_attr: [my_software_deployment, deploy_status_code]
  private_key:
    value:
      get_attr: [my_key, private_key]
  public_key:
    value:
      get_attr: [my_key, public_key]

##################

Expected Behavior
------------------
The server resource should have lock like this:
{
          "type": "String",
          "name": "deploy_signal_id",
          "value": "http://<public_addr>:8000/v1/signal/....",
          "description": "ID of signal to use for signaling output values"
        },

Actual Behavior
----------------
The server resource should have lock like this:
{
          "type": "String",
          "name": "deploy_signal_id",
          "value": "http://<internal_addr>:8000/v1/signal/....",
          "description": "ID of signal to use for signaling output values"
        },

System Configuration
--------------------
One node

Branch/Pull Time/Commit
-----------------------
ISO from http://mirror.starlingx.cengn.ca/mirror/starlingx/release/1000.10.1/centos/2018.10.0/outputs/iso/

Solution
----------
Add the following line to signal_responder.py to /usr/lib/python2.7/site-packages/heat/engine/resources/signal_responder.py
from
https://github.com/openstack/heat/blob/stable/pike/heat/engine/resources/signal_responder.py#L144

i.e.
--- /usr/lib/python2.7/site-packages/heat/engine/resources/signal_responder.py_org 2019-02-25 09:50:30.061517795 +0000
+++ /usr/lib/python2.7/site-packages/heat/engine/resources/signal_responder.py 2019-02-25 09:50:44.533517917 +0000
@@ -158,6 +158,7 @@
                                              cfg.CONF.heat_api_cfn.bind_port,
                                              "/v1",
                                              signal_type)
+ signal_url = config_url.replace('/waitcondition', signal_type)
         else:
             heat_client_plugin = self.stack.clients.client_plugin('heat')
             endpoint = heat_client_plugin.get_heat_cfn_url()

Tags:

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-25:

From Al Bailey:
---------------
The config url is set to: (public endpoint in my lab)
config_url http://128.224.151.57:8000/v1/waitcondition

This is translated by the heat code to the internal url:
http://127.168.204.2:8000/v1/signal

The WRS code that does that is here:
https://github.com/starlingx-staging/stx-heat/blob/master/heat/engine/resources/signal_responder.py#L150

This was done to fix CGTS-3907

If change that method to be similar to what upstream does, it should work for this customer as long as they are not using https or IPv6

This is the standard upstream code which does our alterations.
https://github.com/openstack/heat/blob/stable/pike/heat/engine/resources/signal_responder.py#L144

If you want to demonstrate that this fix works in this scenario, this is the file that needs to be changed:
/usr/lib/python2.7/site-packages/heat/engine/resources/signal_responder.py

find these lines (around line 155)
signal_url = "%s://%s:%s%s%s" % ("http",
                                             host_addr,
                                             cfg.CONF.heat_api_cfn.bind_port,
                                             "/v1",
                                             signal_type)

and add the following one line right below it (which basically sets the variable to what upstream is doing)
signal_url = config_url.replace('/waitcondition', signal_type)

Note: This will not work with IPv6 and https
This will however work for the tests highlighted here

After you make this change (and make sure you backup the original file), you can restart heat engine:

sudo sm-restart service heat-engine

then you should be able to create their stack.
Note: the stack uses a keypair, which may or may not need to be manually cleaned up

Changed in starlingx:
assignee:	nobody → Al Bailey (albailey1974)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-25:

As already confirmed by the reporter, the suggested code change addresses the issue. However, this is just a workaround which is not suitable for submission in the StarlingX 2018.10 release as it would result in IPv6 and https issues with heat.

We expect this to work in newer releases of StarlingX as the code will be based on Stein and will align with the upstream behavior.

Our recommendation is to live with the workaround for now until StarlingX is rebased on Stein.

Tomas, please let us know if you have concerns with this approach.

tags:	added: stx.distro.openstack
Changed in starlingx:
importance:	Undecided → High

Revision history for this message

Tomas Holmberg (wr-tholmber) wrote on 2019-02-26:

I do not have a personal opinion in this. You know better which other parts of the code which will not work. However it would be nice with some kind of warning.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-27:

Agreed to live with the workaround for stx.2018.10. We will retest in stx.2019.05 after moving to Stein to ensure this is still working. Then we'll close the bug at that time.

tags:	added: stx.2019.05
Changed in starlingx:
status:	New → Triaged

Revision history for this message

Bruce Jones (brucej) wrote on 2019-03-05:

We have re-based to Stein. Need to retest this bug and make sure it has been addressed.

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-04-05:

Lowered priority to medium as a workaround exists.

Changed in starlingx:
importance:	High → Medium

Ken Young (kenyis) on 2019-04-05

tags:

added: stx.2.0
removed: stx.2019.05

Revision history for this message

Bruce Jones (brucej) wrote on 2019-05-28:

This defect needs to be retested since we rebased to upstream Stein and this issue may no longer exist.

tags:

added: stx.retestneeded

Revision history for this message

Al Bailey (albailey1974) wrote on 2019-06-25:

In my retest env (heat running in containerized openstack)
  m1.tiny as the flavor.
  cirros qcow image rather than ubuntu
  Used the tenant networking setup from the containers wiki to setup a network and selected: public-net0

The stack create seems to hang
I am not getting a signal error. Its just hanging

I tried both CFN_SIGNAL, HEAT_SIGNAL

I don't know enough about software deployment and signalling to know if my problem is because of my cirros image, or if it is due to a config issue. I don't see any sign of deploy_signal_id anywhere in the logs or the CLI queries.

Revision history for this message

Jan Borren (lmfjbor) wrote on 2019-06-26:

Possibly the Cirros image as the image that you use need to contain the agent toolchain os-collect-config, os-refresh-config and os-apply-config.

Example for adding this agent toolchain to a ubuntu cloud image:

#!/bin/bash
#
# Details: https://docs.openstack.org/heat/latest/template_guide/software_deployment.html#configuring-with-os-apply-config

git clone https://git.openstack.org/openstack/tripleo-image-elements.git
git clone https://git.openstack.org/openstack/heat-agents.git

pip install --upgrade pip
pip install git+https://git.openstack.org/openstack/diskimage-builder.git

export ELEMENTS_PATH=tripleo-image-elements/elements:heat-agents/
export BASE_ELEMENTS="ubuntu"
export AGENT_ELEMENTS="os-collect-config os-refresh-config os-apply-config"
export DEPLOYMENT_BASE_ELEMENTS="heat-config heat-config-script"
export DEPLOYMENT_TOOL=""
export IMAGE_NAME=ubuntu-software-config

/usr/local/bin/disk-image-create vm $BASE_ELEMENTS $AGENT_ELEMENTS $DEPLOYMENT_BASE_ELEMENTS $DEPLOYMENT_TOOL -o $IMAGE_NAME.qcow2
openstack image create --disk-format qcow2 --container-format bare $IMAGE_NAME < $IMAGE_NAME.qcow2

Revision history for this message

yong hu (yhu6) wrote on 2019-07-16:

#10

in WW30/31, we need to try this recipe described above.

Revision history for this message

yong hu (yhu6) wrote on 2019-07-16:

#11

@Bill to ask help from Hao (FiberHome).

Bill Zvonar (billzvonar) on 2019-07-16

Changed in starlingx:
assignee:	Al Bailey (albailey1974) → wanghao (wanghao749)

Revision history for this message

zhangyifan (zhangyifan) wrote on 2019-07-26:

#12

Can you provide the ubuntu image for me?

Revision history for this message

yong hu (yhu6) wrote on 2019-08-08:

#13

@zhangyifan, you might have several ways to get the guest image:
1. for ubuntu cloud image, you can download one: https://cloud-images.ubuntu.com/releases/
2. get the image from @Jan or @Bill.
3. use cirros http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-08-23:

#14

As per agreement with the community, moving all unresolved medium priority bugs from stx.2.0 to stx.3.0

tags:

added: stx.3.0
removed: stx.2.0

Revision history for this message

yong hu (yhu6) wrote on 2019-11-13:

#15

@wanghao, could you have someone in your team working on this LP?

We might need a follow-up or decision on this one for stx.3.0.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-12-30:

#16

As per agreement with the community, marking unresolved medium priority bugs (>= 100 days AND not recently reproduced) from stx.3.0 to Low priority / no target release

tags:	removed: stx.3.0
Changed in starlingx:
importance:	Medium → Low

Revision history for this message

Ramaswamy Subramanian (rsubrama) wrote on 2022-05-31:

#17

No progress on this bug for more than 2 years. Candidate for closure.

If there is no update, this issue is targeted to be closed as 'Won't Fix' in 2 weeks.

Revision history for this message

Ramaswamy Subramanian (rsubrama) wrote on 2022-06-14:

#18

Changing the status to 'Won't Fix' as there is no activity.

Changed in starlingx:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.