Render deployment file for NetworkDeployment

Bug #1769622 reported by Matthias Runge
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
tripleo
Invalid
Undecided
Unassigned

Bug Description

overcloud deployment fails:

2018-05-07 10:24:55 | TASK [Render deployment file for NetworkDeployment] ****************************
2018-05-07 10:24:56 | changed: [overcloud-controller-0] => {"changed": true, "checksum": "83c26065e733e586bc96a3d02ed19c4a92460677", "dest": "/var/lib/heat-config/tripleo-config-download/NetOvercloud configurati
on failed.
2018-05-07 10:24:56 | workDeployment-24ae0b29-1d99-4c1e-9f47-2905bf368645", "failed": false, "gid": 0, "group": "root", "md5sum": "1bd75b737ab89b020bbccdaf0cc7e779", "mode": "0644", "owner": "root", "secontext":
 "system_u:object_r:var_lib_t:s0", "size": 4131, "src": "/home/tripleo-admin/.ansible/tmp/ansible-tmp-1525688694.38-159569891596662/source", "state": "file", "uid": 0}
2018-05-07 10:24:56 |
2018-05-07 10:24:56 | TASK [Check if deployed file exists for NetworkDeployment] *********************
2018-05-07 10:24:56 | ok: [overcloud-controller-0] => {"changed": false, "failed": false, "stat": {"exists": false}}
2018-05-07 10:24:56 |
2018-05-07 10:24:56 | TASK [Check previous deployment rc for NetworkDeployment] **********************
2018-05-07 10:24:56 | skipping: [overcloud-controller-0] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2018-05-07 10:24:56 |
2018-05-07 10:24:56 | TASK [Remove deployed file for NetworkDeployment when previous deployment failed] ***
2018-05-07 10:24:56 | skipping: [overcloud-controller-0] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2018-05-07 10:24:56 |
2018-05-07 10:24:56 | TASK [Force remove deployed file for NetworkDeployment] ************************
2018-05-07 10:24:56 | skipping: [overcloud-controller-0] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2018-05-07 10:24:56 |
2018-05-07 10:24:56 | TASK [Run deployment NetworkDeployment] ****************************************
2018-05-07 10:24:56 | fatal: [overcloud-controller-0]: FAILED! => {"changed": true, "cmd": "/usr/libexec/os-refresh-config/configure.d/55-heat-config\n exit $(jq .deploy_status_code /var/lib/heat-config/deployed/24ae0b29-1d99-4c1e-9f47-2905bf368645.notify.json)", "delta": "0:00:00.027389", "end": "2018-05-07 10:24:55.048158", "failed": true, "msg": "non-zero return code", "rc": 2, "start": "2018-05-07 10:24:55.020769", "stderr": "jq: error: Could not open file /var/lib/heat-config/deployed/24ae0b29-1d99-4c1e-9f47-2905bf368645.notify.json: No such file or directory", "stderr_lines": ["jq: error: Could not open file /var/lib/heat-config/deployed/24ae0b29-1d99-4c1e-9f47-2905bf368645.notify.json: No such file or directory"], "stdout": "", "stdout_lines": []}
2018-05-07 10:24:56 | ...ignoring
2018-05-07 10:24:56 |
2018-05-07 10:24:56 | TASK [Output for NetworkDeployment] ********************************************
2018-05-07 10:24:56 | fatal: [overcloud-controller-0]: FAILED! => {
2018-05-07 10:24:56 | "failed_when_result": true,
2018-05-07 10:24:56 | "msg": [
2018-05-07 10:24:56 | {
2018-05-07 10:24:56 | "stderr": [
2018-05-07 10:24:56 | "jq: error: Could not open file /var/lib/heat-config/deployed/24ae0b29-1d99-4c1e-9f47-2905bf368645.notify.json: No such file or directory"
2018-05-07 10:24:56 | ]
2018-05-07 10:24:56 | },
2018-05-07 10:24:56 | {
2018-05-07 10:24:56 | "status_code": "2"
2018-05-07 10:24:56 | }
2018-05-07 10:24:56 | ]
2018-05-07 10:24:56 | }
2018-05-07 10:24:56 |
2018-05-07 10:24:56 | NO MORE HOSTS LEFT *************************************************************
2018-05-07 10:24:56 |
2018-05-07 10:24:56 | PLAY RECAP *********************************************************************
:

Just in case, the file /var/lib/heat-config/tripleo-config-download/NetworkDeployment-<hash> exists and contains:

[{"inputs": [{"type": "String", "name": "interface_name", "value": "nic1", "description": "None"}, {"type": "String", "name": "bridge_name", "value": "br-ex", "description": "None"}, {"type": "String", "name": "deploy_server_id", "value": "cf92f6a3-a6de-420e-954b-409cdef5312c", "description": "ID of the server being deployed to"}, {"type": "String", "name": "deploy_action", "value": "CREATE", "description": "Name of the current action being deployed"}, {"type": "String", "name": "deploy_stack_id", "value": "overcloud-Controller-ofum4twktxub-0-w66wntfwybd6-NetworkDeployment-j6yi42qrpxny-TripleOSoftwareDeployment-pxfvojwnmowa/d1325865-247f-465b-9535-79031304e745", "description": "ID of the stack this deployment belongs to"}, {"type": "String", "name": "deploy_resource_name", "value": "TripleOSoftwareDeployment", "description": "Name of this deployment resource in the stack"}, {"type": "String", "name": "deploy_signal_transport", "value": "NO_SIGNAL", "description": "How the server should signal to heat with the deployment output values."}], "group": "os-apply-config", "name": "deployment_resource", "deployment_name": "NetworkDeployment", "outputs": null, "config": "{\n \"os_net_config\": {\n \"network_config\": [\n {\n \"addresses\": [\n {\n \"ip_netmask\": \"192.168.24.15/24\"\n }\n ], \n \"mtu\": 1350, \n \"routes\": [\n {\n \"ip_netmask\": \"169.254.169.254/32\", \n \"next_hop\": \"192.168.24.1\"\n }\n ], \n \"use_dhcp\": false, \n \"type\": \"interface\", \n \"name\": \"nic1\"\n }, \n {\n \"dns_servers\": [\n \"192.168.36.9\", \n \"192.168.36.1\"\n ], \n \"addresses\": [\n {\n \"ip_netmask\": \"10.0.0.17/24\"\n }\n ], \n \"members\": [\n {\n \"ovs_options\": \"bond_mode=balance-slb\", \n \"type\": \"ovs_bond\", \n \"name\": \"bond1\", \n \"members\": [\n {\n \"type\": \"interface\", \n \"primary\": true, \n \"name\": \"nic2\", \n \"mtu\": 1350\n }, \n {\n \"type\": \"interface\", \n \"primary\": false, \n \"name\": \"nic3\", \n \"mtu\": 1350\n }\n ]\n }\n ], \n \"routes\": [\n {\n \"ip_netmask\": \"0.0.0.0/0\", \n \"next_hop\": \"10.0.0.1\"\n }\n ], \n \"use_dhcp\": false, \n \"type\": \"ovs_bridge\", \n \"name\": \"br-ex\"\n }, \n {\n \"use_dhcp\": false, \n \"type\": \"interface\", \n \"addresses\": [\n {\n \"ip_netmask\": \"172.20.0.19/24\"\n }\n ], \n \"name\": \"nic4\", \n \"mtu\": 1350\n }, \n {\n \"use_dhcp\": false, \n \"type\": \"interface\", \n \"addresses\": [\n {\n \"ip_netmask\": \"172.18.0.20/24\"\n }\n ], \n \"name\": \"nic5\", \n \"mtu\": 1350\n }, \n {\n \"use_dhcp\": false, \n \"type\": \"interface\", \n \"addresses\": [\n {\n \"ip_netmask\": \"172.19.0.17/24\"\n }\n ], \n \"name\": \"nic6\", \n \"mtu\": 1350\n }, \n {\n \"dns_servers\": [\n \"192.168.36.9\", \n \"192.168.36.1\"\n ], \n \"addresses\": [\n {\n \"ip_netmask\": \"172.16.0.19/24\"\n }\n ], \n \"members\": [\n {\n \"type\": \"interface\", \n \"primary\": true, \n \"name\": \"nic7\", \n \"mtu\": 1350\n }\n ], \n \"use_dhcp\": false, \n \"type\": \"ovs_bridge\", \n \"name\": \"br-tenant\"\n }\n ]\n }\n}\n", "creation_time": "2018-05-07T10:22:26Z", "id": "24ae0b29-1d99-4c1e-9f47-2905bf368645", "options": {}}]

Matthias Runge (mrunge)
Changed in tripleo:
importance: Undecided → Critical
status: New → Triaged
milestone: none → rocky-2
Changed in tripleo:
importance: Critical → High
Changed in tripleo:
milestone: rocky-2 → rocky-3
Revision history for this message
wes hayutin (weshayutin) wrote :

Which upstream job are you trying to execute here? Can we have a full log?

Changed in tripleo:
status: Triaged → Incomplete
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Revision history for this message
Matthias Runge (mrunge) wrote :

I'm simply trying to deploy a "standard configuration", whatever that means, following

https://docs.openstack.org/tripleo-quickstart/latest/devmode-ovb.html#

Due to hard-coded mirrors in deployment scripts, the recreate CI scripts don't really work in environments different from RDO cloud.

Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Closing due to the lack of activity here. It can be re-opened if needed.

Changed in tripleo:
status: Incomplete → Invalid
Revision history for this message
kobig (kobi.ginon) wrote :

Hi
this bug does appear randomly again
you can see below on my deployed server out of 8 Blades , 5 of them failed on the same error
while 3 passed this step.
it seems that there should be some wait or retry , waiting for the notify json file o appear in the OS
Can someone take it ?

2019-02-28 04:56:46,707 p=793 u=mistral | fatal: [overcloud-ovscompute-bvt-0]: FAILED! => {
    "failed_when_result": true,
    "msg": [
        {
            "stderr": [
                "[2019-02-28 04:56:46,176] (heat-config) [WARNING] Skipping config 4291f27b-ba76-4876-99aa-9355d3e60be7, already deployed",
                "[2019-02-28 04:56:46,176] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/4291f27b-ba76-4876-99aa-9355d3e60be7.json",
                "jq: error: Could not open file /var/lib/heat-config/deployed/4291f27b-ba76-4876-99aa-9355d3e60be7.notify.json: No such file or directory"
            ]
        },
        {
            "status_code": "2"
        }
    ]
}

Changed in tripleo:
status: Invalid → Confirmed
Revision history for this message
kobig (kobi.ginon) wrote :

Hi
well after finding this similar issue
https://bugs.launchpad.net/tripleo/+bug/1792343
and indeed we changed the following parameters , when we started see this bug more often
ssh_args = -o ServerAliveInterval=20 -o ControlPersist=60m
teh surprise is that according to the issue resolved is that when the connection is broken and then reconnected
then the following code is still running but the response file does not exists
Seems to be correct
But apparently the parameter suggested by the fix is actually exposing this failure to happen more frequent

So the only thing that worked for me was waiting for the file as in the example below
i suggest this or a different variation as a fix which was working for me

- name: "Run deployment {{ item }}"
  shell: |
    /usr/libexec/os-refresh-config/configure.d/55-heat-config
    max_retries=60
    i=0
    while [ ! -e /var/lib/heat-config/deployed/{{ deployment_uuid }}.notify.json ] && [ $i -lt $max_retries ]; do ((i++)); echo "waiting for file to sync" ; sleep 1; done
    exit $(jq .deploy_status_code /var/lib/heat-config/deployed/{{ deployment_uuid }}.notify.json)
  become: true
  environment:
    HEAT_SHELL_CONFIG: /var/lib/heat-config/tripleo-config-download/{{ item ~ '-' ~ deployment_uuid }}
  register: deployment_result
  ignore_errors: yes

regards

Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
status: Confirmed → Triaged
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
Revision history for this message
Matthias Runge (mrunge) wrote :

there has not been any work on this for over a year.

Changed in tripleo:
status: Triaged → Incomplete
importance: High → Undecided
milestone: ussuri-3 → none
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.