Bug #1546789 “neutron failed to deploy in a multi-node deploymen...” : Bugs : kolla

Steven Dake (sdake) on 2016-02-18

Changed in kolla:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → mitaka-3

Revision history for this message

Lingfeng Xiong (xionglingfeng) wrote on 2016-02-18:

#1

Hi Steven,
I saw you set this bug as triaged. Could you share the root of this bug and/or possible workarounds?

Revision history for this message

Lingfeng Xiong (xionglingfeng) wrote on 2016-02-18:

#2

My current workaround is:
1. remove all compute nodes from inventory file, only leave controller and network node (they are the same host in my environment)
2. run deployment with this inventory file
3. add back compute nodes in inventory file
4. modify
kolla/ansible/roles/neutron/tasks/bootstrap.yml
change

set_fact:
database_created: "{{ (database.stdout.split('localhost | SUCCESS => ')[1]|$

to

set_fact:
database_created: "true"

run deployment again.

It is mandatory to use the original bootstrap.yml to finish the initial deployment on controller node (correct database will be created in this step), then do the modification and deploy to compute node again. If run deployment with modified bootstrap.yml directly on a fresh multi-node deployment, the deployment will succeed but neutron cannot start because the missing of service/endpoint in keystone and databases in mariadb.

Revision history for this message

Thiago Gomes (fthiagogv) wrote on 2016-02-23:

#3

same:
https://bugs.launchpad.net/kolla/+bug/1525759
and this
https://bugs.launchpad.net/kolla/+bug/1535422

Thiago Gomes (fthiagogv) on 2016-02-26

Changed in kolla:
assignee:	nobody → Thiago Gomes (fthiagogv)

Revision history for this message

Sam Yaple (s8m) wrote on 2016-02-26:

#4

This is unfortunate, but it is a bug with Ansible 1.x. This bug is fixed in Ansible 2.x, but our playbooks are not compatible with 2.x

In the newton cycle we will fix this issue indirectly by switching to Ansible 2.x.

Changed in kolla:
milestone:	mitaka-3 → none

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-29: Change abandoned on kolla (master)

#5

Change abandoned by Thiago Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/285408

Thiago Gomes (fthiagogv) on 2016-02-29

Changed in kolla:
assignee:	Thiago Gomes (fthiagogv) → nobody

Revision history for this message

Vikram Hosakote (vhosakot) wrote on 2016-03-14:

#6

I don't think this is a valid kolla bug. As Sam mentioned, this bug will be fixed in Newton when kolla moves to Ansible 2.x.

Revision history for this message

Martin Matyáš (martinx-maty) wrote on 2016-03-16:

#7

How is this issue planned to be handled for Mitaka? Will there be ansible 1.9.x with fix of this?

Maybe technically it is not a valid bug against Kolla, but the other thing is that it hits Kolla's functionality significantly - deployment of multi-node configuration fails. Workaround mentioned above works, but requires manual actions, not simple to automatize. Also other workaround with changing service distribution across nodes is I think not much good.

Note that there is other workaround mentioned on Kolla's IRC channel
http://eavesdrop.openstack.org/irclogs/%23kolla/%23kolla.2016-03-11.log.html#t2016-03-11T19:56:39
which is working for me - tweak site.yml for neutron role - putting following services on section top in this order:
    - neutron-server
    - neutron-dhcp-agent
    - neutron-l3-agent
    - neutron-metadata-agent

in neutron section:
https://github.com/openstack/kolla/blob/906c13eb6148d0c48b5f5ae157cfb10113efe173/ansible/site.yml#L101

Would this be acceptable fix/workaround to include into kolla directly?

Steven Dake (sdake) on 2016-03-28

Changed in kolla:
milestone:	none → newton-1

Steven Dake (sdake) on 2016-03-28

Changed in kolla:
assignee:	nobody → Michał Jastrzębski (inc007)
status:	Triaged → Confirmed

Revision history for this message

Ganesh Maharaj Mahalingam (ganesh-mahalingam) wrote on 2016-03-31:

#8

This seems to be an option that worked without having to re-order the list on site.yml. Maybe this can be pursued.

diff --git a/ansible/roles/neutron/tasks/bootstrap.yml b/ansible/roles/neutron/tasks/bootstrap.yml
index 30c9006..c149072 100644
--- a/ansible/roles/neutron/tasks/bootstrap.yml
+++ b/ansible/roles/neutron/tasks/bootstrap.yml
@@ -10,8 +10,9 @@
   changed_when: "{{ database.stdout.find('localhost | SUCCESS => ') != -1 and
                     (database.stdout.split('localhost | SUCCESS => ')[1]|from_json).changed }}"
   failed_when: database.stdout.split()[2] != 'SUCCESS'
- run_once: True
- delegate_to: "{{ groups['neutron-server'][0] }}"
+ delegate_to: "{{ inventory_hostname }}"
+ until:
+ database.stdout.split()[2] == "SUCCESS"

- name: Reading json from variable
set_fact:

Revision history for this message

Steven Dake (sdake) wrote on 2016-03-31:

#9

Ganesh,

I have a patch up which may be more suitable based upon something Vikram said tonight. Can you give it a spin and see if it works?

Thanks
-steve

Changed in kolla:
importance:	High → Critical
assignee:	Michał Jastrzębski (inc007) → Steven Dake (sdake)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-31: Fix proposed to kolla (master)

#10

Fix proposed to branch: master
Review: https://review.openstack.org/299803

Changed in kolla:
status:	Confirmed → In Progress

Revision history for this message

Ganesh Maharaj Mahalingam (ganesh-mahalingam) wrote on 2016-03-31:

#11

From the logs this is a plausible theory. When the play starts a large list is created with all the hosts on which the play should happen based on the order of hosts. Then the playbook goes through each of the tasks/includes based on the cirterion and runs them.

delegate_to is broken in ansible <2.0 per these bugs. https://github.com/ansible/ansible/issues/14684 && https://github.com/ansible/ansible/pull/15024.

In all the plays where 'run_once' is enabled, the playbook attempts it in the first machine that is in the list (created above) and skips if it doesnt match the criterion.

eg: Neutron endpoint creation fails. http://paste.openstack.org/show/492636/

Followed by creating the config drives which works correctly as they are not 'run_once'. http://paste.openstack.org/show/492637/

Changing the order will purely just change the order of the list and the key tasks which are run_once are obviously to be run on the server nodes (neutron in this case). Changing the order should have the most minimal impact here.

Revision history for this message

Ganesh Maharaj Mahalingam (ganesh-mahalingam) wrote on 2016-04-01:

#12

The above fixed patch has the recent change where the ordering of hosts in 'ansible/site.yml' has been changed to have neutron-server at the top of the list. With ansible 1.9.4 as recommended by kolla, delegate_to and run_once have some issues where the task to be run is attempted on the first host where the details are gathered and irrespective of the outcome of the task, it is only attempted once. The first host is apparently chosen from the ordering of the list of hosts in 'ansible/site.yml'. All the other plays have the respective servers at the top of the list and neutron did not. They made the neutron db creation and service registering task to always be attempted on the first compute node and gets skipped. This patch should be re-visited/reverted once we move a version of ansible where delegate_to issues are fixed to make sure that the code works as expected.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-01: Fix merged to kolla (master)

#13

Reviewed: https://review.openstack.org/299803
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=0bba5fe0007b281c0a2ee75e4f1c9f3950413e6f
Submitter: Jenkins
Branch: master

commit 0bba5fe0007b281c0a2ee75e4f1c9f3950413e6f
Author: Steven Dake <email address hidden>
Date: Thu Mar 31 04:04:27 2016 -0400

Workaround ansible bug related to delegate_to

    Currently the delegate_to doesnt happen and the neutron role creation is
    attempted once on the first server and is skipped. The re-ordering of hosts in
    site.yml seems to make the first host to be one inside neutron-server group
    yielding the expected results. This patch needs to be re-visited as soon as a
    version of ansible is chosen that fixes the issues with delegate_to

    Co-Authored-By: Steven Dake <email address hidden>
    Co-Authored-By: Vikram Hosakote <email address hidden>
    Co-Authored-By: Nate Potter <email address hidden>
    Co-Authored-By: Ganesh Mahalingam <email address hidden>
    Change-Id: Ia712b323aa9d750d470a11ee899ab1b3054a903f
    Partial-Bug: #1546789

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-01: Fix proposed to kolla (stable/mitaka)

#14

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/300655

Steven Dake (sdake) on 2016-04-02

Changed in kolla:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-02: Fix merged to kolla (stable/mitaka)

#15

Reviewed: https://review.openstack.org/300655
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=8dc91eafb55297e93bc9b59058e118ceda587c35
Submitter: Jenkins
Branch: stable/mitaka

commit 8dc91eafb55297e93bc9b59058e118ceda587c35
Author: Steven Dake <email address hidden>
Date: Thu Mar 31 04:04:27 2016 -0400

Workaround ansible bug related to delegate_to

    Currently the delegate_to doesnt happen and the neutron role creation is
    attempted once on the first server and is skipped. The re-ordering of hosts in
    site.yml seems to make the first host to be one inside neutron-server group
    yielding the expected results. This patch needs to be re-visited as soon as a
    version of ansible is chosen that fixes the issues with delegate_to

    Co-Authored-By: Steven Dake <email address hidden>
    Co-Authored-By: Vikram Hosakote <email address hidden>
    Co-Authored-By: Nate Potter <email address hidden>
    Co-Authored-By: Ganesh Mahalingam <email address hidden>
    Change-Id: Ia712b323aa9d750d470a11ee899ab1b3054a903f
    Partial-Bug: #1546789
    (cherry picked from commit 0bba5fe0007b281c0a2ee75e4f1c9f3950413e6f)

tags:

added: in-stable-mitaka

Revision history for this message

Christian Berendt (berendt) wrote on 2016-05-25:

#16

I hit the same bug with Glance when enabling Ceph and running Glance on different nodes than the Ceph monitor services.

After changing the order of the hosts used for the Glances tasks everything is working like expected.

TASK: [glance | Creating Glance database] *************************************
skipping: [de-1-node-1]

TASK: [glance | Reading json from variable] ***********************************
skipping: [de-1-node-2]
skipping: [de-1-node-1]
skipping: [de-1-node-3]
fatal: [de-1-controller-1] => One or more undefined variables: 'dict object' has no attribute 'stdout'
fatal: [de-1-controller-2] => One or more undefined variables: 'dict object' has no attribute 'stdout'
fatal: [de-1-controller-3] => One or more undefined variables: 'dict object' has no attribute 'stdout'

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-25: Fix proposed to kolla (master)

#17

Fix proposed to branch: master
Review: https://review.openstack.org/321241

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-26: Change abandoned on kolla (master)

#18

Change abandoned by Christian Berendt (<email address hidden>) on branch: master
Review: https://review.openstack.org/321241
Reason: issues solved with ansible >= 2

Swapnil Kulkarni (coolsvap-deactivatedaccount) on 2016-06-30

Changed in kolla:
status:	Fix Committed → Fix Released

kolla

neutron failed to deploy in a multi-node deployment

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	kolla	Fix Released	Critical	Steven Dake	kolla newton-1 "n1"
	Mitaka	Fix Released	Critical	Steven Dake	kolla mitaka-rc3 "mitaka-rc3"