Error running nova-manage cell_v2 simple_cell_setup when configuring nova with puppet-nova

Bug #1656276 reported by Alfredo Moralejo on 2017-01-13
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Sylvain Bauza
Packstack
Undecided
Unassigned
puppet-nova
Undecided
Alex Schultz
tripleo
Critical
Unassigned

Bug Description

When installing and configuring nova with puppet-nova (with either tripleo, packstack or puppet-openstack-integration), we are getting following errors:

Debug: Executing: '/usr/bin/nova-manage cell_v2 simple_cell_setup --transport-url=rabbit://guest:guest@172.19.2.159:5672/?ssl=0'
Debug: /Stage[main]/Nova::Db::Sync_cell_v2/Exec[nova-cell_v2-simple-cell-setup]/returns: Sleeping for 5 seconds between tries
Notice: /Stage[main]/Nova::Db::Sync_cell_v2/Exec[nova-cell_v2-simple-cell-setup]/returns: Cell0 is already setup.
Notice: /Stage[main]/Nova::Db::Sync_cell_v2/Exec[nova-cell_v2-simple-cell-setup]/returns: No hosts found to map to cell, exiting.

The issue seems to be that it's running "nova-manage cell_v2 simple_cell_setup" as part of the nova database initialization when no compute nodes have been created but it returns 1 in that case [1]. However, note that the previous steps (Cell0 mapping and schema migration) were successfully run.

I think for nova bootstrap a reasonable orchestrated workflow would be:

1. Create required databases (including the one for cell0).
2. Nova db sync
3. nova cell0 mapping and schema creation.
4. Adding compute nodes
5. mapping compute nodes (by running nova-manage cell_v2 discover_hosts)

For step 3 we'd need to get simple_cell_setup to return 0 when not having compute nodes, or having a different command.

With current behavior of nova-manage the only working workflow we can do is:

1. Create required databases (including the one for cell0).
2. Nova db sync
3. Adding all compute nodes
4. nova cell0 mapping and schema creation with "nova-manage cell_v2 simple_cell_setup".

Am I right?, Is there any better alternative?

[1] https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L1112-L1114

Changed in tripleo:
importance: Undecided → Critical
milestone: none → ocata-3
status: New → Triaged
tags: added: alert ci
tags: added: promotion-blocker
Emilien Macchi (emilienm) wrote :

Everything was working a few hours ago, I think it could be related to https://review.openstack.org/#/c/409890

Alfredo Moralejo (amoralej) wrote :

IIUC, review https://review.openstack.org/#/c/409890 has just uncovered an issue in how we were adding cells as we were mapping cell0, creating schema but not adding the nodes using either "nova-manage cell_v2 discover_hosts" or simple_cell_setup again after adding compute nodes.

Alfredo Moralejo (amoralej) wrote :

As per discussion in #openstack-nova:

1. Compute nodes MUST be registered before running "nova-manage cell_v2 simple_cell_setup" in controller.

2. The right workflow to deploy nova would be:

  1. Create required databases (including the one for cell0).
  2. Nova db sync
  3. Adding all compute nodes
  4. nova cell configuration with "nova-manage cell_v2 simple_cell_setup".

3. Expected result should be similar to:

MariaDB [(none)]> select * from nova_api.cell_mappings;
+---------------------+------------+----+--------------------------------------+-------+--------------------------------------------------+--------------------------------------------------------------------------+
| created_at | updated_at | id | uuid | name | transport_url | database_connection |
+---------------------+------------+----+--------------------------------------+-------+--------------------------------------------------+--------------------------------------------------------------------------+
| 2017-01-13 13:56:15 | NULL | 1 | 00000000-0000-0000-0000-000000000000 | cell0 | none:/// | mysql+pymysql://nova_api:28a3029ff673411a@192.168.121.230/nova_api_cell0 |
| 2017-01-13 14:09:37 | NULL | 3 | ead4f623-4631-4d2d-af4f-4d71c048c66c | NULL | rabbit://guest:guest@192.168.121.230:5672/?ssl=0 | mysql+pymysql://nova:28a3029ff673411a@192.168.121.230/nova |
+---------------------+------------+----+--------------------------------------+-------+--------------------------------------------------+--------------------------------------------------------------------------+
2 rows in set (0.00 sec)

MariaDB [(none)]>

Could someone from nova team confirm it?

melanie witt (melwitt) wrote :

Hi Alfredo,

Yes, you are correct that compute nodes must be registered before running 'nova-manage cell_v2 simple_cell_setup'. I think the reason for this is to avoid the possibility of creating orphaned cell mappings.

The command creates a cell0 mapping and single cell mapping, and maps any found hosts to the created cell.

The 'nova-manage cell_v2 discover_hosts' command maps unmapped hosts to an existing cell mapping. It requires compute nodes to be registered already.

The 'nova-manage cell_v2 map_cell_and_hosts' command creates a cell mapping and maps found hosts to the created cell. It requires compute nodes to be registered already.

Matt Riedemann (mriedem) on 2017-01-13
tags: added: cells nova-manage
Matt Riedemann (mriedem) wrote :

I think we can look at changing the simple_cell_setup command to not return 1 if there are no compute nodes in the deployment yet. And then we'd change the map_cell_and_hosts command to return 1 if there are no compute nodes because that command is specifically for that part of the setup and can't be incomplete.

I'm not entirely sure what that means for simple_cell_setup for it to be incomplete, because if that doesn't fail and people continue on, but forget (or don't know) to map the hosts to cells later, then they can't do anything. That might just be a communication/docs issue though.

If we compare simple_cell_setup to the 'nova-status upgrade check' command when it's checking cells v2, it doesn't fail if there are cell mappings but no compute nodes (no host mappings):

https://github.com/openstack/nova/blob/3ec43d81c324d8229bbb4b5db301175886a049b6/nova/cmd/status.py#L168

It prints a message, but doesn't consider it a failure. That's for the fresh install case that you've created the cell mappings but don't have any computes reporting in yet. I'm thinking we model that in simple_cell_setup also.

Matt Riedemann (mriedem) wrote :

I'm also tracking some of these open questions/issues in the wiki here:

https://wiki.openstack.org/wiki/Nova-Cells-v2

Changed in nova:
assignee: nobody → Sylvain Bauza (sylvain-bauza)
status: New → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/420079

Changed in nova:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/420063
Committed: https://git.openstack.org/cgit/openstack/puppet-nova/commit/?id=be0ac8f320ef31329a70e3d4c2dfee871872dace
Submitter: Jenkins
Branch: master

commit be0ac8f320ef31329a70e3d4c2dfee871872dace
Author: Alex Schultz <email address hidden>
Date: Fri Jan 13 16:40:52 2017 +0000

    Revert "Enable cell_v2 setup by default"

    This reverts commit 055f91446fa2e32eb2ee04f0db232bb7fc8cdba3.

    Change-Id: I063c0a062a4f391a932c60e48d3f2c0ba0e941bb
    Related-Bug: #1656276

Reviewed: https://review.openstack.org/420068
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=34f3ab689616517888422582357d0d4a38e0925d
Submitter: Jenkins
Branch: master

commit 34f3ab689616517888422582357d0d4a38e0925d
Author: Alex Schultz <email address hidden>
Date: Fri Jan 13 16:55:56 2017 +0000

    Revert "Specify cell0 db creation"

    This reverts commit 4e3b085a59e7af49d1025986fd80796be338f5fd.

    Change-Id: Id9b3610af7167572b292ba330c3f0aad660fedc4
    Related-Bug: #1656276

Reviewed: https://review.openstack.org/420109
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=3f83f39fc35d95f7bb1252af18a506a3deef8ce0
Submitter: Jenkins
Branch: master

commit 3f83f39fc35d95f7bb1252af18a506a3deef8ce0
Author: Alex Schultz <email address hidden>
Date: Fri Jan 13 17:55:14 2017 +0000

    Revert "Add cell_v2 simple_cell_setup"

    This reverts commit 4f1727990019e9bfbc1e1f265b163952257ada22.

    Change-Id: I579ae52dad47c29acf536dca435fdbfd17340391
    Related-Bug: #1656276

Alan Pevec (apevec) wrote :

Packstack fails even with puppet-nova revert http://git.openstack.org/cgit/openstack/puppet-nova/commit/?id=a41d30806d4ae7b772e2276e9b058aa61e742d38

https://ci.centos.org/job/weirdo-master-promote-packstack-scenario001/869/consoleFull
...
08:28:47 ERROR : Error appeared during Puppet run: 172.19.2.156_controller.pp
08:28:47 Error: /Stage[main]/Nova::Db::Sync_cell_v2/Exec[nova-cell_v2-simple-cell-setup]: Failed to call refresh: /usr/bin/nova-manage cell_v2 simple_cell_setup --transport-url=rabbit://guest:guest@172.19.2.156:5671/?ssl=1 returned 1 instead of one of [0]

Alan Pevec (apevec) wrote :

Packstack needs same partial revert as in puppet-openstack-integration:
https://review.openstack.org/#/c/420167/2/manifests/nova.pp

Related fix proposed to branch: master
Review: https://review.openstack.org/420337

Change abandoned by Alan Pevec (<email address hidden>) on branch: master
Review: https://review.openstack.org/420337
Reason: duplicate of Iffc70e22a762f58c3f946e27cd0064f3e33b892d

Reviewed: https://review.openstack.org/420336
Committed: https://git.openstack.org/cgit/openstack/packstack/commit/?id=0ab6d87daed904233e8ba670981e9b468f29ab9d
Submitter: Jenkins
Branch: master

commit 0ab6d87daed904233e8ba670981e9b468f29ab9d
Author: Alan Pevec <email address hidden>
Date: Sat Jan 14 13:56:29 2017 +0100

    Revert cells v2 support

    Partial revert of a64f86e3a0ea040b9d8d6a035da5e084812110ce
    matching puppet-openstack-integration workaround
     https://review.openstack.org/420167

    Related-bug: 1656276
    Change-Id: Iffc70e22a762f58c3f946e27cd0064f3e33b892d

Seems like it doesn't happen in CI jobs anymore, removing alert and promotion-blocker tags meanwhile.

tags: removed: alert promotion-blocker

Reviewed: https://review.openstack.org/332713
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f7e9f312a064d9809e2093de5b54c18a45a75322
Submitter: Jenkins
Branch: master

commit f7e9f312a064d9809e2093de5b54c18a45a75322
Author: Andrey Volkov <email address hidden>
Date: Tue Aug 16 11:25:54 2016 +0300

    Add nova-manage cell_v2 create_cell command

    Currently, all of the commands that create a new cell require the
    presence of compute hosts, else they won't create a cell mapping.
    In the use case of a fresh install, it's reasonable that compute
    host records may not yet exist at the time of cells v2 setup.

    This provides a way for operators to create a new empty cell at
    setup time and defer adding hosts to the cell until they have
    started their compute hosts later (via the 'discover_hosts'
    command).

    The command optionally accepts a database connection url and
    message queue transport url, else it will take the values from
    the nova.conf. It returns the uuid of the newly created cell.

    Change-Id: I2fd7d854ffa579e550f6002cfb7223d7f40acac6
    Related-Bug: #1656276

Changed in tripleo:
milestone: ocata-3 → ocata-rc1

Reviewed: https://review.openstack.org/420994
Committed: https://git.openstack.org/cgit/openstack/puppet-nova/commit/?id=d2597b32c80c3800d74f360401eb37a69396e0fa
Submitter: Jenkins
Branch: master

commit d2597b32c80c3800d74f360401eb37a69396e0fa
Author: Alex Schultz <email address hidden>
Date: Mon Jan 16 16:27:53 2017 -0700

    Align stars to fix puppet-ci

    1) Stop doing cell v2
    Since cell v2 setup still needs work, we need to stop doing it in the
    beaker tests for now.

    2) Fix libvirt for ubuntu
    Ubuntu updated their libvirt package to 2.5.0 which has the debian
    service and filename conventions.

    Change-Id: Ic315ae015300f536e88ae0b2a8808fcd6126ac37
    Related-Bug: #1656276
    Closes-Bug: #1657251

Fix proposed to branch: master
Review: https://review.openstack.org/422248

Changed in puppet-nova:
assignee: nobody → Oliver Walsh (owalsh)
status: New → In Progress
Changed in puppet-nova:
assignee: Oliver Walsh (owalsh) → Alex Schultz (alex-schultz)

Reviewed: https://review.openstack.org/422248
Committed: https://git.openstack.org/cgit/openstack/puppet-nova/commit/?id=dc2f3a358663c0fb97795666b16dec63cfcc3872
Submitter: Jenkins
Branch: master

commit dc2f3a358663c0fb97795666b16dec63cfcc3872
Author: Oliver Walsh <email address hidden>
Date: Wed Jan 18 21:26:43 2017 +0000

    Implement a proper cell_v2 setup

    Rather than use simple_cell_setup which expects that there are already
    existing computes, this change uses map_cell0 & create_cell to setup
    cell_v2. Once the computes are configured, the cell_v2 discover_hosts
    should be used to finalized the installation.

    In addition, the db syncs need to be reordered as the api db sync
    should run before the the cell_v2 setup. The main db sync should run
    after.

    map_cell0/simple_cell_setup now uses main nova DB connection instead
    of the api DB connection.

    Change-Id: I591b451197dc3bd0783978f5e3d2b1c830afe54e
    Closes-Bug: #1656276
    Related-Bug: #1656673
    Co-Authored-By: Alex Schultz <email address hidden>

Changed in puppet-nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/puppet-nova 10.2.0 release.

Changed in tripleo:
status: Triaged → Fix Released

Change abandoned by Sylvain Bauza (<email address hidden>) on branch: master
Review: https://review.openstack.org/420079
Reason: We now have a better consensus about leaving the simple_cell_setup call for just an easy upgrade, and rather call each nova-manage method for each step (create cell0, then a cell, then discover hosts for each cell) http://docs.openstack.org/developer/nova/cells.html#first-time-setup

Matt Riedemann (mriedem) on 2017-02-09
Changed in nova:
status: In Progress → Invalid
no longer affects: nova/newton
jethro.sun (jethro-sun7) wrote :

Hi all,

Is there a fix proposed or somehow a temp fix for this?

I bumped into it when using Rally on Packstack. It turns out a correct `database_connection` is populated but there is no `transport_url`, which is probably why I can't finish the instructions in https://docs.openstack.org/developer/nova/cells.html#cells-v2.

Changed in packstack:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers