nova_placement service start not coordinated with api db sync on multiple controllers

Bug #1784155 reported by Mike Bayer
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Lee Yarwood
Rocky
Fix Released
Low
Martin Schuppert
tripleo
Fix Released
High
Martin Schuppert

Bug Description

On a loaded HA / galera environment using VMs I can fairly consistently reproduce a race condition where the nova_placement service is started on controllers where the database is not yet available. The nova_placement service itself does not seem to be able to tolerate this condition upon startup and it then fails to recover. Mitigation here can either involve synchronizing these conditions or getting nova-placement to be more resilient.

The symptoms of overcloud deploy failure look like two out of three controllers having the nova_placement container in an unhealthy state:

TASK [Debug output for task which failed: Check for unhealthy containers after step 3] ***
Saturday 28 July 2018 10:19:29 +0000 (0:00:00.663) 0:30:26.152 *********
fatal: [stack2-overcloud-controller-2]: FAILED! => {
    "failed_when_result": true,
    "outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
        "3597b92e9714 192.168.25.1:8787/tripleomaster/centos-binary-nova-placement-api:959e1d7f755ee681b6f23b498d262a9e4dd6326f_4cbb1814 \"kolla_start\" 2 minutes ago Up 2 minutes (unhealthy) nova_placement"
    ]
}
fatal: [stack2-overcloud-controller-1]: FAILED! => {
    "failed_when_result": true,
    "outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
        "322c5ea53895 192.168.25.1:8787/tripleomaster/centos-binary-nova-placement-api:959e1d7f755ee681b6f23b498d262a9e4dd6326f_4cbb1814 \"kolla_start\" 2 minutes ago Up 2 minutes (unhealthy) nova_placement"
    ]
}
ok: [stack2-overcloud-controller-0] => {
    "failed_when_result": false,
    "outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": []
}
ok: [stack2-overcloud-compute-0] => {
    "failed_when_result": false,
    "outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": []
}

NO MORE HOSTS LEFT *************************************************************

inspecting placement_wsgi_error.log shows the first stack trace that the nova_placement database is missing the "traits" table:

[Sat Jul 28 10:17:06.525018 2018] [:error] [pid 14] [remote 10.1.20.15:0] mod_wsgi (pid=14): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module.
[Sat Jul 28 10:17:06.525067 2018] [:error] [pid 14] [remote 10.1.20.15:0] mod_wsgi (pid=14): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'.
[Sat Jul 28 10:17:06.525101 2018] [:error] [pid 14] [remote 10.1.20.15:0] Traceback (most recent call last):
[Sat Jul 28 10:17:06.525124 2018] [:error] [pid 14] [remote 10.1.20.15:0] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module>
[Sat Jul 28 10:17:06.525165 2018] [:error] [pid 14] [remote 10.1.20.15:0] application = init_application()
[Sat Jul 28 10:17:06.525174 2018] [:error] [pid 14] [remote 10.1.20.15:0] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 88, in init_application
[Sat Jul 28 10:17:06.525198 2018] [:error] [pid 14] [remote 10.1.20.15:0] return deploy.loadapp(conf.CONF)
[Sat Jul 28 10:17:06.525205 2018] [:error] [pid 14] [remote 10.1.20.15:0] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/deploy.py", line 111, in loadapp
[Sat Jul 28 10:17:06.525300 2018] [:error] [pid 14] [remote 10.1.20.15:0] update_database()
[Sat Jul 28 10:17:06.525310 2018] [:error] [pid 14] [remote 10.1.20.15:0] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/deploy.py", line 92, in update_database
[Sat Jul 28 10:17:06.525329 2018] [:error] [pid 14] [remote 10.1.20.15:0] resource_provider.ensure_trait_sync(ctx)
[Sat Jul 28 10:17:06.525337 2018] [:error] [pid 14] [remote 10.1.20.15:0] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 146, in ensure_trait_sync
[Sat Jul 28 10:17:06.526277 2018] [:error] [pid 14] [remote 10.1.20.15:0] _trait_sync(ctx)

...

[Sat Jul 28 10:17:06.531950 2018] [:error] [pid 14] [remote 10.1.20.15:0] raise errorclass(errno, errval)
[Sat Jul 28 10:17:06.532049 2018] [:error] [pid 14] [remote 10.1.20.15:0] ProgrammingError: (pymysql.err.ProgrammingError) (1146, u"Table 'nova_placement.traits' doesn't exist") [SQL: u'SELECT traits.name \\nFROM traits'] (Background on this error at: http://sqlalche.me/e/f405)

the log then continues with the following exception repeated every 30 seconds, which is likely the service trying to recover but it's hitting some problem with how it uses oslo.config in process:

[Sat Jul 28 10:18:36.916617 2018] [:error] [pid 14] [remote 10.1.20.15:148] mod_wsgi (pid=14): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module.
[Sat Jul 28 10:18:36.916646 2018] [:error] [pid 14] [remote 10.1.20.15:148] mod_wsgi (pid=14): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'.
[Sat Jul 28 10:18:36.916664 2018] [:error] [pid 14] [remote 10.1.20.15:148] Traceback (most recent call last):
[Sat Jul 28 10:18:36.916681 2018] [:error] [pid 14] [remote 10.1.20.15:148] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module>
[Sat Jul 28 10:18:36.916711 2018] [:error] [pid 14] [remote 10.1.20.15:148] application = init_application()
[Sat Jul 28 10:18:36.916719 2018] [:error] [pid 14] [remote 10.1.20.15:148] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 75, in init_application
[Sat Jul 28 10:18:36.916742 2018] [:error] [pid 14] [remote 10.1.20.15:148] _parse_args([], default_config_files=[conffile])
[Sat Jul 28 10:18:36.916748 2018] [:error] [pid 14] [remote 10.1.20.15:148] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 61, in _parse_args
[Sat Jul 28 10:18:36.916760 2018] [:error] [pid 14] [remote 10.1.20.15:148] logging.register_options(conf.CONF)
[Sat Jul 28 10:18:36.916766 2018] [:error] [pid 14] [remote 10.1.20.15:148] File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 250, in register_options
[Sat Jul 28 10:18:36.916779 2018] [:error] [pid 14] [remote 10.1.20.15:148] conf.register_cli_opts(_options.common_cli_opts)
[Sat Jul 28 10:18:36.916785 2018] [:error] [pid 14] [remote 10.1.20.15:148] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2468, in __inner
[Sat Jul 28 10:18:36.916807 2018] [:error] [pid 14] [remote 10.1.20.15:148] result = f(self, *args, **kwargs)
[Sat Jul 28 10:18:36.916813 2018] [:error] [pid 14] [remote 10.1.20.15:148] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2690, in register_cli_opts
[Sat Jul 28 10:18:36.916823 2018] [:error] [pid 14] [remote 10.1.20.15:148] self.register_cli_opt(opt, group, clear_cache=False)
[Sat Jul 28 10:18:36.916828 2018] [:error] [pid 14] [remote 10.1.20.15:148] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2472, in __inner
[Sat Jul 28 10:18:36.916838 2018] [:error] [pid 14] [remote 10.1.20.15:148] return f(self, *args, **kwargs)
[Sat Jul 28 10:18:36.916843 2018] [:error] [pid 14] [remote 10.1.20.15:148] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2682, in register_cli_opt
[Sat Jul 28 10:18:36.916852 2018] [:error] [pid 14] [remote 10.1.20.15:148] raise ArgsAlreadyParsedError("cannot register CLI option")
[Sat Jul 28 10:18:36.916894 2018] [:error] [pid 14] [remote 10.1.20.15:148] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option

in nova_placement.yaml we can see that the placement service is started earlier than others, in step 3:

      docker_config:
        step_2:
          get_attr: [NovaPlacementLogging, docker_config, step_2]
        # start this early so it is up before computes start reporting
        step_3:
          nova_placement:
            start_order: 1
            image: {get_param: DockerNovaPlacementImage}
            net: host
            user: root
            restart: always

however in nova-api.yaml we can see that the api DB sync is also in step_3, which although it includes start_order: 0, I'm going to guess that the "start_order" system is not coordinated across controllers:

      docker_config:
        step_2:
          get_attr: [NovaApiLogging, docker_config, step_2]
        step_3:
          nova_api_db_sync:
            start_order: 0
            image: &nova_api_image {get_param: DockerNovaApiImage}
            net: host
            detach: false
            user: root
            volumes: &nova_api_bootstrap_volumes
              list_concat:
                - {get_attr: [ContainersCommon, volumes]}
                - {get_attr: [NovaApiLogging, volumes]}
                -
                  - /var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro
                  - /var/lib/config-data/nova/etc/nova/:/etc/nova/:ro
            command: "/usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c '/usr/bin/nova-manage api_db sync'"

the placement service always starts correctly on the "bootstrap" controller since start_order is coordinating, but this does not carry over to the other two controllers.

Mike Bayer (zzzeek)
affects: nova (Ubuntu) → ubuntu
affects: ubuntu → nova
tags: added: placement
Revision history for this message
Chris Dent (cdent) wrote :

We could probably consider handling either of the two errors within placement itself, but I think it would be (much) better to simply let placement die (all the way dead, not whatever mod wsgi is doing that causes the repeating oslo config issue) and manage retrying externally.

However, it also looks like the wsgi-script support in PBR needs support for separating configuration stages and other set up from creating the WSGI "application" symbol.

Revision history for this message
Chris Dent (cdent) wrote :

Also, for additional info: for a fresh install it is now possible to use a separate database for placement by setting [placement_database]/connection, but that still has its tables created by nova-manage api db_sync.

When placement is extracted we'll do two things:

* make a new manage command for db_sync (which runs migrations)
* make a new command that instead of doing all the migrations will just create the tables via reflection

Changed in tripleo:
status: New → Triaged
milestone: none → rocky-rc1
importance: Undecided → High
Matt Riedemann (mriedem)
Changed in nova:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Mike Bayer (zzzeek) wrote :

FWIW I just changed the placement container step to step_4 and everything works great. not sure what the rationale for the step_3 aspect is.

Revision history for this message
Ye Huang (littlemiaowu) wrote :

I clicked the button incorrectly and I changed this status...I'm sorry and can I withdraw?

Changed in nova:
status: Triaged → Confirmed
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/603372

Changed in nova:
assignee: nobody → Michele Baldessari (michele)
status: Confirmed → In Progress
Revision history for this message
Michele Baldessari (michele) wrote :

Hi cdent,

we spoke in the lobby at the PTG (the review I added is what we discussed). Let me know if you want to approach it differently.

cheers,
Michele

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/604693

Changed in nova:
assignee: Michele Baldessari (michele) → Lee Yarwood (lyarwood)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/604694

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/604693
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=601aa94a2fb9b0d1a884ddecc7a6a5e1f5f8686b
Submitter: Zuul
Branch: master

commit 601aa94a2fb9b0d1a884ddecc7a6a5e1f5f8686b
Author: Lee Yarwood <email address hidden>
Date: Mon Sep 24 09:01:24 2018 +0100

    placement: Always reset conf.CONF when starting the wsgi app

    This ensures that options loaded during any prior run of the application
    are dropped before being added again during init_application.

    Change-Id: I41b5c7990d4d62a3a397f1686261f3fb7dc1a0be
    Closes-bug: #1784155
    (cherry picked from commit ac88b596c60f6c48c0e4c8e878a3ee70c4c2b756)

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michele Baldessari (<email address hidden>) on branch: master
Review: https://review.openstack.org/603372
Reason: https://review.openstack.org/#/c/604693/

Changed in tripleo:
assignee: nobody → Martin Schuppert (mschuppert)
Changed in tripleo:
milestone: stein-1 → stein-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/610034
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=00d08a3288f61268f8a823991fadf0d83c36c04d
Submitter: Zuul
Branch: master

commit 00d08a3288f61268f8a823991fadf0d83c36c04d
Author: Sean Mooney <email address hidden>
Date: Fri Oct 12 14:26:14 2018 +0100

    Harden placement init under wsgi

    - This change tries to address an edge case discovered
      when running placement under mod_wsgi where
      if the placement wsgi application is re-initialize the db_api
      configure method attempts to reconfigure a started transaction
      factory.

    - This since oslo.db transaction factories do not support
      reconfiguration at runtime this result in an exception being
      raised preventing reloading of the Placement API without
      restarting apache to force mod_wsgi to recreate the
      python interpreter.

    - This change introduces a run once decorator to allow annotating
      functions that should only be executed once for the lifetime fo
      an interpreter.

    - This change applies the run_once decorator to the db_api configure
      method, to suppress the attempt to reconfigure the current
      TransactionFactory on application reload.

    Co-Authored-By: Balazs Gibizer <email address hidden>
    Closes-Bug: #1799246
    Related-Bug: #1784155
    Change-Id: I704196711d30c1124e713ac31111a8ea6fa2f1ba

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/617297

Revision history for this message
Martin Schuppert (mschuppert) wrote :

Fix for tripleo master released with:
https://review.openstack.org/610966

Backport to rocky tracked in:
https://review.openstack.org/618958

Backport to queens tracked in:
https://review.openstack.org/618984

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.openstack.org/619586
Reason: Clearing the gate. Do not retore this until being given the all clear. See http://lists.openstack.org/pipermail/openstack-discuss/2018-November/000368.html

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/619586
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=cc61ff93ec41e149caba31cc21524f37def4d07e
Submitter: Zuul
Branch: master

commit cc61ff93ec41e149caba31cc21524f37def4d07e
Author: Martin Schuppert <email address hidden>
Date: Thu Nov 22 15:08:11 2018 +0100

    Change step to start nova placement and make compute wait for it

    There is a deployment race where nova-placement fails to start if
    the nova api db migration have not finished before starting it.
    We start nova placement early to make sure it is up before the
    nova-compute services get started. Since in HA scenario there is
    no sync in between the nodes on the current worked deployment step
    we might have the situation that the placement service gets started
    on C1/2 when the nova api db sync is not yet finished on C0.

    We have two possibilities:
    1) start placement later and verify that nova-computes recover correct
    2) verify that db migration on nova_api db finished before start nova-
    placement on the controllers

    2) which was addressed via https://review.openstack.org/610966 showed
    problems:
    a) the docker/podman container failed to start with some file not found
    error, therefore this was reverted in https://review.openstack.org/619607

    b) when the scrip were running on different controllers at the same
    time, the way how nova's db_version() is implemented has issues, which
    is being worked on in https://review.openstack.org/619622

    This patch addresses 1) and moves placement service start to step_4
    and adds an additional task on the computes to wait until the placement
    service is up.

    Closes-Bug: #1784155

    Change-Id: Ifb5ffc4b25f5ca266560bc0ac96c73071ebd1c9f

Revision history for this message
Mike Bayer (zzzeek) wrote :

in the end everything was done, the conf.CONF behavior was made resilient, the placement service was added to step 4 as was my workaround, AND computes now wait for it to all happen. great effort by everyone!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/621501

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/621619

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/622197

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/621619
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a99820a80cd99690c096553dcdc5e7f7472c02f1
Submitter: Zuul
Branch: master

commit a99820a80cd99690c096553dcdc5e7f7472c02f1
Author: Martin Schuppert <email address hidden>
Date: Mon Dec 3 17:04:09 2018 +0100

    nova_compute fails to start in tls-everywhere configuration

    With tls-everywhere enabled connecting to keystone endpoint fails
    to retrieve the URL for the placement endpoint as the certificate
    can not be verified. While verification is disabled to check the
    placement endpoint later, it is not to communicate with keystone.
    This disables certificate verification for communication with
    keystone.

    Related-Bug: 1784155

    Change-Id: I317dd62f3a555f375d540a63c21a6fb38d37ca96

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/623227

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/623228

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/rocky)

Reviewed: https://review.openstack.org/621501
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=3363bcbf8d7a4c16ec673361adf82c76213e1cbd
Submitter: Zuul
Branch: stable/rocky

commit 3363bcbf8d7a4c16ec673361adf82c76213e1cbd
Author: Martin Schuppert <email address hidden>
Date: Thu Nov 22 15:08:11 2018 +0100

    Change step to start nova placement and make compute wait for it

    There is a deployment race where nova-placement fails to start if
    the nova api db migration have not finished before starting it.
    We start nova placement early to make sure it is up before the
    nova-compute services get started. Since in HA scenario there is
    no sync in between the nodes on the current worked deployment step
    we might have the situation that the placement service gets started
    on C1/2 when the nova api db sync is not yet finished on C0.

    We have two possibilities:
    1) start placement later and verify that nova-computes recover correct
    2) verify that db migration on nova_api db finished before start nova-
    placement on the controllers

    2) which was addressed via https://review.openstack.org/610966 showed
    problems:
    a) the docker/podman container failed to start with some file not found
    error, therefore this was reverted in https://review.openstack.org/619607

    b) when the scrip were running on different controllers at the same
    time, the way how nova's db_version() is implemented has issues, which
    is being worked on in https://review.openstack.org/619622

    This patch addresses 1) and moves placement service start to step_4
    and adds an additional task on the computes to wait until the placement
    service is up.

    Closes-Bug: #1784155

    Change-Id: Ifb5ffc4b25f5ca266560bc0ac96c73071ebd1c9f
    (cherry picked from commit cc61ff93ec41e149caba31cc21524f37def4d07e)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/rocky)

Reviewed: https://review.openstack.org/622197
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=14af0677c87a64dd7ad96fad41d74da1db3d8e07
Submitter: Zuul
Branch: stable/rocky

commit 14af0677c87a64dd7ad96fad41d74da1db3d8e07
Author: Martin Schuppert <email address hidden>
Date: Mon Dec 3 17:04:09 2018 +0100

    nova_compute fails to start in tls-everywhere configuration

    With tls-everywhere enabled connecting to keystone endpoint fails
    to retrieve the URL for the placement endpoint as the certificate
    can not be verified. While verification is disabled to check the
    placement endpoint later, it is not to communicate with keystone.
    This disables certificate verification for communication with
    keystone.

    Related-Bug: 1784155

    Change-Id: I317dd62f3a555f375d540a63c21a6fb38d37ca96
    (cherry picked from commit a99820a80cd99690c096553dcdc5e7f7472c02f1)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/604694
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=74f1db72837fbbddd239680f4c85ae879e2837dd
Submitter: Zuul
Branch: stable/rocky

commit 74f1db72837fbbddd239680f4c85ae879e2837dd
Author: Lee Yarwood <email address hidden>
Date: Mon Sep 24 09:01:24 2018 +0100

    placement: Always reset conf.CONF when starting the wsgi app

    This ensures that options loaded during any prior run of the application
    are dropped before being added again during init_application.

    Change-Id: I41b5c7990d4d62a3a397f1686261f3fb7dc1a0be
    Closes-bug: #1784155
    (cherry picked from commit ac88b596c60f6c48c0e4c8e878a3ee70c4c2b756)
    (cherry picked from commit 601aa94a2fb9b0d1a884ddecc7a6a5e1f5f8686b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/617297
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=74ed2fe69f15d86252b1c7779c916cb93ffc252d
Submitter: Zuul
Branch: stable/rocky

commit 74ed2fe69f15d86252b1c7779c916cb93ffc252d
Author: Sean Mooney <email address hidden>
Date: Fri Oct 12 14:26:14 2018 +0100

    Harden placement init under wsgi

    - This change tries to address an edge case discovered
      when running placement under mod_wsgi where
      if the placement wsgi application is re-initialize the db_api
      configure method attempts to reconfigure a started transaction
      factory.

    - This since oslo.db transaction factories do not support
      reconfiguration at runtime this result in an exception being
      raised preventing reloading of the Placement API without
      restarting apache to force mod_wsgi to recreate the
      python interpreter.

    - This change introduces a run once decorator to allow annotating
      functions that should only be executed once for the lifetime fo
      an interpreter.

    - This change applies the run_once decorator to the db_api configure
      method, to suppress the attempt to reconfigure the current
      TransactionFactory on application reload.

    NOTE(lyarwood): Simple positional conflict due to
    Ie7bf5d012e2ccbcd63c262ddaf739782afcdaf56 and
    I81d13418d75b46fbdb9f6d44889a207528c8d6de not being present in
    stable/rocky.

    Conflicts:
            nova/utils.py

    Co-Authored-By: Balazs Gibizer <email address hidden>
    Closes-Bug: #1799246
    Related-Bug: #1784155
    Change-Id: I704196711d30c1124e713ac31111a8ea6fa2f1ba
    (cherry picked from commit 00d08a3288f61268f8a823991fadf0d83c36c04d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.1.0

This issue was fixed in the openstack/nova 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/623227
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f88dfe5a2dc544bc1a46b8606aa729565bd7d17f
Submitter: Zuul
Branch: stable/queens

commit f88dfe5a2dc544bc1a46b8606aa729565bd7d17f
Author: Martin Schuppert <email address hidden>
Date: Thu Nov 22 15:08:11 2018 +0100

    Change step to start nova placement and make compute wait for it

    There is a deployment race where nova-placement fails to start if
    the nova api db migration have not finished before starting it.
    We start nova placement early to make sure it is up before the
    nova-compute services get started. Since in HA scenario there is
    no sync in between the nodes on the current worked deployment step
    we might have the situation that the placement service gets started
    on C1/2 when the nova api db sync is not yet finished on C0.

    We have two possibilities:
    1) start placement later and verify that nova-computes recover correct
    2) verify that db migration on nova_api db finished before start nova-
    placement on the controllers

    2) which was addressed via https://review.openstack.org/610966 showed
    problems:
    a) the docker/podman container failed to start with some file not found
    error, therefore this was reverted in https://review.openstack.org/619607

    b) when the scrip were running on different controllers at the same
    time, the way how nova's db_version() is implemented has issues, which
    is being worked on in https://review.openstack.org/619622

    This patch addresses 1) and moves placement service start to step_4
    and adds an additional task on the computes to wait until the placement
    service is up.

    Closes-Bug: #1784155

    Change-Id: Ifb5ffc4b25f5ca266560bc0ac96c73071ebd1c9f
    (cherry picked from commit cc61ff93ec41e149caba31cc21524f37def4d07e)
    (cherry picked from commit 3363bcbf8d7a4c16ec673361adf82c76213e1cbd)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/623228
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=605d58b1162cf2c3ca379da0908b614764cf379c
Submitter: Zuul
Branch: stable/queens

commit 605d58b1162cf2c3ca379da0908b614764cf379c
Author: Martin Schuppert <email address hidden>
Date: Mon Dec 3 17:04:09 2018 +0100

    nova_compute fails to start in tls-everywhere configuration

    With tls-everywhere enabled connecting to keystone endpoint fails
    to retrieve the URL for the placement endpoint as the certificate
    can not be verified. While verification is disabled to check the
    placement endpoint later, it is not to communicate with keystone.
    This disables certificate verification for communication with
    keystone.

    Related-Bug: 1784155

    Change-Id: I317dd62f3a555f375d540a63c21a6fb38d37ca96
    (cherry picked from commit a99820a80cd99690c096553dcdc5e7f7472c02f1)
    (cherry picked from commit 14af0677c87a64dd7ad96fad41d74da1db3d8e07)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.2.0

This issue was fixed in the openstack/tripleo-heat-templates 9.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.3.0

This issue was fixed in the openstack/tripleo-heat-templates 10.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.3.0

This issue was fixed in the openstack/tripleo-heat-templates 8.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.