N->O upgrade fail on controller upgrade step 5.

Bug #1667147 reported by Sofer Athlan-Guyot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :
Download full text (6.3 KiB)

The actual error is this:

    Feb 22 14:55:37.368551
    centos-7-2-node-osic-cloud1-s3700-7418594-436867
    os-collect-config[61863]: [2017-02-22 14:55:37,367] (heat-config)
    [INFO] {"deploy_stdout": "\nPLAY [localhost]
    ***************************************************************\n\nTASK
    [setup]
    *******************************************************************\nok:
    [localhost]\n\nTASK [Sync sahara_engine DB]
    ***************************************************\nchanged:
    [localhost]\n\nTASK [get bootstrap nodeid]
    ****************************************************\nchanged:
    [localhost]\n\nTASK [set is_bootstrap_node fact]
    **********************************************\nok:
    [localhost]\n\nTASK [Create puppet manifest to set transport_url
    in nova.conf] ****************\nchanged: [localhost]\n\nTASK [Run
    puppet apply to set tranport_url in nova.conf]
    ***********************\nchanged: [localhost]\n\nTASK [Setup
    cell_v2 (map cell0)]
    ***********************************************\nfatal:
    [localhost]: FAILED! => {\"changed\": true, \"cmd\":
    [\"nova-manage\", \"cell_v2\", \"map_cell0\"], \"delta\":
    \"0:00:03.413361\", \"end\": \"2017-02-22 14:55:37.321137\",
    \"failed\": true, \"rc\": 1, \"start\": \"2017-02-22
    14:55:33.907776\", \"stderr\": \"\", \"stdout\": \"An error has
    occurred:\\nTraceback (most recent call last):\\n File
    \\\"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\\\", line
    1598, in main\\n ret = fn(*fn_args, **fn_kwargs)\\n File
    \\\"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\\\", line
    1140, in map_cell0\\n
    self._map_cell0(database_connection=database_connection)\\n File
    \\\"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\\\", line
    1170, in _map_cell0\\n cell_mapping.create()\\n File
    \\\"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\\\",
    line 226, in wrapper\\n return fn(self, *args, **kwargs)\\n File
    \\\"/usr/lib/python2.7/site-packages/nova/objects/cell_mapping.py\\\",
    line 71, in create\\n db_mapping =
    self._create_in_db(self._context, self.obj_get_changes())\\n File
    \\\"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\\\",
    line 893, in wrapper\\n with sel

    Feb 22 14:55:37.370010
    centos-7-2-node-osic-cloud1-s3700-7418594-436867
    os-collect-config[61863]: f._transaction_scope(context):\\n File
    \\\"/usr/lib64/python2.7/contextlib.py\\\", line 17, in
    __enter__\\n return self.gen.next()\\n File
    \\\"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\\\",
    line 944, in _transaction_scope\\n allow_async=self._allow_async)
    as resource:\\n File \\\"/usr/lib64/python2.7/contextlib.py\\\",
    line 17, in __enter__\\n return self.gen.next()\\n File
    \\\"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\\\",
    line 558, in _session\\n bind=self.connection, mode=self.mode)\\n
    File
    \\\"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\\\",
    line 317, in _create_session\\n self._start()\\n File
    \\\"/usr/lib/python2.7/site-packages/oslo_db/sqla...

Read more...

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :
Download full text (3.4 KiB)

I am not sure why me and mcornea are not hitting this in our downstream envs. For my latest run yesterday I was patching like [0] which includes the nova patch at https://review.openstack.org/#/c/405241/ so it was definitely applied (it would have nuked the file on my run too wrt alex question in comment #2 above or its some kind of race :/ ??)

I have reset to latest puddle today so will be running it again assuming other things don't break before then ;)

[0]
    # NOTE: these patches were applied to openstack-tripleo-heat-templates-6.0.0-0.20170214010958.el7ost.noarch
    #backup templates incase you want to do a diff/sanitycheck:
    sudo cp -r /usr/share/openstack-tripleo-heat-templates /usr/share/openstack-tripleo-heat-templates.ORIG

    # https://review.openstack.org/#/c/431398/1 Apply post-upgrade step to not run puppet in post upgrade
    curl https://review.openstack.org/changes/431398/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # https://review.openstack.org/#/c/433641/ Apply puppet in non-controller script in step.
    curl https://review.openstack.org/changes/433641/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # https://review.openstack.org/#/c/434305/ Add explicit swift check to tripleo_upgrade_node.sh
    curl https://review.openstack.org/changes/434305/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # https://review.openstack.org/#/c/434367/ Add manual ceph-osd upgrade if operator prefers this
    curl https://review.openstack.org/changes/434367/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # https://review.openstack.org/#/c/435339 Install nova-placement package on upgrade
    curl https://review.openstack.org/changes/435339/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # Add major-upgrade-converge environment. https://review.openstack.org/#/c/434468/
    curl https://review.openstack.org/changes/434468/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # https://review.openstack.org/#/c/405241/ Add nova service support for composable upgrades
    curl https://review.openstack.org/changes/405241/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # Add Newton to Ocata UpgradeInitCommonCommand https://review.openstack.org/#/c/424715/
    curl https://review.openstack.org/changes/424715/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # https://review.openstack.org/#/c/436150/ Stop nova-api before upgrading package
    curl https://review.openstack.org/changes/436150/revisions/current/patch?download | \
        base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1

    # https://review.op...

Read more...

Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

@Alex, I tried the snippet you mentioned and this one https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/nova-compute.yaml#L161-L165 and they didn't nuke the nova.conf. For the puppet part, it cannot be a purge => true as the nova.conf has the compute_level set. It's as if the nova.conf was empty to start with and then filled up by the two changes done to it during the upgrade.

@Marios, would be interesting to see why you and marius don't have this error.

Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

Hi,

if you look into another jobs without the patch to include nova during the upgrade, then you can see that nova is not installed on the overcloud:

   http://logs.openstack.org/01/414601/33/check/gate-tripleo-ci-centos-7-multinode-upgrades-nv/10c90a0/logs/subnode-2/var/log/

there is no nova directory.

So the problem is that the upgrade job doesn't deploy nova and that explain why we have this empty configuration file.

Thanks to Emilien for the pointer.

Revision history for this message
Marios Andreou (marios-b) wrote :

so fwiw I just confirmed on my fresh deployment today (reset to vanila newton this morning) it passed the nova upgrade step 5 OK, looks like http://paste.openstack.org/raw/600243/ while it lasts (playbook recap for step5 with Feb 23 15:15:10 overcloud-controller-0.localdomain os-collect-config[4595]: localhost : ok=13 changed=11 unreachable=0 failed=0 )

Revision history for this message
Emilien Macchi (emilienm) wrote :
Changed in tripleo:
status: New → Fix Released
milestone: none → ocata-rc2
importance: Undecided → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.