ValueError: Field `instance_uuid' cannot be None

Bug #1633734 reported by Turbo Fredriksson
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Newton
In Progress
High
Matt Riedemann

Bug Description

I "accidentally" upgraded from Mitaka to Newton a few days ago and I'm still cleaning up "the mess" that introduced (to used to Debian GNU/Linux packages takes care of all that for me).

Anyway, I'm now getting

    ValueError: Field `instance_uuid' cannot be None

in the nova-api log.

I've been looking at http://docs.openstack.org/releasenotes/nova/newton.html#upgrade-notes but I'm not sure what to do.

I've run

    nova-manage db online_data_migrations
    => ERROR nova.db.sqlalchemy.api [req-c08dbccb-d841-4e38-a895-26768f24222b - - - - -] Data migrations for PciDevice are not safe, likely because not all services that access the DB directly are updated to the latest version

    nova-manage db sync
    => ERROR: could not access cell mapping database - has api db been created?

    nova-manage api_db sync
    => Seems to run ok

    nova-manage cell_v2 discover_hosts
    => error: 'module' object has no attribute 'session'

    nova-manage cell_v2 map_cell0
    => Seemed like it ran ok

    nova-manage cell_v2 simple_cell_setup --transport-url rabbit://blabla/
    => Seemed like it ran ok

    nova-manage db null_instance_uuid_scan
    => There were no records found where instance_uuid was NULL.

Other than that, I'm not sure what the problem is.

Revision history for this message
Turbo Fredriksson (turbo-bayour) wrote :

Looking through the code, it seems it's not the database that's at fault, but the fact that build_request.py:_from_db_object() is called with a 'None' value for "db_req['instance_uuid']", which means that the call by cls._get_all_from_db() in build_request.py:get_all() returns a bogus value somehow/somewhere.

Since I'm admin, I'm assuming that the line in build_request.py:_get_all_from_db()

  query = context.session.query(api_models.BuildRequest)

is the problem.. ?

Revision history for this message
Turbo Fredriksson (turbo-bayour) wrote :

Btw, this is Nova v14.0.0.

Praveen N (praveenn)
Changed in nova:
assignee: nobody → Praveen N (praveenn)
Revision history for this message
MarkMielke (mark-mielke) wrote :
Download full text (4.9 KiB)

I hit this, and upon investigation I found that it appears to be stale records in nova_api_db.build_requests.

In my case, I think these are lost "scheduling" requests?

MariaDB [nova_api]> select * from build_requests;
+---------------------+------------+----+-----------------+----------------------------------+----------------------------------+--------------------------+-------------------+----------+----------+------------+--------------------------------------+--------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------+-----------+---------------+----------+-----------------------+
| created_at | updated_at | id | request_spec_id | project_id | user_id | display_name | instance_metadata | progress | vm_state | task_state | image_ref | access_ip_v4 | access_ip_v6 | info_cache | security_groups | config_drive | key_name | locked_by | instance_uuid | instance | block_device_mappings |
+---------------------+------------+----+-----------------+----------------------------------+----------------------------------+--------------------------+-------------------+----------+----------+------------+--------------------------------------+--------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------+-----------+---------------+----------+-----------------------+
| 2016-08-10 14:54:33 | NULL | 89 | 281 | a392bb9328fc43f4ae44fe14db3520fd | 0ea46d4298a5447f8e0b552c810b10bd | bp2_prod_1 | {} | 0 | building | scheduling | b2a5a52c-4d39-459c-9001-4e49abfba593 | NULL | NULL | {"nova_object.version": "1.5", "nova_object.changes": ["instance_uuid", "network_info"], "nova_object.name": "InstanceInfoCache", "nova_object.data": {"instance_uuid": "987e1a2f-c886-4f18-b798-60ce785e1056", "network_info": "[]"}, "nova_object.namespace": "nova"} | {"nova_object.version": "1.0", "nova_object.name": "SecurityGroupList", "nova_object.data": {...

Read more...

Revision history for this message
Turbo Fredriksson (turbo-bayour) wrote :

Yeah, I have a huge amounts (123) of those as well.

*************************** 1. row ***************************
           created_at: 2016-08-02 20:26:20
           updated_at: NULL
                   id: 265
      request_spec_id: 415
           project_id: 04ee0e71babe4fd7aa16c3f64a8fca89
              user_id: 4b0e25c70d2b4ad6ba4c50250f2f0b0b
         display_name: admin-auth
    instance_metadata: {<redacted>}
             progress: 0
             vm_state: building
           task_state: scheduling
            image_ref:
         access_ip_v4: NULL
         access_ip_v6: NULL
           info_cache: {"nova_object.version": "1.5", "nova_object.changes": ["instance_uuid", "network_info"], "nova_object.name": "InstanceInfoCache", "nova_object.data": {"instan
      security_groups: {"nova_object.version": "1.0", "nova_object.name": "SecurityGroupList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"}
         config_drive: 1
             key_name: Turbo_Fredriksson
            locked_by: NULL
        instance_uuid: NULL
             instance: NULL
block_device_mappings: NULL

Revision history for this message
Turbo Fredriksson (turbo-bayour) wrote :

@MarkMiele Did you just clear the table?

Revision history for this message
Andrew Laski (alaski) wrote :

In looking at this with Melanie it appears that there is a code path in Mitaka that would allow this to happen. And it then becomes an issue in Newton because there is code that uses the build_requests table for instance list/show requests.

I'm not sure it's worth trying to address the cause of this at this point since we'll want some proactive protection against these bad entries in Newton for deployments that already have these faulty entries. But something like https://review.openstack.org/#/c/357396/8 is what would address it, the cleanup in the 'except Exception' block.

A good fix would be to have an online migration that looks for and cleans these up during upgrade, or to have Nova log these when found and then ignore them in the API response. They correspond to failed instance boots so it's okay to clean up. And there will most likely be an instance record that corresponds to the build request, and if not it's because the instance record could not be created in the db, so it's safe to clean up either way.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'd be OK with adding a newton-only online database migration, which gets run when you upgrade to newton anyway, to sniff out build requests without an instance_uuid set and delete them.

Changed in nova:
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

If I'm reading this correctly too, the problem is the build requests aren't cleaned up from failed schedules in mitaka, and in mitaka the BuildRequest object didn't have an instance_uuid field (in the object or in the data model). That was added in newton here:

https://github.com/openstack/nova/commit/d789f6eef9052a0f8ee1987e7e3ab895581f264f

So yeah if we get to newton with any build requests which don't have instance_uuid set they are going to fail to load, or be usable. So we should have a data migration in newton that deletes any build requests that don't have an instance_uuid set as those are an indication of failed schedules from mitaka and won't work with the newton code.

Revision history for this message
Matt Riedemann (mriedem) wrote :

BTW, do you have a stacktrace of where the actual "ValueError: Field `instance_uuid' cannot be None" error occurs in the compute API code in Newton?

Revision history for this message
Matt Riedemann (mriedem) wrote :

Ignore comment 9, I see it pointed out already when loading up the build request with the instance_uuid=None. I'm working on a test to recreate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/408725

Revision history for this message
Matt Riedemann (mriedem) wrote :

Hmm, I guess people could skip upgrading from mitaka to ocata, and people do try to skip releases, so we should probably fix this in master (ocata) too, and then backport to newton.

Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/408727

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/newton)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/408725
Reason: Abandon this for now, needs to be worked on master first for anyone upgrading from mitaka to ocata directly, which people try to do.

Matt Riedemann (mriedem)
Changed in nova:
assignee: Praveen N (praveenn) → nobody
Matt Riedemann (mriedem)
Changed in nova:
status: Confirmed → In Progress
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/408727
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ab05b90295c64ff7e6925591dd87a467ef85c00f
Submitter: Jenkins
Branch: master

commit ab05b90295c64ff7e6925591dd87a467ef85c00f
Author: Matt Riedemann <email address hidden>
Date: Thu Dec 8 12:09:59 2016 -0500

    Provide an online data migration to cleanup orphaned build requests

    This exhibits the failure reported in bug 1633734 when upgrading
    from mitaka to newton with some bad build request records that
    weren't cleaned up, and were created before API DB migration
    013_build_request_extended_attrs when we didn't have the
    instance_uuid or instance records in the database.

    After 013_build_request_extended_attrs and the object change to
    BuildRequest in a5d3b57c3d4fb785c5f5eebf2559e495595a6b34 if we try
    loading up a 'dirty' build request DB record without the
    instance_uuid it fails with a ValueError, as shown in the functional
    test in this change.

    This also provides an online data migration (which will be backported
    to Newton for upgrades from Mitaka) that will query the API DB for
    build requests where instance_uuid=NULL and delete them.

    Change-Id: I8a05ee01ec7f6a6f88b896f78414fb5487e0071e
    Related-Bug: #1633734

Matt Riedemann (mriedem)
Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/408725
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=608105a7c508ed48055dca6e2d327c944ecb18ea
Submitter: Jenkins
Branch: stable/newton

commit 608105a7c508ed48055dca6e2d327c944ecb18ea
Author: Matt Riedemann <email address hidden>
Date: Thu Dec 8 12:09:59 2016 -0500

    Provide an online data migration to cleanup orphaned build requests

    This exhibits the failure reported in bug 1633734 when upgrading
    from mitaka to newton with some bad build request records that
    weren't cleaned up, and were created before API DB migration
    013_build_request_extended_attrs when we didn't have the
    instance_uuid or instance records in the database.

    After 013_build_request_extended_attrs and the object change to
    BuildRequest in a5d3b57c3d4fb785c5f5eebf2559e495595a6b34 if we try
    loading up a 'dirty' build request DB record without the
    instance_uuid it fails with a ValueError, as shown in the functional
    test in this change.

    This also provides an online data migration (which will be backported
    to Newton for upgrades from Mitaka) that will query the API DB for
    build requests where instance_uuid=NULL and delete them.

    Change-Id: I8a05ee01ec7f6a6f88b896f78414fb5487e0071e
    Related-Bug: #1633734
    (cherry picked from commit ab05b90295c64ff7e6925591dd87a467ef85c00f)

tags: added: in-stable-newton
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.