Metadata API cross joining instance_metadata and instance_system_metadata

Bug #1799298 reported by Sergio de Carvalho on 2018-10-22
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Unassigned
Ocata
Medium
Unassigned
Pike
Medium
Unassigned
Queens
Medium
Unassigned
Rocky
Medium
Unassigned

Bug Description

Description
===========

While troubleshooting a production issue we identified that the Nova metadata API is fetching a lot more raw data from the database than seems necessary. The problem appears to be caused by the SQL query used to fetch instance data, which joins the "instance" table with, among others, two metadata tables: "instance_metadata" and "instance_system_metadata". Below is a simplified version of this query which was captured by adding extra logging (the full query is listed at the end of this bug report):

SELECT ...
  FROM (SELECT ...
          FROM `instances`
         WHERE `instances` . `deleted` = ?
           AND `instances` . `uuid` = ?
         LIMIT ?) AS `anon_1`
  LEFT OUTER JOIN `instance_system_metadata` AS `instance_system_metadata_1`
    ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` . `instance_uuid`
  LEFT OUTER JOIN (`security_group_instance_association` AS `security_group_instance_association_1`
                   INNER JOIN `security_groups` AS `security_groups_1`
                   ON `security_groups_1` . `id` = `security_group_instance_association_1` . `security_group_id`
                   AND `security_group_instance_association_1` . `deleted` = ?
                   AND `security_groups_1` . `deleted` = ? )
    ON `security_group_instance_association_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   AND `anon_1` . `instances_deleted` = ?
  LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
    ON `security_group_rules_1` . `parent_group_id` = `security_groups_1` . `id`
   AND `security_group_rules_1` . `deleted` = ?
  LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
    ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . `instances_uuid`
  LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
    ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
  LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
    ON `instance_metadata_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   AND `instance_metadata_1` . `deleted` = ?

The instance table has a 1-to-many relationship to both "instance_metadata" and "instance_system_metadata" tables, so the query is effectively producing a cross join of both metadata tables.

Steps to reproduce
==================

To illustrate the impact of this query, add 2 properties to a running instance and verify that it has 2 records in "instance_metadata", as well as other records in "instance_system_metadata" such as base image properties:

> select instance_uuid,`key`,value from instance_metadata where instance_uuid = 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
+--------------------------------------+-----------+--------+
| instance_uuid | key | value |
+--------------------------------------+-----------+--------+
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value |
+--------------------------------------+-----------+--------+
2 rows in set (0.61 sec)

> select instance_uuid,`key`,valusystem_metadata where instance_uuid = 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
+------------------------+--------------------------------------+
| key | value |
+------------------------+--------------------------------------+
| image_disk_format | qcow2 |
| image_min_ram | 0 |
| image_min_disk | 20 |
| image_base_image_ref | 39cd564f-6a29-43e2-815b-62097968486a |
| image_container_format | bare |
+------------------------+--------------------------------------+
5 rows in set (0.00 sec)

For this particular instance, the generated query used by the metadata API will fetch 10 records from the database:

+--------------------------------------+-------------------------+---------------------------+--------------------------------+--------------------------------------+
| anon_1_instances_uuid | instance_metadata_1_key | instance_metadata_1_value | instance_system_metadata_1_key | instance_system_metadata_1_value |
+--------------------------------------+-------------------------+---------------------------+--------------------------------+--------------------------------------+
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 | image_disk_format | qcow2 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value | image_disk_format | qcow2 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 | image_min_ram | 0 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value | image_min_ram | 0 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 | image_min_disk | 20 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value | image_min_disk | 20 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 | image_base_image_ref | 39cd564f-6a29-43e2-815b-62097968486a |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value | image_base_image_ref | 39cd564f-6a29-43e2-815b-62097968486a |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 | image_container_format | bare |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value | image_container_format | bare |
+--------------------------------------+-------------------------+---------------------------+--------------------------------+--------------------------------------+
10 rows in set (0.00 sec)

Of course this is only a problem when instances have a lot of metadata records. An instance with 50 rows in "instance_metadata" and 50 rows in "instance_system_metadata" will fetch 50 x 50 = 2,500 rows from the database. It's not difficult to see how this can escalate quickly. This can be a particularly significant problem in a HA scenario with multiple API nodes pulling data from multiple database nodes.

This issue is affecting clusters running OpenStack Mitaka. I verified that this is not an issue on clusters running Icehouse because, in Icehouse, instance data is pulled as needed, executing separate queries for each table. However, I as far as I could see, this issue could be affecting every release since Mitaka.

Full SQL
========

The generated SQL query below was captured by adding extra logging to _build_instance_get in

https://github.com/openstack/nova/blob/mitaka-eol/nova/db/sqlalchemy/api.py#L2005

SELECT `anon_1` . `instances_created_at` AS `anon_1_instances_created_at`,
       `anon_1` . `instances_updated_at` AS `anon_1_instances_updated_at`,
       `anon_1` . `instances_deleted_at` AS `anon_1_instances_deleted_at`,
       `anon_1` . `instances_deleted` AS `anon_1_instances_deleted`,
       `anon_1` . `instances_id` AS `anon_1_instances_id`,
       `anon_1` . `instances_user_id` AS `anon_1_instances_user_id`,
       `anon_1` . `instances_project_id` AS `anon_1_instances_project_id`,
       `anon_1` . `instances_image_ref` AS `anon_1_instances_image_ref`,
       `anon_1` . `instances_kernel_id` AS `anon_1_instances_kernel_id`,
       `anon_1` . `instances_ramdisk_id` AS `anon_1_instances_ramdisk_id`,
       `anon_1` . `instances_hostname` AS `anon_1_instances_hostname`,
       `anon_1` . `instances_launch_index` AS `anon_1_instances_launch_index`,
       `anon_1` . `instances_key_name` AS `anon_1_instances_key_name`,
       `anon_1` . `instances_key_data` AS `anon_1_instances_key_data`,
       `anon_1` . `instances_power_state` AS `anon_1_instances_power_state`,
       `anon_1` . `instances_vm_state` AS `anon_1_instances_vm_state`,
       `anon_1` . `instances_task_state` AS `anon_1_instances_task_state`,
       `anon_1` . `instances_memory_mb` AS `anon_1_instances_memory_mb`,
       `anon_1` . `instances_vcpus` AS `anon_1_instances_vcpus`,
       `anon_1` . `instances_root_gb` AS `anon_1_instances_root_gb`,
       `anon_1` . `instances_ephemeral_gb` AS `anon_1_instances_ephemeral_gb`,
       `anon_1` . `instances_ephemeral_key_uuid` AS `anon_1_instances_ephemeral_key_uuid`,
       `anon_1` . `instances_host` AS `anon_1_instances_host`,
       `anon_1` . `instances_node` AS `anon_1_instances_node`,
       `anon_1` . `instances_instance_type_id` AS `anon_1_instances_instance_type_id`,
       `anon_1` . `instances_user_data` AS `anon_1_instances_user_data`,
       `anon_1` . `instances_reservation_id` AS `anon_1_instances_reservation_id`,
       `anon_1` . `instances_launched_at` AS `anon_1_instances_launched_at`,
       `anon_1` . `instances_terminated_at` AS `anon_1_instances_terminated_at`,
       `anon_1` . `instances_availability_zone` AS `anon_1_instances_availability_zone`,
       `anon_1` . `instances_display_name` AS `anon_1_instances_display_name`,
       `anon_1` . `instances_display_description` AS `anon_1_instances_display_description`,
       `anon_1` . `instances_launched_on` AS `anon_1_instances_launched_on`,
       `anon_1` . `instances_locked` AS `anon_1_instances_locked`,
       `anon_1` . `instances_locked_by` AS `anon_1_instances_locked_by`,
       `anon_1` . `instances_os_type` AS `anon_1_instances_os_type`,
       `anon_1` . `instances_architecture` AS `anon_1_instances_architecture`,
       `anon_1` . `instances_vm_mode` AS `anon_1_instances_vm_mode`,
       `anon_1` . `instances_uuid` AS `anon_1_instances_uuid`,
       `anon_1` . `instances_root_device_name` AS `anon_1_instances_root_device_name`,
       `anon_1` . `instances_default_ephemeral_device` AS `anon_1_instances_default_ephemeral_device`,
       `anon_1` . `instances_default_swap_device` AS `anon_1_instances_default_swap_device`,
       `anon_1` . `instances_config_drive` AS `anon_1_instances_config_drive`,
       `anon_1` . `instances_access_ip_v4` AS `anon_1_instances_access_ip_v4`,
       `anon_1` . `instances_access_ip_v6` AS `anon_1_instances_access_ip_v6`,
       `anon_1` . `instances_auto_disk_config` AS `anon_1_instances_auto_disk_config`,
       `anon_1` . `instances_progress` AS `anon_1_instances_progress`,
       `anon_1` . `instances_shutdown_terminate` AS `anon_1_instances_shutdown_terminate`,
       `anon_1` . `instances_disable_terminate` AS `anon_1_instances_disable_terminate`,
       `anon_1` . `instances_cell_name` AS `anon_1_instances_cell_name`,
       `anon_1` . `instances_internal_id` AS `anon_1_instances_internal_id`,
       `anon_1` . `instances_cleaned` AS `anon_1_instances_cleaned`,
       `instance_system_metadata_1` . `created_at` AS `instance_system_metadata_1_created_at`,
       `instance_system_metadata_1` . `updated_at` AS `instance_system_metadata_1_updated_at`,
       `instance_system_metadata_1` . `deleted_at` AS `instance_system_metadata_1_deleted_at`,
       `instance_system_metadata_1` . `deleted` AS `instance_system_metadata_1_deleted`,
       `instance_system_metadata_1` . `id` AS `instance_system_metadata_1_id`,
       `instance_system_metadata_1` . `key` AS `instance_system_metadata_1_key`,
       `instance_system_metadata_1` . `value` AS `instance_system_metadata_1_value`,
       `instance_system_metadata_1` . `instance_uuid` AS `instance_system_metadata_1_instance_uuid`,
       `security_groups_1` . `created_at` AS `security_groups_1_created_at`,
       `security_groups_1` . `updated_at` AS `security_groups_1_updated_at`,
       `security_groups_1` . `deleted_at` AS `security_groups_1_deleted_at`,
       `security_groups_1` . `deleted` AS `security_groups_1_deleted`,
       `security_groups_1` . `id` AS `security_groups_1_id`,
       `security_groups_1` . `name` AS `security_groups_1_name`,
       `security_groups_1` . `description` AS `security_groups_1_description`,
       `security_groups_1` . `user_id` AS `security_groups_1_user_id`,
       `security_groups_1` . `project_id` AS `security_groups_1_project_id`,
       `security_group_rules_1` . `created_at` AS `security_group_rules_1_created_at`,
       `security_group_rules_1` . `updated_at` AS `security_group_rules_1_updated_at`,
       `security_group_rules_1` . `deleted_at` AS `security_group_rules_1_deleted_at`,
       `security_group_rules_1` . `deleted` AS `security_group_rules_1_deleted`,
       `security_group_rules_1` . `id` AS `security_group_rules_1_id`,
       `security_group_rules_1` . `parent_group_id` AS `security_group_rules_1_parent_group_id`,
       `security_group_rules_1` . `protocol` AS `security_group_rules_1_protocol`,
       `security_group_rules_1` . `from_port` AS `security_group_rules_1_from_port`,
       `security_group_rules_1` . `to_port` AS `security_group_rules_1_to_port`,
       `security_group_rules_1` . `cidr` AS `security_group_rules_1_cidr`,
       `security_group_rules_1` . `group_id` AS `security_group_rules_1_group_id`,
       `instance_info_caches_1` . `created_at` AS `instance_info_caches_1_created_at`,
       `instance_info_caches_1` . `updated_at` AS `instance_info_caches_1_updated_at`,
       `instance_info_caches_1` . `deleted_at` AS `instance_info_caches_1_deleted_at`,
       `instance_info_caches_1` . `deleted` AS `instance_info_caches_1_deleted`,
       `instance_info_caches_1` . `id` AS `instance_info_caches_1_id`,
       `instance_info_caches_1` . `network_info` AS `instance_info_caches_1_network_info`,
       `instance_info_caches_1` . `instance_uuid` AS `instance_info_caches_1_instance_uuid`,
       `instance_extra_1` . `flavor` AS `instance_extra_1_flavor`,
       `instance_extra_1` . `created_at` AS `instance_extra_1_created_at`,
       `instance_extra_1` . `updated_at` AS `instance_extra_1_updated_at`,
       `instance_extra_1` . `deleted_at` AS `instance_extra_1_deleted_at`,
       `instance_extra_1` . `deleted` AS `instance_extra_1_deleted`,
       `instance_extra_1` . `id` AS `instance_extra_1_id`,
       `instance_extra_1` . `instance_uuid` AS `instance_extra_1_instance_uuid`,
       `instance_metadata_1` . `created_at` AS `instance_metadata_1_created_at`,
       `instance_metadata_1` . `updated_at` AS `instance_metadata_1_updated_at`,
       `instance_metadata_1` . `deleted_at` AS `instance_metadata_1_deleted_at`,
       `instance_metadata_1` . `deleted` AS `instance_metadata_1_deleted`,
       `instance_metadata_1` . `id` AS `instance_metadata_1_id`,
       `instance_metadata_1` . `key` AS `instance_metadata_1_key`,
       `instance_metadata_1` . `value` AS `instance_metadata_1_value`,
       `instance_metadata_1` . `instance_uuid` AS `instance_metadata_1_instance_uuid`
  FROM (SELECT `instances` . `created_at` AS `instances_created_at`,
               `instances` . `updated_at` AS `instances_updated_at`,
               `instances` . `deleted_at` AS `instances_deleted_at`,
               `instances` . `deleted` AS `instances_deleted`,
               `instances` . `id` AS `instances_id`,
               `instances` . `user_id` AS `instances_user_id`,
               `instances` . `project_id` AS `instances_project_id`,
               `instances` . `image_ref` AS `instances_image_ref`,
               `instances` . `kernel_id` AS `instances_kernel_id`,
               `instances` . `ramdisk_id` AS `instances_ramdisk_id`,
               `instances` . `hostname` AS `instances_hostname`,
               `instances` . `launch_index` AS `instances_launch_index`,
               `instances` . `key_name` AS `instances_key_name`,
               `instances` . `key_data` AS `instances_key_data`,
               `instances` . `power_state` AS `instances_power_state`,
               `instances` . `vm_state` AS `instances_vm_state`,
               `instances` . `task_state` AS `instances_task_state`,
               `instances` . `memory_mb` AS `instances_memory_mb`,
               `instances` . `vcpus` AS `instances_vcpus`,
               `instances` . `root_gb` AS `instances_root_gb`,
               `instances` . `ephemeral_gb` AS `instances_ephemeral_gb`,
               `instances` . `ephemeral_key_uuid` AS `instances_ephemeral_key_uuid`,
               `instances` . `host` AS `instances_host`,
               `instances` . `node` AS `instances_node`,
               `instances` . `instance_type_id` AS `instances_instance_type_id`,
               `instances` . `user_data` AS `instances_user_data`,
               `instances` . `reservation_id` AS `instances_reservation_id`,
               `instances` . `launched_at` AS `instances_launched_at`,
               `instances` . `terminated_at` AS `instances_terminated_at`,
               `instances` . `availability_zone` AS `instances_availability_zone`,
               `instances` . `display_name` AS `instances_display_name`,
               `instances` . `display_description` AS `instances_display_description`,
               `instances` . `launched_on` AS `instances_launched_on`,
               `instances` . `locked` AS `instances_locked`,
               `instances` . `locked_by` AS `instances_locked_by`,
               `instances` . `os_type` AS `instances_os_type`,
               `instances` . `architecture` AS `instances_architecture`,
               `instances` . `vm_mode` AS `instances_vm_mode`,
               `instances` . `uuid` AS `instances_uuid`,
               `instances` . `root_device_name` AS `instances_root_device_name`,
               `instances` . `default_ephemeral_device` AS `instances_default_ephemeral_device`,
               `instances` . `default_swap_device` AS `instances_default_swap_device`,
               `instances` . `config_drive` AS `instances_config_drive`,
               `instances` . `access_ip_v4` AS `instances_access_ip_v4`,
               `instances` . `access_ip_v6` AS `instances_access_ip_v6`,
               `instances` . `auto_disk_config` AS `instances_auto_disk_config`,
               `instances` . `progress` AS `instances_progress`,
               `instances` . `shutdown_terminate` AS `instances_shutdown_terminate`,
               `instances` . `disable_terminate` AS `instances_disable_terminate`,
               `instances` . `cell_name` AS `instances_cell_name`,
               `instances` . `internal_id` AS `instances_internal_id`,
               `instances` . `cleaned` AS `instances_cleaned`
          FROM `instances`
         WHERE `instances` . `deleted` = ?
           AND `instances` . `uuid` = ?
         LIMIT ?) AS `anon_1`
  LEFT OUTER JOIN `instance_system_metadata` AS `instance_system_metadata_1`
    ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` . `instance_uuid`
  LEFT OUTER JOIN (`security_group_instance_association` AS `security_group_instance_association_1`
                   INNER JOIN `security_groups` AS `security_groups_1`
                   ON `security_groups_1` . `id` = `security_group_instance_association_1` . `security_group_id`
                   AND `security_group_instance_association_1` . `deleted` = ?
                   AND `security_groups_1` . `deleted` = ? )
    ON `security_group_instance_association_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   AND `anon_1` . `instances_deleted` = ?
  LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
    ON `security_group_rules_1` . `parent_group_id` = `security_groups_1` . `id`
   AND `security_group_rules_1` . `deleted` = ?
  LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
    ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . `instances_uuid`
  LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
    ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
  LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
    ON `instance_metadata_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   AND `instance_metadata_1` . `deleted` = ?

Environment
===========

1. Exact version of OpenStack you are running.

OpenStack Mitaka

openstack-nova-api.noarch 1:13.1.4-1.el7

Code suggests this could be affecting every release since Mitaka.

2. Which hypervisor did you use?

qemu-kvm.x86_64 10:1.5.3-141.el7_4.6
qemu-kvm-common.x86_64 10:1.5.3-141.el7_4.6

Reference
=========

Discussion on the OpenStack dev mailing list:

http://lists.openstack.org/pipermail/openstack-dev/2018-October/135945.html

Matt Riedemann (mriedem) on 2018-10-22
tags: added: api db metadata performance
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Matt Riedemann (mriedem) wrote :

I think we can make a couple of simple enhancements by:

1. only join on security_groups if using nova-network

2. only join on system_metadata if using dynamic vendordata providers

Backporting the system_metadata change would be tricky though since we did use system_metadata to perform an online data migration for at least the instance.flavor information. Having said that, we removed that system_metadata <> flavor migration code in Rocky:

https://review.openstack.org/#/c/508357/

And as that commit message says, you couldn't upgrade to Liberty if you hadn't migrated the flavor data:

https://review.openstack.org/#/c/174480/

So maybe it's OK to at least backport this metadata-api join improvement to at least Pike.

Fix proposed to branch: master
Review: https://review.openstack.org/624778

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/624778
Reason: The join in security_groups should be trivial since the table should be empty in neutron cases, and when getting base metadata we also return the password which is stored in instance.system_metadata, so pre-joining instance_system_metadata is actually something we should be doing when we pull the instance from the cell database - it's used more than just vendordata v2 requests, which makes this change essentially not worthwhile.

We either need to optimize how we do the system_metadata joins in the sqlalchemy API code, or we need to reconsider how we store the instance password (move it out of system_metadata).

Matt Riedemann (mriedem) on 2019-02-24
Changed in nova:
status: In Progress → Confirmed
assignee: Matt Riedemann (mriedem) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers