The instance_faults table is too large, leading to slow query speed of command: nova list --all-tenants

Bug #1800755 reported by Sun Mengyun
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Medium
Unassigned

Bug Description

Description
===========
The execution of command: nova list --all-t, takes 50+ seconds, but the number of virtual machines is only 50.
This is because this command will call the function fill_faults() in "\nova\objects\instance.py", and this function
will query the database table: instance_faults. If the number of records in this table is too large, the performance will be very poor.

For example, in my openstack, due to many wrong operations, the record number is more than 250 thousand and query time is 50+ second.

In my opinion, as time goes on, data will become more and more, and query performance will be lower and lower. So, we need a plan to ensure that query performance is not affected by data volume.

Steps to reproduce
==================
This bug is not easy to reproduce, unless your data is large too.

Environment
===========
[root@nail1 ~]# rpm -qa | grep nova
openstack-nova-api-18.0.2-1.el7.noarch
openstack-nova-common-18.0.2-1.el7.noarch
python2-novaclient-11.0.0-1.el7.noarch
openstack-nova-placement-api-18.0.2-1.el7.noarch
openstack-nova-scheduler-18.0.2-1.el7.noarch
openstack-nova-conductor-18.0.2-1.el7.noarch
openstack-nova-novncproxy-18.0.2-1.el7.noarch
python-nova-18.0.2-1.el7.noarch
openstack-nova-compute-18.0.2-1.el7.noarch
openstack-nova-console-18.0.2-1.el7.noarch

hypervisor:
Libvirt + KVM

Sun Mengyun (kmehxhcr)
tags: added: list
Revision history for this message
Matt Riedemann (mriedem) wrote :

Sounds like bug 1632247 but that was "fixed" a couple of years ago with this change:

https://review.openstack.org/#/c/409943/

I wonder if there has been a regression?

summary: The instance_faults table is too large, leading to slow query speed of
- command: nova list --all-t
+ command: nova list --all-tenants
Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not sure why we don't provide some way to purge old faults. The API only shows the latest fault for a given instance. And we don't have any API or nova-manage CLI to list *all* faults for a given instance, so I guess they are just there in the database until the instance is deleted and archived/purged. Seems we could add a nova-manage CLI to allow purging old fault information as long as the latest fault is left intact.

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is also likely poor for performance:

https://github.com/openstack/nova/blob/f13debf2f0e5377b9d0b0bbd9422c6a79d2cc611/nova/objects/instance.py#L1259

But I'm not sure that is in the same "nova list" call path so it shouldn't be related to that issue, but could be a problem for performance if it's ever called with "fault" in expected_attrs. It would need to be audited.

Matt Riedemann (mriedem)
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Chris Friesen (cbf123) wrote :

Pretty sure that StarlingX purges instance_faults when purging instances. It's in INSTANCES_CHILD_TABLES here:

https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-8fec546e4c39f78d233f8e21dadaa3ffR88

Before we purge soft-deleted instances, we purge entries in the tables in INSTANCES_CHILD_TABLES which refer to the soft-deleted instances that are about to be purged.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.