maasserver_event table grows without bounds, impacting UI performance

Bug #1860619 reported by Adam Beeman on 2020-01-23
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Björn Tillenius

Bug Description

There have been a number of performance related bugs over time which are in various states, so this may duplicate some past concerns, but I could not find a currently active bug which reflects the issue in a recent MAAS version.

I have multiple MAAS regions each with several hundred servers in them. Over time, the web interface becomes increasingly slow to load the list of systems. For example, I timed refreshing the Machines tab on a server with 295 machines to load. It took 3 minutes and 45 seconds to load!
Inspection of the database shows a huge server events table:

postgres=# \c maasdb
You are now connected to database "maasdb" as user "postgres".
maasdb=# SELECT nspname || '.' || relname AS "relation",
maasdb-# pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
maasdb-# FROM pg_class C
maasdb-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
maasdb-# WHERE nspname NOT IN ('pg_catalog', 'information_schema')
maasdb-# AND C.relkind <> 'i'
maasdb-# AND nspname !~ '^pg_toast'
maasdb-# ORDER BY pg_total_relation_size(C.oid) DESC
maasdb-# LIMIT 5;
              relation | total_size
 public.maasserver_event | 24 GB
 public.metadataserver_scriptresult | 73 MB
 public.metadataserver_nodeuserdata | 27 MB
 public.maasserver_node | 1320 kB
 public.maasserver_neighbour | 1256 kB
(5 rows)


We have adopted the undesirable practice of truncating the events table periodically with:
truncate table public.maasserver_event;

... and this immediate speeds things up and makes the web interface usable again.
I think possibly our heavy use of DHCP is a contributor to the bloat of this table, because I believe that DHCP lease renewals are logged as events in this table - though I may be wrong on that.

Is there something we can do to make this more manageable? Either an event log rotation/pruning mechanism, or perhaps we don't need to log as much into this table?

$ dpkg -l '*maas*'|cat
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
un maas <none> <none> (no description available)
ii maas-cli 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS server common files
ii maas-dhcp 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS DHCP server
un maas-dns <none> <none> (no description available)
ii maas-proxy 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all Rack Controller for MAAS
ii maas-region-api 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS server provisioning libraries (Python 3)

Related branches

Adam Beeman (abeeman) wrote :

See also: though that bug says "Fix Committed", I don't believe it's addressing the problem of database bloat.

Adam Beeman (abeeman) wrote :

Correction, is the older bug.

Alberto Donato (ack) on 2020-02-28
Changed in maas:
status: New → Triaged
importance: Undecided → High
Changed in maas:
milestone: none → 2.8.0b1
Alberto Donato (ack) on 2020-04-17
Changed in maas:
milestone: 2.8.0b1 → 2.8.0b2
Alberto Donato (ack) on 2020-04-24
Changed in maas:
milestone: 2.8.0b2 → 2.8.0rc1
Alberto Donato (ack) on 2020-05-01
Changed in maas:
milestone: 2.8.0b3 → 2.8.0rc1
Changed in maas:
assignee: nobody → Björn Tillenius (bjornt)
Alberto Donato (ack) on 2020-05-11
Changed in maas:
milestone: 2.8.0b4 → 2.8.0rc1
Changed in maas:
assignee: Björn Tillenius (bjornt) → Adam Collard (adam-collard)
Changed in maas:
assignee: Adam Collard (adam-collard) → Björn Tillenius (bjornt)
status: Triaged → In Progress
Björn Tillenius (bjornt) wrote :

abeeman (and anyone else experience problems), could you please run the following SQL to shed some light on what the top events are?

Long-term, we'll probably need to cull the event table regularly, so that it can't grow too much. But short term we're going to remove logging of the power queries, which we do know cause a lot of events being logged, and aren't useful to have in the logs.

It'd be interesting to see what other events are being issued a lot. We know that installation and commissioning events may grow quite a lot, but it's harder to fix that, so it won't be don't for 2.8.

Changed in maas:
status: In Progress → Fix Committed
Alberto Donato (ack) on 2020-06-04
Changed in maas:
status: Fix Committed → Fix Released
György Szombathelyi (gyurco) wrote :

Having a _DEBUG event on the top might be easily fixable, I guess:

               name | event_count

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers