[2.1,2.2] cloud-init/curtin http status updates cause high CPU usage
Bug #1648456 reported by
David Britton
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Blake Rouse |
Bug Description
maas 2.1+, cloud-init & curtin on xenial, lost of http posts come back for all actions. According to the maas folks, these all open a database connection for each post. Each one of these in turn goes to the web client to update the UI event log.
This combination pegs all CPUs while even a few nodes are deploying.
To repro:
use maas 2.1+, deploy 10 xenial machines. You will notice very large CPU spikes.
The slowness is the contributing factor in bug # 1604962
Related branches
lp:~blake-rouse/maas/queue-status-messages
- Andres Rodriguez (community): Approve
-
Diff: 1575 lines (+1274/-25)13 files modifiedsrc/maasserver/eventloop.py (+14/-3)
src/maasserver/models/event.py (+4/-3)
src/maasserver/models/node.py (+1/-1)
src/maasserver/models/tests/test_event.py (+10/-0)
src/maasserver/preseed.py (+1/-0)
src/maasserver/tests/test_eventloop.py (+19/-2)
src/maasserver/tests/test_plugin.py (+1/-0)
src/maasserver/tests/test_preseed.py (+3/-0)
src/maasserver/tests/test_webapp.py (+87/-4)
src/maasserver/webapp.py (+76/-10)
src/metadataserver/api.py (+2/-2)
src/metadataserver/api_twisted.py (+290/-0)
src/metadataserver/tests/test_api_twisted.py (+766/-0)
Changed in maas: | |
status: | New → Triaged |
importance: | Undecided → Critical |
milestone: | none → 2.2.0 |
summary: |
- maas2, xenial, cloud-init/curtin http status updates cause extreme - slowness in MAAS + [2.1,2.2] cloud-init/curtin http status updates cause high CPU usage |
tags: | removed: kanban-cross-team |
Changed in landscape: | |
milestone: | none → 16.11 |
Changed in landscape: | |
milestone: | 16.11 → 16.12 |
Changed in landscape: | |
status: | New → Triaged |
importance: | Undecided → Critical |
importance: | Critical → High |
Changed in landscape: | |
milestone: | 16.12 → 17.01 |
Changed in landscape: | |
milestone: | 17.01 → 17.02 |
Changed in maas: | |
status: | Triaged → In Progress |
assignee: | nobody → Blake Rouse (blake-rouse) |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Changed in landscape: | |
milestone: | 17.02 → 17.03 |
no longer affects: | landscape |
To post a comment you must log in.
This is a shot in the dark, but could you try the following to see if this improves performance:
$ sudo maas-region dbshell
maasdb=# ALTER DATABASE maasdb SET synchronous_commit TO off;
If that doesn't help, then try this:
maasdb=# ALTER DATABASE maasdb SET commit_delay TO 10000;
If you later want to return to the default settings, you can do:
maasdb=# ALTER DATABASE maasdb RESET synchronous_commit;
maasdb=# ALTER DATABASE maasdb RESET commit_delay;
Be sure not to leave the `dbshell` open. I've seen weird behavior in MAAS when this happens (maybe because it's taking up a database connection; not sure).
The other thing you can try is Django connection pooling. Open the MAAS Django settings file, such as:
$ sudo vi $(dpkg -L maas-region-api | grep settings.py$)
Then find the spot in the file that configures the database. It should look like this:
# Database access configuration. tion.open( ) as config:
'default' : {
'ENGINE' : 'django. db.backends. postgresql_ psycopg2' ,
'NAME' : config. database_ name,
'USER' : config. database_ user,
'PASSWORD' : config. database_ pass,
'HOST' : config. database_ host,
'PORT' : str(config. database_ port),
try:
with RegionConfigura
DATABASES = {
}
}
Change it to look like this:
# Database access configuration. tion.open( ) as config:
'default' : {
'ENGINE' : 'django. db.backends. postgresql_ psycopg2' ,
'NAME' : config. database_ name,
'USER' : config. database_ user,
'PASSWORD' : config. database_ pass,
'HOST' : config. database_ host,
'PORT' : str(config. database_ port),
'CONN_ MAX_AGE' : 600,
try:
with RegionConfigura
DATABASES = {
}
}
Note the addition of CONN_MAX_AGE. This enables Django connection pooling.
Disclaimer: all of the above is at your own risk. While my limited research leads me to believe it might help, I have not tested these options extensively and make no guarantees. But please let me know if it works. ;-)