[2.1,2.2] cloud-init/curtin http status updates cause high CPU usage

Bug #1648456 reported by David Britton on 2016-12-08
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Critical
Blake Rouse

Bug Description

maas 2.1+, cloud-init & curtin on xenial, lost of http posts come back for all actions. According to the maas folks, these all open a database connection for each post. Each one of these in turn goes to the web client to update the UI event log.

This combination pegs all CPUs while even a few nodes are deploying.

To repro:

use maas 2.1+, deploy 10 xenial machines. You will notice very large CPU spikes.

The slowness is the contributing factor in bug # 1604962

Related branches

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.2.0
summary: - maas2, xenial, cloud-init/curtin http status updates cause extreme
- slowness in MAAS
+ [2.1,2.2] cloud-init/curtin http status updates cause high CPU usage
tags: removed: kanban-cross-team
David Britton (dpb) on 2016-12-14
Changed in landscape:
milestone: none → 16.11
Revision history for this message
Mike Pontillo (mpontillo) wrote :

This is a shot in the dark, but could you try the following to see if this improves performance:

    $ sudo maas-region dbshell

    maasdb=# ALTER DATABASE maasdb SET synchronous_commit TO off;

If that doesn't help, then try this:

    maasdb=# ALTER DATABASE maasdb SET commit_delay TO 10000;

If you later want to return to the default settings, you can do:

    maasdb=# ALTER DATABASE maasdb RESET synchronous_commit;
    maasdb=# ALTER DATABASE maasdb RESET commit_delay;

Be sure not to leave the `dbshell` open. I've seen weird behavior in MAAS when this happens (maybe because it's taking up a database connection; not sure).

The other thing you can try is Django connection pooling. Open the MAAS Django settings file, such as:

    $ sudo vi $(dpkg -L maas-region-api | grep settings.py$)

Then find the spot in the file that configures the database. It should look like this:

# Database access configuration.
try:
    with RegionConfiguration.open() as config:
        DATABASES = {
            'default': {
                'ENGINE': 'django.db.backends.postgresql_psycopg2',
                'NAME': config.database_name,
                'USER': config.database_user,
                'PASSWORD': config.database_pass,
                'HOST': config.database_host,
                'PORT': str(config.database_port),
            }
        }

Change it to look like this:

# Database access configuration.
try:
    with RegionConfiguration.open() as config:
        DATABASES = {
            'default': {
                'ENGINE': 'django.db.backends.postgresql_psycopg2',
                'NAME': config.database_name,
                'USER': config.database_user,
                'PASSWORD': config.database_pass,
                'HOST': config.database_host,
                'PORT': str(config.database_port),
                'CONN_MAX_AGE': 600,
            }
        }

Note the addition of CONN_MAX_AGE. This enables Django connection pooling.

Disclaimer: all of the above is at your own risk. While my limited research leads me to believe it might help, I have not tested these options extensively and make no guarantees. But please let me know if it works. ;-)

Changed in landscape:
milestone: 16.11 → 16.12
Changed in landscape:
status: New → Triaged
importance: Undecided → Critical
importance: Critical → High
Changed in landscape:
milestone: 16.12 → 17.01
Chad Smith (chad.smith) on 2017-02-10
Changed in landscape:
milestone: 17.01 → 17.02
Changed in maas:
status: Triaged → In Progress
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Chad Smith (chad.smith) on 2017-03-16
Changed in landscape:
milestone: 17.02 → 17.03
David Britton (dpb) on 2017-03-17
no longer affects: landscape
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers