MAAS

[2.0rc2] RackController.get_image_sync_status causes huge load on regiond process

Bug #1604465 reported by Blake Rouse on 2016-07-19

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Fix Released	Critical	Blake Rouse
	2.0	Fix Released	Critical	Blake Rouse	MAAS 2.0.0

Bug Description

The call in the handlers/controller.py to get the image sync status causes huge load on the regiond process. The call stack does like this:

ControllerHandler.check_images
node.get_image_sync_status()
BootResource.objects.boot_images_are_in_sync(boot_images) <- This is where it gets slow.

The reason it is slow is the way MAAS calculates the size of the each largefile in the database. It does this by opening each largeobject seeking to the end and getting the size. This was fine in 1.9 when the cluster page was not polled like it is today in MAAS 2.0.

    @property
    def size(self):
        """Size of content."""
        with self.content.open('rb') as stream:
            stream.seek(0, os.SEEK_END)
            size = stream.tell()
        return size

This needs to be converted to a field in the database holding the current size of the largefile instead of calculating in from the largeobject. This will speed up the BootResource.objects.boot_images_are_in_sync method. Also a more advance SQL query could be used to make less calls to check if the images are in sync.

This is currently affecting OIL, to alleviate the issue the following was performed:

class RackController(Controller):

...blah...

   def get_image_sync_status(self, boot_images=None):
       return 'unknown'
       ... the original function...

Tags:

Related branches

lp:~blake-rouse/maas/fix-1604465

Merged into lp:~maas-committers/maas/trunk at revision 5195

Gavin Panella (community): Approve on 2016-07-21

lp:~blake-rouse/maas/fix-1604465-2.0

Merged into lp:maas/2.0 at revision 5170

Blake Rouse (community): Approve on 2016-07-21

lp:~andreserl/maas/packaging_rc3

Merged into lp:~maas-maintainers/maas/packaging at revision 509

Andres Rodriguez (community): Approve on 2016-07-28

Blake Rouse (blake-rouse) on 2016-07-19

tags:

added: oil

Revision history for this message

Larry Michel (lmic) wrote on 2016-07-19:

Adding the top output showing the load:

ubuntu@maas2-production:~$ top

top - 22:09:40 up 1 day, 17:51, 1 user, load average: 5.61, 5.84, 5.63
Tasks: 71 total, 2 running, 69 sleeping, 0 stopped, 0 zombie
%Cpu(s): 38.2 us, 5.4 sy, 0.0 ni, 54.6 id, 0.0 wa, 0.0 hi, 1.8 si, 0.0 st
KiB Mem : 16300676 total, 13767796 free, 1751532 used, 781348 buff/cache
KiB Swap: 16645628 total, 15951660 free, 693968 used. 13767796 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21436 maas 20 0 1221024 249004 8400 S 111.3 1.5 3813:45 twistd3
21438 maas 20 0 1810808 262696 8556 S 111.3 1.6 4895:17 twistd3
21429 maas 20 0 1223360 245912 8384 S 107.3 1.5 3464:28 twistd3
21434 maas 20 0 1147128 247360 8512 S 70.7 1.5 1884:17 twistd3
25142 postgres 20 0 295304 21944 19888 S 19.3 0.1 0:00.58 postgres
24964 postgres 20 0 296604 102552 99740 S 11.0 0.6 0:01.97 postgres
25125 postgres 20 0 295304 21940 19888 S 8.3 0.1 0:00.85 postgres
25126 postgres 20 0 295304 21936 19884 S 8.3 0.1 0:00.84 postgres
25137 postgres 20 0 295304 21936 19888 R 8.3 0.1 0:00.73 postgres
25139 postgres 20 0 295304 21936 19884 S 8.3 0.1 0:00.50 postgres
25140 postgres 20 0 295304 18584 16584 S 8.3 0.1 0:00.43 postgres
25154 postgres 20 0 295304 18588 16592 S 6.3 0.1 0:00.19 postgres

Blake Rouse (blake-rouse) on 2016-07-20

Changed in maas:
assignee:	nobody → Blake Rouse (blake-rouse)
status:	Triaged → In Progress
milestone:	2.0.0 → 2.1.0

MAAS Lander (maas-lander) on 2016-07-21

Changed in maas:
status:	In Progress → Fix Committed

Andres Rodriguez (andreserl) on 2016-08-22

Changed in maas:
status:	Fix Committed → Fix Released
milestone:	2.0.1 → none

Taihsiang Ho (tai271828) on 2019-01-11

tags:

added: auto-sanity tpe-lab

Taihsiang Ho (tai271828) on 2019-01-11

tags:

added: taipei-lab
removed: tpe-lab

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1558314

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.