Concurrent API calls don't get balanced between regiond processes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Jacopo Rota | ||
3.2 |
Fix Released
|
Critical
|
Jacopo Rota | ||
3.3 |
Fix Released
|
Critical
|
Jacopo Rota | ||
3.4 |
Fix Released
|
Critical
|
Jacopo Rota | ||
3.5 |
Fix Released
|
Critical
|
Jacopo Rota |
Bug Description
[Problem Description]
I noticed that parallel requests to a MAAS API are all being
handled by a single regiond process instead of being balanced
between the multiple spawned processes.
Below is the time on a single request for a machines read:
ubuntu@cli01:~$ time maas admin machines read > /dev/null
real 0m40.534s
user 0m1.445s
sys 0m0.161s
When running the request simultaneously from more than one client,
times are highly increased, and checking the load in the server,
only one regiond process is at high CPU usage:
top -u maas on the server:
top - 16:13:58 up 2:46, 1 user, load average: 1.33, 0.98, 0.95
Tasks: 296 total, 1 running, 294 sleeping, 0 stopped, 1 zombie
%Cpu(s): 12.8 us, 0.0 sy, 0.0 ni, 87.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 15991.5 total, 3062.5 free, 5723.7 used, 7205.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 7811.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1316 maas 20 0 3907428 3.5g 20828 S 100.3 22.3 20:25.14 regiond
803 maas 20 0 2608 476 408 S 0.0 0.0 0:00.00 sh
868 maas 20 0 977104 118120 20316 S 0.0 0.7 1:08.37 regiond
869 maas 20 0 7236 516 452 S 0.0 0.0 0:00.03 tee
938 maas 20 0 2608 484 412 S 0.0 0.0 0:00.00 sh
939 maas 20 0 381836 125828 15976 S 0.0 0.8 17:47.29 rackd
940 maas 20 0 7236 452 388 S 0.0 0.0 0:00.01 tee
1200 maas 20 0 63480 53720 9900 S 0.0 0.3 0:00.58 maas-common
1309 maas 20 0 614948 137300 20488 S 0.0 0.8 0:13.89 regiond
1310 maas 20 0 389428 113756 20356 S 0.0 0.7 0:11.40 regiond
1312 maas 20 0 388144 113616 20348 S 0.0 0.7 0:11.61 regiond
1314 maas 20 0 389436 113864 20560 S 0.0 0.7 0:11.35 regiond
1315 maas 20 0 461876 113704 20400 S 0.0 0.7 0:11.76 regiond
1317 maas 20 0 461876 113640 20288 S 0.0 0.7 0:11.38 regiond
1318 maas 20 0 389440 113788 20408 S 0.0 0.7 0:11.17 regiond
1322 maas 20 0 388148 113556 20384 S 0.0 0.7 0:11.39 regiond
1324 maas 20 0 388404 113396 20244 S 0.0 0.7 0:11.20 regiond
1325 maas 20 0 387868 112900 20416 S 0.0 0.7 0:09.33 regiond
1326 maas 20 0 462140 113856 20480 S 0.0 0.7 0:11.10 regiond
1435 maas 20 0 384700 4068 3456 S 0.0 0.0 0:00.02 rsyslogd
Notice that only PID 1316 is consuming CPU.
Another point is that all the client requests seem to finish at the same time
what could indicate they are all blocked on the same thing.
ubuntu@cli01:~$ time maas admin machines read > /dev/null
real 1m44.542s
user 0m1.189s
sys 0m0.176s
ubuntu@cli02:~$ time maas admin machines read > /dev/null
real 1m44.545s
user 0m1.155s
sys 0m0.206s
ubuntu@cli03:~$ time maas admin machines read > /dev/null
real 1m43.999s
user 0m1.180s
sys 0m0.157s
This behavior is seen in MAAS 3.2, 3.3, and 3.4.
Related branches
- MAAS Lander: Approve
- Jacopo Rota: Approve
-
Diff: 452 lines (+122/-52)7 files modifiedsnap/local/tree/bin/run-regiond (+1/-1)
src/maasserver/regiondservices/http.py (+17/-5)
src/maasserver/regiondservices/tests/test_http.py (+20/-7)
src/maasserver/templates/http/regiond.nginx.conf.template (+3/-1)
src/maasserver/tests/test_workers.py (+51/-20)
src/maasserver/webapp.py (+9/-7)
src/maasserver/workers.py (+21/-11)
- MAAS Lander: Approve
- Jacopo Rota: Approve
-
Diff: 444 lines (+121/-52)7 files modifiedsnap/local/tree/bin/run-regiond (+1/-1)
src/maasserver/regiondservices/http.py (+17/-5)
src/maasserver/regiondservices/tests/test_http.py (+20/-7)
src/maasserver/templates/http/regiond.nginx.conf.template (+3/-1)
src/maasserver/tests/test_workers.py (+51/-20)
src/maasserver/webapp.py (+9/-7)
src/maasserver/workers.py (+20/-11)
- MAAS Lander: Approve
- Jacopo Rota: Approve
-
Diff: 444 lines (+121/-52)7 files modifiedsnap/local/tree/bin/run-regiond (+1/-1)
src/maasserver/regiondservices/http.py (+17/-5)
src/maasserver/regiondservices/tests/test_http.py (+20/-7)
src/maasserver/templates/http/regiond.nginx.conf.template (+3/-1)
src/maasserver/tests/test_workers.py (+51/-20)
src/maasserver/webapp.py (+9/-7)
src/maasserver/workers.py (+20/-11)
- Adam Collard (community): Approve
- MAAS Lander: Approve
- Alberto Donato (community): Approve
-
Diff: 449 lines (+121/-52)7 files modifiedsnap/local/tree/bin/run-regiond (+1/-1)
src/maasserver/regiondservices/http.py (+17/-5)
src/maasserver/regiondservices/tests/test_http.py (+20/-7)
src/maasserver/templates/http/regiond.nginx.conf.template (+3/-1)
src/maasserver/tests/test_workers.py (+51/-20)
src/maasserver/webapp.py (+9/-7)
src/maasserver/workers.py (+20/-11)
Changed in maas: | |
status: | New → Confirmed |
Changed in maas: | |
milestone: | 3.5.0 → 3.5.0-beta1 |
Changed in maas: | |
status: | Fix Committed → Fix Released |
I was able to reproduce this. Thanks for reporting!
We start multiple regiond processes and each of them is running
def _makeEndpoint( self):
"""Make the endpoint for the webapp."""
socket_path = os.getenv(
"MAAS_ HTTP_SOCKET_ PATH",
get_ maas_data_ path("maas- regiond- webapp. sock"),
)
s = socket. socket( socket. AF_UNIX, socket.SOCK_STREAM) exists( socket_ path):
os. unlink( socket_ path)
if os.path.
# Use a backlog of 50, which seems to be fairly common.
# Adopt this socket into Twisted's reactor setting the endpoint. rverEndpoint( reactor, s.fileno(), s.family)
endpoint. socket = s # Prevent garbage collection.
endpoint = AdoptedStreamSe
return endpoint
unfortunately only one process is going to process the requests as there is no load balancing for unix sockets (ignore the race condition on os.path.exists, I noticed that when we hit the race an exception is thrown and it is silently ignored). exists( socket_ path):
os. unlink( socket_ path)
When we execute
if os.path.
we remove the connection of the worker that was using that unix socket.
Unless I missed something, the only fix is to create a single unix socket for each process and let nginx round robin the requests to the unix sockets.