Reporting speed of %SS
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenVista/GT.M Integration |
New
|
Wishlist
|
Unassigned |
Bug Description
The %SS routine can take several seconds to complete its report, depending on the number of running mumps processes and the speed of the server.
Currently it operates by signaling each mumps process which then dumps that process's data into a temporary global.
It might be worthwhile to change the way %SS operates such that a process collects the data continuously in the background, and then %SS only needs to read that information when it is called. The staleness of the data would be similar to what it is now, because there is a delay in the duration of the data collection. But the access to the data would be instantaneous.
The freshness of the data could be improved by prioritizing updates for new mumps processes ahead of refreshing the data of existing processes. Also, the refresh rate could be self-tuning to be less frequent on processes that consistently return the same result a certain number of times and for a certain length of time. Once a mumps process starts, the %CPU or routine may sometimes change, but mostly the information is fairly static. So, for existing processes with unchanging results, the data collection routine could simply verify that the process still exists and update the %CPU, both of which can be done quickly without calling the interrupt. Usually, most processes would be in this optimized cycle, which could perform the interrupt update infrequently without a meaningful loss of information. With the collection process using the quick update for most processes, it could maintain much fresher data for the few processes that are new, changing, or gone, and the impact of the collection process would be minimized substantially. So it might be possible to keep the data mostly accurate to within a second.
Just running %SS twice in a row, 8 seconds apart, on a production system with 274 mumps processes, there were 6 processes that changed routines in that time, and there was just one new process.
I'm not sure hoe much I like this proposal, though I could be convinced.
Is the problem here that the %SS takes too long? If so, how about spawning off multiple MUPIPs to run in parallel?
Also, how about being able to provide a pid for immediate %SS? Often we know which process we want to look at and the rest is not important.
My concern is that stale data would lead to disastrous red herrings. The scenario I can see is that there's a client listener that does READ^XWBRW (?) about 99% of the time and so it's lowered in priority. You run a test using the client and do ^%SS simultaneously. You can't count on the results from the %SS being recent if the daemon has decided to "nice" it way back down. You would need a way to say "Seriously, I mean it, do %SS for pid xyz, now!"