Reporting speed of %SS

Bug #867918 reported by Derek_ on 2011-10-04
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenVista/GT.M Integration

Bug Description

The %SS routine can take several seconds to complete its report, depending on the number of running mumps processes and the speed of the server.

Currently it operates by signaling each mumps process which then dumps that process's data into a temporary global.

It might be worthwhile to change the way %SS operates such that a process collects the data continuously in the background, and then %SS only needs to read that information when it is called. The staleness of the data would be similar to what it is now, because there is a delay in the duration of the data collection. But the access to the data would be instantaneous.

The freshness of the data could be improved by prioritizing updates for new mumps processes ahead of refreshing the data of existing processes. Also, the refresh rate could be self-tuning to be less frequent on processes that consistently return the same result a certain number of times and for a certain length of time. Once a mumps process starts, the %CPU or routine may sometimes change, but mostly the information is fairly static. So, for existing processes with unchanging results, the data collection routine could simply verify that the process still exists and update the %CPU, both of which can be done quickly without calling the interrupt. Usually, most processes would be in this optimized cycle, which could perform the interrupt update infrequently without a meaningful loss of information. With the collection process using the quick update for most processes, it could maintain much fresher data for the few processes that are new, changing, or gone, and the impact of the collection process would be minimized substantially. So it might be possible to keep the data mostly accurate to within a second.

Just running %SS twice in a row, 8 seconds apart, on a production system with 274 mumps processes, there were 6 processes that changed routines in that time, and there was just one new process. (jeff-apple) wrote :

I'm not sure hoe much I like this proposal, though I could be convinced.

Is the problem here that the %SS takes too long? If so, how about spawning off multiple MUPIPs to run in parallel?

Also, how about being able to provide a pid for immediate %SS? Often we know which process we want to look at and the rest is not important.

My concern is that stale data would lead to disastrous red herrings. The scenario I can see is that there's a client listener that does READ^XWBRW (?) about 99% of the time and so it's lowered in priority. You run a test using the client and do ^%SS simultaneously. You can't count on the results from the %SS being recent if the daemon has decided to "nice" it way back down. You would need a way to say "Seriously, I mean it, do %SS for pid xyz, now!"

Derek_ (derek-name) wrote :

Currently the data from %SS is a mixture varying substantially in freshness, since it takes several seconds to run. The data for first process that it checked is several seconds old by the time you get it. With the change I'm suggesting, stale data would still be possible, but much less likely. The set of running PIDs, for example, would be very accurate for what is running at the time of the call, unlike the current %SS. If you really need to know accurately what routine a process is in at a very particular point in time, then you wouldn't be able to use the current %SS for that either.

I agree that calling it for specific PID would be a great feature, and that would give you instant and accurate information.

bhaskar (bhaskar) wrote :

Using ZSYSTEM to fire off a number of MUPIP INTRPT has a fair amount of overhead. You can instead use the lightweight $ZSIGPROC(pid,signal) function to send a signal - it returns 0 for success 1 for failure. The downside of the API is that you have to specify the signal numerically, which makes the code less portable. Also, the documented & recommended MUPIP INTRPT API will remain upward compatible in the (unlikely) event we decide to change the mechanism.

GTM>zsystem "ls -l GTM_JOBEXAM*"_$job_"_*"
ls: cannot access GTM_JOBEXAM*25801_*: No such file or directory

GTM>write $zsigproc($job,10)
GTM>zsystem "ls -l GTM_JOBEXAM*"_$job_"_*"
-rw-r--r-- 1 kbhaskar gtc 1612 2011-10-04 21:57 GTM_JOBEXAM.ZSHOW_DMP_25801_3

GTM>write $zsigproc(1,10)

Yes, $zsigproc() should be documented. It will be one day.

Derek_ (derek-name) wrote :

Thanks, Bhaskar. We don't use ZSYSTEM and MUPIP INTRPT for this, though. From INTRPT^MSCZJOBU, we call a custom C program, gtmsignal, once to send the signal to all mumps process. I guess we could use $zsigproc() instead if it had a "*" mode. But I think that part of the process is already very efficient.



Derek_ (derek-name) wrote :

I forgot that gtmsignal does call mupip to send the signal. I don't know if there is a reason why it doesn't just use kill() for this, except maybe to ensure that if FIS were to change the internal implementation, we would still be going though the provided interface. (jeff-apple) wrote :

We may have done it this way in order for a non-privileged user to send signals to other users' processes. gtmsignal is suid, I think.

Jon Tai (jontai) wrote :

As Jeff mentioned, gtmsignal was written so it could be a standalone binary with suid privileges. Derek is correct as to why we call mupip instead of kill() directly -- FIS has stated that you should not rely on the fact that mupip uses regular signals in its implementation.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers