SIP Checkin Speed vs Debug Logging

Bug #1823338 reported by John Merriam on 2019-04-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenSRF
Undecided
Unassigned

Bug Description

Confirmed with OpenSRF 3.0.2 and Evergreen 3.1.8

Hello. Since our upgrade from OpenSRF 2.5.2 to 3.0.2 and Evergreen 2.12.12 to 3.1.8 we have had an issue where SIP checkins have been taking 200+ms longer to complete on average compared to before the upgrade. This has caused a problem with an automated item sorter used by one of our libraries. The exception rate of items that could not be sorted before the SIP checkin response was received by the sorter increased dramatically.

After trying many different things with our SIPServer configuration which made no difference we believed that the problem was not with SIPServer (SIPServer was updated as part of the upgrade). To try to confirm that we started profiling the software using Devel::NYTProf.

After profiling SIPServer and the openi-ils.circ drones we did not see a major problem with either of them. We could see that calls to the checkin function were taking about 500ms. Then we profiled the open-ils.storage drones. We found that the actual database checkin call was only taking about 50ms. That left 450ms. We then found that one debug logging statement was taking about 100ms to process on average and seemed to be being called at least twice per checkin.

That debug logging statement is the 'Available methods ...' one in src/perl/lib/OpenSRF/Application.pm of OpenSRF. A diff is attached which shows what we have done (just comment out that debug logging statement).

It is possible that disabling this logging statement may speed up other things besides SIP checkins. I am unsure how many other code paths also hit this particular logging statement.

Since this debug logging statement appears to have been there unchanged for quite some time, the question is why is this a problem now? One possibility is that this debug logging statement has always taken this long but now something else is slower. In that case we are just compensating for that slowness elsewhere by fixing this issue.

When we upgraded OpenSRF and Evergreen we also upgraded from Ubuntu 14 to Ubuntu 16. That makes me think that another possibility is that something is now different in the underlying system (Perl was upgraded with Ubuntu...) and this logging statement is now slow because of that.

Either way, this simple fix is working for us. SIP checkins are taking about the same amount of time to process with this fix as they were before our upgrade.

John Merriam (jmerriam) wrote :
John Merriam (jmerriam) wrote :

OpenSRF branch with the debug log line removed instead of commented is working/user/jmerriam/LP1823338: https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/jmerriam/LP1823338

John Merriam (jmerriam) on 2019-04-05
tags: added: pullrequest
John Merriam (jmerriam) wrote :

We might have a way to remove this delay without removing the log line. I would say hold off on merging this until we see what comes of that. It involves changes to the logger that allow delayed exec subroutines in addition to strings for the $msg. If that pans out I will push a new branch that uses that method instead of just removing it.

John Merriam (jmerriam) wrote :

********

LP1824181 https://bugs.launchpad.net/opensrf/+bug/1824181

and

LP1824184 https://bugs.launchpad.net/opensrf/+bug/1824184

have an alternate and more comprehensive fix for this issue. LP1824184 contains a fix for the particular logging statement in this bug. That fix depends on LP1824181.

********

Galen Charlton (gmc) on 2019-05-07
Changed in opensrf:
milestone: none → 3.1.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers