Default ejabberd max_stanza_size can be exceeded when chunking (MARC)XML-heavy responses

Bug #1709710 reported by Adam Bowling on 2017-08-09
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenSRF
High
Unassigned

Bug Description

Installation README for OpenSRF originally included instruction to set "max_stanza_size" in ejabberd equal to "2000000." That seemed a helpful step in setting up OpenSRF to serve requests. Without the increase, many operations, including OPAC searches, timeout or fail without returning results.

OpenSRF 2.4.2
OpenSRF 2.5.0

Mike Rylander (mrylander) wrote :

This should not be true with OpenSRF 2.5+ due to chunking. Do you have indications that it is still necessary in 2.5?

FWIW, I've experience with quite a number of instances where increasing the max stanza size is not necessary. Perhaps there is something else going on?

Adam Bowling (abowling) wrote :

Indeed. Two separate instances on test installations this week where OPAC results failed, but were instantly successful upon changing upping the max_stanza_size.

Adam Bowling (abowling) wrote :

FWIW, 16.04 + 2.12.2 + 2.5.0 was the combo.

description: updated
Galen Charlton (gmc) wrote :

Adam, where any 'XML stanza is too big' messages showing up in the OpenSRF logs?

Ben Shum (bshum) wrote :

Build a new VM with Ubuntu 16.04, installed OpenSRF 2.5.0 with standard instructions and default max_stanza_size unadjusted, and installed older Evergreen 2.12.2. Loaded concerto records, and search retrieved results (for "the" and "mozart", etc.) just fine and without any issues.

Been using the chunking from OpenSRF 2.5 series (and master) for all my test VMs with Evergreen master development on Ubuntu 16.04 without any issues so far.

Given that OpenSRF 2.5 chunking only works with Evergreen 2.12+ (and the bug target says 2.11??) I'd like to double check from Adam that he doesn't have a mismatching setup with Evergreen 2.11 and not 2.12 and OpenSRF 2.5?

Changed in evergreen:
status: New → Incomplete
Mike Rylander (mrylander) wrote :

Adam,

If you can, would you please add the following line to your opensr_core.xml in the //config/opensrf section, as a peer to <router_name> and <loglevel>:

    <msg_size_warn>64000</msg_size_warn>

That will cause log WARN-level messages that will look like this:

  Sending large message of XXX bytes to YYY@ZZZ/AAA

Existence of those log messages, and confirmation of what Ben asked above, should help significantly in identifying the cause of your issue.

While upping the stanza size seems to have helped you, it really shouldn't be necessary with OpenSRF 2.5+, and we should find out why it still was for you.

Thanks!

Mike Rylander (mrylander) wrote :

Adam,

Also, do you know what the default max stanza size was before you changed it?

Thanks again.

Dan Scott (denials) wrote :

We're seeing this on EG 2.12 / OpenSRF 2.5.0 as well, I think, if the following log message is an indication:

open-ils.cstore[32713]: [WARN:32713:transport_session.c:869:blah] Text of error message received from Jabber: XML stanza is too big

Will add the stanza Mike suggested and report back once we can restart our services.

Dan Scott (denials) wrote :

Default for ejabberd on Ubuntu 16.04 (our current setting) is 65536:

  max_stanza_size: 65536

Dan Scott (denials) wrote :

Our one case of this occurring today seems to be linked to an SRU search:

/usr/sbin/apache2[25960]: [ERR :25960:SuperCat.pm:2089:blah] Failed to process record for SRU search

Galen Charlton (gmc) wrote :

So, I've tracked it down -- the default chunk size is a bit too low for certain MARCXML-heavy responses to be chunked properly. Consider a request like this:

open-ils.cstore open-ils.cstore.json_query {"from":["unapi.biblio_record_entry_feed","{234,233,230,235,223,215,244,212,221,225}","marcxml","{holdings_xml,mra,acp,acnp,acns,bmp,cbs}","CONS","0","acn=>5,acp=>5",null,null,"1"]}

The result will include a lot of XML containing quotes and brackets:

  <datafield tag="028" ind1="3" ind2="0"><subfield code="a">HL50498715...

which gets converted to JSON as part of the payload of an OpenSRF transport message:

  <datafield tag=\"028\" ind1=\"3\" ind2=\"0\"><subfield code="a">HL50498715...

which, if too large, gets converted to JSON in chunks:

  <datafield tag=\\\"028\\\" ind1=\\\"3\\\" ind2=\\\"0\\\"><subfield code="a">HL50498715...

and converted to XML, and wrapped in an XMPP message, resulting in

  &lt;datafield tag=\\\\\\\&quot;028\\\\\\\&quot; ind1=\\\\\\\&quot;3\\\\\\\&quot; ind2=\\\\\\\&quot;0\\\\\\\&quot;&gt;&lt;subfield code=\\\\\\\&quot;a\\\\\\\&quot;&gt;HL50498715&lt;

The maximum chunk size is currently OSRF_MSG_CHUNK_SIZE == 2 * OSRF_MSG_BUNDLE_SIZE == 2 * 25600 == 51200, but that gets applied when splitting up the original payload as initially converted to JSON. If the payload is primarily XML and has a lot of quotes and brackets that end up getting escaped and converted to entities... I've seen chunks of 51200 bytes of payload ended up become XMPP messages larger than 66000 bytes.

Galen Charlton (gmc) wrote :

The following works for me, although it doesn't provide a guarantee that the max stanza size won't be exceeded.

diff --git a/include/opensrf/osrf_app_session.h b/include/opensrf/osrf_app_session.h
index 7ff732b..7e44dcf 100644
--- a/include/opensrf/osrf_app_session.h
+++ b/include/opensrf/osrf_app_session.h
@@ -38,7 +38,7 @@ typedef struct osrf_app_request_struct osrfAppRequest;
 /**
        @brief Default size of output buffer.
 */
-#define OSRF_MSG_BUNDLE_SIZE 25600 /* 25K */
+#define OSRF_MSG_BUNDLE_SIZE 20480 /* 20K */
 #define OSRF_MSG_CHUNK_SIZE (OSRF_MSG_BUNDLE_SIZE * 2)

 /**

Galen Charlton (gmc) on 2017-08-17
Changed in evergreen:
assignee: Adam Bowling (abowling) → nobody
status: Incomplete → Fix Committed
status: Fix Committed → Confirmed
Changed in opensrf:
milestone: none → 2.5.1
status: New → Confirmed
summary: - OpenSRF Installation Instructions for ejabberd
+ Default ejabberd max_stanza_size can be exceeded when chunking (MARC
+ )XML-heavy responses
Changed in evergreen:
milestone: 2.11.8 → 2.12.5
Dan Scott (denials) wrote :

Upping max_stanza_size to 2000000 (easier than rebuilding & reinstalling OpenSRF and Evergreen on our production server) resolved our issues with using the "Import via Z39.50" interface, which would hang when retrieving a full set of 25 results.

This also appears to have resolved an issue where people were unable to access the reporter interface (we've been running in production for 8 years now, so perhaps we have a lot of reports and folders).

Having enabled the opensrf_core.xml flag that Mike requested, I've seen (in the past half hour since enabling it) "Sending large message of XXX bytes" messages for open-ils.reporter and open-ils.search, the largest value of XXX being 67872.

Changed in evergreen:
milestone: 2.12.5 → 2.12.6
Mike Rylander (mrylander) wrote :

Here's a branch that looks at the to-be-escaped partial response string to see how much extra space the chunk will need, base on how many XML-ish characters are present, and scales the chunk size for that message appropriately:

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/miker/lp-1709710-smarter-chunk-length

tags: added: pullrequest
Galen Charlton (gmc) wrote :

I've taken an initial look at Mike's patch, which so far seems to be an improvement. However, there was a compilation error, which I've corrected in an amended patch here:

collab/gmcharlt/lp1709710-smarter-chunk-length

Mike Rylander (mrylander) wrote :

Thanks, Galen. I was confused because it was all working in my local tests ... I should check in my changes!

Galen Charlton (gmc) wrote :

I've force-pushed to collab/gmcharlt/lp1709710-smarter-chunk-length a final set of patches:

- a further cleaned-up version of Mike's patch that ensures that the final completion status message and fixes const correctness and makes whitespace match local conventions
- a unit test

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/collab/gmcharlt/lp1709710-smarter-chunk-length

Bill Erickson (berick) on 2017-09-08
Changed in opensrf:
assignee: nobody → Bill Erickson (berick)
Bill Erickson (berick) wrote :

Code looks good. Tests pass.

Do we need to tell the Perl length() incantation in the patch to calculate bytes instead of characters? We have related examples of {use bytes; $l = length($s);} in the code. I can make that change if it sounds sane.

Changed in opensrf:
assignee: Bill Erickson (berick) → nobody
Galen Charlton (gmc) wrote :

It does sound like a good idea -- please run with it.

Bill Erickson (berick) wrote :

Pushed a branch:

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lp1709710-smarter-chunk-length-and-bytes

Includes sign-offs for the original 2 commits.

Adds a new commit to calculate string byte counts instead of length in the Perl chunking/bundling code. Note, after a little more reading, decided to go with the newer recommended approach of counting bytes via length(Encode::encode_utf8($str)). For reference, https://perldoc.perl.org/bytes.html and 'perldoc -f length'.

This is a slightly larger change than I wanted to rubber-stamp for myself, so leaving here for additional sign-off.

Jason Stephenson (jstephenson) wrote :

Bill, I think you're right about using encode_utf8. I'm "testing" this branch and it seems to have resolved my issues with bug 1714699. I'll kick this around some more and see how it shakes out.

Jason Stephenson (jstephenson) wrote :

It works for me. I added my signoff here: working/user/dyrcona/lp1709710-smarter-chunk-length-and-bytes -> user/dyrcona/lp1709710-smarter-chunk-length-and-bytes

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/dyrcona/lp1709710-smarter-chunk-length-and-bytes

In case anyone else wants to test before pushing to the main repository. I think it's ready to go, but it being a weekend, thought I'd give someone else a chance to evaluate the branch.

Galen Charlton (gmc) on 2017-09-11
Changed in opensrf:
assignee: nobody → Galen Charlton (gmc)
Galen Charlton (gmc) wrote :

The follow-up changes work for me, and bib records with Arabic and Chinese text are successfully getting ingested and returned. Pushed to master and rel_2_5. Thanks, Mike, Bill, and Jason!

Changed in opensrf:
status: Confirmed → Fix Committed
importance: Undecided → High
assignee: Galen Charlton (gmc) → nobody
Changed in opensrf:
status: Fix Committed → Fix Released
no longer affects: evergreen
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers