OpenSRF

Default ejabberd max_stanza_size can be exceeded when chunking (MARC)XML-heavy responses

Bug #1709710 reported by Adam Bowling on 2017-08-09

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	OpenSRF	Fix Released	High	Unassigned	OpenSRF 2.5.1

Bug Description

Installation README for OpenSRF originally included instruction to set "max_stanza_size" in ejabberd equal to "2000000." That seemed a helpful step in setting up OpenSRF to serve requests. Without the increase, many operations, including OPAC searches, timeout or fail without returning results.

OpenSRF 2.4.2
OpenSRF 2.5.0

See original description

Tags:

Revision history for this message

Mike Rylander (mrylander) wrote on 2017-08-09:

This should not be true with OpenSRF 2.5+ due to chunking. Do you have indications that it is still necessary in 2.5?

FWIW, I've experience with quite a number of instances where increasing the max stanza size is not necessary. Perhaps there is something else going on?

Revision history for this message

Adam Bowling (abowling) wrote on 2017-08-09:

Indeed. Two separate instances on test installations this week where OPAC results failed, but were instantly successful upon changing upping the max_stanza_size.

Revision history for this message

Adam Bowling (abowling) wrote on 2017-08-09:

FWIW, 16.04 + 2.12.2 + 2.5.0 was the combo.

description:

updated

Revision history for this message

Galen Charlton (gmc) wrote on 2017-08-09:

Adam, where any 'XML stanza is too big' messages showing up in the OpenSRF logs?

Revision history for this message

Ben Shum (bshum) wrote on 2017-08-10:

Build a new VM with Ubuntu 16.04, installed OpenSRF 2.5.0 with standard instructions and default max_stanza_size unadjusted, and installed older Evergreen 2.12.2. Loaded concerto records, and search retrieved results (for "the" and "mozart", etc.) just fine and without any issues.

Been using the chunking from OpenSRF 2.5 series (and master) for all my test VMs with Evergreen master development on Ubuntu 16.04 without any issues so far.

Given that OpenSRF 2.5 chunking only works with Evergreen 2.12+ (and the bug target says 2.11??) I'd like to double check from Adam that he doesn't have a mismatching setup with Evergreen 2.11 and not 2.12 and OpenSRF 2.5?

Changed in evergreen:
status:	New → Incomplete

Revision history for this message

Mike Rylander (mrylander) wrote on 2017-08-11:

Adam,

If you can, would you please add the following line to your opensr_core.xml in the //config/opensrf section, as a peer to <router_name> and <loglevel>:

<msg_size_warn>64000</msg_size_warn>

That will cause log WARN-level messages that will look like this:

Sending large message of XXX bytes to YYY@ZZZ/AAA

Existence of those log messages, and confirmation of what Ben asked above, should help significantly in identifying the cause of your issue.

While upping the stanza size seems to have helped you, it really shouldn't be necessary with OpenSRF 2.5+, and we should find out why it still was for you.

Thanks!

Revision history for this message

Mike Rylander (mrylander) wrote on 2017-08-11:

Adam,

Also, do you know what the default max stanza size was before you changed it?

Thanks again.

Revision history for this message

Dan Scott (denials) wrote on 2017-08-16:

We're seeing this on EG 2.12 / OpenSRF 2.5.0 as well, I think, if the following log message is an indication:

open-ils.cstore[32713]: [WARN:32713:transport_session.c:869:blah] Text of error message received from Jabber: XML stanza is too big

Will add the stanza Mike suggested and report back once we can restart our services.

Revision history for this message

Dan Scott (denials) wrote on 2017-08-16:

Default for ejabberd on Ubuntu 16.04 (our current setting) is 65536:

max_stanza_size: 65536

Revision history for this message

Dan Scott (denials) wrote on 2017-08-16:

#10

Our one case of this occurring today seems to be linked to an SRU search:

/usr/sbin/apache2[25960]: [ERR :25960:SuperCat.pm:2089:blah] Failed to process record for SRU search

Revision history for this message

Galen Charlton (gmc) wrote on 2017-08-17:

#11

So, I've tracked it down -- the default chunk size is a bit too low for certain MARCXML-heavy responses to be chunked properly. Consider a request like this:

open-ils.cstore open-ils.cstore.json_query {"from":["unapi.biblio_record_entry_feed","{234,233,230,235,223,215,244,212,221,225}","marcxml","{holdings_xml,mra,acp,acnp,acns,bmp,cbs}","CONS","0","acn=>5,acp=>5",null,null,"1"]}

The result will include a lot of XML containing quotes and brackets:

<datafield tag="028" ind1="3" ind2="0"><subfield code="a">HL50498715...

which gets converted to JSON as part of the payload of an OpenSRF transport message:

<datafield tag=\"028\" ind1=\"3\" ind2=\"0\"><subfield code="a">HL50498715...

which, if too large, gets converted to JSON in chunks:

<datafield tag=\\\"028\\\" ind1=\\\"3\\\" ind2=\\\"0\\\"><subfield code="a">HL50498715...

and converted to XML, and wrapped in an XMPP message, resulting in

<datafield tag=\\\\\\\"028\\\\\\\" ind1=\\\\\\\"3\\\\\\\" ind2=\\\\\\\"0\\\\\\\"><subfield code=\\\\\\\"a\\\\\\\">HL50498715<

The maximum chunk size is currently OSRF_MSG_CHUNK_SIZE == 2 * OSRF_MSG_BUNDLE_SIZE == 2 * 25600 == 51200, but that gets applied when splitting up the original payload as initially converted to JSON. If the payload is primarily XML and has a lot of quotes and brackets that end up getting escaped and converted to entities... I've seen chunks of 51200 bytes of payload ended up become XMPP messages larger than 66000 bytes.

Revision history for this message

Galen Charlton (gmc) wrote on 2017-08-17:

#12

The following works for me, although it doesn't provide a guarantee that the max stanza size won't be exceeded.

diff --git a/include/opensrf/osrf_app_session.h b/include/opensrf/osrf_app_session.h
index 7ff732b..7e44dcf 100644
--- a/include/opensrf/osrf_app_session.h
+++ b/include/opensrf/osrf_app_session.h
@@ -38,7 +38,7 @@ typedef struct osrf_app_request_struct osrfAppRequest;
/**
@brief Default size of output buffer.
*/
-#define OSRF_MSG_BUNDLE_SIZE 25600 /* 25K */
+#define OSRF_MSG_BUNDLE_SIZE 20480 /* 20K */
#define OSRF_MSG_CHUNK_SIZE (OSRF_MSG_BUNDLE_SIZE * 2)

/**

Galen Charlton (gmc) on 2017-08-17

Changed in evergreen:
assignee:	Adam Bowling (abowling) → nobody
status:	Incomplete → Fix Committed
status:	Fix Committed → Confirmed
Changed in opensrf:
milestone:	none → 2.5.1
status:	New → Confirmed
summary:	- OpenSRF Installation Instructions for ejabberd + Default ejabberd max_stanza_size can be exceeded when chunking + (MARC)XML-heavy responses
Changed in evergreen:
milestone:	2.11.8 → 2.12.5

Revision history for this message

Dan Scott (denials) wrote on 2017-08-18:

#13

Upping max_stanza_size to 2000000 (easier than rebuilding & reinstalling OpenSRF and Evergreen on our production server) resolved our issues with using the "Import via Z39.50" interface, which would hang when retrieving a full set of 25 results.

This also appears to have resolved an issue where people were unable to access the reporter interface (we've been running in production for 8 years now, so perhaps we have a lot of reports and folders).

Having enabled the opensrf_core.xml flag that Mike requested, I've seen (in the past half hour since enabling it) "Sending large message of XXX bytes" messages for open-ils.reporter and open-ils.search, the largest value of XXX being 67872.

Evergreen Bug Maintenance (bugmaster) on 2017-08-25

Changed in evergreen:
milestone:	2.12.5 → 2.12.6

Revision history for this message

Mike Rylander (mrylander) wrote on 2017-08-28:

#14

Here's a branch that looks at the to-be-escaped partial response string to see how much extra space the chunk will need, base on how many XML-ish characters are present, and scales the chunk size for that message appropriately:

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/miker/lp-1709710-smarter-chunk-length

tags:

added: pullrequest

Revision history for this message

Galen Charlton (gmc) wrote on 2017-09-02:

#15

I've taken an initial look at Mike's patch, which so far seems to be an improvement. However, there was a compilation error, which I've corrected in an amended patch here:

collab/gmcharlt/lp1709710-smarter-chunk-length

Revision history for this message

Mike Rylander (mrylander) wrote on 2017-09-05:

#16

Thanks, Galen. I was confused because it was all working in my local tests ... I should check in my changes!

Revision history for this message

Galen Charlton (gmc) wrote on 2017-09-08:

#17

I've force-pushed to collab/gmcharlt/lp1709710-smarter-chunk-length a final set of patches:

- a further cleaned-up version of Mike's patch that ensures that the final completion status message and fixes const correctness and makes whitespace match local conventions
- a unit test

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/collab/gmcharlt/lp1709710-smarter-chunk-length

Bill Erickson (berick) on 2017-09-08

Changed in opensrf:
assignee:	nobody → Bill Erickson (berick)

Revision history for this message

Bill Erickson (berick) wrote on 2017-09-08:

#18

Code looks good. Tests pass.

Do we need to tell the Perl length() incantation in the patch to calculate bytes instead of characters? We have related examples of {use bytes; $l = length($s);} in the code. I can make that change if it sounds sane.

Changed in opensrf:
assignee:	Bill Erickson (berick) → nobody

Revision history for this message

Galen Charlton (gmc) wrote on 2017-09-08:

#19

It does sound like a good idea -- please run with it.

Revision history for this message

Bill Erickson (berick) wrote on 2017-09-08:

#20

Pushed a branch:

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lp1709710-smarter-chunk-length-and-bytes

Includes sign-offs for the original 2 commits.

Adds a new commit to calculate string byte counts instead of length in the Perl chunking/bundling code. Note, after a little more reading, decided to go with the newer recommended approach of counting bytes via length(Encode::encode_utf8($str)). For reference, https://perldoc.perl.org/bytes.html and 'perldoc -f length'.

This is a slightly larger change than I wanted to rubber-stamp for myself, so leaving here for additional sign-off.

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2017-09-10:

#21

Bill, I think you're right about using encode_utf8. I'm "testing" this branch and it seems to have resolved my issues with bug 1714699. I'll kick this around some more and see how it shakes out.

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2017-09-10:

#22

It works for me. I added my signoff here: working/user/dyrcona/lp1709710-smarter-chunk-length-and-bytes -> user/dyrcona/lp1709710-smarter-chunk-length-and-bytes

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/dyrcona/lp1709710-smarter-chunk-length-and-bytes

In case anyone else wants to test before pushing to the main repository. I think it's ready to go, but it being a weekend, thought I'd give someone else a chance to evaluate the branch.

Galen Charlton (gmc) on 2017-09-11

Changed in opensrf:
assignee:	nobody → Galen Charlton (gmc)

Revision history for this message

Galen Charlton (gmc) wrote on 2017-09-11:

#23

The follow-up changes work for me, and bib records with Arabic and Chinese text are successfully getting ingested and returned. Pushed to master and rel_2_5. Thanks, Mike, Bill, and Jason!

Changed in opensrf:
status:	Confirmed → Fix Committed
importance:	Undecided → High
assignee:	Galen Charlton (gmc) → nobody

Evergreen Bug Maintenance (bugmaster) on 2017-09-11

Changed in opensrf:
status:	Fix Committed → Fix Released

Evergreen Bug Maintenance (bugmaster) on 2017-09-21

no longer affects:

evergreen

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.