Network Error when deleting all copies in bucket
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
Evergreen version: master 20111205 + mvlc modifications (affects all versions)
OpenSRF version: 2.0.1ish
PostgreSQL version: 9.1.2
Linux distribution: Ubuntu 10.04.3 LTS
When staff try to delete all copies from a bucket using the client, they very often get a network error when there are more than a few hundred copies. Specifically, I've seen it this week with two buckets each having 2,000+ copies each.
While looking through the client code, I see that a fleshed retrieve is done before a call to update fleshed copies. I imagine that the network error comes from all of the fleshed data being sent from the client to the server. The resulting XMPP message likely exceeds our max_stanza_size parameter, which I believe is presently set to 2,000,000.
In lieu of setting the max_stanza_size even higher, I would like to modify the client to use a more space efficient deletion algorithm. A couple of options come to mind:
1. Create a batch copy delete by id method in open-ils.cat that accepts a list of copy ids. The client could then just send the list of ids back to the server, thus sending the bare minimum of information.
2. Retrieve only the acp information for each copy and send that back, thus avoiding all of the fleshed data taking up extra space in transit.
I prefer number 1, but can see that number 2 might be made to work with fewer to no back end additions.
Also, I wonder if there is some reason to continue using the update fleshed copies method here that I don't see.
I'll start a branch to work on solution #1 unless/until someone with more insight says its a bad idea because....
Changed in evergreen: | |
status: | In Progress → New |
Changed in evergreen: | |
assignee: | Jason Stephenson (jstephenson) → nobody |
Changed in evergreen: | |
milestone: | 2.2.0alpha2 → 2.2.0beta1 |
Changed in evergreen: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
Changed in evergreen: | |
milestone: | 2.2.0beta1 → 2.2.0rc1 |
Changed in evergreen: | |
milestone: | 2.2.0rc1 → 2.2.0 |
Changed in evergreen: | |
milestone: | 2.2.0 → none |
Well, I came up with a quicker and probably better solution.
I've added a FM_ACP_ UNFLESHED_ RETRIEVE to the client to just retrieve the unfleshed copy information. This works with the batch copy update call already being used.
Along the way, I've noticed the cat.properties string for cat.batch_ operation_ failed has a %1$2 placeholder tacked onto the end that isn't used anywhere in the client code. I'll remove that as well.
I still sometimes get a network error when doing a bucket with over 200 copies, but the deletion still succeeds. I imagine that this network error comes from a timeout value somewhere. I'll see if I can't also do something about that.