Shouldn't repub_state get changed to 4 post-derive?

Bug #827970 reported by paul.n
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Internet Archive - Tech Support
Confirmed
Low
Jude Coelho

Bug Description

Currently there's a modify xml task to change repub_state from -2 to 4 prior to the derive.php task.

It would seem to me that:

- the repub state should be changed as the next task after a successful derive
- or, perhaps, designating a new repub state to indicate post-derive status

in the latter solution you would then have the following states + definitions:

-1 loaded but not scanned
-2 opened at scribe
4 post-upload, deriving now
? book finished deriving

i've marked this "bug" as low priority.

thanks,

paul

paul.n (paul-n)
description: updated
summary: - Should repub_state get changed to -4 pre-derive?
+ Shouldn't repub_state get changed to 4 post-derive?
description: updated
Revision history for this message
Jude Coelho (judec) wrote :

I think repub state 4 is traditionally meant to indicate a book has derived, as the change occurs after a derive task for all items.

I'm curious about why we use the scheme we do. We have, as you say:

-1 loaded but not scanned
-2 opened at scribe
3 at foldout station (deprecated)
4 post-upload, deriving now

Wouldn't it make more sense to have a scheme like:

0 - loaded, waiting for scanning
1 - opened at scribe for scanning
2 - scanned, waiting for repub
3 - repubed and uploaded, waiting for derive
4 - derived

Does this conflict with anything else in our operation, like microfilm?

Changed in ia-techsupport:
assignee: nobody → Jude Coelho (judec)
status: New → Confirmed
Revision history for this message
paul.n (paul-n) wrote :

Due to the use of OpenBookFactory in the earlier days of microfilm, there were additional repub_states used.
I know "3" has significance. And "6" is the equivalent of "4" in books. We also have -127 (skip repub)

I do think we should at least work on a table detailing all the different repub_states for the various projects and start from there if we are interested in potentially reassessing the scheme.

However, my bug was specifically addressing the last state of a book, 4 (6 for microfilm). I think these states should happen at the end of, or after, derive.php successfully finishes.

I think one of the reasons we use the "format" field a lot is because repub_state cannot currently be trusted to identify a fully derived it.

Revision history for this message
Hank Bromley (hank-archive) wrote :
Download full text (3.7 KiB)

Been meaning to reply to Paul's initial report since it was posted. Sorry for being so slow!

Our use of repub_state is indeed all confused, due to piecemeal changes over time, and is ripe for being totally rethought. At one point, repub_state of 6 was used to indicate the book had finished deriving, and some microfilm pathways still produce that result - we have 125K items with repub_state=6. (Actually, having just checked the code, the 6 is being set on microfilm at the end of the postprocessor book-op, and staying that way after the derive for reasons that will become clear in the next paragraph.)

Then at some point a function was added to the deriver code and set to run at the end of each derive, called nonMicrofilmBookSetRepubState4(). It changed repub_state to 4 for anything that had mediatype=texts, and wasn't microfilm. The comment introducing the function says:

  // If you are a book but not a microfilm book, then set your
  // repub_state to 4. This is meant to be invoked near the
  // end of a derive, so that it is not done if the derive
  // has bombed out earlier. It is for the benefit of scribe 1 books.

For scribe 2 books, the modify_xml.php Paul initially asked about gets queued as a "next_cmd" for the archive.php. That meant we were setting repub_state to 4 twice for scribe 2 books, once at end of the derive, and again in the follow-on modify_xml. Once we began redrowing modify_xml's that produced no change, all the follow-on modify_xml's began redrowing. At that point (April 2010) I tweaked nonMicrofilmBookSetRepubState4() to skip the repub_state change for scribe 2 books, as a short-term solution, pending clarification of the whole mess. My comment in the code reads:

    // FIXXX - April 2010, we need to clean up repub_state values and how
    // they're set - no one seems clear on why we set repub_state to 4 after
    // deriving (shouldn't it be 6?), nor why we do it both here and via the
    // separate modify_xml.php task for Scribe 2 books. The modify_xml is now
    // redrowing due to repub_state already being set to 4 here; as a short-term
    // solution, skip it here for Scribe 2

    // (probably want actually to drop the extra modify_xml task and set
    // repub_state here for all texts items [to 4 or to 6? still unclear])

And, of course, I never got around to starting the discussion I intended to, in order to get that all clarified, so that's where things remained for a while.

Then, in an effort to reduce the number of green rows, Tracey noticed that we had all these modify_xml tasks hanging around, green, for the duration of the derives on their items. Well, that seemed pretty silly, given how fast modify_xml's are, compared to derives. So she added an automatic priority adjustment that caused the modify_xml's to run before the derives, rather than after, the way it used to (which had been causing the redrows that my previous code change had been a response to).

So what we have now seriously makes no sense at all.

That means we get to decide what we should do instead!

I have no strong feelings on what repub_state we should use to signify that an item has finished deriving, beyond wanting u...

Read more...

Revision history for this message
Hank Bromley (hank-archive) wrote :

See related Bug 264143.

Revision history for this message
Jude Coelho (judec) wrote :

I'm fine with whatever we do. My main suggestion would be that, for clarity's sake, I think that the values should rise from -1 or 0 at the beginning of a book's journey and end at the final value, be that 4 or 6, with the states inbetween signifying the steps in the process toward becoming a fully derived book. Going from -1 to -2 to 4 feels wrong, as I think it seems confusing to someone who doesn't understand the history of the convention.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.