Invalid date information in items

Bug #683953 reported by paul.n
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Internet Archive - Tech Support
In Progress
Undecided
Jeff Sharpe

Bug Description

Three items from Provo --

classschedule19291930brig
classschedule19301931brig
classschedule19311932brig

had their date information changed from "1929" to a invalid date. This caused the item to have no date associated with it at all in the search engine.

What to do:

1) Change the date to be valid (valid formats below)
2) Confirm that this change has been made by replying to this bug report

For reference:

"...The valid formats for date are:
YYYY or YYYY-MM-DD or YYYY-MM-DD HH:MM:SS

anything other than that in the date field (like YYYY-YYYY) is going to cause problems - like no value entered at all in the search engine

in fact, that ("no value entered at all in the search engine") is exactly what has now happened with classschedule19301931brig. you can see here that the search engine now has a blank value for "date" for that item, because the value provided was in an invalid format:

http://www.archive.org/advancedsearch.php?q=identifier%3Aclassschedule19301931brig&fl[]=date&fl[]=identifier&output=tables ..." - Hank

(assigning bug to Jeff Sharpe)

paul.n (paul-n)
Changed in ia-techsupport:
assignee: nobody → Jeff Sharpe (jeffs)
status: New → Confirmed
description: updated
Revision history for this message
Jeff Sharpe (jeffs) wrote : Re: [Bug 683953] [NEW] Invalid date information in items
Download full text (4.0 KiB)

Should be fixed now.
Jeff

> Public bug reported:
>
> Three items from Provo --
>
> classschedule19291930brig
> classschedule19301931brig
> classschedule19311932brig
>
> had their date information changed from "1929" to a invalid date. This
> caused the item to have no date associated with it at all in the search
> engine.
>
> What to do:
>
> 1) Change the date to be valid (valid formats below)
> 2) Confirm that this change has been made by replying to this bug report
>
> For reference:
>
> "...The valid formats for date are:
> YYYY or YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
>
> anything other than that in the date field (like YYYY-YYYY) is going to
> cause problems - like no value entered at all in the search engine
>
> in fact, that ("no value entered at all in the search engine") is
> exactly what has now happened with classschedule19301931brig. you can
> see here that the search engine now has a blank value for "date" for
> that item, because the value provided was in an invalid format:
>
> http://www.archive.org/advancedsearch.php?q=identifier%3Aclassschedule19301931brig&fl[]=date&fl[]=identifier&output=tables
> ..." - Hank
>
> (assigning bug to Jeff Sharpe)
>
> ** Affects: ia-techsupport
> Importance: Undecided
> Assignee: Jeff Sharpe (jeffs)
> Status: Confirmed
>
> ** Changed in: ia-techsupport
> Assignee: (unassigned) => Jeff Sharpe (jeffs)
>
> ** Changed in: ia-techsupport
> Status: New => Confirmed
>
> ** Description changed:
>
> Three items from Provo --
>
> classschedule19291930brig
> classschedule19301931brig
> classschedule19311932brig
>
> had their date information changed from "1929" to a invalid date. This
> caused the item to have no date associated with it at all in the search
> engine.
>
> What to do:
>
> 1) Change the date to be valid (valid formats below)
> 2) Confirm that this change has been made by replying to this bug report
>
> + For reference:
>
> - For reference:
> -
> - he valid formats for date are:
> + "...The valid formats for date are:
> YYYY or YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
>
> anything other than that in the date field (like YYYY-YYYY) is going to
> cause problems - like no value entered at all in the search engine
>
> in fact, that ("no value entered at all in the search engine") is
> exactly what has now happened with classschedule19301931brig. you can
> see here that the search engine now has a blank value for "date" for
> that item, because the value provided was in an invalid format:
>
> http://www.archive.org/advancedsearch.php?q=identifier%3Aclassschedule19301931brig&fl[]=date&fl[]=identifier&output=tables
> + ..." - Hank
>
> (assigning bug to Jeff Sharpe)
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/683953
>
> Title:
> Invalid date information in items
>
> Status in Internet Archive - Tech Support:
> Confirmed
>
> Bug description:
> Three items from Provo --
>
> classschedule19291930brig
> classschedule19301931brig
> classschedule19311932brig
>
> had their date information changed from "1929" to a invalid date. This
> caused the item to have no date associated wit...

Read more...

Revision history for this message
Jeff Sharpe (jeffs) wrote :

I changed these. should be fixed.
Jeff

> You have been assigned a bug task for a public bug by paul.n (paul-n):
>
> Three items from Provo --
>
> classschedule19291930brig
> classschedule19301931brig
> classschedule19311932brig
>
> had their date information changed from "1929" to a invalid date. This
> caused the item to have no date associated with it at all in the search
> engine.
>
> What to do:
>
> 1) Change the date to be valid (valid formats below)
> 2) Confirm that this change has been made by replying to this bug report
>
>
> For reference:
>
> he valid formats for date are:
> YYYY or YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
>
> anything other than that in the date field (like YYYY-YYYY) is going to
> cause problems - like no value entered at all in the search engine
>
> in fact, that ("no value entered at all in the search engine") is
> exactly what has now happened with classschedule19301931brig. you can
> see here that the search engine now has a blank value for "date" for
> that item, because the value provided was in an invalid format:
>
> http://www.archive.org/advancedsearch.php?q=identifier%3Aclassschedule19301931brig&fl[]=date&fl[]=identifier&output=tables
>
> (assigning bug to Jeff Sharpe)
>
> ** Affects: ia-techsupport
> Importance: Undecided
> Assignee: Jeff Sharpe (jeffs)
> Status: New
>
> --
> Invalid date information in items
> https://bugs.launchpad.net/bugs/683953
> You received this bug notification because you are a bug assignee.
>

Revision history for this message
paul.n (paul-n) wrote :

Thank you Jeff.
Closing bug.

Changed in ia-techsupport:
status: Confirmed → Fix Released
Revision history for this message
Hank Bromley (hank-archive) wrote :

No, still invalid - which is apparent if you use the search engine link above, which shows the field is still blank in the search engine index.

All three items simply had the hyphens removed, leaving a date like "1930 1930," which does not match any of the three valid formats.

Revision history for this message
Jeff Sharpe (jeffs) wrote : Re: [Bug 683953] Re: Invalid date information in items

Hi Hank,
I was wondering about that. I will take out the second year. I did see
that the year showed up on the details page, and was hoping that worked,
but obviously it didn't.
Thanks,
Jeff

> No, still invalid - which is apparent if you use the search engine link
> above, which shows the field is still blank in the search engine index.
>
> All three items simply had the hyphens removed, leaving a date like
> "1930 1930," which does not match any of the three valid formats.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/683953
>
> Title:
> Invalid date information in items
>
> Status in Internet Archive - Tech Support:
> Fix Released
>
> Bug description:
> Three items from Provo --
>
> classschedule19291930brig
> classschedule19301931brig
> classschedule19311932brig
>
> had their date information changed from "1929" to a invalid date. This
> caused the item to have no date associated with it at all in the search
> engine.
>
> What to do:
>
> 1) Change the date to be valid (valid formats below)
> 2) Confirm that this change has been made by replying to this bug report
>
> For reference:
>
> "...The valid formats for date are:
> YYYY or YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
>
> anything other than that in the date field (like YYYY-YYYY) is going to
> cause problems - like no value entered at all in the search engine
>
> in fact, that ("no value entered at all in the search engine") is exactly
> what has now happened with classschedule19301931brig. you can see here
> that the search engine now has a blank value for "date" for that item,
> because the value provided was in an invalid format:
>
> http://www.archive.org/advancedsearch.php?q=identifier%3Aclassschedule19301931brig&fl[]=date&fl[]=identifier&output=tables
> ..." - Hank
>
> (assigning bug to Jeff Sharpe)
>
>
>
>
>

Revision history for this message
Hank Bromley (hank-archive) wrote :

Yuck, a whole bunch more of Marisa's modify_xml tasks ran after we started this discussion (they'd already been submitted, but were held up by the catalog backlog until last night). We still have all these with invalid dates:

http://www.us.archive.org/metamgr.php?&srt=updated&ord=desc&w_date=*-*&w_identifier=classschedul*&fs_date=on&fs_identifier=on&fs_mediatype=on&fs_format=on&fs_collection=on&fs_curatestate=on&fs_noindex=on&off=0&lim=25

And Jeff, when making changes like this, can you please use modify_xml.php instead of editxml, so that the task history shows the changes?

paul.n (paul-n)
Changed in ia-techsupport:
status: Fix Released → In Progress
Revision history for this message
Jeff Sharpe (jeffs) wrote :

Hi,
Marisa informed me she did about 75 of those last night. I will start on
those tomorrow. She also said she has done many serials like that in the
past with no problems.

Thanks,
Jeff

> Yuck, a whole bunch more of Marisa's modify_xml tasks ran after we
> started this discussion (they'd already been submitted, but were held up
> by the catalog backlog until last night). We still have all these with
> invalid dates:
>
> http://www.us.archive.org/metamgr.php?&srt=updated&ord=desc&w_date=*-*&w_identifier=classschedul*&fs_date=on&fs_identifier=on&fs_mediatype=on&fs_format=on&fs_collection=on&fs_curatestate=on&fs_noindex=on&off=0&lim=25
>
> And Jeff, when making changes like this, can you please use
> modify_xml.php instead of editxml, so that the task history shows the
> changes?
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/683953
>
> Title:
> Invalid date information in items
>
> Status in Internet Archive - Tech Support:
> Fix Released
>
> Bug description:
> Three items from Provo --
>
> classschedule19291930brig
> classschedule19301931brig
> classschedule19311932brig
>
> had their date information changed from "1929" to a invalid date. This
> caused the item to have no date associated with it at all in the search
> engine.
>
> What to do:
>
> 1) Change the date to be valid (valid formats below)
> 2) Confirm that this change has been made by replying to this bug report
>
> For reference:
>
> "...The valid formats for date are:
> YYYY or YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
>
> anything other than that in the date field (like YYYY-YYYY) is going to
> cause problems - like no value entered at all in the search engine
>
> in fact, that ("no value entered at all in the search engine") is exactly
> what has now happened with classschedule19301931brig. you can see here
> that the search engine now has a blank value for "date" for that item,
> because the value provided was in an invalid format:
>
> http://www.archive.org/advancedsearch.php?q=identifier%3Aclassschedule19301931brig&fl[]=date&fl[]=identifier&output=tables
> ..." - Hank
>
> (assigning bug to Jeff Sharpe)
>
>
>
>
>

Revision history for this message
paul.n (paul-n) wrote :

Yea, I asked Hank if we should prevent people from being able to enter
these wrong in the first place. The issue is that there isn't much
feedback - just like how you were able to enter an invalid date and it
went un-detected.

I guess at some point, Hank will have to generate a master list of date
issues and we'll have to fix. If Marisa knows which other ones she did,
that would be helpful.

So there's kinda two issues:

1) the system should be better at preventing people from doing things
that aren't allowed
2) it's impossible for folks to remember all the details, but they
should at least remember one rule: if you aren't sure, ask. I think in
this case, Marisa noticed something she thought to be a problem and just
tried to fix it herself in a way she thought was correct. In most cases
this approach should be okay, but see issue #1

Paul

On 12/2/10 2:00 PM, Jeff Sharpe wrote:
> Hi,
> Marisa informed me she did about 75 of those last night. I will start on
> those tomorrow. She also said she has done many serials like that in the
> past with no problems.
>
> Thanks,
> Jeff
>
>> Yuck, a whole bunch more of Marisa's modify_xml tasks ran after we
>> started this discussion (they'd already been submitted, but were held up
>> by the catalog backlog until last night). We still have all these with
>> invalid dates:
>>
>> http://www.us.archive.org/metamgr.php?&srt=updated&ord=desc&w_date=*-*&w_identifier=classschedul*&fs_date=on&fs_identifier=on&fs_mediatype=on&fs_format=on&fs_collection=on&fs_curatestate=on&fs_noindex=on&off=0&lim=25
>>
>> And Jeff, when making changes like this, can you please use
>> modify_xml.php instead of editxml, so that the task history shows the
>> changes?
>>
>> --
>> You received this bug notification because you are a bug assignee.
>> https://bugs.launchpad.net/bugs/683953
>>
>> Title:
>> Invalid date information in items
>>
>> Status in Internet Archive - Tech Support:
>> Fix Released
>>
>> Bug description:
>> Three items from Provo --
>>
>> classschedule19291930brig
>> classschedule19301931brig
>> classschedule19311932brig
>>
>> had their date information changed from "1929" to a invalid date. This
>> caused the item to have no date associated with it at all in the search
>> engine.
>>
>> What to do:
>>
>> 1) Change the date to be valid (valid formats below)
>> 2) Confirm that this change has been made by replying to this bug report
>>
>> For reference:
>>
>> "...The valid formats for date are:
>> YYYY or YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
>>
>> anything other than that in the date field (like YYYY-YYYY) is going to
>> cause problems - like no value entered at all in the search engine
>>
>> in fact, that ("no value entered at all in the search engine") is exactly
>> what has now happened with classschedule19301931brig. you can see here
>> that the search engine now has a blank value for "date" for that item,
>> because the value provided was in an invalid format:
>>
>> http://www.archive.org/advancedsearch.php?q=identifier%3Aclassschedule19301931brig&fl[]=date&fl[]=identifier&output=tables
>> ..." - Hank
>>
>> (assigning bug to Jeff Sharpe)
>>
>>
>>
>>
>>

Revision history for this message
Hank Bromley (hank-archive) wrote :

> She also said she has done many serials like that in the past with no problems.

Well, perhaps no "problems" in the sense of immediate feedback telling her there was an error, but all of these items probably have no "date" values in the search engine now.

Revision history for this message
paul.n (paul-n) wrote :

I thought I was replying directly to Jeff in my previous response. But since it's here, I should clarify a bit:

   * If invalid formats are entered via modify_xml, I think it would be helpful if the user was blocked from submitting or, more likely, a red-row generated. This way we would at least know there was a problem.
   * I'm actually not sure if it's possible to get a "master" list of all invalid date formats in all items. Looking closer at the link Hank provided, there is an easy way to get at least books which include a "-" in the date field. There are 126 found:

http://www.us.archive.org/metamgr.php?&srt=updated&ord=desc&w_date=*-*&w_mediatype=texts&w_collection=americana&w_sponsor=Brigham*Young*University&w_scanner=scribe1.provo.archive.org&w_repub_state=4%20or%20-1&fs_date=on&fs_identifier=on&fs_mediatype=on&fs_format=on&fs_collection=on&fs_sponsor=on&fs_scandate=on&fs_scanner=on&fs_curatestate=on&fs_repub_state=on&fs_size=on&fs_noindex=on&off=0&lim=25

There should probably be a project to find all invalid formats for other centers but I think there are ways to do it w/out Hank creating some master list like I suggested in my previous response.

Revision history for this message
Hank Bromley (hank-archive) wrote :
Download full text (7.1 KiB)

Here's another way to find invalid dates - this search query returns 200 BYU items that have no date value in the search engine:

http://www.archive.org/advancedsearch.php?q=sponsor%3A%22Brigham+Young+University%22+AND+NOT+date%3A1*+AND+NOT+date%3A2*+AND+collection%3Aamericana&fl[]=date&fl[]=identifier&rows=250&output=tables

Below is a metamgr query to show a histogram of the date values in all those items (long and ugly URL - sorry!). Some are legit - e.g., when the date isn't known, the MARC record may specify "[n.d.]" for the date, and that won't be considered valid by the search engine. But most of the values result from someone manually inserting something invalid.

http://www.us.archive.org/metamgr.php?f=histogram&group=date&w_identifier=pourunregard00mr%20OR%20pourvivre00goza%20OR%20prologdichtung00wern%20OR%20rponsedunefavo00minn%20OR%20traitepratiqueda02lewa%20OR%20stephensmonthlym00step%20OR%20streikbrecherfes00bern%20OR%20sundayschoolteac00sund%20OR%20supremepardon00lano%20OR%20undergraduatecat19941995brig%20OR%20undergraduatecat19951996brig%20OR%20undergraduatecat19961998brig%20OR%20undergraduatecat19971998brig%20OR%20undergraduatecat19981999brig%20OR%20undergraduatecat19992000brig%20OR%20undergraduatecat20002001brig%20OR%20undergraduatecat20012002brig%20OR%20undergraduatecat20022003brig%20OR%20undergraduatecat20032004brig%20OR%20undergraduatecat20042005brig%20OR%20undergraduatecat20052006brig%20OR%20undergraduatecat20062007brig%20OR%20undergraduatecat20072008brig%20OR%20traitepratiqueda01lewa%20OR%20wihenbergvorfunf00bern%20OR%20zeugnissefurdiee00reit%20OR%20zurlandesbefesti00zuri%20OR%20zustandewahrendd00wild%20OR%20zurallensteiners00somm%20OR%20vorfunfzehnjahre00leip%20OR%20uberdielagedesbu00hann%20OR%20classschedule19331934brig%20OR%20classschedule19401941brig%20OR%20classschedule19361937brig%20OR%20classschedule19561957brig%20OR%20classschedule19611962brig%20OR%20classschedule19411942brig%20OR%20classschedule19471948brig%20OR%20classschedule19511952brig%20OR%20classschedule19481949brig%20OR%20classschedule19351936brig%20OR%20classschedule19601961brig%20OR%2050griefsfaitsarg00laro%20OR%20adamgraemeofmoss00olip%20OR%20annualcatalogue19131914brig%20OR%20annualcatalogue19141915brig%20OR%20annualcatalogue19151916brig%20OR%20annualcatalogue19161917brig%20OR%20annualcatalogue19171918brig%20OR%20annualcatalogue19181919brig%20OR%20annualcatalogue19191920brig%20OR%20annualcatalogue19201921brig%20OR%20annualcatalogue19211922brig%20OR%20annualcatalogue19221923brig%20OR%20annualcatalogue19231924brig%20OR%20annualcatalogue19231926brig%20OR%20annualcatalogue19241925brig%20OR%20annualcatalogue19251926brig%20OR%20annualcatalogue19261927brig%20OR%20annualcatalogue19271928brig%20OR%20annualcatalogue19281929brig%20OR%20annualcatalogue19291930brig%20OR%20annualcatalogue19301931brig%20OR%20annualcatalogue19311932brig%20OR%20annualcataloguei19411942brig%20OR%20annualcataloguei19421943brig%20OR%20annualcataloguei19431944brig%20OR%20annualcataloguei19441945brig%20OR%20annualcataloguei19451946brig%20OR%20annualcataloguei19461947brig%20OR%20annualcataloguei19481949brig%20OR%20annualcataloguei19491950brig%20OR%20annualcataloguei19501951brig%20OR%20annu...

Read more...

Revision history for this message
Jeff Sharpe (jeffs) wrote :

There are 5 left from this list and only one: classschedule19691970brig
allows me access to the details page.

Jeff

> I thought I was replying directly to Jeff in my previous response. But
> since it's here, I should clarify a bit:
>
> * If invalid formats are entered via modify_xml, I think it would be
> helpful if the user was blocked from submitting or, more likely, a
> red-row generated. This way we would at least know there was a problem.
> * I'm actually not sure if it's possible to get a "master" list of all
> invalid date formats in all items. Looking closer at the link Hank
> provided, there is an easy way to get at least books which include a
> "-" in the date field. There are 126 found:
>
> http://www.us.archive.org/metamgr.php?&srt=updated&ord=desc&w_date=*-*&w_mediatype=texts&w_collection=americana&w_sponsor=Brigham*Young*University&w_scanner=scribe1.provo.archive.org&w_repub_state=4%20or%20-1&fs_date=on&fs_identifier=on&fs_mediatype=on&fs_format=on&fs_collection=on&fs_sponsor=on&fs_scandate=on&fs_scanner=on&fs_curatestate=on&fs_repub_state=on&fs_size=on&fs_noindex=on&off=0&lim=25
>
> There should probably be a project to find all invalid formats for other
> centers but I think there are ways to do it w/out Hank creating some
> master list like I suggested in my previous response.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/683953
>
> Title:
> Invalid date information in items
>
> Status in Internet Archive - Tech Support:
> In Progress
>
> Bug description:
> Three items from Provo --
>
> classschedule19291930brig
> classschedule19301931brig
> classschedule19311932brig
>
> had their date information changed from "1929" to a invalid date. This
> caused the item to have no date associated with it at all in the search
> engine.
>
> What to do:
>
> 1) Change the date to be valid (valid formats below)
> 2) Confirm that this change has been made by replying to this bug report
>
> For reference:
>
> "...The valid formats for date are:
> YYYY or YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
>
> anything other than that in the date field (like YYYY-YYYY) is going to
> cause problems - like no value entered at all in the search engine
>
> in fact, that ("no value entered at all in the search engine") is exactly
> what has now happened with classschedule19301931brig. you can see here
> that the search engine now has a blank value for "date" for that item,
> because the value provided was in an invalid format:
>
> http://www.archive.org/advancedsearch.php?q=identifier%3Aclassschedule19301931brig&fl[]=date&fl[]=identifier&output=tables
> ..." - Hank
>
> (assigning bug to Jeff Sharpe)
>
>
>
>
>

Revision history for this message
Hank Bromley (hank-archive) wrote :

Those other four are dark, and I see you've already submitted a modify_xml to correct that last one, which will run when the current derive finishes. So I guess this batch is done, but we probably still have a bunch more from other contributors.

Revision history for this message
Jude Coelho (judec) wrote :

Any update on this? It's been inactive for a long time. If the issue is resolved, please mark the bug "Fix Released" Jeff.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.