Comment 2 for bug 423523

Revision history for this message
LA2 (lars-aronsson) wrote :

By looking at death_date and changing all digits to 9, the following patterns are the most common (in authors.json.gz of 29 July, 2009 having 6.47 million author records):

occurrences value pattern comment
 164289 9999. trailing period should be removed
 125446 9999 nice year
   4598 * just an asterisk, should be blank?
   4153 . just a period, should be blank
   3499 , just a comma, should be blank
   3311 9999, trailing comma should be removed
   2738 [from old catalog] imported from LoC? garbage?
   2485 9999?
   2395 ) just a closing parenthesis, should be blank
   2150 9999. [from old catalog] LoC?
   1717 .· just a period and a mid-dot, should be blank
   1453 ca. 9999
   1400 999
    765 .* just a period and an asterisk, should be blank
    762 ca. 9999.
    701 9999 or 9.
    424 9999] trailing bracket should be removed
    406 999. trailing period should be removed
    314 9999) trailing parenthesis should be removed
    263 ca. 999
    233 ed. huh?
    225 9999 or 99. trailing period should be removed
    207 9999?. trailing period should be removed
    191 ). just parenthesis and period, should be blank
    173 999?
    172 9999.· trailing period and mid-dot should be removed
    162 ] just a closing bracket, should be blank
    129 9999.* trailing period and asterisk should be removed
    121 9999 or 9
    114 c huh?
     98 99
     97 9999, [from old catalog] LoC?
     86 comp. huh?
     86 9999 or 99
     84 ca. 999. trailing period should be removed
     83 · just a mid-dot, should be blank
     62 999 B.C. keep trailing period after B.C.
     61 999 or 9. trailing period should be removed
     56 99. trailing period should be removed
     50 99th cent. keep trailing period after cent.
     42 99.99.9999 is this day.month.year or month.day.year?
     38 .... four periods?
     35 l999 lowercase L instead of digit 1
     33 9999. [from old catalog] period and two spaces before the bracket, LoC import?
     32 ca.9999 should have space after ca.
     31 9999). trailing parenthesis and period should be removed
     30 9
     28 ca. 999 B.C.
     27 9999 . trailing space and period should be removed
     26 99/99/9999 is this day/month/year or month/day/year ?
     26 ?
     25 l999. lowercase L instead of digit 1
     25 99 B.C.