By looking at death_date and changing all digits to 9, the following patterns are the most common (in authors.json.gz of 29 July, 2009 having 6.47 million author records):
occurrences value pattern comment
164289 9999. trailing period should be removed
125446 9999 nice year
4598 * just an asterisk, should be blank?
4153 . just a period, should be blank
3499 , just a comma, should be blank
3311 9999, trailing comma should be removed
2738 [from old catalog] imported from LoC? garbage?
2485 9999?
2395 ) just a closing parenthesis, should be blank
2150 9999. [from old catalog] LoC?
1717 .· just a period and a mid-dot, should be blank
1453 ca. 9999
1400 999
765 .* just a period and an asterisk, should be blank
762 ca. 9999.
701 9999 or 9.
424 9999] trailing bracket should be removed
406 999. trailing period should be removed
314 9999) trailing parenthesis should be removed
263 ca. 999
233 ed. huh?
225 9999 or 99. trailing period should be removed
207 9999?. trailing period should be removed
191 ). just parenthesis and period, should be blank
173 999?
172 9999.· trailing period and mid-dot should be removed
162 ] just a closing bracket, should be blank
129 9999.* trailing period and asterisk should be removed
121 9999 or 9
114 c huh?
98 99
97 9999, [from old catalog] LoC?
86 comp. huh?
86 9999 or 99
84 ca. 999. trailing period should be removed
83 · just a mid-dot, should be blank
62 999 B.C. keep trailing period after B.C.
61 999 or 9. trailing period should be removed
56 99. trailing period should be removed
50 99th cent. keep trailing period after cent.
42 99.99.9999 is this day.month.year or month.day.year?
38 .... four periods?
35 l999 lowercase L instead of digit 1
33 9999. [from old catalog] period and two spaces before the bracket, LoC import?
32 ca.9999 should have space after ca.
31 9999). trailing parenthesis and period should be removed
30 9
28 ca. 999 B.C.
27 9999 . trailing space and period should be removed
26 99/99/9999 is this day/month/year or month/day/year ?
26 ?
25 l999. lowercase L instead of digit 1
25 99 B.C.
By looking at death_date and changing all digits to 9, the following patterns are the most common (in authors.json.gz of 29 July, 2009 having 6.47 million author records):
occurrences value pattern comment
164289 9999. trailing period should be removed
125446 9999 nice year
4598 * just an asterisk, should be blank?
4153 . just a period, should be blank
3499 , just a comma, should be blank
3311 9999, trailing comma should be removed
2738 [from old catalog] imported from LoC? garbage?
2485 9999?
2395 ) just a closing parenthesis, should be blank
2150 9999. [from old catalog] LoC?
1717 .· just a period and a mid-dot, should be blank
1453 ca. 9999
1400 999
765 .* just a period and an asterisk, should be blank
762 ca. 9999.
701 9999 or 9.
424 9999] trailing bracket should be removed
406 999. trailing period should be removed
314 9999) trailing parenthesis should be removed
263 ca. 999
233 ed. huh?
225 9999 or 99. trailing period should be removed
207 9999?. trailing period should be removed
191 ). just parenthesis and period, should be blank
173 999?
172 9999.· trailing period and mid-dot should be removed
162 ] just a closing bracket, should be blank
129 9999.* trailing period and asterisk should be removed
121 9999 or 9
114 c huh?
98 99
97 9999, [from old catalog] LoC?
86 comp. huh?
86 9999 or 99
84 ca. 999. trailing period should be removed
83 · just a mid-dot, should be blank
62 999 B.C. keep trailing period after B.C.
61 999 or 9. trailing period should be removed
56 99. trailing period should be removed
50 99th cent. keep trailing period after cent.
42 99.99.9999 is this day.month.year or month.day.year?
38 .... four periods?
35 l999 lowercase L instead of digit 1
33 9999. [from old catalog] period and two spaces before the bracket, LoC import?
32 ca.9999 should have space after ca.
31 9999). trailing parenthesis and period should be removed
30 9
28 ca. 999 B.C.
27 9999 . trailing space and period should be removed
26 99/99/9999 is this day/month/year or month/day/year ?
26 ?
25 l999. lowercase L instead of digit 1
25 99 B.C.