parse() handling of strings containing only integers is inconsistent

Bug #1399821 reported by Chris
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dateutil
New
Undecided
Unassigned

Bug Description

We have a set of strings from data and we're trying to test to see if they are dates or something else. Code looks something like

try:
    dt = parser.parse(my_random_string)
    return dt
except:
    # not a date, does something else

unfortunately finding a string such as "113" returns inconsistently. In this case I had string "113" and I got back

datetime(113, 12, 5, 0, 0)

(12/5 is today's date)

I can see how this would sort of be a feature but different integer/strings do different things

One and two digits affect the day only:

parser.parse("1") - returns datetime(2014, 12, 1, 0, 0)
parser.parse("10") - returns datetime(2014, 12, 10, 0, 0)

3 digits affects the year, as above. As does 4 digits:

parser.parse( "113" ) - returns datetime(113, 12, 5, 0, 0)
parser.parse("1130") - returns datetime(1130, 12, 5, 0, 0)

5 digits
parser.parse("11301") - ValueError: year out of range (wat?)

6 digits
parser.parse("113011") - ValueError: month must be 1..12

7 digits
parser.parse("1130111") - ValueError: year is out of range

8 digits
parser.parse("11301111") - returns datetime(1130, 11, 11, 0, 0)

In my opinion 8 digits makes sense but everything else is extremely ambiguous and should probably throw an exception.

However there's another gotcha with even with an 8 digit date. The resulting datetime.strftime() throws a exception that years must be less than 1900

Alternatively I propose such strings be parsed consistently by looking for a year first, followed by a month, followed by days. Rather than falling back to today's date for missing values, fall back to the first day or month that would be valid. Any year less than 1900 should be discarded.

Examples:

parser.parse("0") - datetime(2000,1, 1, 0, 0)

parser.parse("10") - datetime(2010, 1, 1, 0, 0)

parser.parse("100") - ValueError - year out of range

parser.parse("1000") - ValueError - year out of range

parser.parse("1900") - datetime(1900, 1, 1, 0, 0)

parser.parse("10000") - ValueError - year out of range

# next 1 or 2 digits considered month
parser.parse("19001") - datetime(1900, 1, 1, 0, 0)

parser.parse("190012") - datetime(1900, 12, 1, 0, 0)

parser.parse("190013") - ValueError - month out of range

# Next 1 or 2 digits considered date

parser.parse("1900121") - datetime(1900, 12, 1, 0, 0)

parser.parse("19001220") - datetime(1900, 12, 20,0,0)

parser.parse("19001232") - ValueError day out of range

I could work up a patch if this seems sane.

Revision history for this message
jarondl (jarondl) wrote :

A more 'strict' parsing behavior is very welcome, if you can help with a patch. Please note that the development activity officially moved to https://github.com/dateutil/dateutil

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.