DateTime doesn't parse all ISO8601-valid strings

Bug #143784 reported by Tim Hicks on 2006-09-12
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
DateTime
Medium
Dorneles Tremea

Bug Description

The DateTime.DateTime constructor successfully parses ISO8601 strings like '2006-09-12T20:33:11Z' but does not recognise them like '20060912T20:33:11Z'. According to http://www.w3.org/TR/NOTE-datetime, the latter is not valid, but http://www.cl.cam.ac.uk/~mgk25/iso-time.html suggests otherwise. I'm not sure which is correct, but I have a problem in that Ecto (the remote blogging tool) is sending strings like the latter and I need to parse them.

I'm uploading a diff that enables the behaviour that I'm after.

Tim Hicks (tim-sitefusion) wrote :
Tim Hicks (tim-sitefusion) wrote :

Uploaded: testdatetime.diff

And here is a patch that tests the new behaviour.

The problem is that the first patch uploaded here causes another test to fail. I cannot see how or why, and probing with DateTime manually from the python prompt, I'm unable to reproduce the test failure.

If anyone with a better grasp of this has any ideas, I'd love to hear them.

Andreas Jung (ajung) wrote :

ISO8601 allows both YYYY-MM-DD and YYYYMMDD. DateTime should support *both*. A related patch/unittest must support both.

Tim Hicks (tim-sitefusion) wrote :

The patch is supposed to support both formats, and indeed, it does seem to. The test also tests both formats.

The problem is that another, apparenly unrelated test, now fails (with details below). I'm really not sure why this separate test fails now. If anyone can suggest anything to try after looking at the failure below, I'd love to hear it.

"""
tim@teresa 2.8.8 $ bin/zopectl test --libdir lib/python/DateTime
Running tests via: /usr/bin/python /home/tim/zope/freethink/2.8.8/bin/test.py -v --config-file /home/tim/zope/freethink/2.8.8/etc/zope.conf --libdir lib/python/DateTime
Running unit tests at level 1
Running unit tests from /home/tim/zope/freethink/2.8.8/lib/python/DateTime
Parsing /home/tim/zope/freethink/2.8.8/etc/zope.conf
..............F.................
======================================================================
FAIL: testInternationalDateformat (tests.testDateTime.DateTimeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tim/zope/freethink/2.8.8/lib/python/DateTime/tests/testDateTime.py", line 343, in testInternationalDateformat
    self.assertEqual(d_us, d_int)
  File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual
    raise self.failureException, \
AssertionError: DateTime('1990/01/02') != DateTime('1990/01/01')

----------------------------------------------------------------------
Ran 32 tests in 30.515s

FAILED (failures=1)
"""

Dorneles Tremea (dtremea) wrote :

Status: Pending => Accepted

 Supporters added: dtremea

Uploaded: DateTime-ISO8601.patch

Hey folks,

> The patch is supposed to support both formats, and indeed, it
> does seem to. The test also tests both formats.
>
> The problem is that another, apparenly unrelated test, now fails
> (with details below). I'm really not sure why this separate test
> fails now. If anyone can suggest anything to try after looking
> at the failure below, I'd love to hear it.

I read the ISO8601 3rd edition standard specification[1], from 2004-12-01, and added support for these features:

 - reduce date format
 - week format
 - day of year format
 - hours/minutes fractions
 - optional date/time separators
 - optional seconds/minutes
 - timezones with only one digit for hour

Attached is the patch with tests for each feature. Additionally,
it fixes the failing test introduced by the initial work on this
issue.

Please review it and give me a '+1', so I can commit my first
Zope code contribution!

Any feedback is welcome.

[1] http://www.omg.org/docs/ISO-stds/06-08-01.pdf

PS: Don't be afraid of that 45 lines regex, it's just for the
sake of readability and understanding... :-)

Tim Hicks (tim-sitefusion) wrote :

I don't have time to test this new patch, but it sounds like a *considerable* improvement to mine. If others are happy, so am I!

Thanks Dorneles :).

Dorneles Tremea (dtremea) wrote :

Looks like this fix broke 3 CMF tests:

http://mail.zope.org/pipermail/cmf-tests/2006-November/003324.html

After further investigation, I would say that "YYYY-MM-DD" dates
are in the ISO8601 format, and thus GMT based.

Dates in format "YYYY-MM-DD HH:MM:SS" are also in ISO format, but
Zope wasn't recognizing they as such. Now Zope does.

I argue that CMF was misusing DateTime, and that the new behavior
is the correct, otherwise we'll never get a consistent behavior
for the ISO dates.

I argue that the specific getBeginAndEndTimes method from
CalendarTool.py should also use the timezone, analog to what the
getEventsForThisDay method does.

Thoughts?

Stefan H. Holek (stefanholek) wrote :

Status: Resolved => Accepted

FWIW, I am all for backward compatiblity in this case.

"Naive" ISO (w/o timezone) used to mean local time. I'd be very reluctant to change this behavior, even though I agree it is "wrong". There are currently tests breaking in CMF and ATContentTypes because of this, and I am worried that there is a lot of code out there relying on the local time interpretation.

I could however imagine a zope.conf flag, say, 'datetime-strict-iso-parsing' that would be *off* by default, at least on the maintenance branches (2.9, 2.10).

Tres Seaver (tseaver) wrote :

This change has to be reverted on the release branches, as it
breaks semantics of peoples *persistent data* with no possibility
of correct BBB.

On the trunk, such a change should come with an *option" to enable
the new semantics; that option should be *disabled* through
at least the next release, at which point it might safely be
enabled.

I will revert on the 2.9 and 2.10 branches.

Tres Seaver (tseaver) wrote :

Note as well that docstrings are not allowed for testcase methods:
the testrunner displays them, if present, rather than the name
of the method, which is not acceptable.

Tres Seaver (tseaver) wrote :

There is also a bunch of "gratuitous" janitorial change (e.g., removing periods at the end of doctstring sentences) which is
both debatable on the merits (An imperative sentence *ought* to
end with a period) and inappropriate for code long in maintenance
mode (it makes finding the "real" changes, or applying other
patches, needlessly difficult.

Dorneles Tremea (dtremea) wrote :

Hey Tres,

sorry for not have fixed this on the last weekend as I promised, but my goal is to do it later today. In fact, I have 80% from it done, which means we add the ISO8601 support and *keep* the backward compatibility.

Some remarks: about the docstrings on tests, yep, I'm aware of it,
that's why I changed them to comments (and not the other way round).

About the cosmetical code changes, sorry for that, but I just standardized with the remaining code from the same file... :-)

Tres Seaver (tseaver) wrote :

> = Comment - Entry #13 by dtremea on Dec 13, 2006 12:19 pm

> sorry for not have fixed this on the last weekend as I promised, but my
> goal is to do it later today. In fact, I have 80% from it done, which
> means we add the ISO8601 support and *keep* the backward compatibility.

I'm sorry if my reversions today on the 2.9 and 2.10 branchs
make your merge harder. :(

> Some remarks: about the docstrings on tests, yep, I'm aware of it,
> that's why I changed them to comments (and not the other way round).

My bad -- I think I was looking at a reversed diff.

> About the cosmetical code changes, sorry for that, but I just
> standardized with the remaining code from the same file... :-)

I will assume that was my bad, too.

In general, I would avoid committing "cleanup" changes like those
at the same time as "substantive" changes, and would only commit such "cleanups" on the trunk.

Brett Carter (bacarter) wrote :

This patch appears to break backwards API compatibility. DateTime() objects now assume any date entered is in GMT unless a timezone is specified. This is contrary to zope 2.9.5 behavior where the timezone is assumed to be the system default. The new code is also inconsistent:

print "\n>>>>>> ", DateTime().timezone()
print "\n>>>>>> ", DateTime('2006-02-27').timezone()
print "\n>>>>>> ", DateTime('2006-02-27 12:30:00').timezone()
print "\n>>>>>> ", DateTime().greaterThan(DateTime('2007-02-27 01:00:00'))

Prints:
>>>>>> US/Pacific
>>>>>> GMT+0
>>>>>> GMT+0
>>>>>> True

Note that a DateTime() with no string returns an instance in the the machine's timezone (Pacific) however all other instances are in GMT. This really screws up the final comparison, where we end up comparing a GMT time to a Pacific time.

John Eikenberry (jae) wrote :

Looks like it is mistakenly parsing the '-' dates as iso8601.
The big iso8601Match regex matches the first example below, which it shouldn't (due to the space).

Eg.

>>> DateTime('2007-02-27 13:09:01.279').timezone()
'GMT+0'
>>> DateTime('2007/02/27 13:09:01.279').timezone()
'US/Eastern'

Tres Seaver (tseaver) wrote :

Agreed, the referenced spec[1] does not permit the space
as a separator between the date and time portions (slthough
in some forms it may be elided altogether).

[1] http://www.omg.org/docs/ISO-stds/06-08-01.pdf

John Eikenberry (jae) wrote :

The regex has a line that allows a whitespace where it shouldn't.
Around line 19 of the regex is this line.

  (?:[T ] # one T or one whitespace

This should be (or some proximate of this).

  (?:T? # one optional T

From my reading of the spec this whitespace is not allowed. Below is the section of the document that discusses this. It mentions the 'T' being optionally omitted, but nothing about allowing a whitespace.

--- snippet from page 19 of the iso8601 spec ---
The character [T] shall be used as time designator to indicate the start of the representation of the time of day
component in these expressions. The hyphen [-] and the colon [:] shall be used, in accordance with 4.4.4, as
separators within the date and time of day expressions, respectively, when required.

NOTE By mutual agreement of the partners in information interchange, the character [T] may be omitted in
applications where there is no risk of confusing a date and time of day representation with others defined in this
International Standard.

mustapha (mustapha-headnet) wrote :

Hi,

from here:
http://en.wikipedia.org/wiki/ISO_8601#Combined_representations
the whitespace is allowed to separate date and time

bacarter, eikenberry,

I found out that the method __parse_iso8601 sets the TimeZone to GMT+0 when a timezone is not given with the time. This is not allowed in ISO8601. In the case when a timezone is not given local timezone should be used.

I modified the line before the 'return' statement in __parse_iso8601 method from:

tz = 'GMT%+03d%02d' % (hour_off, min_off)

To:

tz = fields['hour_off'] and 'GMT%+03d%02d' % (hour_off, min_off) or None

That way the method _parse_args will set tz to the local timezone if no tz is given.

This modification makes the test in testDateTime.testISO8601 method fail at

self.assertEqual(ref0, isoDt)

with:
ref0 = DateTime('2002/5/2 8:00am GMT')
isoDt = DateTime('2002-05-02T08:00:00') #here will tz = local timezone

and that's normal: the assertEqual returns True only if your local timezone is GMT+0 but if you are leaving at GMT+1 like me the assertEqual will fail.

Tres Seaver (tseaver) wrote :

> from here:
> http://en.wikipedia.org/wiki/ISO_8601#Combined_representations
> the whitespace is allowed to separate date and time

First, wikipedia is not authoritative against the spec, which I
cited below.

Second, it happens to be correct here, where it says:

  The date and time representations may sometimes appear in
  proximity, separated by a space or other characters, in which
  case they occupy two separate fields in a data system, rather
  than a single combined representation.

In fact, the only datetime format it shows it describes as
'<date>T<time>'.

> bacarter, eikenberry,
>
> I found out that the method __parse_iso8601 sets the TimeZone to GMT+0
> when a timezone is not given with the time. This is not allowed in
> ISO8601. In the case when a timezone is not given local timezone should
> be used.

Again, the wikipedia page you cite doesn't support that; it says:

  If no time zone information is given with a time, the time zone is
  assumed to be in some conventional local timezone. While it may be
  safe to assume a local zone when used between two people in the same
  area, it is ambiguous when used in communication between multiple
  timezones. It is usually preferable to indicate a time zone using the
  standard’s notation.

The "some conventional local timezone" makes the choice of zone an
implmentation detail: in Zope's case, we chose UTC.

Brett Carter (bacarter) wrote :

> Again, the wikipedia page you cite doesn't support that; it says:
>
> If no time zone information is given with a time, the time zone is
> assumed to be in some conventional local timezone. While it may be
> safe to assume a local zone when used between two people in the same
> area, it is ambiguous when used in communication between multiple
> timezones. It is usually preferable to indicate a time zone using the
> standard’s notation.

Agreed Wikipedia isn't the standard, but it's a simple description of an (ahem) detailed specification.

> The "some conventional local timezone" makes the choice of zone an
> implmentation detail: in Zope's case, we chose UTC.
>

I disagree. It's fairly clear by the word 'local' that the timezone should be based on the localtime in my mind.
In fact, the spec you cite clearly defines 'local time' (2.1.16) as:
"locally applicable time of day such as standard time of day, or a *non-UTC* based time of day"

More importantly, as I pointed out before, this patch breaks backwards compatibility with previous versions of zope. In my mind, and in the pythonic way, that trumps specification correctness any day.

As further evidence, ZMySQLDA does a simple DateTime() cast on mysql date, timestamp, and datetime columns:
From ZMySQLDA/db.py, line 144:

def DateTime_or_None(s):
    try: return DateTime(s)
    except: return None

Thus prior to zope 2.9.6, a Date column in mysql returned via ZMySQL are in localtime. As of 2.9.6 they're now all in GMT.

John Eikenberry (jae) wrote :

> The "some conventional local timezone" makes the choice of zone an
> implmentation detail: in Zope's case, we chose UTC.

I disagree with this interpretation of the specification. I believe it states
(though in a somewhat convoluted manner) that the timezone is local when
non-specified.

Here are quotes from the appropriate sections of the specification that I think are relevant to this

Section 4.2.2.1 (General) states:

"In the representations of local time as defined below no provisions have been
made to prevent ambiguities in expressions that result from discontinuities in
the time scale of local time (e.g. daylight saving time)."

Section 4.2.2.2 (Complete representation):

"When the application identifies the need for an expression of local time then
the complete representation shall be a single numeric expression comprising six
digits in the basic format, where [hh] represents hours, [mm] minutes and [ss]
seconds."

Section 4.2.4 (UTC of day):

"To express UTC of day the representations specified in 4.2.2.2 through 4.2.2.4
shall be used, followed immediately, without space, by the UTC designator [Z]."

Section 4.3.2 (Complete representations)

"The zone designator is empty if use is made of local time in accordance with
4.2.2.2 through 4.2.2.4, it is the UTC designator [Z] if use is made of UTC of
day in accordance with 4.2.4 and it is the difference-component if use is made
of local time and the difference from UTC in accordance with 4.2.5.2."

This clearly states that if the time is provided and either no timezone is
given or no Z (UTC designator) is present, then the local timezone is used.

While these areas of the specification are discussing how the time is
represented and interpreted, I think it idicates the use of timezone when
dealing with dates as well as they are implicit in the date. Though this could
possibly be viewed as an implementation detail, I think it is a stretch to
assume that the implicit time also implies an implicit UTC designator or
specified zone when this does counter to the assumed timezone in other cases
where it is non-specified.

John Eikenberry (jae) wrote :

Note that one problem with the current implementation given my previous note, that it should default to the local timezone when not specified, is that the implementation currently ignores the 'Z' UTC designator as it depends on the current defaulting to UTC for this. So part of any fix for this would need to deal with parsing this 'Z' UTC indicator and setting the timezone as appropriate.

John Eikenberry (jae) wrote :

Uploaded: iso8601fixes.patch

Here is a patch containing my fixes for:

'T' time of day designator (comment #18)
unspecified timezone incorrectly defaulting to UTC (comment #22)
'Z' UTC designator handling with change in default (comment #23)

Dorneles Tremea (dtremea) wrote :

Hey folks,

in the past months, CMF people also raised some more broken behavior:

http://<email address hidden>/msg22090.html

I would ask for anyone interested in help to solve this issue to
contribute with tests (or pseudo-tests, like some already present
in this thread) so we know what should be fixed.

I'm asking this, because ATM there's *no failing* tests in Zope.
All we discovered was poited out by different projects...

/me willing to have this issue fixed before Andreas release the
next versions on Sunday afternoon...

affects: zope2 → datetime
Thimo Kraemer (thimo-kraemer) wrote :

This timezone issue should be fixed. Really.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers