2014.3 tz change in US/Central looks wrong

Bug #1319939 reported by Jeff Reback
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
pytz
Opinion
Undecided
Unassigned

Bug Description

Did these really change like this?

>>> pytz.__version__
'2014.3'
>>> pytz.timezone('US/Central')
<DstTzInfo 'US/Central' LMT-1 day, 18:09:00 STD>
>>> pytz.timezone('Europe/Berlin')
<DstTzInfo 'Europe/Berlin' LMT+0:53:00 STD>

>>> pytz.__version__
'2014.2'
>>> pytz.timezone('US/Central')
<DstTzInfo 'US/Central' CST-1 day, 18:00:00 STD>
>>> pytz.timezone('Europe/Berlin')
<DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>

Revision history for this message
Jeff Reback (jeff-reback) wrote :

not a bug....just a change in TZ, sorry, was not localizing properly in some test cases!

Revision history for this message
Jeff Reback (jeff-reback) wrote :

this can be closes

Revision history for this message
Stuart Bishop (stub) wrote :

yer, we get some earlier transition times now with 2014c. The pytz test suite had a similar bug ;)

Changed in pytz:
status: New → Invalid
Revision history for this message
Aigars Mahinovs (aigarius) wrote :

Wait, Europe/Berlin having a +0053 offset is an invalid bug? We get production crashes because of that.

Revision history for this message
Aigars Mahinovs (aigarius) wrote :

How is this not a critical bug?

~/env/bin/python -c 'import pytz; print pytz.__version__; print repr(pytz.timezone("Europe/Berlin"))'
2014.2
<DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>

~/env/bin/python -c 'import pytz; print pytz.__version__; print repr(pytz.timezone("Europe/Berlin"))'
2014.3
<DstTzInfo 'Europe/Berlin' LMT+0:53:00 STD>

Revision history for this message
Stuart Bishop (stub) wrote :

@aigarius - you are displaying the timezone details at an arbitrary point in history. What you are seeing is what the IANA database tells us the timezone was before Mar 31 1893 (Local Mean Time 53 minutes ahead of UTC)

Revision history for this message
Aigars Mahinovs (aigarius) wrote :

Shouldn't that have some kind of sane-er default value, such as timezone for now()? There is this common usecase:

dt = datetime(2014, 03, 01, 13, 14) # We get a datetime without a timezone from somewhere, like MySQL
dt.replace(tzinfo=tz.timezone("Europe/Berlin")) # Set the timezone to the one we know that time was set in

The other sane alternative would be to completely refuse to give a timezone for a country when point in time is not specified.

Revision history for this message
Aigars Mahinovs (aigarius) wrote :

Please change the default timepoint for timezone selection to be now(). Otherwise this is still a regression.

Changed in pytz:
status: Invalid → New
Revision history for this message
Bernhard M. Wiedemann (ubuntubmw) wrote :
Revision history for this message
nicpottier (nicpottier) wrote :

This change has broken our use cases as well. Though I can understand the rational behind saying "garbage in, garbage out" because there is no concept of "now" when asking for the timezone, it does seem like the principle of least surprise applies here and pytz should be returning the timezone for now() here all other things being equal.

Revision history for this message
Bryan Helmig (bry4n) wrote :

This was definitely a surprise to us as well. The now() default does seems smarter than a default from 100+ years ago.

Revision history for this message
Stuart Bishop (stub) wrote :

This is not a regression. Pytz has always behaved this way. If the recent changes to the data broke things, it indicates incorrect usage and a lurking bug was triggered.

These lurking bugs are why I have always been nervous of changing the behaviour. It was obvious in most timezones that something is wrong when your timestamps are reporting a timezone from 100 years ago.

I can change the default. It does however mean that bugs like the following are less likely to be noticed, silently corrupting data:

>>> tz = pytz.timezone('Australia/Melbourne')
>>> dt = datetime(2014, 04, 10, 13, 59, tzinfo=tz)

In this case, I have constructed a 7 month old timestamp using a todays timezone definition. Exactly the sort of thing you would do when writing a log parser. It incorrectly reports daylight savings time:

>>> dt.strftime('%Y-%m-%d %H:%M:%S %Z%z')
'2014-04-10 13:59:00 AEDT+1100'

If we normalize, we get the correct timezone displayed but the time has been wound back one hour:

>>> tz.normalize(dt).strftime('%Y-%m-%d %H:%M:%S %Z%z')
'2014-04-10 12:59:00 AEST+1000'

Even if you are dealing with 'now' rather than older timestamps, there is still a race condition. The timezone instance is generated before the datetime instance, so even "datetime.now(pytz.timezone('Australia/Melbourne'))" could give you a timestamp out by one hour if you were unlucky and the DST transition occurred while that statement was being run.

So, yes, using 'now' for the default timezone information is less of a surprise, but it might not be smarter as a surprise could lead you to the problem. It has already led many people to the mailing lists or bug trackers, and certainly many more to search results.

All that said, I'm still open to changing this 10 year old behaviour. The rationale of helping people notice bugs seems dubious now, and I can't stop people from shooting themselves in the foot when they think the issues don't affect their use case.

If I have time, I may investigate if making datetime(..., tzinfo=foo) work as expected, and remove the need for localize and normalize entirely. The last time I looked, I decided it was impossible with the current Python datetime implementation but that was several years ago. If not, all this weirdness will be sorted when pytz is integrated with Python 3 (first step, find a volunteer to modify some C code).

Changed in pytz:
status: New → Opinion
Revision history for this message
Aigars Mahinovs (aigarius) wrote :

The only way to really fix that is to break the API - add a mandatory parameter to timezone() : datetime for which the timezone should be retrieved. That would likely have to be a datetime in UTC timezone to remove abiquity in datetime intervals where DST shifts the time backwards. It could also be useful to make it possible to pass a timezone name into the datetime constructor instead of timezone object so that datatime constructor could pass the requested time for the timezone construction.

I am not really sure what should be done if one requests times in the DST time shift hours: when time is shifted back there are two candidates for timezone in a location for the same date, hour and minute value, when time is shifted forward there is and hour for which there is actually no valid timezone.

Revision history for this message
Bryan Helmig (bry4n) wrote :

I definitely appreciate the thought process behind this - I hadn't originally considered the impossibility of modifying the datetime object based on tzinfo.

That said, I do think an explicit and documented default is much nicer than "whichever way the source data files are packed". It is about as defined as an ambiguous behavior can get due to the implementation restriction - I suppose.

Revision history for this message
jfs (jfs+lp) wrote :

@Stuart Bishop (stub) This is wrong: *"The timezone instance is
generated before the datetime instance, so even
"datetime.now(pytz.timezone('Australia/Melbourne'))" could give you a
timestamp out by one hour if you were unlucky and the DST transition
occurred while that statement was being run."*

**It does not matter how long datetime.now(tz) runs; the result contains
the correct tzinfo object for the corresponding date/time value.**

"now(tz)" is equivalent to "tz.fromutc(utcnow())". tz.fromutc sets the
correct tzinfo object using _transition_info.

Imagine the following implementation:

    def now(cls, tz): time.sleep(300) # tz may be created a week ago: we
        don't care as long as the tz database version is acceptable
        utc_time = cls.utcnow() # don't care how long it takes; whatever
        it returns is the *current time* time.sleep(300) # don't care
        what is the default tzinfo; it will be choosen by fromutc()
        time.sleep(300) return tz.fromutc(utc_time) # <- this sets the
        correct tzinfo for the utc_time

The result time may not be the current time (10min late) but the tzinfo
(now(tz).tzinfo) is correct for the corresponding time (utc_time). It
does not matter whether DST transition happens while the method runs (it
won't change utc_time, it won't change tz._utc_transition_times). The
_result won't be wrong by one hour_ i.e., **abs(now(tz) - now(tz)) won't
jump even if DST transition happens between/during the calls.**

If you meant that close now() calls may return drastically different
local times then it is true e.g., two now() calls 10 minutes apart may
return 1:55 and 3:05 local times but it is the expected result for a
"Spring forward" transition (the time difference is still 10 minutes for
aware datetime objects returned by now(tz)).
There is no race condition: both local times are correct.

If you meant that you can't use "now(tz).tzinfo" object for the very
next moment in time then it is also true that is why tz.normalize
exists: "next_ = tz.normalize(now(tz) + moment)". But "now(tz).tzinfo"
itself is correct for "now(tz).replace(tzinfo=None)" date/time value.

The point is: you can use "datetime.now(tz)" even during DST transitions
while e.g., "tz.localize(datetime.now())" may fail.

Please, clarify your statement. People quote you:
http://stackoverflow.com/questions/2331592/datetime-datetime-utcnow-why-no-tzinfo/2331635#comment49722480_2331640

Unrelated: I'm glad that the default tzinfo is harder to confuse with
the current timezone rules: it allows to catch bugs sooner: "Errors
should never pass silently. Unless explicitly silenced." Correct code
continues to work whatever the default.

Revision history for this message
Stan Knutson (7-stan) wrote :

I'm sorry, this bug breaks other APIs.

In particular, the datetime.datetime constructor now returns things that are clearly "wrong" in any sense of the word.

Here is a test case where the datetime.datetime gives the completely incorrect result, with tz offset wrong by 7 minutes!

I worry this breaks other things in our application.

Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytz
>>> from pytz import UTC
>>> PACIFIC = pytz.timezone("US/Pacific-New")
>>> import datetime
>>> pytz.__version__
'2015.4'
>>> # next two dates are "the same" epoch timestamp.
>>> datetm_utc = datetime.datetime(2015,5,7,1,0,0,0, UTC)
>>> print datetm_utc
2015-05-07 01:00:00+00:00
>>> datetm_pacific = datetime.datetime(2015,5,6,18,0,0,0,PACIFIC)
>>> print datetm_pacific
2015-05-06 18:00:00-07:53

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.