Use compiled timezone database

Bug #71227 reported by James Henstridge
2
Affects Status Importance Assigned to Milestone
pytz
Fix Released
Undecided
Unassigned

Bug Description

As a start, I've put together a branch that parses the binary time zone data files at runtime instead of generating Python source files. This provides equivalent functionality, and by replacing the pytz/zoneinfo directory with a symlink to /usr/share/zoneinfo, it will use the system database (we still need to ship the compiled database to remain cross platform).

Related branches

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 71227] Use system timezone database

James Henstridge wrote:
> Public bug reported:
>
> So that there is less stuff to update, it would be nice if pytz could
> use the system time zone database in /usr/share/zoneinfo. This would
> reduce the number of packages that need to be updated when new time zone
> data becomes available.
>
> As a start, I've put together a branch that parses the binary time zone
> data files at runtime instead of generating Python source files. This
> provides equivalent functionality, and by replacing the pytz/zoneinfo
> directory with a symlink to /usr/share/zoneinfo, it will use the system
> database.

Gustavo maintains a similar package that works this way.

--
Stuart Bishop <email address hidden> http://www.canonical.com/
Canonical Ltd. http://www.ubuntu.com/

Revision history for this message
James Henstridge (jamesh) wrote : Re: Use system timezone database

Do you have any particular reason for preferring code generation over this technique? The branch I provide should act pretty much identically to the generated code:

1. builds the same tzinfo instances at runtime, and makes sure you get the same instance each time you request a named time zone.

2. is self contained, not relying on the system time zone database.

3. works as a Python egg, using pkg_resources to look up the compiled time zone files.

Furthermore, it has the following benefits:

1. it can be updated by slotting in new compiled time zone files, rather than needing to regenerate Python source files.

2. would make it possible to support a "localtime" timezone ($TZ environment variable, falling back to /etc/localtime), which could be useful for applications that want to display times in the user's local time zone without requiring additional configuration.

Revision history for this message
Stuart Bishop (stub) wrote :

I've no particular objection to switching to reading compiled zoneinfo files at runtime rather than converting the files into Python classes. I've never seen the existing implementation as a problem so never bothered to work on alternatives.

We want to ensure that:
 - Startup performance is the same or better
 - Memory footprint is the same or better
 - tarball and egg file sizes are the same or smaller
 - Old pickles can still be unpickled
 - It works on all architectures.

description: updated
Revision history for this message
James Henstridge (jamesh) wrote :

- Startup performance is the same or better

I haven't checked this one yet. Will have to do some profiling.

- Memory footprint is the same or better

If anything, the memory footprint should be slightly lower, since we don't load up code objects and modules for the time zones.

- tarball and egg file sizes are the same or smaller

Should be slightly smaller. With generated code, "Australia/Perth" takes up 935 bytes for the .py file and 1461 for the .pyc. Using the binary time zone data, there is a single 356 byte file.

So that is a saving of 579 bytes in the dist tarball case (where we only have the .py file), and 2040 bytes in the egg case (where we have both the .py and .pyc files). The actual space saving after compression would be smaller, but it adds up when you multiply it by the number of time zones.

- Old pickles can still be unpickled

In my branch, I haven't done the time zone name munging in pytz.timezone() so some old pickles may fail. This could be fixed by doing the reverse of the old munging process in pytz.timezone() or pytz._p().

- It works on all architectures.

The tzinfo reading code uses the struct module with big endian standard size/alignment. This should give identical results on all architectures.

Revision history for this message
James Henstridge (jamesh) wrote :

I did some quick performance checks using timeit with the following arguments:
    timeit.py -s 'import pytz' 'tzinfo = pytz.timezone("Australia/Perth")'

Before:
  100000 loops, best of 3: 8.85 usec per loop
After:
  100000 loops, best of 3: 2.22 usec per loop

Of course in both cases, a lot less work is being done on subsequent loops. I think my branch is quicker here because it is caching the previously used tzinfo objects in a dictionary rather than importing the corresponding Python module each time. Of course, this modification could be made without the runtime tzfile loading to get a similar speed up.

On a single loop importing each time zone in pytz.all_timezones, the times seem to be fairly similar when the Python source or binary tz data are in cache. This is a bit of a difficult one to profile, but the new code doesn't seem noticeably faster or slower than the old code.

Revision history for this message
James Henstridge (jamesh) wrote :

According to Stuart, he included this change in the 2007c release.

Changed in pytz:
status: Unconfirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers