oursql

Prepared statements used for all queries

Bug #1046706 reported by BrindleFly on 2012-09-06

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	oursql	New	Undecided	Unassigned

Bug Description

I am new to Python and oursql. In diagnosing why queries were so much slower in a Python vs. Ruby application, I noticed you are creating, using and then discarding prepared statements for EVERY query (see cursor.execute). As best as I can tell, this is to avoid dealing with the string interpolation of SQL arguments (e.g. which seems to be handled with varying degrees of success in MySQLdb module). It seems though this is a defect, since the two step process of a prepared statement adds overhead that can be substantial in some use cases. This overhead is typically only of use if you plan on actually re-using the prepared statement.

Assuming my understanding above is correct, I would suggest the current interface should always be dynamic, since you do not intend to cache statements.

I suspect most Python programmers who switched from MySQLdb do not realize the additional overhead they incur for executing a query, since this module defaults to dynamic SQL.

Revision history for this message

David Hull (hull) wrote on 2012-10-10:

The fact that oursql uses prepared statements is one of the things I like best about oursql.

I would like it if oursql would automatically cache the prepared statements, or would give us some way to cache them ourselves.

Revision history for this message

BrindleFly (joe-kinsella) wrote on 2012-10-18:

What do you like about prepared statements? Since you can't access them, they provide you no external benefit as a caller, but give you a quantifiable performance degradation. The only reason I can think to like them is it allows the oursql implementation to avoid writing code to handle string interpolation of SQL arguments. This feels like a pretty expensive way to avoid client-side string interpolation.

Prepared statements can add some substantial overhead to each individual transaction, so are really only used when you plan to retain the statement for frequent use. I'd suggest we default to a dynamic interface like most other sql libraries, but make available the option to use prepared statements.

Revision history for this message

Cody Baker (jcodybaker) wrote on 2012-12-24:

oursql.statement.cache.patch Edit (5.2 KiB, text/plain)

BrindleFly, there are some advantages to prepared statements--notably the streaming string/binary value streaming. That said, this performance issue is a major headache. I've read through the code and implemented a statement cache. This should allow us to reap the perfomance benefits of prepared queries for applications which repeated statements. The cache exists at the cursor level. If the application executes identical statements (presumably with different values) against the same cursor then this patch should eleminate the double network command penalty that oursql sees, and should see decreased CPU utilization because the server doesn't need to prepare the query again and the client doesn't need to escape values. That said, if the application recreates the cursor between queries or uses a large number of non-identical statements w/ small values then a mysql connector which uses the mysql_real_query interface (like MySQLdb) rather than mysql_stmt_* interface may be a better fit.

Essentially this patch saves the _statement objects into a fixed size LRU queue. If the query is identical, it will issue a free the results and reuse the statement. Because the statements are not immediately closed, result sets which have not been completly retrieved will remain allocated on the server side until the statement is reused or the statement falls off the end of the cache. In the case of large data sets this can be a substantial burden to the server. It is not enabled by default.

I'm still doing some testing for edge cases, but it seems to give a pretty substantial performance increase.

# Test table
CREATE TABLE `t1` (
`id` int(11) NOT NULL
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;

# oursql.wo.cache.py - Without Cache
import oursql
db = oursql.connect(host='localhost', user='root', db='test')
c = db.cursor()
for i in range(100000):
c.execute('INSERT INTO `t1` VALUES (?)',(i,))

(venv)Codys-MacBook-Air-2:oursql-0.9.3.1 cody$ time python ~/Desktop/oursql.wo.cache.py

real 0m10.896s
user 0m1.915s
sys 0m3.542s

# oursql.w.cache.py - With statement cache
import oursql
db = oursql.connect(host='localhost', user='root', db='test')
c = db.cursor(statement_cache_size=5)
for i in range(100000):
c.execute('INSERT INTO `t1` VALUES (?)',(i,))

(venv)Codys-MacBook-Air-2:oursql-0.9.3.1 cody$ time python ~/Desktop/oursql.w.cache.py

real 0m6.401s
user 0m1.438s
sys 0m1.952s

BrindleFly, there are some advantages to prepared statements--notably the streaming string/binary value streaming.  That said, this performance issue is a major headache.  I've read through the code and implemented a statement cache.  This should allow us to reap the perfomance benefits of prepared queries for applications which repeated statements.  The cache exists at the cursor level. If the application executes identical statements (presumably with different values) against the same cursor then this patch should eleminate the double network command penalty that oursql sees, and should see decreased CPU utilization because the server doesn't need to prepare the query again and the client doesn't need to escape values.  That said, if the application recreates the cursor between queries or uses a large number of non-identical statements w/ small values then a mysql connector which uses the mysql_real_query interface (like MySQLdb) rather than mysql_stmt_* interface  may be a better fit.

Essentially this patch saves the _statement objects into a fixed size LRU queue.  If the query is identical, it will issue a free the results and reuse the statement.  Because the statements are not immediately closed, result sets which have not been completly retrieved will remain allocated on the server side until the statement is reused or the statement falls off the end of the cache.  In the case of large data sets this can be a substantial burden to the server. It is not enabled by default.

I'm still doing some testing for edge cases, but it seems to give a pretty substantial performance increase.

# Test table
CREATE TABLE `t1` (
  `id` int(11) NOT NULL
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;

# oursql.wo.cache.py - Without Cache
import oursql
db = oursql.connect(host='localhost', user='root', db='test')
c = db.cursor()
for i in range(100000):
    c.execute('INSERT INTO `t1` VALUES (?)',(i,))

(venv)Codys-MacBook-Air-2:oursql-0.9.3.1 cody$ time python ~/Desktop/oursql.wo.cache.py

real	0m10.896s
user	0m1.915s
sys	0m3.542s

# oursql.w.cache.py - With statement cache 
import oursql
db = oursql.connect(host='localhost', user='root', db='test')
c = db.cursor(statement_cache_size=5)
for i in range(100000):
    c.execute('INSERT INTO `t1` VALUES (?)',(i,))

(venv)Codys-MacBook-Air-2:oursql-0.9.3.1 cody$ time python ~/Desktop/oursql.w.cache.py

real	0m6.401s
user	0m1.438s
sys	0m1.952s

Revision history for this message

dayf (dayf) wrote on 2013-04-04:

The API should allow to prepare and return statements on connection (so that users can cache and re-use prepared statements themselves) and offer execute on the statement with parameters, as in http://en.wikipedia.org/wiki/Prepared_statement

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

oursql.statement.cache.patch Edit

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.