Incorrect substitution by cursor.execute when tuple args contains '%s'

Bug #691836 reported by blep on 2010-12-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Connector/Python
Status tracked in Trunk
0.3
Fix Released
Critical
Geert JM Vanderkelen
Trunk
Triaged
Critical
Geert JM Vanderkelen

Bug Description

Description:

cursor.execute substitute %s occurrence found in tuple args value, causing an incorrect query to be produced has is shown below:

c.execute( 'REPLACE INTO TestBug VALUES (%s, %s, %s)', (1, 'Format%s', 3) )
MYSQL QUERY: b"REPLACE INTO TestBug VALUES (1, 'Format3', %s)"

The MYSQL QUERY trace is the content of stmt in MySQLCursor.execute before "self.db().protocol.cmd_query(stmt)".

How to reproduce:
CREATE TABLE TestBug (
 id INT UNSIGNED NOT NULL UNIQUE,
 name VARCHAR(255) NOT NULL,
    count INT NOT NULL,
 PRIMARY KEY (id)
);

db = mysql.connector.Connect( ... )
c = db.cursor()
c.execute( 'REPLACE INTO TestBug VALUES (%s, %s, %s)', (1, 'Format%s', 3) )
MYSQL QUERY: b"REPLACE INTO TestBug VALUES (1, 'Format3', %s)"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python31\lib\site-packages\mysql\connector\cursor.py", line 311, in execute
    res = self.db().protocol.cmd_query(stmt)
  File "C:\Python31\lib\site-packages\mysql\connector\protocol.py", line 137, in deco
    return func(*args, **kwargs)
  File "C:\Python31\lib\site-packages\mysql\connector\protocol.py", line 495, in cmd_query
    return self.handle_cmd_result(self.conn.recv())
  File "C:\Python31\lib\site-packages\mysql\connector\connection.py", line 180, in recv_plain
    errors.raise_error(buf)
  File "C:\Python31\lib\site-packages\mysql\connector\errors.py", line 84, in raise_error
    raise get_mysql_exception(errno,errmsg)
mysql.connector.errors.ProgrammingError: 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%s)' at line 1

Environment info:

* MySQL Connector/Python version (and/or revision)
python -c "import mysql.connector as db; print( db.__version__ )"
(0, 3, 0, 'devel', 287)

* Python version
python --version
Python 3.1.1

mysql> SELECT VERSION()
| 5.0.51b-community-nt |

* Platform/OS: Windows XP SP2 32 bits

Related branches

blep (blep) wrote :

Attached is a patch I used as a work-around for the issue. It ensures the following test pass:

def test_subtargs():
    assert _subst_tuple_args( b'A%s, %sB', (b'1',b'2') ) == b'A1, 2B'
    assert _subst_tuple_args( b'A%s, %sB', (b'%s1',b'2') ) == b'A%s1, 2B'
    assert _subst_tuple_args( b"A%s, 'B%C'", (b'%s1',) ) == b"A%s1, 'B%C'"
    assert _subst_tuple_args( b"AB", () ) == b"AB"
    try:
        assert _subst_tuple_args( b"AB", (1,) ) == '', 'Too many arg exception expected'
    except ValueError:
        pass
    try:
        assert _subst_tuple_args( b'A%s, %sB', (b'1',) ) == '', 'Missing arg exception expected'
    except IndexError:
        pass

tags: added: py3k

I can't reproduce the problem with the following test case (please review):

def main():
    cnx = mysql.connector.connect(database='test')
    cur = cnx.cursor()
    cur.execute('DROP TABLE IF EXISTS t1')
    cur.execute("CREATE TABLE t1 (id INT NOT NULL, c1 VARCHAR(30), c2 INT)")
    cur.execute("REPLACE INTO t1 VALUES (%s,%s,%s)", (1,'ham%s',2))
    cur.execute("SELECT * FROM t1")
    print(cur.fetchall())
    cnx.close()

The result is:

[(1, 'ham%s', 2)]

Changed in myconnpy:
assignee: nobody → Geert JM Vanderkelen (geertjmvdk)
status: New → Incomplete
tags: added: cursor
blep (blep) wrote :

I'm reproducing the issue with the reworked sample you provided with python 3.1.

The code causing the issue is:
http://bazaar.launchpad.net/~geertjmvdk/myconnpy/main/annotate/head%3A/python3/mysql/connector/cursor.py#L307

                elif isinstance(params, (list,tuple)):
                    for p in self._process_params(params):
                        stmt = stmt.replace(b'%s',p,1)

It replaces '%s' found in the statement as many times as it there is parameters, each replacement starting over from the beginning of the statement where previous parameters' value have already been substitued. This means that if one parameter value contains '%s', this occurrence of '%s' will be considered for substitution for the next parameter...

With your test case, you end up with the following statement:
"REPLACE INTO t1 VALUES (1,'ham2',%s)"

I figured why you do not reproduce the issue with python 2.6: the implementation of cursor.execute use a very different algorithm for substitution:
http://bazaar.launchpad.net/~geertjmvdk/myconnpy/main/annotate/head%3A/python2/mysql/connector/cursor.py#L307

stmt = operation % self._process_params(params)

which does not suffer from the bug I pointed out (but potentially introduce other issues as any format escape sequence such as "%r" or "%d" would also be substitued, though it is unlikely someone stumble on this).

Seems like I did something wrong with my test case before, now I can indeed reproduce the problem.
And indeed.. it's a silly bug there.

I can't use the %s substitution like in Python 2 because I'm working with bytes in Python 3.1 code.

Changed in myconnpy:
status: Incomplete → Confirmed
importance: Undecided → Critical

I think an easier fix (maybe even faster) is to first replace all %s by something else:

                elif isinstance(params, (list,tuple)):
                    stmt = stmt.replace(b'%s',b'%mys')
                    for p in self._process_params(params):
                        stmt = stmt.replace(b'%mys',p,1)

I'm using '%mys' because that would (hopefully) never or rarely clash.
What you think?

blep (blep) wrote :

From my point of view the proposed fix only makes the bug less likely to occur, but does not fix it.

The likelihood of occurrence of '%mys' in a sequence of characters is very difficult to predict when you are storing blob/text. The proposed fix for example would fail if I were to store the python source or this bug description as text as '%mys' occurs in it. When you are storing data from the web, you want something that always works.

Also, while I did not do any performance tests, using replace without start index hint, cause the string to be rescanned from the start which can be slow when you are manipulating large text/blob.

That being said there may be a simpler fix than the one I did (though it also check if all args where used in addition to fixing the bug).

A fix is now committed and I've used the approach you used, with regular expression.
Thanks for this, I learned a few new tricks about re in Python :)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers