Please recompile sqlite 3.11 with -DSQLITE_ENABLE_FTS3_TOKENIZER

Bug #1546911 reported by James Henstridge
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
mediascanner2 (Ubuntu)
Invalid
Critical
Unassigned
sqlite3 (Ubuntu)
Fix Released
Undecided
Łukasz Zemczak

Bug Description

The recent upload of sqlite 3.11 to xenial-proposed has rendered mediascanner2 non-functional. From the release notes, it seems the ability to register new full text search tokenizers has been disabled by default:

http://sqlite.org/releaselog/3_11_0.html

This means that mediascanner2 fails to open the index. We can't switch to any of the built-in tokenizers because they don't handle CJK text, so the only option seems to be to re-enable this functionality despite it being a potential security vulnerability for apps that let untrusted code run arbitrary SQL.

Changed in mediascanner2 (Ubuntu):
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
James Henstridge (jamesh) wrote :

And as a simple test case for the problem, run the following:

    $ sqlite3 :memory:
    SQLite version ...
    Enter ".help" for usage hints.
    sqlite> select fts3_tokenizer('foo', fts3_tokenizer('porter'));

On older versions this would register the tokenizer "foo". With 3.11.0, it spits out the following error:

    Error: fts3tokenize: disabled - rebuild with -DSQLITE_ENABLE_FTS3_TOKENIZER

Changed in sqlite3 (Ubuntu):
assignee: nobody → Łukasz Zemczak (sil2100)
status: New → Confirmed
Revision history for this message
Tyler Hicks (tyhicks) wrote :

Hi James - Is it possible to use the 1-argument variant of fts3_tokenizer()? See the second example in https://sqlite.org/fts3.html#f3tknzr

Revision history for this message
James Henstridge (jamesh) wrote :

Tyler: no it isn't. The one argument version allows you to query for the existence of a particular named tokenizer. The two argument version is needed to register a new named tokenizer. When they disabled this they didn't offer an alternative for fts3/fts4 users, so the documentation just says to turn the feature back on if you need it, which is a bit unsatisfying.

It looks like there is a new API to register tokenizers using the new fts5 API, but that still seems to be under development so the entire backend is disabled in the current release:

    $ sqlite3 :memory:
    SQLite version 3.11.0 2016-02-15 17:29:24
    Enter ".help" for usage hints.
    sqlite> create virtual table f1 using fts5(a, content='');
    Error: no such module: fts5

It isn't clear this code is at a point where databases would be compatible release to release, so is probably not appropriate to even consider yet.

From a few web searches, I'm guessing this is the reason it was disabled:

http://chichou.0ginr.com/blog/1336/abuse-sqlite3-ext-to-bypass-php-security-restrictions

So it is a problem when an application runs untrusted SQL under the control of the attacker (and in this case, combined with untrusted PHP code under the control of the attacker). That seems like a pretty buggy application to start with.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I think we're all leaning a bit towards actually enabling this flag in our sqlite3 packages. In case the security team gives a +1 on it, I have already prepared the modified package for upload.

That being said, I guess mediascanner2 needs to slowly think about the future. Since the documentation mentions it as being seldom used and not being enabled by default, we can suspect that with the future releases of sqlite this function will go away completely. We would need to be prepared for that. For xenial we should be safe, but in the next cycles it's not guaranteed to stay I suspect.

Revision history for this message
Tyler Hicks (tyhicks) wrote :

I agree that applications shouldn't be running untrusted SQL/PHP. We can enable the flag in our sqlite3 package for now but, as Łukasz mentioned, I think it would be best if James could work with upstream to get a proper tokenizer in place in the future.

Revision history for this message
James Henstridge (jamesh) wrote :

Well, one other major user of this API is Thunderbird. In fact, the tokenizer we use in mediascanner is based on the one they developed. It looks like a number of other people had also extracted the Mozilla tokenizer too, since none of the built-in options give the same multi-language compromise.

I doubt that they plan to remove this API completely, since it would render existing databases unreadable. It might eventually go away when they deprecate the fts3/fts4 modules, but that won't happen until the replacement fts5 module is ready for prime time. We'll definitely need to cross that bridge at some point.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I uploaded the modified sqlite3 to xenial-proposed.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sqlite3 - 3.11.0-1ubuntu1

---------------
sqlite3 (3.11.0-1ubuntu1) xenial; urgency=medium

  * debian/rules: compile SQLite with SQLITE_ENABLE_FTS3_TOKENIZER to re-enable
    the two-argument version of fts3_tokenizer() used by mediascanner2
    (LP: #1546911)

 -- Łukasz 'sil2100' Zemczak <email address hidden> Fri, 19 Feb 2016 13:12:22 +0100

Changed in sqlite3 (Ubuntu):
status: Confirmed → Fix Released
Changed in mediascanner2 (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.