Not indexing documents unless all fields are in the index expression clause

Bug #1214538 reported by Victor Choueiri on 2013-08-20
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
U1DB Qt/ QML
Critical
Christian Dywan
u1db-qt (Ubuntu)
Undecided
Unassigned

Bug Description

It seems that queries only return results of documents, where all the document's fields are indexed [regardless of whether or not those fields are being queried].

I've included a number of examples below, I hope they clarify the situation:

Examples:

Database definition:

    U1db.Database {
        id: theDB
        path: "indextest.db"
    }

Two sample documents: the first has only a "title" field, the second has both "title" and "num" fields:

    U1db.Document {
        database: theDB
        create: true
        docId: "vic"
        defaults: {"title": "vic"}
    }

    U1db.Document {
        database: theDB
        create: true
        docId: "xqwzts"
        defaults: {
            "title": "xqwzts",
            "num": "123"
        }
    }

Index only the title field since it is what I care about querying:

    U1db.Index {
        database: theDB
        id: theIndex
        expression: ["title"]
    }

An empty query which should match all documents in the index:

    U1db.Query {
        id: theQuery
        index: theIndex
    }

Expected result: both documents should be returned

Action result: only the first document is returned

Detailed output:

JSON.stringify(theDB.listDocs())

    {"0":"vic","1":"xqwzts"}

JSON.stringify(theIndex)

    {"database":{"error":"","objectName":"","path":"indextest.db"},"name":"","objectName":"","expression":{"0":"title"}}

JSON.stringify(theQuery)

    {"index":{"database":{"error":"","objectName":"","path":"indextest.db"},"name":"","objectName":"","expression":{"0":"title"}},"objectName":"","results":[{"title":"vic"}],"documents":{"0":"vic"}}

Specifying the query to search for title = "*" makes no difference:

    U1db.Query {
        id: theQuery
        index: theIndex
        query: [{"title": "*"}]
    }

JSON.stringify(theQuery)

    {"index":{"database":{"error":"","objectName":"","path":"indextest.db"},"name":"","objectName":"","expression":{"0":"title"}},"objectName":"","results":[{"title":"vic"}],"documents":{"0":"vic"},"query":[{"title":"*"}]}

It seems the second document simply isn't in the index, so no query will return it.

To add the second document to the index we must add all fields of the document to the index expression:

    U1db.Index {
        database: theDB
        id: theIndex
        expression: ["title", "num"]
    }

And remove the query clause to have the query return all documents in the index:

   U1db.Query {
        id: theQuery
        index: theIndex
    }

So now the output becomes:

JSON.stringify(theIndex)

    {"database":{"error":"","objectName":"","path":"indextest.db"},"name":"","objectName":"","expression":{"0":"title","1":"num"}}

JSON.stringify(theQuery)

    {"index":{"database":{"error":"","objectName":"","path":"indextest.db"},"name":"","objectName":"","expression":{"0":"title","1":"num"}},"objectName":"","results":[{"title":"vic"},{"num":"123"}],"documents":{"0":"vic","1":"xqwzts"}}

But now in order to add a query clause all the fields in the index which are also in the document have to be included.

    U1db.Query {
        id: theQuery
        index: theIndex
        query: [{"title": "*"}]
    }

JSON.stringify(theQuery)

    {"index":{"database":{"error":"","objectName":"","path":"indextest.db"},"name":"","objectName":"","expression":{"0":"title","1":"num"}},"objectName":"","results":[{"title":"vic"}],"documents":{"0":"vic"},"query":[{"title":"*"}]}

In order to actually return all the documents with title=* we have to also specify that num=*

    U1db.Query {
        id: theQuery
        index: theIndex
        query: [{"title": "*"}, {"num": "*"}]
    }

JSON.stringify(theQuery)

    {"index":{"database":{"error":"","objectName":"","path":"indextest.db"},"name":"","objectName":"","expression":{"0":"title","1":"num"}},"objectName":"","results":[{"title":"vic"},{"num":"123"}],"documents":{"0":"vic","1":"xqwzts"},"query":[{"title":"*"},{"num":"*"}]}

Conclusion:

- In order to be included in the index, ALL fields in a document must be included in the index's expression clause.
- In order to be returned by a query, ALL fields in a document [which are also in the index] must be included in the query's query clause.

This is pretty problematic with large/complicated documents where including every single field in the expression/query clause is not really feasible.

Related branches

description: updated
Changed in u1db-qt:
status: New → Confirmed
Changed in u1db-qt:
importance: Undecided → Critical
assignee: nobody → Christian Dywan (kalikiana)
milestone: none → 1.0
Christian Dywan (kalikiana) wrote :

My attempts to fix it apparently break omitting deleted docs and it's very tedious to investigate other tests breaking due to how qmltestrunner tends to not output any results if it hangs or crashes. The second branch may show some entry points if somebody else wants to take a look.

Christian Dywan (kalikiana) wrote :
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:u1db-qt at revision 114, scheduled for release in u1db-qt, milestone 1.0

Changed in u1db-qt:
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package u1db-qt - 0.1.5+14.04.20140313-0ubuntu1

---------------
u1db-qt (0.1.5+14.04.20140313-0ubuntu1) trusty; urgency=low

  [ CI bot ]
  * No change rebuild against Qt 5.2.1.

  [ Christian Dywan ]
  * Adopt xvfb.sh script from ui toolkit to run tests
  * Sort out build warnings and make them always fatal.
  * Implement Database.removeDoc method and use it in unit test
    Functionally this is equivalent to replacing the doc with an empty
    one. (LP: #1243395)
  * Use new-style qmlrunner log option to enable stdout.
  * Query improvements and more advanced example. (LP: #1271977,
    #1271972, #1266478)
  * Store whole document contents in the results and unit test that.
    (LP: #1271973)
  * Reverse query logic to check non-matching and internally convert
    between query syntaxes. (LP: #1284194, #1214538, #1215898)
  * Revert r113 and update unit test to verify previous behavior
 -- Ubuntu daily release <email address hidden> Thu, 13 Mar 2014 23:12:40 +0000

Changed in u1db-qt (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments