Inconsistent use of column alias In ORDER BY clause that specified collation

Bug #861763 reported by Timour Katchaounov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Confirmed
Wishlist
Timour Katchaounov

Bug Description

This bug is created to track MySQL BUG#59449, and to review and
either apply or reimplement a contributed patch for this bug.

Changed in maria:
importance: Undecided → Critical
assignee: nobody → Timour Katchaounov (timour)
Revision history for this message
Timour Katchaounov (timour) wrote :

Much shorter test case:

CREATE TABLE Language (
   LanguageID INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
   LanguageName VARCHAR(40) NOT NULL,
   PRIMARY KEY (LanguageID),
   KEY LanguageName (LanguageName) )
ENGINE=MyISAM
DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci;

INSERT INTO Language VALUES
(1, 'English'),
(2, 'Spanish'),
(3, 'American Sign Language'),
(4, 'Argentinian Sign Language');

-- works as expected - no collation - sorted by alias to a function result (alias has name matching a real field from a queried table)
SELECT LanguageID,
       substr(LanguageName, 2) as LanguageName
  FROM Language
 ORDER BY LanguageName;

-- broken - collated - sorted by alias to a function (coalesce) result (alias has name matching a real field from a queried table)
SELECT LanguageID,
       substr(LanguageName, 2) as LanguageName
  FROM Language
 ORDER BY LanguageName COLLATE utf8_spanish_ci;

-- broken - collated - sorted by alias to a column (alias has name matching a real field from a queried table)
SELECT LanguageID as LanguageName,
       substr(LanguageName, 2) as LanguageNameSubstr
  FROM Language
 ORDER BY LanguageName COLLATE utf8_spanish_ci;

drop table Language;

Changed in maria:
status: New → In Progress
Revision history for this message
Timour Katchaounov (timour) wrote :

Analysis:
This is an architectural problem in the way ORDER BY expressions
and function arguments are resolved.

Order by expressions are resolved by the procedure find_order_in_list().
This procedure correctly defines the name resolution order to search
first in the SELECT clause and to prefer aliases. The name resolution is
implemented via the call:

  /* Lookup the current GROUP/ORDER field in the SELECT clause. */
  select_item= find_item_in_list(order_item, fields, &counter,
                                 REPORT_EXCEPT_NOT_FOUND, &resolution);

Note however that find_item_in_list() works only for fields (Item_field)
and not functions.

Thus functions in the ORDER clause are not resolved against the
SELECT clause, and find_order_in_list() proceeds until it calls
Item::fix_fields() for these Items.

For functions, fix_fields() calls recursively fix_fields() for its
arguments. Finally we end up calling Item_field::fix_fields() for
the 'LanguageName' argument of e.g. the Collate function.

However Item_field::fix_fields() is designed to resolve fields
mainly in the WHERE and other clauses, where the search order
is different from the one in the ORDER clause. Specifically,
fix_fields() searches first in the FROM clause. As a result, matching
fields in the FROM clause have priority over aliases in the SELECT
clause.

This explains why the first example is resolved correctly, and the
next two - incorrectly.

Revision history for this message
Timour Katchaounov (timour) wrote :

Solution:

In summary, the problem that needs to be solved is that
column references in the ORDER/GROUP BY clauses must
be resolved in a different way from column references in
the WHERE and SELECT clauses.

The implementation in find_order_in_list() is limited only
to Item_field and Item_ref. In general the approach to resolve
ORDER/GROUP BY clauses in a different way compared to the
rest seems wrong.

As discussed with Sanja, the right way to have a flexible and
generic name resolution that supports different name resolution
algorithms, is to reuse the Item::fix_fields mechanism in the
following way.

Items have to be marked correctly with their location in the query.
The name resolution algorithms for each group of clauses have to
be implemented as separate methods of the most suitable class.
Then the corresponding fix_fields() method should select the
correct name resolution algorithm based on the location of the
item in the query.

Notice that when e.g. an Item_field in an ORDER/GROUP BY clause
is inside a subquery, the name resolution algorithm must be changed
so that it first searches the subquery itself in the same way the WHERE
clause is resolved, and only if not found in the subquery, name resolution
should proceed to search the outer query, giving preference to aliases.

This is a small re-engineering task, and not a simple bug fix. My estimate
is at least 4-5 full developer days if there are no nasty surprises. Changing
name resolution is quite risky, so this task should be planned for a release
after 5.3.

Changed in maria:
status: In Progress → Confirmed
importance: Critical → Medium
status: Confirmed → Triaged
Changed in maria:
importance: Medium → Wishlist
status: Triaged → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.