LOAD DATA INFILE doesn't work with multibyte ENCLOSED BY (although SELECT INTO OUTFILE does)

Bug #308457 reported by Stewart Smith
4
Affects Status Importance Assigned to Milestone
Drizzle
Fix Released
Low
Jay Pipes
MySQL Server
Unknown
Unknown

Bug Description

CREATE TABLE t1 (c1 VARCHAR(256));
INSERT INTO t1 (c1) VALUES ('☠');
SELECT HEX(c1) FROM t1;
HEX(c1)
E298A0
SELECT * INTO OUTFILE 'MYSQLTEST_VARDIR/tmp/bug32533.txt' FIELDS ENCLOSED BY '☢' FROM t1;
TRUNCATE t1;
SELECT HEX(LOAD_FILE('MYSQLTEST_VARDIR/tmp/bug32533.txt'));
HEX(LOAD_FILE('MYSQLTEST_VARDIR/tmp/bug32533.txt'))
E298A2E298A0E298A20A
LOAD DATA INFILE 'MYSQLTEST_VARDIR/tmp/bug32533.txt' INTO TABLE t1 FIELDS ENCLOSED BY '☢';

drizzletest: At line 90: query 'LOAD DATA INFILE '$file' INTO TABLE t1 FIELDS ENCLOSED BY '☢'' failed: 1083: Field separator argument is not what is expected; check the manual

Revision history for this message
Stewart Smith (stewart) wrote :

outfile_loaddata test

Revision history for this message
Jay Pipes (jaypipes) wrote :

Lines 137-142 in drizzled/sql_load.cc:

  if (escaped->length() > 1 || enclosed->length() > 1)
  {
    my_message(ER_WRONG_FIELD_TERMINATORS,ER(ER_WRONG_FIELD_TERMINATORS),
        MYF(0));
    return(true);
  }

where enclosed= ex->enclosed;

ex is of type sql_exchange, and enclosed is of type String (not std::string).

I changed the above code to be:

    my_error(ER_WRONG_FIELD_TERMINATORS,MYF(0),enclosed->c_ptr(), enclosed->length());

and changed the ER_WRONG_FIELD_TERMINATORS like so:

from:

N_("Field separator argument is not what is expected; check the manual"),

to:

N_("Field separator argument '%-.32s' with length '%d' is not what is expected; check the manual"),

And ran the test case above. I got the following error:

1083: Field separator argument '☢' with length '3' is not what is expected; check the manual

Now, checking the manual, the LOAD DATA INFILE syntax indeed states that enclosed by and escaped by are of type "char", but there is no indication whether a utf8 "char" is allowed. My instinct is that if SELECT INTO OUTFILE works with a utf8 character, that LOAD DATA INFILE should also!

Therefore, a fix for this bug is attached. The patch includes a file used in the following test and result:

bug308457.test:

--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_warnings
CREATE TABLE t1 (c1 VARCHAR(256));
INSERT INTO t1 (c1) VALUES ('☠');
SELECT HEX(c1) FROM t1;
TRUNCATE t1;
SELECT HEX(LOAD_FILE('../std_data_ln/bug308457.txt'));
LOAD DATA INFILE '../std_data_ln/bug308457.txt' INTO TABLE t1 FIELDS ENCLOSED BY '☢';

bug308457.result:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 VARCHAR(256));
INSERT INTO t1 (c1) VALUES ('☠');
SELECT HEX(c1) FROM t1;
HEX(c1)
E298A0
TRUNCATE t1;
SELECT HEX(LOAD_FILE('../std_data_ln/bug308457.txt'));
HEX(LOAD_FILE('../std_data_ln/bug308457.txt'))
E298A2E298A0E298A20A
LOAD DATA INFILE '../std_data_ln/bug308457.txt' INTO TABLE t1 FIELDS ENCLOSED BY '☢';

Revision history for this message
Jay Pipes (jaypipes) wrote :

Assigning to me. Stewart, please review the patch. If OK, I will apply it to my local enable-test-suite branch and commit.

Changed in drizzle:
assignee: nobody → jaypipes
importance: Undecided → Low
status: New → In Progress
Revision history for this message
Stewart Smith (stewart) wrote : Re: [Bug 308457] Re: LOAD DATA INFILE doesn't work with multibyte ENCLOSED BY (although SELECT INTO OUTFILE does)

Looks okay.

Any reason why we have to limit to 4 cahr... things should 'just work'
if we removed that whole error path, right?
--
Stewart Smith

Revision history for this message
Jay Pipes (jaypipes) wrote : Re: [Bug 308457] Re: LOAD DATA INFILE doesn't work with multibyte ENCLOSED BY (although SELECT INTO OUTFILE does)

Stewart Smith wrote:
> Looks okay.
>
> Any reason why we have to limit to 4 cahr... things should 'just work'
> if we removed that whole error path, right?

I was just sticking to the manual, which states a "character" for a
delimiter. But AFAICT, no, there's no reason it couldn't be a string.

-j

Revision history for this message
Stewart Smith (stewart) wrote : Re: [Bug 308457] Re: LOAD DATA INFILE doesn't work with multibyte ENCLOSED BY (although SELECT INTO OUTFILE does)

On Thu, Dec 18, 2008 at 03:06:06PM -0000, Jay Pipes wrote:
> Stewart Smith wrote:
> > Looks okay.
> >
> > Any reason why we have to limit to 4 cahr... things should 'just work'
> > if we removed that whole error path, right?
>
> I was just sticking to the manual, which states a "character" for a
> delimiter. But AFAICT, no, there's no reason it couldn't be a string.

FYI this has just been marked a D2 "serious" bug for mysql - you may get
to write "fixed in drizzle" :)
--
Stewart Smith

Jay Pipes (jaypipes)
Changed in drizzle:
milestone: none → cirrus
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.