"csvtool" returns wrong output when escaping double quotes

Bug #1714760 reported by negora on 2017-09-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ocaml-csv (Ubuntu)
Undecided
Unassigned

Bug Description

Hello:

I'm using "csvtool" to get a column from a CSV file, with the subcommand "col". This file has one column with escaped double quotes that makes this command to return the wrong output. There are escaped quotes at the beginning of the column, in the middle, and at the end. But "csvtool" doesn't remove the double quotes that surround the column (the non-escaped ones), nor un-escapes the escaped quotes (it should transform "" to ").

Suppose that you have a file called "test.csv" with this single line:

  "foo","""Hello "" World!""","bar"

It has 3 columns, all them quoted. The one in the middle has 3 double double-quotes (escaped quotes).

When I execute these commands, I expect the following output:

  $ csvtool col 1 test.csv
  for

  $ csvtool col 2 test.csv
  "Hello " World!"

  $ csvtool col 3 test.csv
  bar

But I get this output:

  $ csvtool col 1 test.csv
  for

  $ csvtool col 2 test.csv
  """Hello "" World!"""

  $ csvtool col 3 test.csv
  bar

"csvtool" seems to return the 2nd column as is, without any transformation.

This is the information of my current install:

  * Distribution: Ubuntu 16.04.3 LTS
  * Architecture: amd64
  * Kernel: 4.4.0-92.115
  * libc6: 2.23-0ubuntu9
  * csvtool: 1.4.2-1

Thank you.

negora (negora) on 2017-09-03
summary: - Wrong output when escaping double quotes
+ "csvtool" returns wrong output when escaping double quotes
description: updated
description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ocaml-csv (Ubuntu):
status: New → Confirmed
Sven K. (wagalaweia) wrote :

After stumbling upon the same issue, I realized that this is not a bug, but intended behavior.

The output of csvtool is always a valid csv-formatted record again. It removes any obsolete double quotes, but it keeps all *necessary* double quotes.

In your example, csvtool gives the following output

$csvtool col 1,2,3 test.csv
foo,"""Hello "" World!""",bar

which is indeed a correct csv-record. Moreover, the double quotes are neccessary, since the unescaped output

foo,"Hello " World!",bar

would yield a corrupt csv-record.

The same holds even true for the output's separator char as defined by the -u flag. Consider the following example:

$ echo "\"single,field\",next one" | csvtool col 1,2 -
"single,field",next one

Again the surrounding quotes are *not* removed because they are mandatory because of the comma inside the field. However, when specifying by -u ; another output separator, the double quotes become obsolete and will be removed:

$ echo "\"single,field\",next one" | csvtool col 1,2 -u ";" -
single,field;next one

Summarized: csvtool focuses on generating valid csv output, *not* on extracting individual field values in always their unescaped form. Unfortunately, this means that one needs another commandline tool to post-process the somtimes-escaped-sometimes-unescaped output of csvtool when trying to use it in scripts to extract a single column.

It could be a feature request (and indeed a very helpful feature) to support a flag that forces the output to be unescaped without any surrounding double quotes, e.g., csvtool --unquoted col 2 test.csv could give "Hello " World!" then, which is intended for further processing, rather than """Hello "" World!""".

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers