CSV-Import: auto-test for separation character: Semicolon

Bug #1393472 reported by Tobias Zeuch on 2014-11-17
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Tobias Zeuch

Bug Description

This also came up on the Mahara Barcamp 2014: Microsoft Excel has the strange behavior to determine the separation character based on the system language setting, which for German users turns out to be a semicolon instead of a comma. Therefore, CSV-files generated on a German-set windows usually cannot be imported directly but have to be opened in a plaintext editor and the semicolons replaced by commas. Alternatively, the language setting in Windows(!) has to be changed before saving the document as CSV to export the file correctly.
Apparently, Moodle lets you specify if your file is semicolon-separated but I think it should be easy to check automatically, if the first line is separated by commas or by semicolons.

Changed in mahara:
status: New → Confirmed
importance: Undecided → Wishlist
Aaron Wells (u-aaronw) wrote :

Any system for auto-detecting the delimiter would need to have a confirmation screen to make sure it has detected the delimiter correctly. Since that would require an extra page-load and extra click anyway, I think it would be better to simply copy Moodle and have an option for the user to say "semicolon" or "comma".

Or if we wanted to get fancy, have some Javascript attached to the file upload form, which examines the file, guesses the delimiter, and ticks the appropriate box on the form.

Tobias Zeuch (tobias-zeuch-8) wrote :

I don't see the necessity for a confirmation. The first line that defines the fields to import has a pretty strict syntax, we have a couple of fields that are required and we have a fixed set of fields that are valid. If one of the required fields is not identifiable or an invalid field appears, the import is canceled. So basically the first line of a valid input file can only contain semicolons or commas but never both. Or am I missing something?

Aaron Wells (u-aaronw) wrote :

Oh, good point. The first line has to be the column names, so it should indeed be pretty easy to tell whether they've used columns or semicolons.

Alright, fair enough, we could auto-detect pretty easily then. :)

tags: added: csvupload
Tobias Zeuch (tobias-zeuch-8) wrote :

Thanks for the confirmation :)

I added a patch, just checking for a comma or semicolon in the first line of the file with the column names. In my tests it worked well for User uploads and Group uploads.
I used an array for the separation characters to check just in case Microsoft uses yet another character in yet another language setting. If so it could be added easily.

Reviewed: https://reviews.mahara.org/3992
Committed: http://gitorious.org/mahara/mahara/commit/7be6438a3908e41ed8bafbc7a3e3361ee4a14274
Submitter: Robert Lyon (<email address hidden>)
Branch: master

commit 7be6438a3908e41ed8bafbc7a3e3361ee4a14274
Author: Tobias Zeuch <email address hidden>
Date: Wed Nov 19 10:44:42 2014 +0100

Make CSV-Import autodetect delimiter

Bug 1393472: Automatically detect the delimiter used to store the fields in the
csv field. The problem this should solve is that Microsoft Excel uses a
semicolon instead of a comma if the language setting of Windows is set to
The first line of the csv-file contains the header fields so we use that line
to check, if it contains a comma or a semicolon since none of these appears in
the header field names itself.

Change-Id: I833041eb2169fc5ccc9557e0debf3f03c8daf7cc
Signed-off-by: Tobias Zeuch <email address hidden>

Robert Lyon (robertl-9) on 2014-12-11
Changed in mahara:
assignee: nobody → Tobias Zeuch (tobias-zeuch-8)
milestone: none → 15.04.0
status: Confirmed → Fix Committed
tags: added: nominatedfeature
Robert Lyon (robertl-9) on 2015-04-17
Changed in mahara:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers