Importing a CSV file can fail due to unexpected thousands separator

Bug #370571 reported by Steve Dodier-Lazaro
4
Affects Status Importance Assigned to Milestone
wxBanker
Fix Released
Medium
Unassigned

Bug Description

Hi,

A CVS file can fail to be parsed if for instance there is an occurence of "+1 000" instead of "+1000". The spaces between numbers should be parsed in order to avoid this.

Also, I didn't check but it might be good to check if the currencies are parsed out of the amount's value aswell (for instance, € $ DOL EUR $ , etc).

Finally, i suggest you change the layout of the import form, and use something similar to :

1 | 2 | 3 | 4 | ...
For each column, a list of options (Date/Desc/Amount).

This should be much easier to understand than the current approach.

Cordially, SD.

Related branches

Revision history for this message
Karel Kolman (kolmis) wrote :

I guess the right way to solve the first issue is to add a "thousand separator" setting next to the decimal one in the csv import dialog.

As for the second one, removal of non-digit characters on both ends of the column value might be an option, i'll think about that.

About the csv import dialog design: I actually prefer the import dialog mapping wxBanker keys to csv file columns instead of the other way around as you're suggesting. Bank exported csv files will usuallly have 10+ columns, you're mapping to 3 wxBanker values - date, amount, description, and the way it is is easier. The chaining of many csv columns into the description one seems intuitive too. Moneydance (or was that some other java personal banking app ?) has a csv import tool that works the way you would like, you map column 1 to something, column 2 to something, ... etc. I absolutely hated it when trying it out....

Revision history for this message
Michael Rooney (mrooney) wrote :

Yes, we could easily just remove spaces in the amount field for now, but a more robust solution would be probably be to remove every character that isn't a digit, negative sign, thousand or decimal separator. As Karel said this depends on having a user specified thousands separator which is probably a good idea anyway.

Changed in wxbanker:
importance: Undecided → Medium
milestone: none → 0.6
status: New → Confirmed
Revision history for this message
Michael Rooney (mrooney) wrote :

Ah, I see your branch now, let me merge that in!

Revision history for this message
Michael Rooney (mrooney) wrote :

I think actually just ignoring all characters that aren't digits, a negative sign, or the decimal sep means we don't need the thousands sep, since it is replaced with nothing anyway. I made a new implementation of parseAmount (also pushed up):

    def parseAmount(self, val, settings):
        val = val.replace(settings['decimalSeparator'], '.')
        amountStr = ""
        for char in val:
            if char.isdigit() or char in "-.":
                amountStr += char

        return float(amountStr)

What do you think about using this and removing the thousands sep from the UI?

summary: - Importing a CSV file can fail due to "big numbers"
+ Importing a CSV file can fail due to unexpected thousands separator
Revision history for this message
Karel Kolman (kolmis) wrote :

i'm in favor of your solution !

Revision history for this message
Michael Rooney (mrooney) wrote :

Okay, I've reverted the changes to csvimportframe. Hopefully this should be fairly robust! I made one more change, which was to replace the decimal sep AFTER the loop. Otherwise if your separator is a comma for example and you have periods elsewhere, those periods won't be removed. The new way allows for float representations like "1.000,23":

    def parseAmount(self, val, settings):
        amountStr = ""
        for char in val:
            if char in "-1234567890"+settings['decimalSeparator']:
                amountStr += char

        amountStr = amountStr.replace(settings['decimalSeparator'], '.')
        return float(amountStr)

Changed in wxbanker:
status: Confirmed → Fix Committed
Revision history for this message
Karel Kolman (kolmis) wrote :

Great, i suggest you change the settings parameter to passing of the decimalSeparator only.

I also created a simple regression test for this in my csvimport branch. I pretty much hate those doctests and I feel that unittests should live in its own space (directory). Therefore I placed the parseAmount testcase in the created test directory. Could you consider using this approach ?

Revision history for this message
Michael Rooney (mrooney) wrote : Re: [Bug 370571] Re: Importing a CSV file can fail due to unexpected thousands separator

I agree, the doctests are on their way out and new tests are great as
unit tests. I'll check out your structure and perhaps separate out the
tests, it seems more sustainable. I didn't do that initially as I just
don't know how to tell unittest to run tests across different files
and integrate the results.

Revision history for this message
Karel Kolman (kolmis) wrote :

You might look into test/alltests.py in my latest commit to the csvimport branch.

It seems fairly easy, you specify a list of modules and tell the TestLoad to create a TestSuite from it. The TextTestRunner than runs the tests and creates the text output (used in unittest.main() also I believe), the code is:

modules = ['test1', 'othertests']
suite = unittest.TestLoader().loadTestsFromNames(modules)
unittest.TextTestRunner(verbosity=2).run(suite)

Revision history for this message
Michael Rooney (mrooney) wrote :

On Thu, May 7, 2009 at 2:36 PM, Karel Kolman <email address hidden> wrote:
> You might look into test/alltests.py in my latest commit to the
> csvimport branch.

Thanks Karel, I've pushed up some commits which merged your branch and
separated out the unittests into separate files in the test directory.
It is a much nicer framework indeed.

Michael Rooney (mrooney)
Changed in wxbanker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.