Can't generate BOM with python scripts - regular expression error in python BOM generation

Bug #1833822 reported by Travis Ayres
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
KiCad
Fix Released
Medium
Ian McInerney

Bug Description

Command error. Return code 1
Error messages:
Traceback (most recent call last):
  File "D:\Program Files\KiCad\bin\scripting\plugins/bom_sorted_by_ref.py", line 49, in <module>
    components = net.getInterestingComponents()
  File "D:\Program Files\KiCad\bin\scripting\plugins/kicad_netlist_reader.py", line 624, in getInterestingComponents
    ret.sort(key=lambda g: f(g.getRef()))
  File "D:\Program Files\KiCad\bin\scripting\plugins/kicad_netlist_reader.py", line 624, in <lambda>
    ret.sort(key=lambda g: f(g.getRef()))
  File "D:\Program Files\KiCad\bin\scripting\plugins/kicad_netlist_reader.py", line 623, in f
    return re.sub(r'([A-z]+)[0-9]+', r'\1', v) + '%08i' % int(re.sub(r'[A-z]+([0-9]+)', r'\1', v))
ValueError: invalid literal for int() with base 10: 'SW_-11'

Win 10 x64, Kicad 5.1.2 release build
Pcbnew netlist attached - not sure if that's what I should be attaching.

The problem is the regular expression to parse the netlist, it looks like.

Revision history for this message
Travis Ayres (trayres) wrote :
Changed in kicad:
milestone: none → 5.1.3
tags: added: python
Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

@Travis, please include your kicad version information from the about dialog.

Changed in kicad:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

The issue appears to the '.' character now allowed in references. The regular expression will have to be changed to accommodate this.

Revision history for this message
Seth Hillbrand (sethh) wrote :

We'll need a schematic or project here.

Changed in kicad:
status: Triaged → Incomplete
Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

Here is the one I used: https://github.com/beagleboard/beaglebone-blue. You will have to import from eagle. Any schematic that has something like U1.2, P3.1, etc will trigger the bug.

Revision history for this message
Ian McInerney (imcinerney) wrote :

@Wayne/Seth, the issue is more than just a '.' character. The reference designator in eeschema seems to now accept any symbol (e.g. the string !J1+54.gr:fd1-fd_gf'fd"?%)(@hi is a valid reference for a part according to eeschema). The only limitation seems to be no spaces, tabs, carriage returns or line feeds (it uses SCH_FIELD_VALIDATOR to check this). The python regex for sorting the reference field seems to not accept any symbols, only letter and numbers.

I think this now begs the question, what is a valid sorting of the reference ID now that symbols are involved? It appears the Python BOM plugins have a custom function f() to extract the key to use for the sort order. I am not sure about the history of that extraction function tough. One possible way to fix it would be to just remove the call to f() in the lambda functions for the three calls to sort (lines 624, 668, 671 in kicad_netlist_reader.py) and leave the sort key as the reference designator itself instead of a custom-parsed version of the reference designator. This will allow Python to use its normal string sorting function to sort the designators, which can handle the symbols.

Even removing the regex from the sorting won't completely fix this though, since doing that now moves the error to another location. It then errors about needing to provide an escape character in the csv.writer() call (this is using that jibberish reference field I gave in the first paragraph). Since it appears it wants to escape the " and ( characters even though they are not the delimiters. Some searching seems to suggest that adding: escapechar='', quotechar='' to the csv.writer() call should quench its needs to escape things. It should be safe to null out those options since we already prevent \t from being used in the field.

Of course all this is predicated on the idea that the reference designator should be able to contain any character and not have many limitations.

Note, all this testing I just did was on the 5.1 branch with

Application: Eeschema
Version: (5.1.2-151-gc951ca6f4), release build
Libraries:
    wxWidgets 3.0.4
    libcurl/7.59.0 OpenSSL/1.1.0i zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.4) libssh/0.8.6/openssl/zlib nghttp2/1.32.1
Platform: Linux 5.0.16-100.fc28.x86_64 x86_64, 64 bit, Little endian, wxGTK
Build Info:
    wxWidgets: 3.0.4 (wchar_t,wx containers,compatible with 2.8) GTK+ 3.22
    Boost: 1.66.0
    OpenCASCADE Community Edition: 6.9.1
    Curl: 7.59.0
    Compiler: Clang 6.0.1 with C++ ABI 1002

Build settings:
    USE_WX_GRAPHICS_CONTEXT=OFF
    USE_WX_OVERLAY=ON
    KICAD_SCRIPTING=ON
    KICAD_SCRIPTING_MODULES=ON
    KICAD_SCRIPTING_PYTHON3=OFF
    KICAD_SCRIPTING_WXPYTHON=ON
    KICAD_SCRIPTING_WXPYTHON_PHOENIX=OFF
    KICAD_SCRIPTING_ACTION_MENU=ON
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_USE_OCC=OFF
    KICAD_SPICE=ON

Revision history for this message
Ian McInerney (imcinerney) wrote :

Also, for my tests I just used the pic_programmer demo project and replaced the text for J1 with the string !J1+54.gr:fd1-fd_gf'fd"?%)(@hi1

Changed in kicad:
status: Incomplete → Confirmed
Changed in kicad:
assignee: nobody → Ian McInerney (imcinerney)
status: Confirmed → In Progress
Revision history for this message
Ian McInerney (imcinerney) wrote :

Ok, I figured out what that regex was supposed to be doing. It was injecting zero padding into the component references so that the numerical order was more natural (e.g. 100 after 99). I figured out a better way to write that regex that will do the natural sorting, and it can now handle the symbols.

Additionally, I tracked down the csv export error to including double quotes in the reference field, so those have been added into the invalid character list for the field (unless anyone can think of a reason why they should be allowed).

The patch for this is attached.

Revision history for this message
Seth Hillbrand (sethh) wrote :

Thanks for the patch, Ian. I like the natural number sorting a lot. And this is good future-proofing for removing the number requirement.

I am leery about adding additional restricted characters to the reference field. Everytime we have done this in the past, we have found that there are users whose schematics or symbol libraries are now broken because we changed their requirements.

Is there any other way to fix this? Would it work to escape the double-quotes when exporting?

Revision history for this message
jean-pierre charras (jp-charras) wrote :

The python fix is very good.

I do not have issues with a " inside a field (that can be found in any field, not only REF),
just with your fix in kicad_netlist_reader.py

The csv writer correctly quote this " and I can open without issue the .csv file in LibreOffice.

Revision history for this message
Ian McInerney (imcinerney) wrote :

Hmm, it looks like the quoting strategy for the bom_sorted_by_ref is different than the ones for bom_csv_*. So it appears that the csv BOMs can handle the quote and properly escape it, but that was disabled for the tab delimiter one.

@JP, I am guessing you were using the csv ones not the tab one for your testing? When I tried putting a quote character through the tab delimiter one (bom_sorted_by_ref.py), it threw the error about needing to quote but not being able to.

It is simple enough to modify that BOM generator to properly quote, then it shouldn't need the additional restriction on entering " into the field. I'll see about updating the patch for that.

Revision history for this message
jean-pierre charras (jp-charras) wrote :

It looks like bom_sorted_by_ref.py uses a not good csv option for us:
It uses quoting=csv.QUOTE_NONE
bom_csv_sorted_by_ref.py uses quoting=csv.QUOTE_ALL
Using quoting=csv.QUOTE_MINIMAL works for me.

The actual issue is in this bom script.

Revision history for this message
Ian McInerney (imcinerney) wrote :

Yes, the quoting is the issue with that script. While QUOTE_MINIMAL would fix the issue, it would still be inconsistent with the other generators. I have updated it to use QUOTE_ALL so there is consistency between all the generators in their handling of the quotes.

The attached patch should now fix both of the issues (sorting + quoting) and replaces the previous one.

Revision history for this message
KiCad Janitor (kicad-janitor) wrote :

Fixed in revision 3b645ed305d9e2ffacc9eca02cecc4f234d23cb5
https://git.launchpad.net/kicad/patch/?id=3b645ed305d9e2ffacc9eca02cecc4f234d23cb5

Changed in kicad:
status: In Progress → Fix Committed
tags: added: eeschema
Changed in kicad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.