Cannot parse symbol libs with non-english symbol names

Bug #1806206 reported by Aleksandr Sh
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
KiCad
Fix Released
Medium
Wayne Stambaugh

Bug Description

I can create symbols with non-english names like "Резистор" in the symbol editor and save the lib.
But after reopening KiCad, it cannot load the lib (and cannot load the cache if the symbol was added to schematic). Try to open the attached lib.

Non-english names in footprint libs seems to work fine.

Application: kicad
Version: (6.0.0-rc1-dev-1291-g61b749f0b), release build
Libraries:
    wxWidgets 3.0.4
    libcurl/7.61.1 OpenSSL/1.1.1 (WinSSL) zlib/1.2.11 brotli/1.0.6 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) nghttp2/1.34.0
Platform: Windows 8 (build 9200), 64-bit edition, 64 bit, Little endian, wxMSW
Build Info:
    wxWidgets: 3.0.4 (wchar_t,wx containers,compatible with 2.8)
    Boost: 1.68.0
    OpenCASCADE Community Edition: 6.9.1
    Curl: 7.61.1
    Compiler: GCC 8.2.0 with C++ ABI 1013

Build settings:
    USE_WX_GRAPHICS_CONTEXT=OFF
    USE_WX_OVERLAY=OFF
    KICAD_SCRIPTING=ON
    KICAD_SCRIPTING_MODULES=ON
    KICAD_SCRIPTING_PYTHON3=OFF
    KICAD_SCRIPTING_WXPYTHON=ON
    KICAD_SCRIPTING_WXPYTHON_PHOENIX=OFF
    KICAD_SCRIPTING_ACTION_MENU=ON
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_USE_OCC=OFF
    KICAD_SPICE=ON

Revision history for this message
Aleksandr Sh (dsa-t) wrote :
Revision history for this message
Maciej Suminski (orsonmmz) wrote :

I have managed to load the library without any problems on Linux, so the problem might be Windows specific. What is the error message you are getting?

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

I can confirm that the provided library does not open on windows. What's odd is that it fails to open with a "expected Y or N in input/source at line 6, offset 23 but the file looks fine to me.

Changed in kicad:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 5.1.0
Revision history for this message
jean-pierre charras (jp-charras) wrote :

@Wayne,

It is not odd!
The parser if just broken.

this line is a UTF8 line, but the parser parses it as a char (ASCII8) line.

The parser stops at the middle of the UTF8 symbol name because it found a ' ', i.e. a separator, because the UTF8 "char" is not correctly parsed.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

@JP, wouldn't this same problem occur with the old sscanf() based parser? If so, then we probably should not allow utf8 characters as symbol names which would effectively be a file format change. Fixing the parser is an option if we want to allow utf8 characters in legacy file formats.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

@JP, I just tested 4.0.7 and the sscanf parser does indeed work with the library included in this bug report so I will have to fix the parser. Given that this was released in 5.0.0, I can believe it took this long to find the bug.

Revision history for this message
jean-pierre charras (jp-charras) wrote :

@Wayne,

I am guessing the probability to encountering this bug is low: the parser stops reading a token if a space is found.
So encountering a UTF8 sequence that contains the 0x20 byte (and is not the "space" char) is perhaps not frequent.

Changed in kicad:
assignee: nobody → Wayne Stambaugh (stambaughw)
status: Triaged → In Progress
Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

I just pushed the fix for this. Please let me know if you are still having issues.

Revision history for this message
Aleksandr Sh (dsa-t) wrote :

I wonder if these characters with 0A are handled correctly:
ĊȊЊԊ؊܊ࠊऊਊଊఊഊช༊ညᄊሊጊᐊᔊᘊᜊ᠊ᤊᨊᬊᴊḊἊ ℊ∊⌊␊┊☊✊⠊⤊⨊⬊Ⰺⴊ⸊⼊《ㄊ㈊㌊㐊㔊㘊

I'm not sure why you would use them in component fields through.

Revision history for this message
Wayne Stambaugh (stambaughw) wrote :

@Aleksandr, I just did a copy of the character string you provided and successfully created a symbol with that name. Obviously kicad cannot display those characters in the name filed but it does save and load the symbol with that name correctly. There are some differences in the characters I am seeing on this bug report page compared to what windows is displaying but my guess is this is a font mapping issue.

Revision history for this message
Aleksandr Sh (dsa-t) wrote :

Right, UTF-8 does not allow 0x0A byte anywhere except for actual line feed character therefore there is no problems with line feeds in field values.

Revision history for this message
KiCad Janitor (kicad-janitor) wrote :

Fixed in revision a61a51f26e9e400431c9ac58994281263284010f
https://git.launchpad.net/kicad/patch/?id=a61a51f26e9e400431c9ac58994281263284010f

Changed in kicad:
status: In Progress → Fix Committed
Changed in kicad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.