KiCad

Eeschema allows invalid text paste

Bug #1798144 reported by Seth Hillbrand on 2018-10-16

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	KiCad	Fix Released	Critical	Seth Hillbrand	KiCad 5.0.2

Bug Description

Pasting unicode (e.g. 😕) into a net label in Eeschema causes segfault.

In master, the fault is caught but the GAL context lock is not released and a wxdialog is shown. This leads to a second context lock attempt and deadlock.

Application: kicad
Version: (5.0.1-7-gcccf4cf20), debug build
Libraries:
    wxWidgets 3.0.2
    libcurl/7.52.1 OpenSSL/1.0.2l zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) libssh2/1.7.0 nghttp2/1.18.1 librtmp/2.3
Platform: Linux 4.9.0-8-amd64 x86_64, 64 bit, Little endian, wxGTK
Build Info:
    wxWidgets: 3.0.2 (wchar_t,wx containers,compatible with 2.8) GTK+ 2.24
    Boost: 1.62.0
    OpenCASCADE Community Edition: 6.8.0
    Curl: 7.52.1
    Compiler: GCC 6.3.0 with C++ ABI 1010

Build settings:
    USE_WX_GRAPHICS_CONTEXT=OFF
    USE_WX_OVERLAY=OFF
    KICAD_SCRIPTING=OFF
    KICAD_SCRIPTING_MODULES=OFF
    KICAD_SCRIPTING_WXPYTHON=OFF
    KICAD_SCRIPTING_ACTION_MENU=OFF
    BUILD_GITHUB_PLUGIN=ON
    KICAD_USE_OCE=ON
    KICAD_USE_OCC=OFF
    KICAD_SPICE=OFF

Revision history for this message

Seth Hillbrand (sethh) wrote on 2018-10-16:

Testing this further, this is only true for any >16-bit unicode character. You can type it in or paste. The confused face (linux) is Ctrl-Shift-u 0001f615

Revision history for this message

KiCad Janitor (kicad-janitor) wrote on 2018-10-16:

Fixed in revision b37bc694768128cb051ce368d43dd262ab94598c
https://git.launchpad.net/kicad/patch/?id=b37bc694768128cb051ce368d43dd262ab94598c

Changed in kicad:
status:	New → Fix Committed
assignee:	nobody → Seth Hillbrand (sethh)

Revision history for this message

Dick Hollenbeck (dickelbeck) wrote on 2018-10-22:

Seth,

I am not so sure about this patch. This is one version of the source to the called function:

wxCharBuffer wxSafeConvertWX2MB(const wchar_t *ws)
{
if ( !ws )
return wxCharBuffer();

    wxCharBuffer buf(wxConvLibc.cWX2MB(ws));
    if ( !buf )
        buf = wxMBConvUTF8(wxMBConvUTF8::MAP_INVALID_UTF8_TO_OCTAL).cWX2MB(ws);

return buf;
}

A couple of things I understand to be true:

1) On windows, the default encoding in process (aka locale) is not utf8. So wxConvLibc.cWX2MB will not produce utf8. And since you are starting from a 16 bit character, this will not fail, and you will end up with something other than UTF8 on Windows, because the if test fails.

2) On linux, the default encoding in process is often utf8, but again, starting now with a 32 bit character, why would wxConvLibc.cWX2MB() ever fail, and if the in process encoding, aka locale, is not UTF8, you do not have UTF8 in the result.

3) On Windows, since the input character is 16 bits wide, you cannot be sure that the input string is actually unicode, since not all unicode can be represented with 16 bit characters. The input could be UTF16.

Questions:

1) On windows, does WX assume that these 16 bit wchar's are UTF16, and if so, is
wxMBConvUTF8(wxMBConvUTF8::MAP_INVALID_UTF8_TO_OCTAL).cWX2MB(ws) going to first convert each character to a 32 bit platform neutral unicode character before going to UTF8, like libiconv does?

2) Why don't you use just the 2nd part of the above function straight away. This paradigm of trying one thing then another is the right one when coming from 8 bit characters, but not when coming from 32 bit characters where you in the 32 bit case cannot fail.

The 16 bit case is simply unfortunately different. The wx designers made a tough call in even using that bit width. I suspect it is actually UTF16.

In the Debug build there was support in this module to test if the 8 bit string was actually UTF8, but I don't know if those tests were valid for unicode content wider than 16 bits.

All this to say that your fix is in my view not producing UTF8, and once you let non UTF8 characters into this class, it can be very hard to find the source of long delayed problems which pop up much later when that string is converted or used much later.

HTH,

Dick

Seth,

I am not so sure about this patch.  This is one version of the source to the called function:

wxCharBuffer wxSafeConvertWX2MB(const wchar_t *ws)
{
    if ( !ws )
        return wxCharBuffer();

wxCharBuffer buf(wxConvLibc.cWX2MB(ws));
    if ( !buf )
        buf = wxMBConvUTF8(wxMBConvUTF8::MAP_INVALID_UTF8_TO_OCTAL).cWX2MB(ws);

return buf;
}

A couple of things I understand to be true:

1) On windows, the default encoding in process (aka locale) is not utf8.  So wxConvLibc.cWX2MB will not produce utf8.  And since you are starting from a 16 bit character, this will not fail, and you will end up with something other than UTF8 on Windows, because the if test fails.

Questions:

1) On windows, does WX assume that these 16 bit wchar's are UTF16, and if so, is 
wxMBConvUTF8(wxMBConvUTF8::MAP_INVALID_UTF8_TO_OCTAL).cWX2MB(ws) going to first convert each character to a 32 bit platform neutral unicode character before going to UTF8, like libiconv does?

2) Why don't you use just the 2nd part of the above function straight away.  This paradigm of trying one thing then another is the right one when coming from 8 bit characters, but not when coming from 32 bit characters where you in the 32 bit case cannot fail.

The 16 bit case is simply unfortunately different.  The wx designers made a tough call in even using that bit width.  I suspect it is actually UTF16.

In the Debug build there was support in this module to test if the 8 bit string was actually UTF8, but I don't know if those tests were valid for unicode content wider than 16 bits.

HTH,

Dick

Revision history for this message

Wayne Stambaugh (stambaughw) wrote on 2018-10-22:

@Seth, would you please address Dick's concerns regarding your changes to UTF8 when you get a chance. This is important to get correct and since it has been changed in the 5.0 branch. I want to make sure it's fixed for the upcoming 5.0.2 release.

Revision history for this message

Seth Hillbrand (sethh) wrote on 2018-12-07:

Hi Dick/Wayne-

Apologies that I somehow missed this question. Here are the answers to Dick's two questions

1) xWX2MB() from wxWidgets strconv.cpp converts wchar via wxDecodeSurrogate which returns UTF-32 before conversion back to UTF-8.

2) We allow the multi-try conversion as the first step is lossless and the second step is lossy, converting invalid chars to their octal representation. They both call the same underlying routines.

This has changed a bit since you wrote the first message, so please have a look and see if you have any additional questions. I'll do my best to not overlook the message.

KiCad Janitor (kicad-janitor) on 2018-12-12

Changed in kicad:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1803131

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.