Race condition in CMAKE

Bug #1831643 reported by Seth Hillbrand on 2019-06-04
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KiCad
High
Seth Hillbrand

Bug Description

See [1][2] for details.

Custom targets can not be consumed safely by multiple threads.

[1] https://lists.launchpad.net/kicad-developers/msg40860.html
[2] https://lists.launchpad.net/kicad-developers/msg40890.html

Simon Richter (sjr) wrote :

FWIW, I've ran a git-bisect on the KiCad tree, with the following test script:

    #! /bin/sh -ex
    count=0
    for i in `seq 1 10`
    do
            git clean -fdx || exit 125
            mkdir Build || exit 125
            cd Build || exit 125
            cmake -DKICAD_SPICE=OFF -DCMAKE_BUILD_TYPE=Debug .. || exit 125
            make -j32 || count=$(($count + 1))
            cd ..
            if [ $count -ne 0 ] && [ $count -ne $i ]; then exit 1; fi
    done
    if [ $count -eq 10 ]; then exit 125; fi

Basically, this does up to 10 attempts per version, marks a version as "bad" if it got a mix of successful and failed builds, "skip" if the build always fails, and "good" if all ten builds succeed.

The result is

    There are only 'skip'ped commits left to test.
    The first bad commit could be any of:
    0617bffce0ebc2549adc05d6ed85b8166fb8e00f
    840e08fa7886814e816b2c37646da5ea6550e156
    300f5cb0821c7aca92c13c67f2df6dee88890a63
    We cannot bisect more!

My suspicion would be 0617bffce0, which turns eeschema into an OBJECT library to allow unit tests to work, but as mentioned in my mail, it's likely not the root cause, but rather just shifted the timing so we actually run into it.

Seth Hillbrand (sethh) wrote :

Great, thanks Simon!

I think I have a CMAKE solution here. A few minutes to test after work today and it should be ready.

Seth Hillbrand (sethh) wrote :

@Simon, @SteveFalco -

Can you both test the attached patch and see if it resolves the build issue for you?

Steven Falco (stevenfalco) wrote :

Unfortunately, it may make things worse. Now I got the error:

/builddir/build/BUILD/kicad-r15990-0ea75cb4/eeschema/dialogs/panel_sym_lib_table.cpp:29:10: fatal error: lib_table_lexer.h: No such file or directory
   29 | #include <lib_table_lexer.h>
      | ^~~~~~~~~~~~~~~~~~~
compilation terminated.

And I did not see the file lib_table_lexer.h anywhere in the build area.

I've attached the complete build log. The error occurs around line 3405.

Seth Hillbrand (sethh) wrote :

Wow, Steve, you find all the interesting things.

Looks like we forgot to make the library table a dependency of eeschema. We've just always gotten it by luck before.

Here's a corrected patch.

Steven Falco (stevenfalco) wrote :

Just (un)lucky I guess. Anyway, the attached log shows what I get with the new patch. Looks like common/lib_table_keywords.cpp is not found.

Seth Hillbrand (sethh) wrote :

One. More. Time. This time with _feeling_

Steven Falco (stevenfalco) wrote :

It worked once. But it will be hard to prove it will never fail. :-)

I'll run it a few more times and report if I see any errors.

KiCad Janitor (kicad-janitor) wrote :

Fixed in revision c6af38477d07b2b464c17b1232557f433047baa8
https://git.launchpad.net/kicad/patch/?id=c6af38477d07b2b464c17b1232557f433047baa8

Changed in kicad:
status: Triaged → Fix Committed
John Beard (john-j-beard) wrote :

I hate to bear bad news, especially when people have worked so hard to make a fix (and my commit exposed it), but Jenkins MSVC just failed to build master: beb434801 (on one out of 4 builds):

C:\Jenkins\workspace\windows-kicad-msvc-head\build\debug\cpu\x86\label\msvc\src\eeschema\dialogs\panel_sym_lib_table.cpp(29): fatal error C1083: Cannot open include file: 'lib_table_lexer.h': No such file or directory [C:\Jenkins\workspace\windows-kicad-msvc-head\build\debug\cpu\x86\label\msvc\build\eeschema\eeschema_kiface_objects.vcxproj]

https://jenkins.simonrichter.eu/job/windows-kicad-msvc-head/build=debug,cpu=x86,label=msvc/3106/

Also failed a different 1 of 4 on 838e8aef0 yesterday:

https://jenkins.simonrichter.eu/job/windows-kicad-msvc-head/build=debug,cpu=x86,label=msvc/3101/

Reopening this bug, as I assume it's the same thing.

Changed in kicad:
status: Fix Committed → Triaged
KiCad Janitor (kicad-janitor) wrote :

Fixed in revision 720889edd0db48e91cf87564e1b3ea6d333e97db
https://git.launchpad.net/kicad/patch/?id=720889edd0db48e91cf87564e1b3ea6d333e97db

Changed in kicad:
status: Triaged → Fix Committed
Seth Hillbrand (sethh) wrote :

Yeah, I was noticing that. Thanks! On the plus side, it gave me an opportunity to clean up the dependency building chain I had built up trying to fix it last time.

Nick Østergaard (nickoe) wrote :

I still have issue with this on some platforms. Ubuntu and fedora, even with -j2.

Is there some debug prints I can add or some debug information to cmake/make that I can add?

Seth Hillbrand (sethh) wrote :

Please send me the link to the build log. What error are you getting?

Nick Østergaard (nickoe) wrote :

eeschema/dialogs/panel_sym_lib_table.cpp:29:10: fatal error: lib_table_lexer.h: No such file or directory

From https://copr.fedorainfracloud.org/coprs/g/kicad/kicad/build/934856/

https://copr-be.cloud.fedoraproject.org/results/@kicad/kicad/fedora-28-x86_64/00934856-kicad/builder-live.log

(you may want to be patient while your browser loads the builder-live.log or wget it)

Seth Hillbrand (sethh) on 2019-06-13
Changed in kicad:
status: Fix Committed → Triaged
status: Triaged → In Progress
Seth Hillbrand (sethh) wrote :

Fixed in 8112a8ade

Please test when you have a chance.

Changed in kicad:
status: In Progress → Fix Committed
Nick Østergaard (nickoe) wrote :

@Seth, it looks like that fixes it, at least the build that has been consistently failing on it is green now. I expect the next fedora nightlies to be green as well unless some other bug found its way in.

https://copr.fedorainfracloud.org/coprs/g/kicad/kicad/build/935608/

Thank you!

Changed in kicad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers