Invalid character in 'bareword' DB field crashes IOC

Bug #1505247 reported by Ralph Lange
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
Undecided
Andrew Johnson

Bug Description

When a DB file contains a 'bareword' as a field value, invalid characters in that bareword may crash the IOC.

Example (replacing dbExample1.db in the example template from Base, note the "wrong" opening double quotes):

    record(calc, "$(user):calcExample")
    {
        field(CALC, “some-nonsense$%%&")
    }

The IOC sometimes just complains:

    dbLoadTemplate "db/user.substitutions"
    Error: Invalid character '▒'
     at or before "▒" in file "db/dbExample1.db" line 3
    Error: Invalid character '▒'
    Error: Invalid character '▒'
    Error: Invalid character '$'
    Error: syntax error
    Error: dbRecordHead: tempList not empty
     at or before ")" in file "db/dbExample2.db" line 1
    Error: dbRecordBody: tempList not empty
    dbLoadRecords "db/dbSubExample.db", "user=langer"
    ## Set this to see messages from mySub
    #var mySubDebug 1
    ## Run this to trace the stages of iocInit
    #traceIocInit
    cd "/home/ralph/work/CODAC/TTT/iocBoot/ioccrashtest"
    iocInit
    Starting iocInit

But sometimes it crashes:

    dbLoadTemplate "db/user.substitutions"
    Error: Invalid character '▒'
     at or before "▒" in file "db/dbExample1.db" line 3
    Error: Invalid character '▒'
    Error: Invalid character '▒'
    Error: Invalid character '$'
    Error: syntax error
    Error: dbRecordHead: tempList not empty
     at or before ")" in file "db/dbExample2.db" line 1
    [ 12:54:13 ]
    ralph @ machine : ~/work/CODAC/TTT $

Tags: dbstatic
Revision history for this message
Ralph Lange (ralph-lange) wrote :

Fun fact:
Having three "Invalid character" messages is related to the fact that the "left double quotation mark" character is encoded in UTF-8 as three bytes: e2 80 9c.

Revision history for this message
mdavidsaver (mdavidsaver) wrote : Re: [Bug 1505247] Re: Invalid character in 'bareword' DB field crashes IOC

On 10/12/2015 11:16 AM, Ralph Lange wrote:
> Fun fact:
> Having three "Invalid character" messages is related to the fact that the "left double quotation mark" character is encoded in UTF-8 as three bytes: e2 80 9c.
>

FYI I did start the process of pulling in updated versions of flex and
antelope (aka byacc) which have 8-bit character support (prereq for
unicode).

https://code.launchpad.net/~epics-core/epics-base/lexyacc-update

I gave up for an unrelated reason. My target feature (bison style
destructor support) wasn't fully implemented (only for non-reentrant
parser). However, aside from some compiler warnings (unused static
functions) it works with existing code.

Revision history for this message
Andrew Johnson (anj) wrote :

For barewords the list of allowed characters is fairly strict and is well-documented in the AppDevGuide. I get the same "Invalid character" error if I try to use '!' or '$' in a bareword value for example.

I agree that this should not crash the IOC, but I am unable to reproduce a crash with 3.15.3-pre1. Do you have hints ideas on how to trigger one? I was editing the standard dbExample2.db file

This also raises the question of character encoding and what to do about characters above 0x7f, which I will call high-bit characters. The IOC does accept them in quoted strings, treating them just like any other character. However it pays no attention to the LANG environment variable, and in the event of truncating of a string it will quite happily break multi-byte characters in the middle of the sequence, such as in your error messages above. This behavior is OK for single-byte character encodings but not for UTF-8, which we don't really support.

It is actually quite easy to modify the above 'Invalid character' message to print the hex ordinal of the bad character like this, but fixing the 'at or before' message after it is much harder:

dbLoadTemplate "db/user.substitutions"
Error: Invalid character 0xe2
 at or before "�" in file "db/dbExample2.db" line 4
Error: Invalid character 0x80
Error: Invalid character 0x9c
Error: syntax error
Error: dbRecordHead: tempList not empty
 at or before ")" in file "db/dbExample2.db" line 1
Error: dbRecordBody: tempList not empty

We shouldn't be printing unprintable characters anyway so that is a bug which I will fix on the 3.14 branch. However I'm not sure that we can do much better for UTF-8 encoding with the existing parser. I don't want to get into trying to recognize and support UTF-8 quotation mark characters in the lexer, there are so many different ones in different locales that it would be a nightmare.

Thoughts?

Revision history for this message
Ralph Lange (ralph-lange) wrote :

My scope for this bug is only to have the IOC not crash.
This should be fixed now, in 3.14.

8 bit, UTF-8 support etc are IMHO 3.16 material of low priority.

@Andrew: Note that I changed the dbExample1.db file, and the crash occurred while the IOC was reading dbExample2.db. Changing the latter might keep the crash from happening.
I was not doing any other changes, and the crashes occur on current trunk, using Linux 64 bit.

Revision history for this message
Andrew Johnson (anj) wrote :

Crashing achieved using your version of dbExample1.db on both 3.14 and 3.15 branches. Could be more than one bug, but I think these are two different symptoms of a single problem:

1. dbRecordBody() was calling dbFreeEntry(0), which wasn't being checked for. I will fix that although it may be unnecessary.
2. The parser's tempList was getting corrupted.

I think this is all as a result of the parser trying to continue when it sees bad characters. If I change the calls to yyerror() into yyerrorAbort() inside the dbLex.l file I can't trigger the crashes any more and the tempList corruption goes away. I still get the Invalid character reports for the left-doublequote character though, which are useful.

I believe the % characters in your bareword string may have had something to do with the crash problem, they were probably resulting in the lexer returning a tokenCDEFS for the end of your nonsense value (which is why those chars were not reported as invalid).

Patch (3.14) attached for your approval, I can commit this tomorrow.

Revision history for this message
Ralph Lange (ralph-lange) wrote :

Thanks a lot!

This patch works fine on 3.14 and (slightly modified) on 3.15, keeping the IOC from crashing and getting nicer hex output for non-printables.

Please commit whenever you find time.

Andrew Johnson (anj)
Changed in epics-base:
status: New → Fix Committed
assignee: nobody → Andrew Johnson (anj)
Revision history for this message
Ben Franksen (bfrk) wrote :

On 10/12/2015 06:27 PM, mdavidsaver wrote:
> On 10/12/2015 11:16 AM, Ralph Lange wrote:
>> Fun fact: Having three "Invalid character" messages is related to
>> the fact that the "left double quotation mark" character is encoded
>> in UTF-8 as three bytes: e2 80 9c.
>
> FYI I did start the process of pulling in updated versions of flex
> and antelope (aka byacc) which have 8-bit character support (prereq
> for unicode).
>
> https://code.launchpad.net/~epics-core/epics-base/lexyacc-update
>
> I gave up for an unrelated reason. My target feature (bison style
> destructor support) wasn't fully implemented (only for non-reentrant
> parser). However, aside from some compiler warnings (unused static
> functions) it works with existing code.

I recommend using re2c instead of the cranky flex. No need to patch and
thus to bundle it with base. It is, however, not a drop-in replacement.
For instance, the the user has to write the code that supplies the input
data (the text to be scanned); re2c just generates code to recognize
regular expression, no more, no less.

Cheers
Ben
--
"Make it so they have to reboot after every typo." ― Scott Adams

Andrew Johnson (anj)
Changed in epics-base:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.