Bug #1935037 “Invalid charactor in field name” : Bugs : EPICS Base

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-08-26 (last edit on 2021-08-26):

#1

As a test I removed the check for [a-zA-Z0-9] and characters like '-' work perfectly fine.
Only space is a problem.
Why is the limitation to [a-zA-Z0-9] there?

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-26:

#2

The character set restriction of PVD fields (in groups or otherwise) is meant to mirror that of C89 keywords.

> int abc;

... is valid

> int a-bc;

... is not

This was a somewhat arbitrary decision made many years ago. imo. changing it is primarily a question of backwards compatibility. Anything rejected by (or crashing) existing/older clients is a non-starter.

Be aware that there is no separate versioning for the PVD encoding. imo. an oversight in the design. The negotiated PVA protocol version is the only way to communicate handle changes. You should also be aware that the existing pvData (de)serialization APIs don't expose the negotiated protocol version. So making even seemingly small changes to the PVD encoding format would be complicated.

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-26:

#3

Also, I've taken advantage of this restriction with PVXS to allow an extended syntax for traversing into structures, unions, and arrays. eg.

> pvxs::Value top;
> top["value[5]->ufld1"] = ...

This takes a 'top' Structure, traverses to the 'value' field, which is an array of Unions. Then indexing the 5th element, and then accessing/selecting the 'ufld1' member of the Union.

Right now my parsing doesn't handle a Structure field named "in-valid", although it could (no following '>').

Revision history for this message

Niko Kivel (niko-kivel) wrote on 2021-08-26: Re: [Bug 1935037] Re: Invalid charactor in field name

#4

If this limitation stays like that, it's a complete show-stopper for me.
I'd like to create structures for EtherCAT terminal/slave data. There are
potentially hundreds of PVs to add to pva-groups. All based on their PV
names. Since the PV names have to adhere to the naming convention, they
have '-' between entity name and suffix, i.e. BI02-TimeRise,
master7-Online, BO03-Array, ...
Proposal:
I'd be perfectly happy with a replacement mechanism. I.e. upon adding a PV
to the group, replace non-supported characters with '_'.
Maybe with a flag to prevent automatic conversion? And cause the Error to
be thrown?

On Thu, Aug 26, 2021 at 6:25 PM mdavidsaver <email address hidden>
wrote:

> The character set restriction of PVD fields (in groups or otherwise) is
> meant to mirror that of C89 keywords.
>
> > int abc;
>
> ... is valid
>
> > int a-bc;
>
> ... is not
>
> This was a somewhat arbitrary decision made many years ago. imo.
> changing it is primarily a question of backwards compatibility.
> Anything rejected by (or crashing) existing/older clients is a non-
> starter.
>
> Be aware that there is no separate versioning for the PVD encoding.
> imo. an oversight in the design. The negotiated PVA protocol version is
> the only way to communicate handle changes. You should also be aware
> that the existing pvData (de)serialization APIs don't expose the
> negotiated protocol version. So making even seemingly small changes to
> the PVD encoding format would be complicated.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1935037
>
> Title:
> Invalid charactor in field name
>
> Status in EPICS Base:
> New
>
> Bug description:
> When creating a group in QSRV, the allowed characters are too
> restrictive
>
> error:
> Error Group not created: Invalid charactor '-' (45) in field name
> "BI02-TimeRise" must be A-Z, a-z, 0-9, or '_'
>
> expected behaviour:
> pva structures should allow for a more liberal set of characters that
> just what is covered by \w regex metacharacter.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/epics-base/+bug/1935037/+subscriptions
>
>

If this limitation stays like that, it's a complete show-stopper for me.
I'd like to create structures for EtherCAT terminal/slave data. There are
potentially hundreds of PVs to add to pva-groups. All based on their PV
names. Since the PV names have to adhere to the naming convention, they
have '-' between entity name and suffix, i.e. BI02-TimeRise,
master7-Online, BO03-Array, ...
Proposal:
I'd be perfectly happy with a replacement mechanism. I.e. upon adding a PV
to the group, replace non-supported characters with '_'.
Maybe with a flag to prevent automatic conversion? And cause the Error to
be thrown?

On Thu, Aug 26, 2021 at 6:25 PM mdavidsaver <1935037@bugs.launchpad.net>
wrote:

> The character set restriction of PVD fields (in groups or otherwise) is
> meant to mirror that of C89 keywords.
>
> > int abc;
>
> ... is valid
>
> > int a-bc;
>
> ... is not
>
> This was a somewhat arbitrary decision made many years ago.  imo.
> changing it is primarily a question of backwards compatibility.
> Anything rejected by (or crashing) existing/older clients is a non-
> starter.
>
> Be aware that there is no separate versioning for the PVD encoding.
> imo. an oversight in the design.  The negotiated PVA protocol version is
> the only way to communicate handle changes.  You should also be aware
> that the existing pvData (de)serialization APIs don't expose the
> negotiated protocol version.  So making even seemingly small changes to
> the PVD encoding format would be complicated.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1935037
>
> Title:
>   Invalid charactor in field name
>
> Status in EPICS Base:
>   New
>
> Bug description:
>   When creating a group in QSRV, the allowed characters are too
>   restrictive
>
>   error:
>   Error Group not created: Invalid charactor '-' (45) in field name
> "BI02-TimeRise" must be A-Z, a-z, 0-9, or '_'
>
>   expected behaviour:
>   pva structures should allow for a more liberal set of characters that
> just what is covered by \w regex metacharacter.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/epics-base/+bug/1935037/+subscriptions
>
>

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-08-27 (last edit on 2021-08-27):

#5

I would prefer to traverse the "normal" structure way with '.' instead of '->'.
In particular as using '.' in qsrv creates a structure (with char restriction removed from validateFieldName):

    info(Q:group, {
        "PV" : {
            "my.field": {+type:plain, +channel:VAL},
            "my.other-field": {+type:plain, +channel:VAL},
        }

$ pvget PV
PV structure
    structure record
        structure _options
            uint queueSize 0
            boolean atomic true
    structure my
        double field 0
        double other-field 0

Being able to access the fields in a simple way would be nice:

$ pvget PV.my.other-field

Revision history for this message

Timo Korhonen (ttkorhonen) wrote on 2021-08-27:

#6

+1 from me to Dirk's suggestion.

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-08-27 (last edit on 2021-08-27):

#7

In order to allow easy structure and array access while having the most flexibility in field names, I suggest to define a *small* set of disallowed chars:
* control codes and space (<= 0x20)
* structure and array access chars: . [ ]
* MAYBE some few other chars like $() because of macros and \ and " because they need to be escaped in JSON.

I would generally allow all other unicode (UTF-8) chars, in particular accented/modified Latin characters like äöüëßáóúíýàòùǹñøœåš, punktuation like {}'-+/*;:#% and Asian characters like 文本, ข้อความ. There is no reason (any more since unicode) to force everyone in the world to the limited charset used in English. The JSON parser understands unicode (that is part of the JSON standard).

Also UTF-8 is not particularly difficult to use:
* Nothing changes for English (ASCII)
* A 0x00 byte only appears in \u0000 a the end of a string, thus functions like strlen() and strdup() work perfectly fine getting the right amount of memory bytes.
* A multibyte character is easy to detect (high bit set) and traversing backwards through multibyte characters is easy too, thus truncated strings (-> strncpy) are easy to fix if they cut a character in half at the end.
* Calculating the screen width (number of chars) for adjusting screen output is easy too. (Just don't use strlen() for that).

Instead of blocking the use of UTF-8 with new restrictions, we should rather aim to support it fully in EPICS DBs (record names, string values, macro names).

Revision history for this message

Ralph Lange (ralph-lange) wrote on 2021-08-27:

#8

How will CALC-like expressions distinguish between a "-" operator and a "-" as part of a name?
(Same applies to the other operators, obviously, and brackets etc.)

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-08-27 (last edit on 2021-08-27):

#9

Which CALC allows names? The CALC expressions in calc, calcout, access security, etc only allow predefined function names and field variables A,B,C,...

If you have a CALC that allows record names as arguments, quote names containing special chars.

I do not think that we need to turn records and PVA stucture field names into C code, do we?

Revision history for this message

Ralph Lange (ralph-lange) wrote on 2021-08-27:

#10

That's why I wrote "CALC-like expressions".
Clients may want to allow things beyond the narrow limits of the existing calc/calcout records.

Fun fact: same thing caused Siemens to create all names for OPC UA with an extra set of double quotes around every name part in the hierarchy. Since special characters need to be quoted inside field values in EPICS databases...
field(INP, "@PLC1 ns=3;s=\"DB1\".\"struct\ 1\".\"element\ 2\"")

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-08-27:

#11

So they even allow quotes to be part of the string! That is much more than we ask for.
Funnier fact: They write invalid XML by not escaping their quotes in the quoted strings.

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-27:

#12

> I would prefer to traverse the "normal" structure way with '.' instead of

'.' is used to traverse a Structure. '->' is selecting a member of a Union.

https://mdavidsaver.github.io/pvxs/value.html#_CPPv4N4pvxs5ValueixERKNSt6stringE

> Attempt to access a descendant field.
>
> Argument may be:
>
> name of a child field. eg. “value”
> name of a descendant field. eg “alarm.severity”
> element of an array of structures. eg “dimension[0]”
> name of a union field. eg. “->booleanValue”
>
> These may be composed. eg.
>
> “dimension[0].size”
> ”value->booleanValue”

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-27:

#13

> I suggest to define a *small* set of disallowed chars:

I would like to see suggestions expressed in terms of what would be allowed.

Currently this can be expressed as:

> [a-zA-Z_][a-zA-Z0-9_]*

https://github.com/mdavidsaver/pvxs/blob/98edf61de8f2c3d826604d57f7094adbd2a3b308/src/type.cpp#L196

https://github.com/epics-base/pvDataCPP/blob/d3b4976ea2b0d78075511f14d7f7bf9620dd4d14/src/factory/FieldCreateFactory.cpp#L1294

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-08-27:

#14

> '.' is used to traverse a Structure. '->' is selecting a member of a Union.
WHY?

> I would like to see suggestions expressed in terms of what would be allowed.
[^][.()$ 0-9][^][.()$ ]*

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-27:

#15

Provided that the questions of compatibility are answered positively, I don't have an objection to allowing utf-8 (aka. 0x7f -> 0xff). However, I don't like the idea of allowing allowing more characters in the 0x00 -> 0x7e range beyond the current [a-zA-Z0-9_] set.

I gave the example of PVXS to illustrate how I am able to exploit a restricted character set in a (fairly fast) parser. This is not something I imagined back when this restriction was added.

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-27:

#16

@Dirk, by "what would be allowed." I meant a positive regex. Omissions are too easy with a negative character set. For example, quote characters.

> WHY?

My main point is that a restricted character makes it easier for future syntax to be designed around PVD field names. Be it CALC expressions in records or in OPI systems like cs-studio.

cf. "pva://SomePVName/subfield/subelement" in cs-studio. Which practically excludes '/' in both record and field names.

https://control-system-studio.readthedocs.io/en/latest/core/pv/doc/index.html#pv-access

As for the specific case of PVXS. Having a different separator for structure and union allows for fewer std::map lookups, giving faster parsing (PVD does a lot of these lookups...).

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-27:

#17

https://epics.anl.gov/core-talk/2021/msg01204.php

On 8/27/21 6:23 AM, Mooney, Tim M. via Core-talk wrote:
> Several calc records allow variable names, like $left, $target, etc. Transform for sure.

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-27:

#18

> Provided that the questions of compatibility are answered positively ...

For PVXS, field name conformance is only enforced for locally defined types. I've added a test to check that an "in-valid" field name can be decoded, and re-encoded (eg. by a PVA gateway). Some non-conformant names won't be accessible locally though.

https://github.com/mdavidsaver/pvxs/commit/2ab0d662bf3f74f32915d0e07383d8c30217fdd5

I don't have time at present to do this testing for pvDataCPP/AccessCPP, pvDataJava/AccessJava, or Kay's new PVA.

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-27:

#19

@Niko,

> I'd be perfectly happy with a replacement mechanism. I.e. upon adding a PV
> to the group, replace non-supported characters with '_'.

QSRV does not compute group or member names automatically. So I think this is a request for whatever person or tool is creating the group definition JSON for you.

> There are potentially hundreds of PVs to add to pva-groups.

Hundreds? Please let us know how this works out.

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-08-27:

#20

> QSRV does not compute group or member names automatically. So I think this is a request for whatever person or tool is creating the group definition JSON for you.

When the "tool" is dbLoadTemplate and you use macros, then you would to provide two of them, one for the originam PV name and one for the PVA field name. Without a mechanism to replace chars automatically, this is very error prone. Something like ${MACRO "but all - replaced with _"}, maybe: ${MACRO/[^0-9a-zA-Z]/_}

Revision history for this message

Niko Kivel (niko-kivel) wrote on 2021-08-27:

#21

>
>
> > I'd be perfectly happy with a replacement mechanism. I.e. upon adding a
> PV
> > to the group, replace non-supported characters with '_'.
>
> QSRV does not compute group or member names automatically. So I think
> this is a request for whatever person or tool is creating the group
> definition JSON for you.
>

no tool, simple substitution + template
https://github.com/paulscherrerinstitute/ecmccfg/blob/pva/db/Beckhoff_1XXX/ecmcEL1008.substitutions
https://github.com/paulscherrerinstitute/ecmccfg/blob/pva/db/generic/ecmc_binaryInput-chX.template

in this case there is no suffix, but you get the idea.

> > There are potentially hundreds of PVs to add to pva-groups.
>
> Hundreds? Please let us know how this works out.

What exactly do you want to know? I assume that pva/QRSV is tested and
benchmarked? I even more desperately hope that this was done with more than
just 'hundreds' of fields in groups!

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-08-27:

#22

Personally, no. I've tested with ~10, not hundreds, of group fields.

I don't know of any reason why it won't "work" in the strictest sense of not crashing, and eventually yielding consistent data. My curiosity is about performance. There is a strong application specific component here as groups tie in with locking is done with database records (aka. dbScanLock() ).

When a group is read out, on GET or when a monitor update is triggered, QSRV will lock *all* of the individual records in *sequence*. This means waiting for any record processing to complete, and preventing subsequent processing. So inclusion in a group couples scan times of the constituent records with each other, and with QSRV.

The worst case time to read out a group is the sum of the worst case processing times of the constituent records in isolation. Further, because the constituents are no longer in isolation, this also becomes the worst case scan time of each record. Each might have to wait for all the others, and QSRV.

EPICS Base

Invalid charactor in field name

Bug Description

Other bug subscribers

Remote bug watches