Change the semantics of built-in 'is'

Bug #708469 reported by Matt Giuca
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mars
Fix Committed
Medium
Matt Giuca

Bug Description

The built-in 'is' function currently has the following semantics (as defined in the manual and implemented in the interpreter):
- For ints, this is the same as eq,
- For functions, this is an error (same as eq),
- For everything else (arrays and user-defined types), it performs reference equality.

This can't be implemented in the LLVM backend or other low-level back-ends without a type dictionary (which is quite unnecessary) -- all because of the special cases for int and function. Change the semantics to simply state that it performs reference equality on all types. If the two values have the same identity (are the result of the same allocation operation, or reference the same static object in the program source), it returns true; else it returns false.

The key here for implementers is that this allows a great deal of implementation freedom (and carries some degree of unspecified behaviour), as being a pure language, it is not fully specified what it means for two values to have the same identity. We make a couple of specifications:
- If two objects are inequal (according to 'eq'), they MUST not have the same identity. For functions (where 'eq' is an error), if two functions do not return the same results for all inputs (including errors and nontermination), they MUST not have the same identity -- note that this is decidable because it only goes one way, and the simplest way to interpret this is that each function object has a different identity.
- If two function objects are the same function from the program source, they MUST have the same identity. Note that this does not apply to CGCs.
- If two objects are the result of the same allocation operation, they MUST have the same identity.

For anything not covered by the above rule, the result is unspecified, and may be determined by the implementation (and need not be consistent either). The idea is that 'is' requires no special effort on the part of implementers; it can just use whatever underlying identity mechanism is available. The above three rules are trivial to satisfy, and anything else depends on the implementation.

The only rule is that 'is' has to use the same semantics as the other impure functions and operations in determining what will be mutated. So the definition of "what does it mean to have the same identity" is no longer part of the 'is' specification, but it's more of a language-level thing (but it only affects impure constructs).

This rule allows a lot of implementation freedom. Firstly, it allows any optimisation where the compiler realises two values are equal, so gives them the same identity (any equal objects are allowed to have the same identity). This includes conservatively realising that two functions compute the same thing and combining them, and interning values such as array constants. It also allows an unboxed Int representation, where any two equal unboxed Ints will be considered to have the same identity. It even allows a sufficiently-detailed string representation of a function (such as that provided by the current Mars interpreter) to be used as the function's identity -- inequal functions will never be given the same string.

Importantly, an implementation which uses untagged boxed and unboxed values (such as LLVM) can implement 'is' without type dictionaries, simply by comparing the bit pattern of the values. Boxed values will give a proper reference comparison, and unboxed values will give an equality comparison which is sufficient as long as no mutation of the unboxed value is allowed.

The current interpreter implementation fits the new semantics, except for functions (which produce an error) -- instead, compare the string result of 'show' on the two functions.

Related branches

Matt Giuca (mgiuca)
Changed in mars:
milestone: 0.3.1 → 0.4
Revision history for this message
Matt Giuca (mgiuca) wrote :

Fixed in trunk r1297.

Changed in mars:
status: Triaged → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.