Type variables can't be introduced in the body of a function

Bug #482947 reported by Matt Giuca
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mars
Triaged
Wishlist
Matt Giuca

Bug Description

Mars doesn't allow a new type variable to be mentioned in the body of a function. I believe this was by design, but it needs more thought.

Assuming a function id :: a -> a, run this code:

# Define id for ints in a roundabout way
def int_id(x :: Int) :: Int:
    var f :: a -> a
    # f is the id function (works on any type)
    f = id
    return f(x)

Syntax error: Undefined type variable "a".

It should be happy to introduce a local variable f :: a -> a.

This rule was introduced to prevent something like "var x :: a" (because it is "meaningless" to have a variable of an unspecified type). However, it's conceivable to have a variable of such a type (e.g., we could introduce a primitive 'default :: a', which always returns the "default" value for any type). In this case, x could be used several times, each time producing a different default value, given different types.

The only thing we want to prevent is code which explicitly assigns a concrete type to such a variable. But that's already prevented (e.g., "Type error in statement 'x = 4'; Term: 4; Type: Int; Expected: a).

So it *should* be possible to introduce type variables in a function's body. The only problem is that currently, all type variables in a function's body are (from the internal viewpoint) existentially quantified (there exists a type for x; we don't know what it is). Type vars introduced in the function's body would be universally quantified, like globals (for all types, x has a value; we can pick any we like). Not sure how easy this would be to implement.

Tags: types
Revision history for this message
Matt Giuca (mgiuca) wrote :

Currently, the strategy for ensuring that the type variables in function arguments/return are suitable for *all* values (universal quantification) is to bind them to dummy atoms. All that should be required is to *not* bind the type variables introduced in the body of the function, and then they should be existentially quantified (if they get bound, they get bound).

Revision history for this message
Matt Giuca (mgiuca) wrote :

This needs to be done properly. Note the following example:

def foo :: Int:
    var x :: a
    var i :: Int
    var s :: Array(Int)
    x = error("xxx")
    i = x
    s = x
    return 0

Currently, this will produce the error "Syntax error: Undefined type variable "a"."

If we implement this fix, then it *should* allow the variable x (of type "a") to be treated like a global variable of type a -- arbitrarily cast to any other type *without* becoming bound. Type variables should only be bound on calls to functions (during the call).

The line "i = x" should be allowed, since x :: a. Treat x as an Int. (This would be a better example if we had the "default" builtin, mentioned above, rather than error). Now this assignment should *not* bind the type variable "a" to the type Int.

This is important because the next line, "s = x" should also be allowed, treating x as an Array(Int). A naive implementation of the fix for this bug may result in the following error:

Type error in statement 's = x'
        Term: x
        Type: Int
        Expected: Array(Int).

Another thing to test is that it should be *restricted* from assignments of values with too narrow a type. A variable "x :: a" means "x can be used as any type" (for reading), NOT "x can be assigned any type". The above example only works because I assign error("foo") :: a. The following should NOT work:

def bar :: Int:
    var x :: a
    x = 4
    return 0

This should produce:

Type error in statement 'x = 4'
        Term: 4
        Type: Int
        Expected: a.

This is consistent with type variables in the arguments/return (and already works that way for such variables).

Revision history for this message
Matt Giuca (mgiuca) wrote :

Note also that mentioning the same variable in the body as already mentioned in the head should refer to the *same* variable (and thus be inflexible, as it is universally quantified). This is different from Haskell, where referring to a type variable in the body is considered to be a different variable.

Consider:

def foo(x :: a) :: Int:
    var y :: a
    y = error("foo")
    return y

This (rightly) produces:

Type error in statement 'return y'
        Term: y
        Type: a
        Expected: Int.

As a can be bound to anything the caller wants (not necessarily Int). However, with this bug fixed, the following code should be legal:

def foo(x :: a) :: Int:
    var y :: b
    y = error("foo")
    return y

Now y is a different type variable to x, so it may be read as any type.

Revision history for this message
Matt Giuca (mgiuca) wrote :

Compare to Haskell and Mercury:

Mercury behaves exactly like Mars if we remove the artificial limitation that type variables can't be introduced in the body (this can be done in parsem.m). Type variables are all quantified over the whole function/predicate, so any one binding causes them to become permanently bound. This means Mercury fails most of the above test cases. You could argue that this means we shouldn't have to fix it, but I think it makes lifted functions not work entirely correctly. Importantly, if I lift a parameterised function, I can only use it on a single type. However, Mercury *does* behave as we want on the most recent example above; the re-use of a header type variable in the body is considered to be the same variable. Much like Mars prevents re-binding of such variables by binding them to dummy atoms, Mercury considers header type variables to be existentially-quantified on the inside of a function.

Haskell has the opposite behaviour, fixing all the Mercury problems but not respecting the type variable names bound in the head. This is mostly the behaviour we want. In Haskell, type expressions automatically universally quantify type variables. Most of the above examples work, except that re-using the name 'a' is considered to be a fresh variable.

We need a behaviour (which mostly mimics Haskell), in which "unification" is asymmetrical. For example, on assignment statements (v = e), it must be possible for the type of e to be more general than the type of v, but not vice versa. For example, an assignment of an (a -> a) to a (Int -> Int) is possible (as a -> a is a subset of Int -> Int, as all a -> a functions are members of Int -> Int). However, an assignment of an (Int -> Int) to an (a -> a) is not possible.

Also, if assigning an (a -> a) to an (Int -> Int), it should *not* cause 'a' to be bound to Int. It's like a class cast in Java -- it's just changing the type for this assignment, and doesn't place any constraints on the existing thing of type (a -> a).

If we add type inference, then we also need to support the possibility of either side not having a known type. In this case, one can assign e to a (Int -> Int), in which case, e should be annotated (Int -> Int). This should be provisional -- if e is later assigned to a (Float -> Float), then e should be generalised to (a -> a).

Revision history for this message
Matt Giuca (mgiuca) wrote :

One thought: Why should we allow local variable assignments to be universally quantified when arguments are always existentially quantified? Under this scheme, we can't accept something of type (forall a. a -> a) as an argument, since any argument of type (a -> a) is internally existentially quantified. Why are locals so special?

I tried Haskell (which *technically* defines let variables as simple lambdas; therefore local variables *can't* be so special), and got a surprising result:

foo = (x, i, s)
    where
    x = error "xxx"
    i = x :: Int
    s = x :: Float

-- bar same as foo
bar = let x = error "xxx" in
      let i = x :: Int in
      let s = x :: Float in
      (x, i, s)

-- baz same as bar
-- But type error!
baz = (\x ->
          let i = x :: Int in
          let s = x :: Float in
          (x, i, s)) (error "xxx")

foo and bar check out OK, but baz gives this error:
Type error in type annotation
*** Term : x
*** Type : Int
*** Does not match : Float

Interestingly behaving like Mercury... so let bindings *aren't* the same as lambdas! In this last case, x's type can't be inferred, since if it was inferred as 'a', that would mean the caller can choose a concrete type, *not* that the caller must supply a universally-typed value.

This can probably be solved with explicit existential typing.

Matt Giuca (mgiuca)
Changed in mars:
milestone: none → 0.3
Revision history for this message
Matt Giuca (mgiuca) wrote :

Read this section of the Haskell report carefully:
http://www.haskell.org/onlinereport/decls.html#sect4.5

Do we have to deal with monomorphism, etc? (We don't have nested functions, which eases things a bit.)

Revision history for this message
Matt Giuca (mgiuca) wrote :

Note on comment #5, the Haskell report addresses this:
http://www.haskell.org/onlinereport/exps.html#sect3.12

"This translation does not preserve the static semantics because the use of case precludes a fully polymorphic typing of the bound variables."

I'm pretty sure this is what they mean.

Revision history for this message
Matt Giuca (mgiuca) wrote :

A better example:

def f() :: Pair(a -> a, b -> b):
    var g :: c -> c
    g = id
    return Pair(g, g)

This currently says Syntax error: Undefined type variable "c".
If you change g to a -> a, it says:
Type error in statement 'return Pair(g, g)'
        Term: Pair(g, g)
        Type: Pair(a -> a, a -> a)
        Expected: Pair(a -> a, b -> b).

It should consider g to have type forall c. c -> c, and f to have type forall a, b. () -> Pair(a -> a, b -> b).

Therefore, in the body, a and b are bound while c is free. Unifying forall c. c -> c with a -> a is easy, and makes no bindings, so it can be unified with both.

Note that this does work:

def g() :: Pair(a -> a, a -> a):
    return Pair(id, id)

Currently, when you refer to a global variable, it implicitly assumes all of its type variables are forall, and instantiates fresh copies of those variables, so they always unify with anything. The fix would be to remove this implicit instantiation, and instead explicitly store the types of all globals with a "forall" for each type variable. Now, referring to globals is not a special case. Instead, on unification, each universally quantified variable is freshly instantiated, so it can unify with a different type each time.

Revision history for this message
Matt Giuca (mgiuca) wrote :

> Do we have to deal with monomorphism, etc?
No, the "monomorphism restriction" only applies to constrained type variables (i.e., this is an issue with type classes). It's basically there to prevent values which are not known functions from having an unknown type (and therefore representing a number of different values, depending on the eventual type). Values with an unknown type which is not constrained are OK, because the only value possible is _|_.

Revision history for this message
Matt Giuca (mgiuca) wrote :

An even better example -- actually use a universally-quantified object multiple times with different types.
(g could be something more useful, like a -> Pair(a, a)).

def f() :: Pair(Int, Array(Int)):
    var g :: a -> a
    var i :: Int
    var a :: Array(Int)
    g = id
    i = g(4)
    a = g([1,2,3])

Revision history for this message
Matt Giuca (mgiuca) wrote :

OK, so the solution is to keep each function with its own "type environment" (as it currently has), but to add explicit universal quantification to types, and make it so for each variable, all type variables which aren't in the function's type environment become universally quantified (in fact, explicit quantification may be unnecessary).

Unifying types checks that universally-quantified variables are consistent but does not bind them in the type environment. Therefore, every time they are used they can be bound differently. Global variables are simply universally quantified. Therefore, we don't have to fix bug #488595 -- there doesn't have to be a "public varset".

Revision history for this message
Matt Giuca (mgiuca) wrote :

This is more complicated than I thought. There are several solutions:
1. Make all body type vars monomorphic. Allow type variables to be introduced in the body of a function, but they will be monomorphic (that would be easy). The original example of id would work, but note that a would be early-bound to Int, and f could not be used as a generic id function. (So if you return Pair(f, f), then the type would be Pair(a -> a, a -> a).)
2. Make body type vars polymorphic if certain conditions apply. This is very tricky, and requires an explicit forall in the internal type representation (not necessarily in the syntax -- see Haskell). The reason it needs to be explicit in the type representation is that types with polymorphic variables would have forall on the outside, but then they could be put into a wider expression (such as a Pair), and need to have more granular forall quantification. For example, if f = id, and you return Pair(f, f), then the type of the Pair must be Pair(forall a. a -> a, forall a. a -> a), so that the ultimate type is Pair(b -> b, c -> c). This requires the generalisation step from Haskell.

Note that (1) is ALREADY in place, but only for implicitly-typed variables (you can do this currently with pattern bindings, soon with bug #483082, you'll be able to do it for any variable). You get monomorphism. So all we'd fix would be to allow explicit monomorphic type variables to be declared in the function body.

Note that a third option is simply to do nothing -- allow implicit monomorphic variables but not explicit ones. This is possibly desirable, because if we allowed explicit variables, the following code would be allowed:

def f() :: Int:
    var x :: a
    x = Int
    return x

This isn't a serious problem -- it's just confusing. The type variable a would be bound to Int. Note that we can't make a a rigid type variable (like we do for arguments), because then it would prohibit any assignment at all.

I'd like (2) but I'll refrain -- it requires a lot of work and changes to the data structure.

So possibly the solution will be WontFix, but we'll have implicit monomorphic type variables allowed.

Revision history for this message
Matt Giuca (mgiuca) wrote :

Marking as Won't Fix. One day, may reassess. In the meantime, you can't mention a new type variable in the body of a function (but you can introduce implicit monomorphic type variables by assigning without declaring variables).

Changed in mars:
status: Triaged → Won't Fix
Matt Giuca (mgiuca)
Changed in mars:
milestone: 0.3 → none
Revision history for this message
Matt Giuca (mgiuca) wrote :

Actually, I'm bringing this back as a bug. The reason it was flagged WontFix is that if you declare a variable to have a type which is a non-head variable (e.g., var x :: a, where a is not in the function heading), it could get bound, and that wouldn't make much sense.

The solution is not to *completely* disallow explicit type variables in the body, but to make them rigid, like head variables. This means it is illegal to bind them. Thus if you say var x :: a, you can't assign an Int to it, and can only use it in such a way that its type could be anything (e.g., you could assign "error" to it). This is almost useless, but it makes it possible to explicitly declare any variable (otherwise, some fairly-useless code could only be expressed with implicit variable typing).

Consider:

def foo() :: Int:
    f = id
    x = error("bad")
    y = f(x)
    return 0

Without this fix, there is no way to explicitly declare any of those variables. This fix allows the following code:

def foo() :: Int:
    var f :: a -> a
    var x :: a
    var y :: a
    f = id
    x = error("bad")
    y = f(x)
    return 0

Which is fine because a is never bound.

This doesn't allow the original example of this bug (currently a parse error), which is now a type error:

def int_id(x :: Int) :: Int:
    var f :: a -> a
    f = id
    return f(x)

A type error because a gets bound to Int. It would be fine if the declaration was changed to "var f :: Int -> Int" (which is what f would be inferred as). Remember, these are still monomorphic.

Changed in mars:
importance: Low → Wishlist
status: Won't Fix → Triaged
Revision history for this message
Matt Giuca (mgiuca) wrote :

This bug is now documented as part of the language spec. If the bug is fixed, fix the language spec (doc/ref/procedures, under Local variables).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.