Bug #482947 “Type variables can't be introduced in the body of a...” : Bugs : Mars

Revision history for this message

Matt Giuca (mgiuca) wrote on 2009-11-15:

#1

Currently, the strategy for ensuring that the type variables in function arguments/return are suitable for *all* values (universal quantification) is to bind them to dummy atoms. All that should be required is to *not* bind the type variables introduced in the body of the function, and then they should be existentially quantified (if they get bound, they get bound).

Revision history for this message

Matt Giuca (mgiuca) wrote on 2009-11-15:

#2

This needs to be done properly. Note the following example:

def foo :: Int:
    var x :: a
    var i :: Int
    var s :: Array(Int)
    x = error("xxx")
    i = x
    s = x
    return 0

Currently, this will produce the error "Syntax error: Undefined type variable "a"."

If we implement this fix, then it *should* allow the variable x (of type "a") to be treated like a global variable of type a -- arbitrarily cast to any other type *without* becoming bound. Type variables should only be bound on calls to functions (during the call).

The line "i = x" should be allowed, since x :: a. Treat x as an Int. (This would be a better example if we had the "default" builtin, mentioned above, rather than error). Now this assignment should *not* bind the type variable "a" to the type Int.

This is important because the next line, "s = x" should also be allowed, treating x as an Array(Int). A naive implementation of the fix for this bug may result in the following error:

Type error in statement 's = x'
        Term: x
        Type: Int
        Expected: Array(Int).

Another thing to test is that it should be *restricted* from assignments of values with too narrow a type. A variable "x :: a" means "x can be used as any type" (for reading), NOT "x can be assigned any type". The above example only works because I assign error("foo") :: a. The following should NOT work:

def bar :: Int:
    var x :: a
    x = 4
    return 0

This should produce:

Type error in statement 'x = 4'
        Term: 4
        Type: Int
        Expected: a.

This is consistent with type variables in the arguments/return (and already works that way for such variables).

Revision history for this message

Matt Giuca (mgiuca) wrote on 2009-11-15:

#3

Note also that mentioning the same variable in the body as already mentioned in the head should refer to the *same* variable (and thus be inflexible, as it is universally quantified). This is different from Haskell, where referring to a type variable in the body is considered to be a different variable.

Consider:

def foo(x :: a) :: Int:
    var y :: a
    y = error("foo")
    return y

This (rightly) produces:

Type error in statement 'return y'
        Term: y
        Type: a
        Expected: Int.

As a can be bound to anything the caller wants (not necessarily Int). However, with this bug fixed, the following code should be legal:

def foo(x :: a) :: Int:
    var y :: b
    y = error("foo")
    return y

Now y is a different type variable to x, so it may be read as any type.

Revision history for this message

Matt Giuca (mgiuca) wrote on 2009-11-16:

#4

Compare to Haskell and Mercury:

Mercury behaves exactly like Mars if we remove the artificial limitation that type variables can't be introduced in the body (this can be done in parsem.m). Type variables are all quantified over the whole function/predicate, so any one binding causes them to become permanently bound. This means Mercury fails most of the above test cases. You could argue that this means we shouldn't have to fix it, but I think it makes lifted functions not work entirely correctly. Importantly, if I lift a parameterised function, I can only use it on a single type. However, Mercury *does* behave as we want on the most recent example above; the re-use of a header type variable in the body is considered to be the same variable. Much like Mars prevents re-binding of such variables by binding them to dummy atoms, Mercury considers header type variables to be existentially-quantified on the inside of a function.

Haskell has the opposite behaviour, fixing all the Mercury problems but not respecting the type variable names bound in the head. This is mostly the behaviour we want. In Haskell, type expressions automatically universally quantify type variables. Most of the above examples work, except that re-using the name 'a' is considered to be a fresh variable.

We need a behaviour (which mostly mimics Haskell), in which "unification" is asymmetrical. For example, on assignment statements (v = e), it must be possible for the type of e to be more general than the type of v, but not vice versa. For example, an assignment of an (a -> a) to a (Int -> Int) is possible (as a -> a is a subset of Int -> Int, as all a -> a functions are members of Int -> Int). However, an assignment of an (Int -> Int) to an (a -> a) is not possible.

Also, if assigning an (a -> a) to an (Int -> Int), it should *not* cause 'a' to be bound to Int. It's like a class cast in Java -- it's just changing the type for this assignment, and doesn't place any constraints on the existing thing of type (a -> a).

If we add type inference, then we also need to support the possibility of either side not having a known type. In this case, one can assign e to a (Int -> Int), in which case, e should be annotated (Int -> Int). This should be provisional -- if e is later assigned to a (Float -> Float), then e should be generalised to (a -> a).

Compare to Haskell and Mercury:

Mercury behaves exactly like Mars if we remove the artificial limitation that type variables can't be introduced in the body (this can be done in parsem.m). Type variables are all quantified over the whole function/predicate, so any one binding causes them to become permanently bound. This means Mercury fails most of the above test cases. You could argue that this means we shouldn't have to fix it, but I think it makes lifted functions not work entirely correctly. Importantly, if I lift a parameterised function, I can only use it on a single type. However, Mercury *does* behave as we want on the most recent example above; the re-use of a header type variable in the body is considered to be the same variable. Much like Mars prevents re-binding of such variables by binding them to dummy atoms, Mercury considers header type variables to be existentially-quantified on the inside of a function.

Haskell has the opposite behaviour, fixing all the Mercury problems but not respecting the type variable names bound in the head. This is mostly the behaviour we want. In Haskell, type expressions automatically universally quantify type variables. Most of the above examples work, except that re-using the name 'a' is considered to be a fresh variable.

We need a behaviour (which mostly mimics Haskell), in which "unification" is asymmetrical. For example, on assignment statements (v = e), it must be possible for the type of e to be more general than the type of v, but not vice versa. For example, an assignment of an (a -> a) to a (Int -> Int) is possible (as a -> a is a subset of Int -> Int, as all a -> a functions are members of Int -> Int). However, an assignment of an (Int -> Int) to an (a -> a) is not possible.

Also, if assigning an (a -> a) to an (Int -> Int), it should *not* cause 'a' to be bound to Int. It's like a class cast in Java -- it's just changing the type for this assignment, and doesn't place any constraints on the existing thing of type (a -> a).

If we add type inference, then we also need to support the possibility of either side not having a known type. In this case, one can assign e to a (Int -> Int), in which case, e should be annotated (Int -> Int). This should be provisional -- if e is later assigned to a (Float -> Float), then e should be generalised to (a -> a).

Revision history for this message

Matt Giuca (mgiuca) wrote on 2009-11-16:

#5

One thought: Why should we allow local variable assignments to be universally quantified when arguments are always existentially quantified? Under this scheme, we can't accept something of type (forall a. a -> a) as an argument, since any argument of type (a -> a) is internally existentially quantified. Why are locals so special?

I tried Haskell (which *technically* defines let variables as simple lambdas; therefore local variables *can't* be so special), and got a surprising result:

foo = (x, i, s)
    where
    x = error "xxx"
    i = x :: Int
    s = x :: Float

-- bar same as foo
bar = let x = error "xxx" in
      let i = x :: Int in
      let s = x :: Float in
      (x, i, s)

-- baz same as bar
-- But type error!
baz = (\x ->
          let i = x :: Int in
          let s = x :: Float in
          (x, i, s)) (error "xxx")

foo and bar check out OK, but baz gives this error:
Type error in type annotation
*** Term : x
*** Type : Int
*** Does not match : Float

Interestingly behaving like Mercury... so let bindings *aren't* the same as lambdas! In this last case, x's type can't be inferred, since if it was inferred as 'a', that would mean the caller can choose a concrete type, *not* that the caller must supply a universally-typed value.

This can probably be solved with explicit existential typing.

Matt Giuca (mgiuca) on 2009-11-18

Changed in mars:
milestone:	none → 0.3

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-03-22:

#6

Read this section of the Haskell report carefully:
http://www.haskell.org/onlinereport/decls.html#sect4.5

Do we have to deal with monomorphism, etc? (We don't have nested functions, which eases things a bit.)

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-03-22:

#7

Note on comment #5, the Haskell report addresses this:
http://www.haskell.org/onlinereport/exps.html#sect3.12

"This translation does not preserve the static semantics because the use of case precludes a fully polymorphic typing of the bound variables."

I'm pretty sure this is what they mean.

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-03-22:

#8

A better example:

def f() :: Pair(a -> a, b -> b):
    var g :: c -> c
    g = id
    return Pair(g, g)

This currently says Syntax error: Undefined type variable "c".
If you change g to a -> a, it says:
Type error in statement 'return Pair(g, g)'
        Term: Pair(g, g)
        Type: Pair(a -> a, a -> a)
        Expected: Pair(a -> a, b -> b).

It should consider g to have type forall c. c -> c, and f to have type forall a, b. () -> Pair(a -> a, b -> b).

Therefore, in the body, a and b are bound while c is free. Unifying forall c. c -> c with a -> a is easy, and makes no bindings, so it can be unified with both.

Note that this does work:

def g() :: Pair(a -> a, a -> a):
return Pair(id, id)

Currently, when you refer to a global variable, it implicitly assumes all of its type variables are forall, and instantiates fresh copies of those variables, so they always unify with anything. The fix would be to remove this implicit instantiation, and instead explicitly store the types of all globals with a "forall" for each type variable. Now, referring to globals is not a special case. Instead, on unification, each universally quantified variable is freshly instantiated, so it can unify with a different type each time.

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-04-28:

#9

> Do we have to deal with monomorphism, etc?
No, the "monomorphism restriction" only applies to constrained type variables (i.e., this is an issue with type classes). It's basically there to prevent values which are not known functions from having an unknown type (and therefore representing a number of different values, depending on the eventual type). Values with an unknown type which is not constrained are OK, because the only value possible is _|_.

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-04-28:

#10

An even better example -- actually use a universally-quantified object multiple times with different types.
(g could be something more useful, like a -> Pair(a, a)).

def f() :: Pair(Int, Array(Int)):
    var g :: a -> a
    var i :: Int
    var a :: Array(Int)
    g = id
    i = g(4)
    a = g([1,2,3])

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-04-28:

#11

OK, so the solution is to keep each function with its own "type environment" (as it currently has), but to add explicit universal quantification to types, and make it so for each variable, all type variables which aren't in the function's type environment become universally quantified (in fact, explicit quantification may be unnecessary).

Unifying types checks that universally-quantified variables are consistent but does not bind them in the type environment. Therefore, every time they are used they can be bound differently. Global variables are simply universally quantified. Therefore, we don't have to fix bug #488595 -- there doesn't have to be a "public varset".

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-04-30:

#12

This is more complicated than I thought. There are several solutions:
1. Make all body type vars monomorphic. Allow type variables to be introduced in the body of a function, but they will be monomorphic (that would be easy). The original example of id would work, but note that a would be early-bound to Int, and f could not be used as a generic id function. (So if you return Pair(f, f), then the type would be Pair(a -> a, a -> a).)
2. Make body type vars polymorphic if certain conditions apply. This is very tricky, and requires an explicit forall in the internal type representation (not necessarily in the syntax -- see Haskell). The reason it needs to be explicit in the type representation is that types with polymorphic variables would have forall on the outside, but then they could be put into a wider expression (such as a Pair), and need to have more granular forall quantification. For example, if f = id, and you return Pair(f, f), then the type of the Pair must be Pair(forall a. a -> a, forall a. a -> a), so that the ultimate type is Pair(b -> b, c -> c). This requires the generalisation step from Haskell.

Note that (1) is ALREADY in place, but only for implicitly-typed variables (you can do this currently with pattern bindings, soon with bug #483082, you'll be able to do it for any variable). You get monomorphism. So all we'd fix would be to allow explicit monomorphic type variables to be declared in the function body.

Note that a third option is simply to do nothing -- allow implicit monomorphic variables but not explicit ones. This is possibly desirable, because if we allowed explicit variables, the following code would be allowed:

def f() :: Int:
    var x :: a
    x = Int
    return x

This isn't a serious problem -- it's just confusing. The type variable a would be bound to Int. Note that we can't make a a rigid type variable (like we do for arguments), because then it would prohibit any assignment at all.

I'd like (2) but I'll refrain -- it requires a lot of work and changes to the data structure.

So possibly the solution will be WontFix, but we'll have implicit monomorphic type variables allowed.

This is more complicated than I thought. There are several solutions:
1. Make all body type vars monomorphic. Allow type variables to be introduced in the body of a function, but they will be monomorphic (that would be easy). The original example of id would work, but note that a would be early-bound to Int, and f could not be used as a generic id function. (So if you return Pair(f, f), then the type would be Pair(a -> a, a -> a).)
2. Make body type vars polymorphic if certain conditions apply. This is very tricky, and requires an explicit forall in the internal type representation (not necessarily in the syntax -- see Haskell). The reason it needs to be explicit in the type representation is that types with polymorphic variables would have forall on the outside, but then they could be put into a wider expression (such as a Pair), and need to have more granular forall quantification. For example, if f = id, and you return Pair(f, f), then the type of the Pair must be Pair(forall a. a -> a, forall a. a -> a), so that the ultimate type is Pair(b -> b, c -> c). This requires the generalisation step from Haskell.

Note that (1) is ALREADY in place, but only for implicitly-typed variables (you can do this currently with pattern bindings, soon with bug #483082, you'll be able to do it for any variable). You get monomorphism. So all we'd fix would be to allow explicit monomorphic type variables to be declared in the function body.

Note that a third option is simply to do nothing -- allow implicit monomorphic variables but not explicit ones. This is possibly desirable, because if we allowed explicit variables, the following code would be allowed:

def f() :: Int:
    var x :: a
    x = Int
    return x

This isn't a serious problem -- it's just confusing. The type variable a would be bound to Int. Note that we can't make a a rigid type variable (like we do for arguments), because then it would prohibit any assignment at all.

I'd like (2) but I'll refrain -- it requires a lot of work and changes to the data structure.

So possibly the solution will be WontFix, but we'll have implicit monomorphic type variables allowed.

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-05-03:

#13

Marking as Won't Fix. One day, may reassess. In the meantime, you can't mention a new type variable in the body of a function (but you can introduce implicit monomorphic type variables by assigning without declaring variables).

Changed in mars:
status:	Triaged → Won't Fix

Matt Giuca (mgiuca) on 2010-05-03

Changed in mars:
milestone:	0.3 → none

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-05-05:

#14

Actually, I'm bringing this back as a bug. The reason it was flagged WontFix is that if you declare a variable to have a type which is a non-head variable (e.g., var x :: a, where a is not in the function heading), it could get bound, and that wouldn't make much sense.

The solution is not to *completely* disallow explicit type variables in the body, but to make them rigid, like head variables. This means it is illegal to bind them. Thus if you say var x :: a, you can't assign an Int to it, and can only use it in such a way that its type could be anything (e.g., you could assign "error" to it). This is almost useless, but it makes it possible to explicitly declare any variable (otherwise, some fairly-useless code could only be expressed with implicit variable typing).

Consider:

def foo() :: Int:
    f = id
    x = error("bad")
    y = f(x)
    return 0

Without this fix, there is no way to explicitly declare any of those variables. This fix allows the following code:

def foo() :: Int:
    var f :: a -> a
    var x :: a
    var y :: a
    f = id
    x = error("bad")
    y = f(x)
    return 0

Which is fine because a is never bound.

This doesn't allow the original example of this bug (currently a parse error), which is now a type error:

def int_id(x :: Int) :: Int:
    var f :: a -> a
    f = id
    return f(x)

A type error because a gets bound to Int. It would be fine if the declaration was changed to "var f :: Int -> Int" (which is what f would be inferred as). Remember, these are still monomorphic.

Changed in mars:
importance:	Low → Wishlist
status:	Won't Fix → Triaged

Revision history for this message

Matt Giuca (mgiuca) wrote on 2010-05-06:

#15

This bug is now documented as part of the language spec. If the bug is fixed, fix the language spec (doc/ref/procedures, under Local variables).

Mars

Type variables can't be introduced in the body of a function

Bug Description

Other bug subscribers

Remote bug watches