Check for type recursion without boxing #3407

rtjoa · 2024-12-24T16:33:59Z

Summary

We only allow recursion in unboxed types to occur through boxes, otherwise the type is uninhabitable and usually also infinite-size.

Because check_well_founded already ruled out recursion through structural types (in our case, we care about unboxed tuples and aliases), we just look for a cycle in nominal unboxed types ([@@unboxed] types and unboxed records), tracking the set of seen paths. This gives an algorithm that is simpler than check_well_founded (whose correctness relies on some subtle properties of Tconstr's abbrev_memo).

We newly disallow recursive singleton [@@unboxed] types (type t = { t : t } [@@unboxed]), which are uninhabitable anyway.

Specification

For each group of mutually recursive type declarations, we define the following "type contains" transitive relation on type expressions:

Unboxed records and variants defined in the group contain their fields
- If type 'a t = #{ ...; lbl : u; ... }, type 'a t = { lbl : u } [@@unboxed], or type 'a t = U of u [@@unboxed] is in the recursive group, then 'a t contains u
Abbreviations defined in the group contain their expansions
- If type 'a t = u is in the recursive group then 'a t contains u
Types not in the recursive group contain the parameters indicated by their layout
- 'a t contains 'a if layout_of 'a or any occurs in 'a t's layout
- We don't yet have layout_of, so currently only consider any
Unboxed tuples contain their components: #(u_1 * u_2 * ...) contains all u_i

('a is a metavariable for zero or more type parameters; t is a metavariable for type constructors; u is a metavariable for type expressions)

If a path starting from the type expression on the LHS of a declaration contains two types with the same head type constructor, and that repeated type is an unboxed record or variant, then we give an error.

Examples

type t = #{ t : t list }
(* Allowed, [t list] is not in this recursive group, and
   has layout [value] so it doesn't contain anything *)

type t = #{ x : #(int * u) }
and u = T of t [@@unboxed]
(* Banned, t -> #(int * u) -> u -> t *)

type 'a id = #{ a : 'a }
type t = int id id
(* Allowed, [int id] is not in this recursive group, and
   has layout [value] so it doesn't contain anything *)

type 'a id = #{ a : 'a }
and t = int id id
(* Sadly, banned. t -> int id id -> int id; id appears twice *)

See typing-layouts-unboxed-records/recursive.ml for more.

Alternatives

Layout variables. We could introduce layout variables (for this check only), finding infinite-size types via the occurs check. Such an implementation might be more principled, accepting finite-size recursive types like type t = #{ t : t }. The current implementation gives more informative error traces, and leaves the flexiblity to switch to a less restrictive check in the future.

The algorithm from the Unboxed data constructors paper. This would accept types like type 'a id = #{ a : 'a } and t = int id id. I think this could be an appealing final solution, but some details are a bit subtle. See discussion on ocaml/ocaml#10479.

Generalize check_well_founded. My first implementation, which roughly generalizes check_well_founded over the "type contains" relation, accepts type 'a id = #{ a : 'a } and t = int id id. (On rtjoa.check-well-founded-unboxed-recursion, unpolished.) I'd hesitate to increase our reliance on check_well_founded, which I believe roughly implements the same algorithm as the paper above in a tricky way.

Review: @goldfirere

goldfirere

I'm not quite convinced here. This mostly works. But it misses some cases (so it's not fully sound) and catches some cases it shouldn't (so it's not fully complete). The old check -- reporting when e.g. value & (value & value) doesn't subjkind with value & value is still in place, so the lack of soundness doesn't matter all that much. And maybe the lack of completeness doesn't bite in practice? But given these drawbacks, and the fact that the old check worked (but with a somewhat confusing error message), I think I lean mildly against merging.

testsuite/tests/letrec-check/unboxed.ml

testsuite/tests/typing-layouts-unboxed-records/recursive.ml

typing/ctype.mli

typing/typedecl.ml

typing/ctype.mli

goldfirere · 2024-12-26T18:55:40Z

Actually, maybe I'm wrong about "mildly against merging". I agree that this improves the error message in the majority case, and that's pretty cool. And the implementation is simple enough. But at the least I'd leave an internal ticket about revisiting this in the future.

rtjoa

@goldfirere Thanks for the helpful review.

Here are my goals for this check:

Improve the programming experience for unboxed records
Prohibit all infinite-size types (types that would have a cyclic layout if we had layout variables without an occurs check)

It's because we are sound but not complete w.r.t. "no infinite-size types" that the error message complains about a stronger property, "no recursion without boxing," (it would be confusing to see a "no infinite-size types" error on type t = #{ t : t }).

So I'd agree that w.r.t. the stronger property of "no recursion without boxing," we are both unsound (your example with type recursion hidden behind an abstract signature, and your recursive module example) and incomplete (int good good). But I think that this is okay: the stronger property is only mentioned as it's a better error message.

I agree we'll eventually want to be more complete (e.g. allow int good good) - I'll create an internal ticket corresponding to the CR above check_unboxed_recursion.

testsuite/tests/typing-layouts-unboxed-records/recursive.ml

typing/ctype.mli

lpw25 · 2024-12-29T09:37:54Z

It is a little unclear from the description what the definition of recursion is. Does the new check still allow more useful things like:

type t = { t : t list } [@@unboxed]

What about if list where abstract?

rtjoa · 2024-12-30T08:04:12Z

@lpw25 The new check would allow that, including if list is abstract. A description of what exactly we allow (will also update comments):

For each group of mutually recursive type declarations, we define the following "type contains" relation on type expressions:

Unboxed records and variants defined in the group contain their fields
- If type 'a t = #{ ...; lbl : u; ... }, type 'a t = { lbl : u } [@@unboxed], or type 'a t = U of u [@@unboxed] is in the recursive group, then 'a t contains u
Abbreviations defined in the group contain their expansions
- If type 'a t = u is in the recursive group then 'a t contains u
Types not in the recursive group are treated as abstract
- 'a t contains 'a if layout_of 'a occurs in 'a t's layout
- For now, as we don't have layout_of, types not in the recursive group are skipped because they all have a finite-size layout
Unboxed tuples contain their components: #(u_1 * u_2 * ...) contains all u_i
The relation is transitive

('a is a metavariable for zero or more type parameters; t is a metavariable for type constructors; u is a metavariable for type expressions)

If a path starting from the type expression on the LHS of a declaration contains two types with the same head type constructor, and that repeated type is an unboxed record or variant, then we give an error.

Examples (@goldfirere - the third is newly allowed with a bug fix):

type t = #{ t : t list }
(* Allowed, [t list] is not in this recursive group, and
   has layout [value] so it doesn't contain anything *)

type t = #{ x : #(int * u) }
and u = T of t [@@unboxed]
(* Banned, t -> #(int * u) -> u -> t *)

type 'a id = #{ a : 'a }
type t = int id id
(* Allowed, [int id] is not in this recursive group, and
   has layout [value] so it doesn't contain anything *)

type 'a id = #{ a : 'a }
and t = int id id
(* Sadly, banned. t -> int id id -> int id; id appears twice *)

lpw25 · 2024-12-30T11:13:45Z

types not in the recursive group are skipped because they all have a finite-size layout

Presumably this is not the case for something like:

type ('a : any) t : any

is an any type just rejected directly currently?

rtjoa · 2024-12-30T18:02:27Z

is an any type just rejected directly currently?

Currently, an any type is rejected because all unboxed record/variant fields must be representable.

We might be concerned about programs like the following once we both drop this restriction and add layout_of:

module Id : sig
  type ('a : any) t : any
end = struct
  type ('a : any) t = 'a
end

type t = #{ t : t Id.t }

But I think it will be okay to accept this, since the type also won't be visibly cyclic to constrain_type_jkind (which is the cause of the bad errors this check is trying to avoid).

Another example worth thinking about is the following, once we have layout_of:

type 'a id : any = 'a
type t = #{ t : t id }

Speculatively: I think the check will reject this (when updated appropriately for layout_of) because we'll give id the layout layout_of 'a. We won't give id layout any because the only way jkind annotations can make a layout less specific is by preventing a sort variable from being defaulted to value.

lpw25 · 2024-12-30T18:50:39Z

But I think it will be okay to accept this, since the type also won't be visibly cyclic to constrain_type_jkind (which is the cause of the bad errors this check is trying to avoid).

What about if your Id module were a functor parameter?

rtjoa · 2024-12-30T21:50:55Z

But I think it will be okay to accept this, since the type also won't be visibly cyclic to constrain_type_jkind (which is the cause of the bad errors this check is trying to avoid).

What about if your Id module were a functor parameter?

If the functor parameter abstracts t as an any type, I think it should work similarly to the first example from my last comment: we'll accept it, but it should be fine since the type won't be visibly cyclic to constrain_type_jkind. (Another perspective: we want to accept this program, as rejecting it would break the abstraction boundary.)

module type S = sig
  type ('a : any) t : any
end

module F(Id : S) = struct
  type t = #{ t : t Id.t }
end

module Id : S = struct
  type ('a : any) t = 'a
end

module M = F(Id)

If the functor parameter exposes t's definition or abstracts t as a type with layout_of 'a, it should work similarly to the second example from my last comment: we'll reject F.t by considering t Id.t to contain t via its layout_of layout.

module type S = sig
  type ('a : any) t = 'a (* changed line *)
end

module F(Id : S) = struct
  type t = #{ t : t Id.t }
end
(* Error: the definition of t contains recursion without boxing
     t contains t Id.t
     t Id.t contains t
*)

Does this sound right?

lpw25 · 2024-12-31T07:15:14Z

My concern was with your M.t above. It is cyclic in the way you are worried about and visibly so. Will that cause problems for constrain_type_jkind?

lpw25 · 2024-12-31T07:20:32Z

Actually, your M.t isn’t quite the example I had in mind: you need to drop the : S from the definition of the Id module to make the problematic case.

rtjoa · 2024-12-31T08:22:44Z

I see the problem now, thanks. Current best guess for the fix: we consider a type not in the recursive group whose layout contains any to contain all of its type parameters.

rtjoa · 2024-12-31T23:09:10Z

Current best guess for the fix: we consider a type not in the recursive group whose layout contains any to contain all of its type parameters.

I made this change to future-proof against unboxed records with unrepresentable layouts, and refactored so we can more easily support layout_of later.

ccasin · 2025-01-03T13:37:11Z

I see the problem now, thanks. Current best guess for the fix: we consider a type not in the recursive group whose layout contains any to contain all of its type parameters.

I thought about it briefly and didn't come up with any examples where this heuristic misses anything or is overly conservative in a way that made me very sad (though it is certainly conservative).

I propose we (1) have Ryan check that this isn't so conservative that it rejects anything in our code base and then (2) merge this PR, unless @lpw25 can come up with another counter example. It will be much easier to figure out whether a better solution exists once we start working on allowing non-values in records and on layout_of, so as long as we're conservative I think we shouldn't invest a ton of time in perfecting this.

rtjoa · 2025-01-03T20:52:12Z

(1) have Ryan check that this isn't so conservative that it rejects anything in our code base

The check doesn't seem to reject anything:

I rebased this branch onto 5.2.0minus-4 as rtjoa.5.2.0minus-4-with-unboxed-recursion-check, very slightly modifying it because 5.2.0minus-4 doesn't have unboxed records. The check still works the same - many of the tests could be adapted to [@@unboxed] records, see 7e2da96 and 31b062e. I then used this to build our code base (see with-unboxed-recursion-check internally).

goldfirere

This looks all good from my perspective. I still think we'll eventually switch to use layout variables, but there's no reason not to merge this in the meantime.

typing/typedecl.ml

rtjoa requested a review from ccasin December 24, 2024 16:34

rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch 4 times, most recently from 8b38c59 to eb42fd2 Compare December 24, 2024 16:57

rtjoa requested review from goldfirere and removed request for ccasin December 24, 2024 17:34

rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch 2 times, most recently from 83ae0ff to 2247db3 Compare December 25, 2024 02:52

rtjoa added typing unboxed types labels Dec 25, 2024

goldfirere reviewed Dec 26, 2024

View reviewed changes

rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch from c5fc9cf to 19f511f Compare December 27, 2024 00:27

rtjoa commented Dec 27, 2024

View reviewed changes

rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch from d3459e6 to 3b9c7e7 Compare December 30, 2024 08:54

rtjoa mentioned this pull request Jan 2, 2025

Move unboxed records to stable #3419

Merged

rtjoa referenced this pull request Jan 3, 2025

Some final touches

25eadb9

rtjoa mentioned this pull request Jan 3, 2025

with-kinds #3284

Open

42 tasks

rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch 2 times, most recently from 5d2fea9 to 277b1d8 Compare January 6, 2025 13:22

goldfirere approved these changes Jan 9, 2025

View reviewed changes

typing/typedecl.ml Show resolved Hide resolved

rtjoa added 2 commits January 10, 2025 10:14

Check for type recursion without boxing (rebased)

7fb1a2c

Comment suggested by Richard

426888b

rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch from 277b1d8 to 426888b Compare January 10, 2025 15:21

rtjoa enabled auto-merge (squash) January 10, 2025 15:22

rtjoa merged commit c30ec74 into main Jan 10, 2025
24 checks passed

rtjoa deleted the rtjoa.check-type-recursion-without-indirection branch January 10, 2025 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for type recursion without boxing #3407

Check for type recursion without boxing #3407

rtjoa commented Dec 24, 2024 •

edited

Loading

goldfirere left a comment

goldfirere commented Dec 26, 2024

rtjoa left a comment •

edited

Loading

lpw25 commented Dec 29, 2024 •

edited

Loading

rtjoa commented Dec 30, 2024 •

edited

Loading

lpw25 commented Dec 30, 2024

rtjoa commented Dec 30, 2024 •

edited

Loading

lpw25 commented Dec 30, 2024

rtjoa commented Dec 30, 2024

lpw25 commented Dec 31, 2024

lpw25 commented Dec 31, 2024

rtjoa commented Dec 31, 2024 •

edited

Loading

rtjoa commented Dec 31, 2024

ccasin commented Jan 3, 2025 •

edited

Loading

rtjoa commented Jan 3, 2025 •

edited

Loading

goldfirere left a comment

Check for type recursion without boxing #3407

Check for type recursion without boxing #3407

Conversation

rtjoa commented Dec 24, 2024 • edited Loading

Summary

Specification

Examples

Alternatives

goldfirere left a comment

Choose a reason for hiding this comment

goldfirere commented Dec 26, 2024

rtjoa left a comment • edited Loading

Choose a reason for hiding this comment

lpw25 commented Dec 29, 2024 • edited Loading

rtjoa commented Dec 30, 2024 • edited Loading

lpw25 commented Dec 30, 2024

rtjoa commented Dec 30, 2024 • edited Loading

lpw25 commented Dec 30, 2024

rtjoa commented Dec 30, 2024

lpw25 commented Dec 31, 2024

lpw25 commented Dec 31, 2024

rtjoa commented Dec 31, 2024 • edited Loading

rtjoa commented Dec 31, 2024

ccasin commented Jan 3, 2025 • edited Loading

rtjoa commented Jan 3, 2025 • edited Loading

goldfirere left a comment

Choose a reason for hiding this comment

rtjoa commented Dec 24, 2024 •

edited

Loading

rtjoa left a comment •

edited

Loading

lpw25 commented Dec 29, 2024 •

edited

Loading

rtjoa commented Dec 30, 2024 •

edited

Loading

rtjoa commented Dec 30, 2024 •

edited

Loading

rtjoa commented Dec 31, 2024 •

edited

Loading

ccasin commented Jan 3, 2025 •

edited

Loading

rtjoa commented Jan 3, 2025 •

edited

Loading