Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for type recursion without boxing #3407

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

rtjoa
Copy link
Contributor

@rtjoa rtjoa commented Dec 24, 2024

Summary

We only allow recursion in unboxed types to occur through boxes, otherwise the type is uninhabitable and usually also infinite-size.

Because check_well_founded already ruled out recursion through structural types (in our case, we care about unboxed tuples and aliases), we just look for a cycle in nominal unboxed types ([@@unboxed] types and unboxed records), tracking the set of seen paths. This gives an algorithm that is simpler than check_well_founded (whose correctness relies on some subtle properties of Tconstr's abbrev_memo).

We newly disallow recursive singleton [@@unboxed] types (type t = { t : t } [@@unboxed]), which are uninhabitable anyway.

Specification

For each group of mutually recursive type declarations, we define the following "type contains" transitive relation on type expressions:

  • Unboxed records and variants defined in the group contain their fields
    • If type 'a t = #{ ...; lbl : u; ... }, type 'a t = { lbl : u } [@@unboxed], or type 'a t = U of u [@@unboxed] is in the recursive group, then 'a t contains u
  • Abbreviations defined in the group contain their expansions
    • If type 'a t = u is in the recursive group then 'a t contains u
  • Types not in the recursive group contain the parameters indicated by their layout
    • 'a t contains 'a if layout_of 'a or any occurs in 'a t's layout
    • We don't yet have layout_of, so currently only consider any
  • Unboxed tuples contain their components: #(u_1 * u_2 * ...) contains all u_i

('a is a metavariable for zero or more type parameters; t is a metavariable for type constructors; u is a metavariable for type expressions)

If a path starting from the type expression on the LHS of a declaration contains two types with the same head type constructor, and that repeated type is an unboxed record or variant, then we give an error.

Examples

type t = #{ t : t list }
(* Allowed, [t list] is not in this recursive group, and
   has layout [value] so it doesn't contain anything *)

type t = #{ x : #(int * u) }
and u = T of t [@@unboxed]
(* Banned, t -> #(int * u) -> u -> t *)

type 'a id = #{ a : 'a }
type t = int id id
(* Allowed, [int id] is not in this recursive group, and
   has layout [value] so it doesn't contain anything *)

type 'a id = #{ a : 'a }
and t = int id id
(* Sadly, banned. t -> int id id -> int id; id appears twice *)

See typing-layouts-unboxed-records/recursive.ml for more.

Alternatives

Layout variables. We could introduce layout variables (for this check only), finding infinite-size types via the occurs check. Such an implementation might be more principled, accepting finite-size recursive types like type t = #{ t : t }. The current implementation gives more informative error traces, and leaves the flexiblity to switch to a less restrictive check in the future.

The algorithm from the Unboxed data constructors paper. This would accept types like type 'a id = #{ a : 'a } and t = int id id. I think this could be an appealing final solution, but some details are a bit subtle. See discussion on ocaml/ocaml#10479.

Generalize check_well_founded. My first implementation, which roughly generalizes check_well_founded over the "type contains" relation, accepts type 'a id = #{ a : 'a } and t = int id id. (On rtjoa.check-well-founded-unboxed-recursion, unpolished.) I'd hesitate to increase our reliance on check_well_founded, which I believe roughly implements the same algorithm as the paper above in a tricky way.


Review: @goldfirere

@rtjoa rtjoa requested a review from ccasin December 24, 2024 16:34
@rtjoa rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch 4 times, most recently from 8b38c59 to eb42fd2 Compare December 24, 2024 16:57
@rtjoa rtjoa requested review from goldfirere and removed request for ccasin December 24, 2024 17:34
@rtjoa rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch 2 times, most recently from 83ae0ff to 2247db3 Compare December 25, 2024 02:52
Copy link
Collaborator

@goldfirere goldfirere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite convinced here. This mostly works. But it misses some cases (so it's not fully sound) and catches some cases it shouldn't (so it's not fully complete). The old check -- reporting when e.g. value & (value & value) doesn't subjkind with value & value is still in place, so the lack of soundness doesn't matter all that much. And maybe the lack of completeness doesn't bite in practice? But given these drawbacks, and the fact that the old check worked (but with a somewhat confusing error message), I think I lean mildly against merging.

testsuite/tests/letrec-check/unboxed.ml Show resolved Hide resolved
typing/ctype.mli Outdated Show resolved Hide resolved
typing/ctype.mli Outdated Show resolved Hide resolved
typing/ctype.mli Outdated Show resolved Hide resolved
typing/typedecl.ml Outdated Show resolved Hide resolved
typing/ctype.mli Outdated Show resolved Hide resolved
@goldfirere
Copy link
Collaborator

Actually, maybe I'm wrong about "mildly against merging". I agree that this improves the error message in the majority case, and that's pretty cool. And the implementation is simple enough. But at the least I'd leave an internal ticket about revisiting this in the future.

@rtjoa rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch from c5fc9cf to 19f511f Compare December 27, 2024 00:27
Copy link
Contributor Author

@rtjoa rtjoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldfirere Thanks for the helpful review.

Here are my goals for this check:

  1. Improve the programming experience for unboxed records
  2. Prohibit all infinite-size types (types that would have a cyclic layout if we had layout variables without an occurs check)

It's because we are sound but not complete w.r.t. "no infinite-size types" that the error message complains about a stronger property, "no recursion without boxing," (it would be confusing to see a "no infinite-size types" error on type t = #{ t : t }).

So I'd agree that w.r.t. the stronger property of "no recursion without boxing," we are both unsound (your example with type recursion hidden behind an abstract signature, and your recursive module example) and incomplete (int good good). But I think that this is okay: the stronger property is only mentioned as it's a better error message.

I agree we'll eventually want to be more complete (e.g. allow int good good) - I'll create an internal ticket corresponding to the CR above check_unboxed_recursion.

typing/ctype.mli Outdated Show resolved Hide resolved
@lpw25
Copy link
Collaborator

lpw25 commented Dec 29, 2024

It is a little unclear from the description what the definition of recursion is. Does the new check still allow more useful things like:

type t = { t : t list } [@@unboxed]

What about if list where abstract?

@rtjoa
Copy link
Contributor Author

rtjoa commented Dec 30, 2024

@lpw25 The new check would allow that, including if list is abstract. A description of what exactly we allow (will also update comments):

For each group of mutually recursive type declarations, we define the following "type contains" relation on type expressions:

  • Unboxed records and variants defined in the group contain their fields
    • If type 'a t = #{ ...; lbl : u; ... }, type 'a t = { lbl : u } [@@unboxed], or type 'a t = U of u [@@unboxed] is in the recursive group, then 'a t contains u
  • Abbreviations defined in the group contain their expansions
    • If type 'a t = u is in the recursive group then 'a t contains u
  • Types not in the recursive group are treated as abstract
    • 'a t contains 'a if layout_of 'a occurs in 'a t's layout
    • For now, as we don't have layout_of, types not in the recursive group are skipped because they all have a finite-size layout
  • Unboxed tuples contain their components: #(u_1 * u_2 * ...) contains all u_i
  • The relation is transitive

('a is a metavariable for zero or more type parameters; t is a metavariable for type constructors; u is a metavariable for type expressions)

If a path starting from the type expression on the LHS of a declaration contains two types with the same head type constructor, and that repeated type is an unboxed record or variant, then we give an error.

Examples (@goldfirere - the third is newly allowed with a bug fix):

type t = #{ t : t list }
(* Allowed, [t list] is not in this recursive group, and
   has layout [value] so it doesn't contain anything *)

type t = #{ x : #(int * u) }
and u = T of t [@@unboxed]
(* Banned, t -> #(int * u) -> u -> t *)

type 'a id = #{ a : 'a }
type t = int id id
(* Allowed, [int id] is not in this recursive group, and
   has layout [value] so it doesn't contain anything *)

type 'a id = #{ a : 'a }
and t = int id id
(* Sadly, banned. t -> int id id -> int id; id appears twice *)

@rtjoa rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch from d3459e6 to 3b9c7e7 Compare December 30, 2024 08:54
@lpw25
Copy link
Collaborator

lpw25 commented Dec 30, 2024

types not in the recursive group are skipped because they all have a finite-size layout

Presumably this is not the case for something like:

type ('a : any) t : any

is an any type just rejected directly currently?

@rtjoa
Copy link
Contributor Author

rtjoa commented Dec 30, 2024

is an any type just rejected directly currently?

Currently, an any type is rejected because all unboxed record/variant fields must be representable.

We might be concerned about programs like the following once we both drop this restriction and add layout_of:

module Id : sig
  type ('a : any) t : any
end = struct
  type ('a : any) t = 'a
end

type t = #{ t : t Id.t }

But I think it will be okay to accept this, since the type also won't be visibly cyclic to constrain_type_jkind (which is the cause of the bad errors this check is trying to avoid).


Another example worth thinking about is the following, once we have layout_of:

type 'a id : any = 'a
type t = #{ t : t id }

Speculatively: I think the check will reject this (when updated appropriately for layout_of) because we'll give id the layout layout_of 'a. We won't give id layout any because the only way jkind annotations can make a layout less specific is by preventing a sort variable from being defaulted to value.

@lpw25
Copy link
Collaborator

lpw25 commented Dec 30, 2024

But I think it will be okay to accept this, since the type also won't be visibly cyclic to constrain_type_jkind (which is the cause of the bad errors this check is trying to avoid).

What about if your Id module were a functor parameter?

@rtjoa
Copy link
Contributor Author

rtjoa commented Dec 30, 2024

But I think it will be okay to accept this, since the type also won't be visibly cyclic to constrain_type_jkind (which is the cause of the bad errors this check is trying to avoid).

What about if your Id module were a functor parameter?

If the functor parameter abstracts t as an any type, I think it should work similarly to the first example from my last comment: we'll accept it, but it should be fine since the type won't be visibly cyclic to constrain_type_jkind. (Another perspective: we want to accept this program, as rejecting it would break the abstraction boundary.)

module type S = sig
  type ('a : any) t : any
end

module F(Id : S) = struct
  type t = #{ t : t Id.t }
end

module Id : S = struct
  type ('a : any) t = 'a
end

module M = F(Id)

If the functor parameter exposes t's definition or abstracts t as a type with layout_of 'a, it should work similarly to the second example from my last comment: we'll reject F.t by considering t Id.t to contain t via its layout_of layout.

module type S = sig
  type ('a : any) t = 'a (* changed line *)
end

module F(Id : S) = struct
  type t = #{ t : t Id.t }
end
(* Error: the definition of t contains recursion without boxing
     t contains t Id.t
     t Id.t contains t
*)

Does this sound right?

@lpw25
Copy link
Collaborator

lpw25 commented Dec 31, 2024

My concern was with your M.t above. It is cyclic in the way you are worried about and visibly so. Will that cause problems for constrain_type_jkind?

@lpw25
Copy link
Collaborator

lpw25 commented Dec 31, 2024

Actually, your M.t isn’t quite the example I had in mind: you need to drop the : S from the definition of the Id module to make the problematic case.

@rtjoa
Copy link
Contributor Author

rtjoa commented Dec 31, 2024

I see the problem now, thanks. Current best guess for the fix: we consider a type not in the recursive group whose layout contains any to contain all of its type parameters.

@rtjoa
Copy link
Contributor Author

rtjoa commented Dec 31, 2024

Current best guess for the fix: we consider a type not in the recursive group whose layout contains any to contain all of its type parameters.

I made this change to future-proof against unboxed records with unrepresentable layouts, and refactored so we can more easily support layout_of later.

@ccasin
Copy link
Contributor

ccasin commented Jan 3, 2025

I see the problem now, thanks. Current best guess for the fix: we consider a type not in the recursive group whose layout contains any to contain all of its type parameters.

I thought about it briefly and didn't come up with any examples where this heuristic misses anything or is overly conservative in a way that made me very sad (though it is certainly conservative).

I propose we (1) have Ryan check that this isn't so conservative that it rejects anything in our code base and then (2) merge this PR, unless @lpw25 can come up with another counter example. It will be much easier to figure out whether a better solution exists once we start working on allowing non-values in records and on layout_of, so as long as we're conservative I think we shouldn't invest a ton of time in perfecting this.

rtjoa referenced this pull request Jan 3, 2025
@rtjoa rtjoa mentioned this pull request Jan 3, 2025
42 tasks
@rtjoa
Copy link
Contributor Author

rtjoa commented Jan 3, 2025

(1) have Ryan check that this isn't so conservative that it rejects anything in our code base

The check doesn't seem to reject anything:

I rebased this branch onto 5.2.0minus-4 as rtjoa.5.2.0minus-4-with-unboxed-recursion-check, very slightly modifying it because 5.2.0minus-4 doesn't have unboxed records. The check still works the same - many of the tests could be adapted to [@@unboxed] records, see 7e2da96 and 31b062e. I then used this to build our code base (see with-unboxed-recursion-check internally).

@rtjoa rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch 2 times, most recently from 5d2fea9 to 277b1d8 Compare January 6, 2025 13:22
Copy link
Collaborator

@goldfirere goldfirere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks all good from my perspective. I still think we'll eventually switch to use layout variables, but there's no reason not to merge this in the meantime.

typing/typedecl.ml Show resolved Hide resolved
@rtjoa rtjoa force-pushed the rtjoa.check-type-recursion-without-indirection branch from 277b1d8 to 426888b Compare January 10, 2025 15:21
@rtjoa rtjoa enabled auto-merge (squash) January 10, 2025 15:22
@rtjoa rtjoa merged commit c30ec74 into main Jan 10, 2025
24 checks passed
@rtjoa rtjoa deleted the rtjoa.check-type-recursion-without-indirection branch January 10, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants