Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

union, intersection and overlaps? for Ranges #15106

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

CTC97
Copy link
Contributor

@CTC97 CTC97 commented Oct 21, 2024

Apologies for the earlier PR from my forked master branch.

Based on the request from @jgaskins. Behaviors below:

a = 1..5
b = 3..7
a.union(b)        # => 1..7
a.intersection(b) # => 3..5

c = 1..10
d = 1...15
c.union(d)        # => 1...15
c.intersection(d) # => 1..10

e = 1...10
f = 1..15
e.union(f)        # => 1..15
e.intersection(f) # => 1...10

g = 1..5
h = 10..20
g.union(h)        # => 0..0
g.intersection(h) # => 0..0

(1...10).union(1..10)        # => 1..10
(1...10).intersection(1..10) # => 1...10

(1...10).union(10..20) # => 1..20

(1..5).union(5..10) # => 1..10

(1...10).intersection(7..12) # => 7...10
(2..10).intersection(0..8)   # => 2..8

(1...10).overlaps?(10..11)   # => false
(1..10).overlaps?(5..9) # => true

Fixes #14487
Fixes #14488
Fixes #10148


P.S., very excited to be making a small contribution to a lovely language.

Looking forward to review!

@jgaskins
Copy link
Contributor

This looks like a great start. A couple areas of focus:

  • Ranges can be nil on either end
  • It would be awesome to see some tests — this functionality can be super confusing and, honestly, I wouldn't be able to tell whether any implementation was correct when all edge cases are considered

src/range.cr Outdated
# ```
def union(other : Range)
if self.end < other.begin || other.end < self.begin
return 0..0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can get away with hardcoding a Range(Int32, Int32) specifically. My own use case for this functionality is actually Range(Time, Time), so I'd get compile-time errors with this return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and thank you for the quick review!

I'll get on these changes asap! ✌️⭐

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we revert from 0..0 to the originally suggested nil? I think it makes sense, but open to other options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using any of the four endpoints here instead of 0 should probably work too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me nil seems like a more logical return value than using any of the existing endpoints which might be confusing.

Totally open to your reasoning though!

Copy link
Contributor

@ysbaddaden ysbaddaden Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think the intersection of a list returning a nil vs an empty list: which feels more logical? Which is less of a burden?

Returning nil means that the methods always return a nilable type (Range(B, E) | Nil) and we must always check if the value is nil or not before we can do anything.

Returning an empty range means we always return a Range(B, E) that would act as a NOOP object, and if we need to know if the range is empty, we can call Range#empty?

Copy link
Contributor

@ysbaddaden ysbaddaden Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: 0..0 wasn't an empty range, it's a range with a single value (starts at 0 and terminates at 0 inclusive):

(0..0).to_a # => [0]
(0...0).to_a # => []

@Blacksmoke16 Blacksmoke16 changed the title union, intersection and overlaps? for Ranges union, intersection and overlaps? for Ranges Oct 21, 2024
@Blacksmoke16
Copy link
Member

Some specs would be a good idea as well 👍.

@jgaskins
Copy link
Contributor

Tests and specs? In this economy?

@CTC97
Copy link
Contributor Author

CTC97 commented Oct 22, 2024

Added some specs (edit: will go back and add tests for other types of Ranges) and went with the nil return as originally suggested in @jgaskins' issue.

On error handling - should I be adding any type handling in the functions themselves or is an error message like below sufficient?

(1..5).union(time_range)

---

In src/range.cr:263:25

 263 | if self.end < other.begin || other.end < self.begin
                           ^----
Error: expected argument #1 to 'Int32#<' to be Float32, Float64, Int128, Int16, Int32, Int64, Int8, Number, UInt128, UInt16, UInt32, UInt64 or UInt8, not Time

or

(time_range).union(1..5)

---

In src/range.cr:263:25

 263 | if self.end < other.begin || other.end < self.begin
                           ^----
Error: expected argument #1 to 'Time#<' to be Time, not Int32

@straight-shoota
Copy link
Member

I don't think we can do much about improving these error message anyways. That would require knowing for which types the comparison operator is defined, and we cannot know this. We mustn't restrict it to identical types because there are many examples where different typs are still comparable.

@CTC97
Copy link
Contributor Author

CTC97 commented Oct 23, 2024

Added some additional tests for other datatypes (let me know if there are any more that should be spec'd), but running into an interesting issue - detailed over one of the tests.

  # TODO: This test will fail - 
  #   In an integer setting this case is doable. We can simply check adjaceny with +-1.
  #   However, how do we define an adjacent range for other data types? 
  #     Examples of adjacency ambiguity:
  #       - Floats: Is 4.1's upper adjacent value 4.2 or 4.11, etc.?
  #       - Times: Is a time's adjacent value +1 second or +1 hour, etc.?
  # 
  #   How do we define adjacency?
  #   
  # (1..5).union(6..10).should eq(1..10)      # Adjacent ranges

Some of these intended adjacencies would depend on the user's use case, but I'm not sure there's a reliable way to allow the user that kind of customizability. Is there a standard we should follow?

@straight-shoota
Copy link
Member

straight-shoota commented Oct 23, 2024

Adjacency can only work for types with (meaningfully) discrete values. This should be indicated by #succ and #pred methods defined on the item type. Range#each (and #reverse_each) are based on that.
Technically, most (all?) comparable types have discrete steps in some form, but they're not always meaningful. Float#next_float and time + 1.nanosecond would return adjacent values, but those are defined in the representation and not very useful in practice.

So for types that define #succ or #pred, we could discover adjacency. For all others, to form a union the ranges must at least touch. This can match exclusive ranges though: (1.0...5.0).union(5.0..10.0).should eq(1.0..10.0). This would be the only case where two ranges with continuous values could form a valid union without overlap.

#
# How do we define adjacency?
#
# (1..5).union(6..10).should eq(1..10) # Adjacent ranges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the Range documentation you'd use #succ and #pred:

Ranges typically involve integers, but can be created using arbitrary objects as long as they define succ (or pred for #reverse_each), to get the next element in the range, and < and #==, to know when the range reached the end:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the Range documentation you'd use #succ and #pred:

Ranges typically involve integers, but can be created using arbitrary objects as long as they define succ (or pred for #reverse_each), to get the next element in the range, and < and #==, to know when the range reached the end:

Making this change this weekend. ✨

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Oct 23, 2024

What about some edge cases? For example empty ranges:

(1...1).intersect(0..10) #=> empty

src/range.cr Outdated Show resolved Hide resolved
@CTC97
Copy link
Contributor Author

CTC97 commented Oct 25, 2024

What about some edge cases? For example empty ranges:

(1...1).intersect(0..10) #=> empty

I can add a check for one of the ranges being empty (i.e. non-inclusive ranges where the ends are equal), but is it valid to return 1...1, which is an accurate representation of the intersection. (...is it accurate?)

src/range.cr Outdated Show resolved Hide resolved
Copy link
Contributor

@Sija Sija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil is implicit with no return value.

src/range.cr Outdated Show resolved Hide resolved
src/range.cr Outdated Show resolved Hide resolved
@CTC97
Copy link
Contributor Author

CTC97 commented Nov 5, 2024

Apologies for the untimely updates here - been a bit of a hectic few weeks, but will be jumping back into this asap! I have not forgotten! 😅

src/range.cr Outdated Show resolved Hide resolved
src/range.cr Outdated Show resolved Hide resolved
src/range.cr Outdated Show resolved Hide resolved
src/range.cr Outdated Show resolved Hide resolved
@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 14, 2024

I can add a check for one of the ranges being empty (i.e. non-inclusive ranges where the ends are equal), but is it valid to return 1...1, which is an accurate representation of the intersection. (...is it accurate?)

Well, 1...1 is empty, so I don't expect it to intersect with anything but another empty range. IMO it should return an empty range. Hence the suggestions to return an empty (whatever the chosen values) instead of nil. The question is whether returning whatever values can have some edge effects?

@CTC97
Copy link
Contributor Author

CTC97 commented Nov 16, 2024

Well, 1...1 is empty, so I don't expect it to intersect with anything but another empty range. IMO it should return an empty range. Hence the suggestions to return an empty (whatever the chosen values) instead of nil. The question is whether returning whatever values can have some edge effects?

I could imagine some edge effects, but only if the end user was comparing data to either the beginning or end of the empty range without considering context of the full object (i.e., both ends).

I don't think it's unreasonable to expect users to handle that sort of return if their use case could result that way, but I'm open to other thoughts.

I think I'm also a little unclear as to why nil isn't a valid return in such a case, but maybe I'm missing something.

@ysbaddaden
Copy link
Contributor

It's not that nil is an invalid value, but it's cumbersome. Anytime you intersect you MUST deal with a nilable, when an empty range would be a transparent NOOP.

@oprypin oprypin mentioned this pull request Dec 2, 2024
@CTC97
Copy link
Contributor Author

CTC97 commented Dec 4, 2024

Coming back to this - what do people think is a reasonable empty range to return in the edge cases discussed above? I'm not sure if we've reached a solid conclusion.

@CTC97
Copy link
Contributor Author

CTC97 commented Dec 6, 2024

I implemented empty ranges with the logic @straight-shoota describes in #10148, which describes a #clamp function (which is interchangeable with #intersect):

If both ranges are disparate ((-5..5).clamp(10..100)), the result should probably be an empty range. This can be expressed as an exclusive range with identical begin and end (x...x). But what should x be? Probably best to use the begin of the clamped range (-5...-5).

I think this is a good resolution as it logically makes sense and won't require the user to handle nil.

I've also added #10148 to this PR, but I think there's a case to be made for giving #intersection an alias #clamp, which users might find intuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants