Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug: binding of do/end blocks is surprising and doesn't match the documentation #15303

Open
keidax opened this issue Dec 21, 2024 · 1 comment

Comments

@keidax
Copy link
Contributor

keidax commented Dec 21, 2024

Background

I started digging into this issue while working on a tree-sitter grammar for Crystal. I created a forum post. Based on the responses there, a GitHub issue seems like the next step.

My intent by opening this issue is:

  • understand how much of the current behavior is intentional
  • discuss whether the behavior should be changed

Many of the examples below could be "fixed", or made to behave more predictably, simply by adding parentheses. However, I would like to see the Crystal grammar defined more precisely, not just find a working code snippet.

Do/end blocks in Crystal

I've noticed several surprising ways in which the Crystal parser handles do/end blocks. I say surprising to mean different from Ruby, or unexplained by the documentation.

The documentation for do/end blocks says:

The difference between using do ... end and { ... } is that do ... end binds to the left-most call, while { ... } binds to the right-most call

All the examples below use these methods:

Crystal prelude
# All of these methods are defined twice, once with a block and once without.
# This allows us to see which method a block binds to, without compiler errors.

def a(*args)
  puts "a did not receive block"
end
def a(*args)
  puts "a received block"
  yield
end

def b(*args)
  puts "b did not receive block"
end
def b(*args)
  puts "b received block"
  yield
end

def c(*args)
  puts "c did not receive block"
end
def c(*args)
  puts "c received block"
  yield
end

def d(*args)
  puts "d did not receive block"
end
def d(*args)
  puts "d received block"
  yield
end

def e(*args)
  puts "e did not receive block"
end
def e(*args)
  puts "e received block"
  yield
end

And here's an equivalent set of methods in Ruby that optionally accept a block:

Ruby prelude
def a(*args)
  if block_given?
    puts "a received block"
    yield
  else
    puts "a did not receive block"
  end
end

def b(*args)
  if block_given?
    puts "b received block"
    yield
  else
    puts "b did not receive block"
  end
end

def c(*args)
  if block_given?
    puts "c received block"
    yield
  else
    puts "c did not receive block"
  end
end

def d(*args)
  if block_given?
    puts "d received block"
    yield
  else
    puts "d did not receive block"
  end
end

def e(*args)
  if block_given?
    puts "e received block"
    yield
  else
    puts "e did not receive block"
  end
end

Surprise One

The first surprising behavior is that do/end blocks don't bind to the left-most call! They bind to the second-to-the-right call:

a b c d e do
  puts "in block"
end

## Outputs ##
# e did not receive block
# d received block
# in block
# c did not receive block
# b did not receive block
# a did not receive block

In this example, the documentation implies that the block should bind to a, but it actually binds to d.

Compared to Ruby

In Ruby, the do/end block actually does bind to the left-most call:

a b c d e do
  puts "in block"
end

## Outputs ##
# e did not receive block
# d did not receive block
# c did not receive block
# b did not receive block
# a received block
# in block

Surprise Two

The next surprising behavior is that multiple do/end blocks may be passed to one chain of method calls. The blocks bind right-to-left, starting from the second-to-the-right call. And do/end blocks may be mixed with {} blocks:

# Newlines added for clarity, the example works the same with newlines removed
a b c d e do
  puts "in block 1"
end do
  puts "in block 2"
end {
  puts "in block 3"
} do
  puts "in block 4"
end

## Outputs ##
# e did not receive block
# d received block
# in block 1
# c received block
# in block 2
# b received block
# in block 3
# a received block
# in block 4

The documentation does not explain this behavior.

Compared to Ruby

Ruby allows at most one {} block and one do/end block per call chain.

a b c d e { puts "in block 1" } do puts "in block 2" end

## Outputs ##
# e received block
# in block 1
# d did not receive block
# c did not receive block
# b did not receive block
# a received block
# in block 2

Surprise Three

Simply adding non-block arguments can change the block binding:

a b c d e 1, 2 do
  puts "in block"
end

## Outputs ##
# e received block
# in block
# d did not receive block
# c did not receive block
# b did not receive block
# a did not receive block

Just by adding some positional arguments to e, the block now binds to e instead of d!

Compared to Ruby

Adding positional arguments doesn't change the block binding in Ruby.

a b c d e 1, 2 do
  puts "in block"
end

## Outputs ##
# e did not receive block
# d did not receive block
# c did not receive block
# b did not receive block
# a received block
# in block

Surprise Four

This surprise was pointed out by @straight-shoota in the forum post. Adding parentheses in the method call chain changes the block binding, in unexpected ways.

First, what happens if we give d some more arguments?

a b c d 1, 2, 3, e  do
  puts "in block"
end

## Outputs ##
# e did not receive block
# d received block
# in block
# c did not receive block
# b did not receive block
# a did not receive block

The block binds to d, as we now expect from Surprise One. But what if we wrapped one of those integers in a harmless pair of parentheses?

a b c d 1, (2), 3, e  do
  puts "in block"
end

## Outputs ##
# e received block
# in block
# d did not receive block
# c did not receive block
# b did not receive block
# a did not receive block

Just like Surprise Three, the block changes binding from d to e!

Compared to Ruby

Wrapping arguments in parentheses doesn't change the block binding in Ruby.

a b c d 1, (2), 3, e  do
  puts "in block"
end

## Outputs ##
# e did not receive block
# d did not receive block
# c did not receive block
# b did not receive block
# a received block
# in block

Summary

  • Surprise One confuses me. The behavior as described above deviates from the Crystal documentation, and from Ruby's behavior. But it's also been this way a long time, and I couldn't find any other issues or forum posts about it. Apparently this doesn't impact a lot of people. Personally I think Ruby's behavior is much more intuitive, but switching to that behavior would be a breaking change.

  • Surprise Two follows logically from Surprise One. I think the ability to pass multiple {} blocks is a superset of what's possible in Ruby. It's not the most readable pattern, but I would be satisfied if this was documented as valid Crystal syntax:

    a b c { "block for c" } { "block for b" } { "block for a" }
  • Surprises Three and Four look like parser bugs to me. I can't think of any good reason why the current behavior should be preferred, other than backwards compatibility.

Versions

I mainly produced these examples with Crystal 1.14.0. I also tested them with older versions, going back to 1.4.0, and the behavior seems consistent.

The Ruby examples were tested with Ruby 3.3.6.

@crysbot
Copy link

crysbot commented Dec 21, 2024

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/do-end-block-behavior-differs-from-ruby/7545/6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants