Parallel Mapping using Celluloid
Celluloid Futures are wicked sweet, and when combined with a pmap
implementation AND a supervisor to keep the max threads down, you can be wicked
sweet too!
We use Celluloid Futures and Celluloid pool to execute blocks in parallel.
The pmap will return an array of values when all of the Futures have completed and return values (or return nil).
The pool can help to make sure you don't exceed your connection resources. A common use case for this is in Rails, you can easily exceed the default ActiveRecord connection size.
Tony Arcieri created celluloid, and the simple_pmap example from which this codebase started
I've used this implementation in several production systems over the last year. All complexity is with Celluloid (not 1.0 yet, but in my experience has been highly stable.)
Because I've been implementing the same initializer code in every project I've worked on for the last 6 months. It was time to take a stand, man.
- 1.9.3
- ruby-head (2.0)
- jruby-19mode
- jruby-head
- rbx-19mode
Add this line to your application's Gemfile:
gem 'celluloid-pmap'
Default usage will execute in parallel. Simply pass a block to an Enumerable (like an Array)
puts "You'll see the puts happen instantly, and the sleep in parallel"
[55,65,75,85].pmap{|limit| puts "I can't drive #{limit}!"; sleep(rand)}
Or something more real-world?
User.active.all.pmap do |user|
stripe_user = Stripe::Customer.retrieve user.stripe_customer_token
user.invoices = BuildsInvoicesFromStripeUser.build(stripe_user)
user.save
end
Problem: When using with ActiveRecord, you can quickly run out of connections.
Answer: Specify the max number of threads (actors) to create at once!
puts "You should see two distinct groups of timestamps, 3 seconds apart"
puts [1,2,3].pmap(2){|speed_limit| puts Time.now.tap { sleep(3) }}
=> You should see two distinct groups of timestamps, 3 seconds apart
2013-01-29 21:15:01 -0600
2013-01-29 21:15:01 -0600
2013-01-29 21:15:04 -0600
We default pmap's threads to the number of Celluloid cores in the system.
When you need the response right away. (well, right away in the workflow sense). This is crazy good in IRB too. Destroying multiple records in parallel is nice.
- When the blocks are IO bound (like database or web queries)
- When you're running JRuby or Rubinius
- When you're running C Extensions
- Pure math or ruby computations*
*except if you're on JRuby or Rubinius, where this will still speed those along quite nicely.
Ben Scheirman (@subdigital) originally used the awesome Celluloid He-Man image in a presentation on background workers. "He-Man and the Masters of the Universe," AND "She-Ra: Princess of Power" are copyright Mattel.
More information on He-Man can be found at the unspeakably wow site: http://castlegrayskull.org
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request