RFC 2: ExecutionContext #15302

ysbaddaden · 2024-12-20T19:37:45Z

Here finally comes the first draft pull request for RFC 2 extracted from the execution_context shard.

Status

All three execution contexts have been implemented:

ExecutionContext::SingleThreaded to create a concurrent only context (the fibers run on a single thread but will run in parallel to other fibers in other contexts); this is the default context (unless you specify -Dmt).
ExecutionContext::MultiThreaded to create a concurrent+parallel context with work stealing (fibers in the context may be resumed by any thread); this is the default context is you specify -Dmt.
ExecutionContext::Isolated to run a single fiber in a dedicated thread (no concurrency, no parallelism), while still being able to communicate with other fibers in other contexts normally (Channel, Mutex, ...), doing IO operations, or spawning fibers (transparently to another context).

Both the single and multi threaded contexts share the same queues and overall logic, but with different optimizations. The isolated context doesn't need any queues and relies on a special loop. It's the only context that can shutdown for the time being (we may implement cooperation shutdown or shrinking/growing the MT context).

Alongside the execution contexts, a monitoring thread is running, for the moment limited to collecting fiber stacks regularly, but shall evolve (in subsequent pull requests) to handle much more situations by monitoring the execution contexts. For example the shard has a proof-of-concept for a cooperative yield for fibers that have been running for too time (may be checked at cancellation points or manually in CPU heavy loops). See the TODO in monitor.cr for more exciting ideas.

Stability

Ovber the development of the shard, the schedulers have proved hard to fix of race conditions, though all known races have been squashed, and the schedulers have proved to be quite stable (and fast).

So far both the ST and MT contexts can run the crystal std specs... save for:

MT: some sporadic segfaults 😭 maybe because of the same-thread fiber assumption being broken, or maybe an issue with threads starting up or shutdown down in parallel to GC collections.
ST: a GC bug where the GC will sometimes enter an infinite loop trying to allocate a large object (see Infinite loop when trying to allocate large object (v8.2.8) ivmai/bdwgc#691), the issue can't be reproduced with GC 8.3 (unreleased).

Usage

The feature is opt-in for the time being.

You must compile your application with both the -Dexecution_context to use the ExecutionContext schedulers, and -Dpreview_mt compile time flags.

Notes

A number of individual commits peripheral to the current feature have already been extracted into individual pull requests. The ones that haven't are because they may still change as this PR continues to evolve or may be dropped (e.g. Thread::WaitGroup).

- Avoid spaces in fiber names (easier to columnize) - Support more types (Bool, Char, Time::Span)

Writes a message to a growable, in-memory buffer, before writing to STDERR in a single write (possibly atomic, depending on PIPE_BUF), instead of having many writes to the IO which will be intermingled with other writes and be completely unintelligible.

Allows to change the Thread#name property without affecting the system name used by the OS, which affect ps, top, gdb, ... We usually want to change the system name, except for the main thread. If we name the thread "DEFAULT" then ps will report our process as "DEFAULT" instead of "myapp" (oops).

Also adds `Thread.each(&)` and `Fiber.each(&)`.

Simple abstraction on top of a mutex and condition variable to synchronize the execution of a set of threads.

In a MT environment the main thread's fiber may be resumed by any thread, which may terminate the program, and would return from another thread that the process' main thread, which may be This patches instead explicitly exits from `main` and `wmain`. For backward compatibility reason (win32's `wmain` and wasi's `__main_argc_argv` call `main`), the default `main` still returns, and is replaced for UNIX targets.

- Add the `ExecutionContext` module; - Add the `ExecutionContext::Scheduler` module; - Add the `execution_context` compile-time flag. When the `execution_context` flag is set: - Don't load `Crystal::Scheduler`; - Plug `ExecutionContext` instead of `Crystal::Scheduler`.

Introduces the first EC scheduler that runs in a single thread. Uses the same queues (Runnables, GlobalQueue) as the multi-threaded scheduler that will come next. The Runnables local queue could be simplified (no parallel accesses, hence no need for atomics) at the expense of duplicating the implementation. The scheduler doesn't need to actively park the thread, since the event loops always block (when told to), even when they are no events, which acts as parking the thread.

Reduces discrepancies with the IOCP::FiberEvent and fixes a couple issues: 1. No need to tie the event to a specific event loop; 2. Clear wake_at _after_ dequeueing the timer (MT bug).

ExecutionContext will change the thread#name property of the main thread but won't set the system name. The spec doesn't make the difference and we currently have no mean to get the system name.

Introduces the second EC scheduler that runs in multiple threads. Uses the thread-safe queues (Runnables, GlobalQueue). Contrary to the ST scheduler, the MT scheduler needs to actively park the thread in addition to waiting on the event loop, because only one thread is allowed to run the event loop.

Introduces the last EC scheduler that runs a single fiber in a single thread. Contrary to the other schedulers, concurrency is disabled. Like the ST scheduler, the scheduler doesn't need to actively park the thread and merely waits on the event loop.

The method is called from IO::FileDescriptor and Socket finalizers, which means they can be run from any thread during GC collections, yet calling an instance method means accessing the current event loop, which may have not been instantiated yet for the thread. Fix: replace typeof for backend_class accessor The API won't change if we start having two potential event loop implementations in a single binary (e.g. io_uring with a fallback to epoll).

Qard

Looks very cool. I’m excited for this! 🚀

ysbaddaden added 20 commits December 20, 2024 19:32

Improve Crystal::Tracing

d618400

- Avoid spaces in fiber names (easier to columnize) - Support more types (Bool, Char, Time::Span)

Add Crystal.print_error_buffered

b1c80b0

Writes a message to a growable, in-memory buffer, before writing to STDERR in a single write (possibly atomic, depending on PIPE_BUF), instead of having many writes to the IO which will be intermingled with other writes and be completely unintelligible.

Add Thread::LinkedList#each to safely iterate lists

981fa2e

Also adds `Thread.each(&)` and `Fiber.each(&)`.

Add Thread::WaitGroup

cec007c

Simple abstraction on top of a mutex and condition variable to synchronize the execution of a set of threads.

Add thread safety to Fiber::StackPool

2e51518

Add Fiber::Queue singly-linked LIFO queue

818152a

Add Runnables and GlobalQueue for schedulers to keep fibers

40127fc

Alt EventLoop#run(queue*, blocking) method

31268cf

Fix: EventLoop::Polling::FiberEvent

8fd364f

Reduces discrepancies with the IOCP::FiberEvent and fixes a couple issues: 1. No need to tie the event to a specific event loop; 2. Clear wake_at _after_ dequeueing the timer (MT bug).

Fix: don't ask for main thread name in spec

462f963

ExecutionContext will change the thread#name property of the main thread but won't set the system name. The spec doesn't make the difference and we currently have no mean to get the system name.

Win32: use isolated execution context instead of bare thread

1831886

std specs: use isolate execution context instead of bare thread

18b679b

WIP: add monitor thread

25d9fd0

ysbaddaden added kind:feature topic:multithreading labels Dec 20, 2024

ysbaddaden self-assigned this Dec 20, 2024

Qard reviewed Dec 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC 2: ExecutionContext #15302

RFC 2: ExecutionContext #15302

ysbaddaden commented Dec 20, 2024

Qard left a comment

RFC 2: ExecutionContext #15302

Are you sure you want to change the base?

RFC 2: ExecutionContext #15302

Conversation

ysbaddaden commented Dec 20, 2024

Qard left a comment

Choose a reason for hiding this comment