Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 2: ExecutionContext #15302

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

ysbaddaden
Copy link
Contributor

Here finally comes the first draft pull request for RFC 2 extracted from the execution_context shard.

Status

All three execution contexts have been implemented:

  • ExecutionContext::SingleThreaded to create a concurrent only context (the fibers run on a single thread but will run in parallel to other fibers in other contexts); this is the default context (unless you specify -Dmt).

  • ExecutionContext::MultiThreaded to create a concurrent+parallel context with work stealing (fibers in the context may be resumed by any thread); this is the default context is you specify -Dmt.

  • ExecutionContext::Isolated to run a single fiber in a dedicated thread (no concurrency, no parallelism), while still being able to communicate with other fibers in other contexts normally (Channel, Mutex, ...), doing IO operations, or spawning fibers (transparently to another context).

Both the single and multi threaded contexts share the same queues and overall logic, but with different optimizations. The isolated context doesn't need any queues and relies on a special loop. It's the only context that can shutdown for the time being (we may implement cooperation shutdown or shrinking/growing the MT context).

Alongside the execution contexts, a monitoring thread is running, for the moment limited to collecting fiber stacks regularly, but shall evolve (in subsequent pull requests) to handle much more situations by monitoring the execution contexts. For example the shard has a proof-of-concept for a cooperative yield for fibers that have been running for too time (may be checked at cancellation points or manually in CPU heavy loops). See the TODO in monitor.cr for more exciting ideas.

Stability

Ovber the development of the shard, the schedulers have proved hard to fix of race conditions, though all known races have been squashed, and the schedulers have proved to be quite stable (and fast).

So far both the ST and MT contexts can run the crystal std specs... save for:

  • MT: some sporadic segfaults 😭 maybe because of the same-thread fiber assumption being broken, or maybe an issue with threads starting up or shutdown down in parallel to GC collections.
  • ST: a GC bug where the GC will sometimes enter an infinite loop trying to allocate a large object (see Infinite loop when trying to allocate large object (v8.2.8) ivmai/bdwgc#691), the issue can't be reproduced with GC 8.3 (unreleased).

Usage

The feature is opt-in for the time being.

You must compile your application with both the -Dexecution_context to use the ExecutionContext schedulers, and -Dpreview_mt compile time flags.

Notes

A number of individual commits peripheral to the current feature have already been extracted into individual pull requests. The ones that haven't are because they may still change as this PR continues to evolve or may be dropped (e.g. Thread::WaitGroup).

- Avoid spaces in fiber names (easier to columnize)
- Support more types (Bool, Char, Time::Span)
Writes a message to a growable, in-memory buffer, before writing to
STDERR in a single write (possibly atomic, depending on PIPE_BUF),
instead of having many writes to the IO which will be intermingled with
other writes and be completely unintelligible.
Allows to change the Thread#name property without affecting the system
name used by the OS, which affect ps, top, gdb, ...

We usually want to change the system name, except for the main thread.
If we name the thread "DEFAULT" then ps will report our process as
"DEFAULT" instead of "myapp" (oops).
Also adds `Thread.each(&)` and `Fiber.each(&)`.
Simple abstraction on top of a mutex and condition variable to
synchronize the execution of a set of threads.
In a MT environment the main thread's fiber may be resumed by any
thread, which may terminate the program, and would return from another
thread that the process' main thread, which may be

This patches instead explicitly exits from `main` and `wmain`.

For backward compatibility reason (win32's `wmain` and wasi's
`__main_argc_argv` call `main`), the default `main` still returns, and
is replaced for UNIX targets.
- Add the `ExecutionContext` module;
- Add the `ExecutionContext::Scheduler` module;
- Add the `execution_context` compile-time flag.

When the `execution_context` flag is set:

- Don't load `Crystal::Scheduler`;
- Plug `ExecutionContext` instead of `Crystal::Scheduler`.
Introduces the first EC scheduler that runs in a single thread. Uses the
same queues (Runnables, GlobalQueue) as the multi-threaded scheduler
that will come next. The Runnables local queue could be simplified (no
parallel accesses, hence no need for atomics) at the expense of
duplicating the implementation.

The scheduler doesn't need to actively park the thread, since the event
loops always block (when told to), even when they are no events, which
acts as parking the thread.
Reduces discrepancies with the IOCP::FiberEvent and fixes a couple
issues:

1. No need to tie the event to a specific event loop;
2. Clear wake_at _after_ dequeueing the timer (MT bug).
ExecutionContext will change the thread#name property of the main thread
but won't set the system name. The spec doesn't make the difference and
we currently have no mean to get the system name.
Introduces the second EC scheduler that runs in multiple threads. Uses
the thread-safe queues (Runnables, GlobalQueue).

Contrary to the ST scheduler, the MT scheduler needs to actively park
the thread in addition to waiting on the event loop, because only one
thread is allowed to run the event loop.
Introduces the last EC scheduler that runs a single fiber in a single
thread. Contrary to the other schedulers, concurrency is disabled.

Like the ST scheduler, the scheduler doesn't need to actively park the
thread and merely waits on the event loop.
The method is called from IO::FileDescriptor and Socket finalizers,
which means they can be run from any thread during GC collections, yet
calling an instance method means accessing the current event loop, which
may have not been instantiated yet for the thread.

Fix: replace typeof for backend_class accessor

The API won't change if we start having two potential event loop
implementations in a single binary (e.g. io_uring with a fallback to
epoll).
Copy link

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very cool. I’m excited for this! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

2 participants