Why does CSP use a synchronous execution model? #203
-
I have been studying the CSP documentation and was wondering about the design motivations for the execution model. Many engines like Flink, Heron etc rely on an asynchronous execution model and use watermarks to mark barriers for synchronization. CSP uses a synchronous model instead. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Can you elaborate on what you mean by "asynchronous" execution? Are you referring to the fact that the csp engine is single-threaded? |
Beta Was this translation helpful? Give feedback.
-
Also, Flink uses watermarks to handle out-of-order events, since per their docs: "When it comes to supporting event time, Flink’s streaming runtime builds on the pessimistic assumption that events may come out-of-order, i.e. an event with timestamp t may come after an event with timestamp t+1." Since |
Beta Was this translation helpful? Give feedback.
Parallel or distributed
csp
still remains a topic of discussion and an area for future growth.The primary reason that the engine runs on a single-thread is GIL constraints. Since users can write pure Python nodes which will invoke the GIL, multithreading node execution gets hairy. Rank-level parallelization (what you are suggesting, passing nodes off into a thread pool) will only work to our advantage if all node are using a C++ implementation.
A secondary reason is that even with the C++ nodes, many are very quick computations, so the overhead of maintaining the thread pool/synchronization is more than the node itself. For example,
baselib
nodes likesample
,merge
etc. are all extremely…