Skip to content

Declarative concurrency and stream processing library for Scala

License

Notifications You must be signed in to change notification settings

MateuszKowalewski/libretto

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Libretto

Maven Central

Declarative concurrency and stream processing library for Scala.

Motivation

... or why another concurrency and stream processing library.

Libretto grew out of frustration with existing libraries. Here is an attempt to summarize the pain points.

  • Dissatisfaction with effects being a prerequisite for concurrency.

    Existing libraries tend to build concurrency on top of effects: completing a Promise is an effect, spawning an actor or a fiber is an effect, enqueueing a message in a queue is an effect.

    Libretto rejects the assumption that effects are a prerequisite for concurrency.

    Although stream processing libraries do provide some high-level operations for declarative concurrency that abstract away from the underlying effects, we haven't come across a library with a set of high-level operations that is expressive enough for a moderately complex application.

  • Underdelivery on the promise of writing mere descriptions or blueprints of programs.

    Substantial parts of such "descriptions" are opaque Scala functions.

    Moreover, even in an IO monad-style program, we still manipulate live objects with identities, such as fibers, mutable references, queues, "pre-materialized" blueprints, ...

    That does not fit our idea of writing mere descriptions of programs.

  • The cognitive load of making sure all the wires are connected.

    In ensuring that

    • each Promise is completed and completed only once,
    • every spawned computation (Future, actor, fiber) is awaited,
    • all streams are consumed, all sinks are fed,
    • all messages from a queue are processed (or intentionally discarded),

    programmers are typically left on their own, with no help from the type system.

  • Callbacks everywhere.

    It is disappointing how callback-ridden our programs, including our functional programs, are when it comes to concurrency and accessing resources.

    By a callback we mean a function object that performs effects (on resources it has captured) and is passed to someone else to be executed.

    One common example of a callback-style interface is an HTTP server taking, during construction, a request handler callback. The request handler has captured resources, such as a reference to mutable state or a database connector, and performs side-effects on them.

    The problem with callbacks is that they create non-local interaction, which makes it hard to reason about programs.

  • Lack of expressivity of the high-level stream operations.

    Building custom dataflow topologies with existing libraries, if at all possible, requires escaping to a different paradigm, one which is more low-level and imperative.

    We are talking about that case when you needed a slight variation on a stream operator and ended up "manually" shoveling messages between queues, mutable references and/or promises.

  • Inability to express protocols of interaction in types.

    In practice this leads to having to handle a lot of illegal state.

    To give an example, consider a protocol between two components, Alice and Bob, according to which Alice sends A to Bob, Bob replies with B or C, and then Alice replies with D if Bob sent B and with E if Bob sent C.

    In a common (but unsatisfactory) implementation, Bob formally accepts any of A, D, E at all times and raises or replies with an error if the protocol is violated, i.e. if it is not in a state in which it can accept the message.

Features

  • Concurrent by design

    Libretto programs are naturally concurrent, without special control-flow operators.

  • Session types

  • Statically checked linearity

    That every producer is connected to a consumer, etc., is guaranteed statically, i.e. before execution(*).

    (*) For programs written in point-free style, this is at compile time. For programs written using λ-notation, it is at assembly time, i.e. when converting to the point-free representation.

  • Direct-style programming

    All interaction of a component with its environment is stated by its type signature (much like inputs and outputs of a pure function).

    This is in contrast to callback-style which leads to spooky action at a distance. See also What's the problem with callbacks?.

  • Expressiveness

    Libretto provides a self-contained DSL that is powerful enough to express many different concepts without using escape hatches to the underlying layer.

    Streams themselves and dynamic stream topologies are expressible without needing built-in support.

  • Graphical notation

    We are often able to visualize the architecture of a system as a diagram, but there is usually a big gap between the diagram and the code.

    Libretto primitives have graphical notation that composes into larger diagrams as we compose the program, preserving close correspondence between code and graphical representation.

  • Programs as data

    Libretto programs are just data structures. This opens new possibilities: they can be inspected, manipulated, optimized, given different interpretations, used for code generation, sent over the network, ... It also means that many different Libretto implementations are possible.

    See also Are Libretto programs any more amenable to inspection than IO monad programs?

Libretto in a Nutshell

Libretto takes inspiration from linear logic, with a twist of interpreting the multiplicative conjunction, ⊗, in a concurrent fashion.

The correspondence of linear logic to certain monoidal categories led to a point-free notation that enforces linearity statically, despite the lack of linear types in the host language.

There is also lambda syntax, whose linearity checking is deferred until assembly time (i.e. when constructing the Libretto blueprint, before execution).

Primitives for racing and recursion were added.

Documentation

Scaladoc (Found something undocumented or not documented clearly? We should do better. Do not hesitate to submit a documentation request.)

Tutorial

Presentations

Typed Interaction with Session Types (using Scala and Libretto). Functional Scala 2022 [video] [slides]

Custom Stream Operators Made Safe And Simple with Libretto. Scalar 2023 [video] [slides]

Concurrent All The Way Down (Functional Concurrency with Libretto). LambdaDays 2023 [slides]

Caveats

It is all too common for software projects to highlight only the good parts or to overstate their features. We value your time and want to be absolutely honest about the limitations. However, it is hard to know what we are missing. That's why we need your help.

Do you find that libretto does not hold up to some of its promises? Do you find that the project description omits some important limitation? Do you feel that you have spent too much time to find out something that could have been presented more readily? Please, let us know.

Batteries not included. The streams library is under-developed. I/O library (file, network) is non-existent at the moment. There are currently no connectors or wrappers for things like HTTP servers, DB connectors, message brokers, ...

Not yet a good story for supervision/error recovery.

Flawed proof-of-concept implementation. The current implementation is leaky. It is not a design flaw, just an implementation flaw.

Best practices not established. Programming in Libretto is quite different from conventional concurrency libraries. It is not clear what the best practices should be.

Not used to this level of concurrency. In conventional programming languages and libraries, concurrency is a scarce special thing. In Libretto, concurrency is the default and sequentiality is a special thing. The consequences might be surprising.

Q&A

Did not find an answer to your question? Do not hesitate to ask us.

What is Libretto for?

Libretto is a library for implementing concurrent systems and stream processing in Scala. Its focus is on the programming model, i.e. the generic glue to structure concurrent programs, rather than any specific use cases.

Who is Libretto for?

You are more likely to appreciate Libretto if you:

  • cannot function without static types;
  • are into functional and/or declarative programming;
  • like when the compiler is guiding you (type-driven development);
  • hate falling back to imperative code (like the IO monad) when it comes to concurrency;
  • have hit an expressiveness limit of an existing library when you needed a slightly non-trivial data-flow topology.

Is Libretto production-ready?

No. See also Caveats.

The goal for now is to present a different approach to concurrent programming and to excite a small number of enthusiasts to play with it, explore it and push it further.

The expectation is that people will find the Libretto approach worthwhile regardless of unstable API, non-existent ecosystem of libraries, flawed proof-of-concept implementation, or unknown performance characteristics.

What do you mean by declarative, anyway?

Declarative programming is a programming paradigm in which we describe the computation without giving instructions to the underlying execution model. Examples of such instructions are updating a shared mutable variable or spawning or joining a thread/fiber/actor.

Instructions to the underlying execution model are side-effects, so we can say that declarative programs are programs without side-effects. In particular, all expressions in a declarative program must be referentially transparent.

Aren't IO monad-style programs referentially transparent as well?

Not really.

They are referentially transparent from the point of view of the host language (Scala). The expression IO { println("Hi!") } is referentially transparent in that given

val x = IO { println("Hi!") }

we can freely replace occurences of x by IO { println("Hi!") } (and vice versa) without changing the meaning of the program. There is no denying that this already has great benefits for compositionality and reasoning about programs.

However, if we view the IO monad as a new semantic level and having to wrap some expressions in IO { ... } as just a syntactic annoyance for the sake of embedding IO programs in Scala, then referential transparency at this new IO level would require the following two programs to have the same meaning

for {
  x <- IO { println("Hi!") }
  y <- IO { println("Hi!") }
} yield y
for {
  x <- IO { println("Hi!") }
  y <- IO { x }
} yield y

but they do not.

Saying that (the expressions in) these programs are referentially transparent (even though at the Scala level they are) is somewhat like saying that any program is referentially transparent when viewed as a text file in a file system.

Note: The way Libretto circumvents the issue of programs like these having different behavior is linearity: If, by some stretch of the imagination, these were Libretto programs, the first one would not typecheck, because x is unused.

How can Libretto statically guarantee that each resource is consumed exactly once when Scala does not have linear type system features?

Libretto programs are not composed of Scala functions A => B. They are composed of Libretto's linear functions A -⚬ B, which are linear by construction.

What exactly are the primitives of Libretto, from which everything else is derived?

There is a hierarchy of DSLs, partially ordered by their power. At the bottom, i.e. the weakest, is currently CoreDSL.

An extension of CoreDSL that is of particular interest is Scaletto. It adds support for using Scala values and pure Scala functions, managing resources that are Scala objects, performing effects on those resources, and marking thread-blocking Scala calls.

Does libretto support dynamic stream topologies?

Yes.

By dynamic stream topologies we mean data flows that do not have a rigid structure predetermined at compile-time, but depend on runtime values or runtime load.

Do libretto streams support feedback loops?

Yes, loops, where a stream from system's output is fed back into the system's input can be implemented easily. Note, however, that sending data around the system might not be a great way to do feedback. As data in Libretto can flow in both directions, the feedback can flow backwards through the system.

Is there support for timers?

Yes, the basic operation is delay:

def delay(d: FiniteDuration): Done -⚬ Done

How do I communicate with the external world (I/O)?

CoreDSL does not have any means of I/O.

Its extension, Scaletto, has means to wrap Scala resources and perform effects on them.

Another approach, although much more involved, is to

  • define a custom extension of CoreDSL that provides all the primitives necessary for a particular problem domain,
  • provide implementation of that custom DSL,
  • write programs against the custom DSL.

Does libretto support supervision of a subsystem?

Not yet.

See #8.

Can I execute different parts of the system on different execution contexts/thread pools?

Not yet.

The current implementation uses one main thread pool and another one for blocking operations. This is sufficient for most applications.

See #6.

Does libretto have fibers, as known from ZIO or Cats Effect?

No.

Fibers are used to manually control concurrency. In Libretto, any two computations that do not causally depend on each other take place concurrently, without any explicit instructions like spawn-ing a fiber.

Are Libretto programs any more amenable to inspection than IO monad programs?

IO monad programs are hidden inside an impenetrable Scala function after the first flatMap, and thus not accessible to any kind of inspection. Libretto, too, allows to incorporate Scala functions into the program. One might therefore ask whether Libretto programs are any more amenable to inspection than IO monad programs.

The answer is yes.

The difference is that in Libretto, uses of opaque Scala functions are local. This is unlike IO monad programs, where a Scala function passed to flatMap produces all the rest of the program, at runtime, thus making it impossible to inspect the program before it is run.

Given a Libretto program

f > mapVal(g) > h

where g is a Scala function and f, h are Libretto arrows, the content of g is opaque, but both f and h can be inspected before the program is run.

Does libretto support session types?

Yes.

Why are there two different function-like symbols? What is the difference between -⚬ and =⚬?

-⚬ =⚬
Function as code Function as data
Describes a program. Lives in a running program.
Tangible: we create and manipulate Scala values of this type. Intangible: there are no Scala values of this type. A type like A =⚬ B can appear only to the left or right of -⚬.
Value in the meta language (Scala). As such, it can be used any number of times (including zero) by Scala code. Resource in a Libretto program. As such, it must be consumed (evaluated) exactly once, because it might have captured other resources that are consumed on evaluation.
Morphism Object, Internal Hom

In category theory, one does not look inside objects. Everything is expressed in terms of morphisms. In particular, objects in a category, which in our case are types like =⚬, |*|, |+|, |&|, One, Val, ..., are not viewed as collections of elements. Indeed, there are no values of these types. We express everything in terms of morphisms, -⚬. We manipulate morphisms as values in the underlying meta language (Scala).

Notice how the distinction between a morphism (-⚬), an internal hom (=⚬) and a mapping of the underlying meta theory (=>) avoids a lot of confusion. In the category of Scala functions, all three of these roles are played by =>.

In the signature of curry on Scala functions, => appears in all three of these roles. Can you tell which occurrences play which roles?

def curry[A, B, C]: ((A, B) => C) => (A => (B => C))

The Libretto version of curry makes the roles clear:

def curry[A, B, C]: ((A |*| B) -⚬ C) => (A -⚬ (B =⚬ C))

Is the Libretto function -⚬ an Arrow?

No.

By Arrow we mean the Arrow typeclass.

-⚬ is not an Arrow, because

  • Libretto's concurrent pair |*| is not Scala's pair (_, _), or even a categorical product. It is only a monoidal product (tensor product), which is weaker.
  • Lifting arbitrary Scala function A => B to Libretto function A -⚬ B (the arr operation) is not supported.

How to type the symbol used in -⚬ and =⚬?

is Unicode code point U+26AC (medium small white circle).

You can use any input method that let's you enter a Unicode symbol via its code. For example, on macOS you can add Unicode Hex Input to your System Preferences > Keyboard > Input Sources. Using this input source, you can type by pressing Alt+26ac.

IntelliJ IDEA

In IntelilJ IDEA, we can define a Live Template that expands -o (where o is the Latin letter 'o') to -⚬:

  1. Go to Preferences > Editor > Live Templates > scala.
  2. Click the Add button and choose Live Template.
  3. Fill out the definition:
    • Abbreviation: -o (with the Latin letter 'o')
    • Description: e.g. lollipop
    • Template text: -⚬ (with the Unicode code point U+26AC)
    • Expand with: e.g. Space

Now, when you type -o␣ in a Scala file, the editor expands it to -⚬.

Visual Studio Code

In Visual Studio Code, we can add a user snippet triggered by -o (where o is the Latin letter 'o') which expands to -⚬:

  1. In Preferences > User Snippets click on New Global Snippets file...
  2. Enter file name, e.g. libretto.
  3. In the pre-generated JSON file, add the field
      "lollipop": {
        "prefix": "-o",
        "body": "-⚬",
        "description": "types the lollipop arrow (-⚬)"
      }
    and save.

Now, when you type -o in the editor, a popup should appear with a suggestion to insert -⚬, which you confirm by pressing Enter. (Sometimes you might need to press Ctrl+Space for the suggestion popup to appear.)

Does Libretto prevent deadlocks?

No. It is easy to write a Libretto program that gets stuck because of cyclic dependencies.

Some concurrency libraries that claim to prevent deadlocks use the word deadlock in a very specific sense, namely system threads waiting on each other in a cycle. Thread is not even a term from Libretto's vocabulary, but you can safely assume that any implementation of Libretto does not cause deadlocks on the level of threads, either.

Could deadlocks be prevented statically, in principle?

Yes.

Trivially, by restricting the DSL so that it is impossible to set up a loop in the data flow. However, the resulting loss of expressiveness would make it unusable for most applications.

A sufficiently expressive deadlock-free DSL would certainly require more sophisticated types. We are leaving that for further research and most likely for a different host language than Scala.

Why do we need negative values?

To avoid callbacks in bidirectional dataflows.

A negative value Neg[A] could alternatively be expressed as a callback Val[A] =⚬ One, but we want to avoid callbacks.

What's the problem with callbacks?

  • Indirection, resource capture, side-effects, non-local interaction.
    • The consumer of a value, instead of stating in its interface the need for a value to be supplied, prepares a function object that typically captures some resources (i.e. it is a closure) and performs side-effects on them.
    • The producer of a value, instead of stating in its interface that it produces a value, is aksed to invoke the callback, which from its perspective is some foreign code, on the produced value.
    • Through the captured resources and side-effects on them, the invocation of a callback causes spooky action at a distance, which negatively affects the ability to reason about programs.
  • Moreover, components that produce or consume callbacks are higher-order linear functions in Libretto. Higher-order functions are more powerful than first-order functions. (In terms of category theory, higher-order functions require closedness of the category.) By the principle of least power, we should avoid higher-order functions, and thus callbacks, when possible.

How does Libretto compare to lchannels?

lchannels is another Scala library with support for session types. Its encoding of session types is based on continuation-passing style classes and ordinary Scala functions. As a result, it is more lightweight in terms of the required machinery, but linearity is only checked at runtime. lchannel's heavy reliance on Scala functions preclude it from having Libretto's level of concurrency and analyzability, but those are not among stated goals of the project.

Copyright and License

Copyright 2020–2023 Tomas Mikula

This software is distributed under the Mozilla Public License 2.0, see the LICENSE file for details.

About

Declarative concurrency and stream processing library for Scala

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 99.6%
  • CSS 0.4%