Slow-Auto, Inconvenient-Semi

escaping false dichotomy with sanely-automatic derivation

Mateusz Kubuszok

Note

Most of you probably used some Scala library based on type class derivation.

Most of you probably started with so-called automatic derivation, and at some point it bite you with long compilation times or hard to debuf errors.

Some of you moved to moved to so-called semi-automatic derivation, which is much more clumsy, but tries to solve some of the issues with automatic derivation.

In this presentation, I’ll try to show that all of these issues are not some inherent cost of using type class derivation, but that we all did it wrong.

About me

breaking things in Scala for 9 years
a little bit of open source - including co-authoring Chimney for over 7 years now
blog at Kubuszok.com
niche Things you need to know about JVM (that matter in Scala) ebook

Note

I’ve been working with Scala for over 9 years,

and 7 of them I’ve been a maintainer of a library with rather complex metaprogramming.

Additionally, I’ve been blogging, speaking at meetups and conferences like this one, and also wrote a book about Scala and JVM, so maybe I am not here by accident.

But enough about me. Let’s see today’s game plan.

Agenda

what is a type class
what is type class derivation
automatic and semi-automatic derivation a’la Circe
semi-automatic derivation a’la Jsoniter
sanely-automatic derivation a’la Chimney
does it matter to a library users how these approach differ

Note

We’ll start with briefly saying what is a type-class, since some of you might not be familiar with the term.

Then we explain what we mean by type class derivation, because it has nothing to do with integrals and calculus.

Then we take a look at what people mean by automatic and semi-automatic derivation - in libraries which almost always followed Circe’s approach.

Then we show that not all libraries follow this approach, when it comes to semi-automatic derivation.

And some libraries have some improvements also when it comes to automatic derivation.

Hopefully, by the end of this talk, you will understand that how the library’s author implement the whole mechanics, impacts you, the users.

But I have to worn you, this is not Shapeless/Mirrors/Magnolia/or macros tutorial.

Examples

https://github.com/MateuszKubuszok/derivation-benchmarks

Note

We will be talking how these design choices of library’s authors affect you, the users.

If you are curious about the code that generated the numbers or error messages, everything that we compare, you can take a look at this link, and investigate the code at your own pace.

As an excercise for the reader.

So let’s start.

Type class

interface
with type paremeters
whose implementation can be automatically provided based on their type only

Note	Type class, what it is? As far as I care it is an interface. But it’s an interface with some type parameter. So that each implementation,that we’d use, could be distinguished only by type, and we could let the compiler pass it around for us. For example.

!

trait Encoder[A] {
  def apply(a: A): Json // <-- JSON as data
}
object Encoder {
  given encodeString: Encoder[String] = ...
  given encodeInt: Encoder[Int] = ...
  given encodeDouble: Encoder[Double] = ...
}

extension [A](value: A) {
  def asJson(using encoder: Encoder[A]): Json = encoder(value)
}

"value".asJson // using Encoder.encodeString
1024.asJson // using Encoder.encodeInt
3.13.asJson // using Encoder.encodeDouble

Note

We has some data type representing JSONs.

We want to be able to encode our type, whatever it is, to JSON.

For starters, we have implementation for the primitives.

And some extension methods, so that the code would look nice.

Then with the mechanism called implicit s or, on Scala 3, using and given, these implementations can be passed for us automatically.

However, we only provided some implementations. (next slide)

!

What if nobody wrote the implementation explicitly for my type?

case class Address(value: String)
case class User(name: String, address: Address)

Address("Paper St. 19").asJson // ???
User("John Smith", Address("Paper St. 19")).asJson // ???

No given instance of type Encoder[Address] was found for parameter encoder of
method asJson in object ...
No given instance of type Encoder[User] was found for parameter encoder of
method asJson in object ...

Note

What if nobody wrote the implementation explicitly for my type?

We can have, some Address and some User defined.

What is going to happen when we try to encode them?

The answer is that the compilation would fail. Because there are no implementations for these types.

That’s where the derivation comes in.

Type class derivation

(If you don’t understand this diagram, you probably haven’t spend 600h on a topic that most sane people avoid.)

Note

Type class derivation.

We’ll have several of these pictures, which you can study at your own pace at home.

For now what I want you to remember:

derivation is about picking the implementations for some parts of your type, and combining them together into the implementation for the whole type
for case class es, it means that you have to have an implementation for each field’s type
for sealed trait s and enum s, it means that you have to have implementation for each subtype
because it combines the implementations from bottom-up someone, usually, the library’s author have to provide implementations for the smallest blocks, usualy primitives
and, because there is no magic, someone has to define how these small blocks can be combined, usually that’s also the author of the library
and if that sounds confusing, it only because you haven’t spend way too much time on this subject

Maybe some example would help.

Derivation a’la Circe

!

trait Encoder[A] {
  def apply(a: A): Json // <-- JSON as data
}

extension [A](value: A) {
  def asJson(using encoder: Encoder[A]): Json = encoder(value)
}

case class Address(value: String)
case class User(name: String, address: Address)

Address("Paper St. 19")
// { "value": "Paper St. 19" }
User("John Smith", Address("Paper St. 19"))
// { "name": "John Smith", "address": { "value": "Paper St. 19" } }

import MagicImportOfSomethingThatCreatesEncoders.given

Address("Paper St. 19").asJson // generates Encoder[Address] on demand
User("John Smith", Address("Paper St. 19")).asJson // ditto but for User

import ImportOfSomethingThatLetsYouCreateEncoders.deriveEncoder

given addressEncoder: Encoder[Address] = deriveEncoder[Address]
given userEncoder: Encoder[User] = deriveEncoder[User]

Address("Paper St. 19").asJson // using addressEncoder
User("John Smith", Address("Paper St. 19")).asJson // using userEncoder

Note

As a reminder, we have this Encoder, which should turn your case class into JSON.

It has this nice extension method.

And we want to encode these `case class`es.

The library’s author has some assumption, like for instance, that each case class should be encoded as JSON object. Each field’s name would turn into that object’s key, and value, should be encoded, with the encoder for its type.

Automatic derivation assumes that all the missing implementations, that you have not provided yourself have an automatic fallback, often enabled with an import. You add that import, fallback becomes available in the implicit scope, and everything works. (Sometimes, this is implemented in the compation objects and then it cannot be disabled).

Semi-automatic derivation assumes that you want to define these implicits yourself, but you don’t want to write their implementation. It gives you some method, which would give you a new implementation, and even if there is implicit of a demanded type in scope, it ignores it. (So that you won’t end up with cyclical dependeny in the initialization).

If you will not write thise implicits/givens yourself, you’ll keep on getting implicit not found.

But let’s take a look a bit closer.

Automatic derivation of Address

implicitly[Encoder[Address]] // <-- using Encoder[Address]

Note

What happens when we try to summon an instance with automatic derivation? Using Address as an example.

(Reminder: this and the following diagrams are also something you can study at your own pace at home.)

First of all, automatic derivation should be a fallback, so the compiler tries to find some existing implementation and failed.

Then, it sees that we have a case class, and we just happen to have some mechanism implemented by authors which would

obtain the implementation for each field
combine them together

Now, the semi-automatic.

Semi-automatic derivation of Address

deriveEncoder[Address] // <-- creates new Encoder[Address]

Note	Here, we can see that there is no, try-existing-then-use-fallback part. We moved directly into creating new instance. If it cannot be created, the compilation fails, even if such instance exist and is in scope. Is there any difference when you try these approach with `User`.

Automatic derivation of User

implicitly[Encoder[User]] // <-- using Encoder[User]

Note

Well, there is, the diagram is much bigger. Why?

Because, with automatic derivation in scope the compiler automatically, as a fallback, not only the implementation for the type we asked for, but also implementations for the types nested in this type.

Here, it triggers the automatic derivation of Address.

Is it try for semi-automatic derivation as well?

Semi-automatic derivation of User

deriveEncoder[User] // <-- creates new Encoder[User]

Note

No. If we haven’t imported automatic derivation next to semi-automatic, if we didn’t create that implicit Encoder of Address, the compilation would fail.

Semi-automatic derivation in Circe, and libraries based on its approach, are not recursive.

In case you look at all of this, and as yourself… (next slide)

OK, but where is the code?

Wouldn’t it be easier to understand with some examples?

Note	…where is the code? Reminder: (next slide)

1. We are focusing on user-side of the derivation story

Note	One. Our goal is it see how something that we didn’t wrote but someone else affect us.

2. Code is in the link

https://github.com/MateuszKubuszok/derivation-benchmarks

Note	Two. You can look at the code whenever you want.

3. If you really need the derivation-internals-explanation-experience

Note	If we actually try to show it, and explain it during this presentation. It would take half the conference. The whole audience would be traumatised, and we would still not get to the point I’m trying to make. So, getting back to the main topic.

Why people bother with semi-automatic derivation?

1. They want to make sure that they use the same implementation everywhere

Note	We’re not going to question that use case. If you want to have the same implementation everywhere, you define it only once and reuse.

2. "Speed"

Note

But the other reason people have strong preference for semi is speed.

There were a lot of horror stories about a single implicit compiling for several minutes. (I saw some myself).

A lot of people did investigation - compiler benchmarks, flame graphs, time spend in different compilation phases - and found that the cause is automatic derivation.

Semi-automatic derivation solved their problems.

But is it still true today?

!

// We're use Circe:
// trait Encoder[A] { ... } turns A -> Json
// trait Decoder[A] { ... } turns Json -> Either[Decoder.DecodingError, A]

case class Out(...) // <-- really big case class with nested case classes

// value -> Json -> value again
def roundTrip(out: Out): (Json, Either[Decoder.DecodingError, Out]) = {
  val json = out.asJson // <-- encode as Json using Encoder[Out]
  val parsed = json.as[Out] // <-- decode from Json using  Decoder[Out]
  json -> parsed
}

// Semi-automatic version will just have this:
implicit val in1Decoder: Decoder[In1] = deriveDecoder
implicit val in1Encoder: Encoder[In1] = deriveEncoder
implicit val in2Decoder: Decoder[In2] = deriveDecoder
implicit val in2Encoder: Encoder[In2] = deriveEncoder
implicit val in3Decoder: Decoder[In3] = deriveDecoder
implicit val in3Encoder: Encoder[In3] = deriveEncoder
implicit val in4Decoder: Decoder[In4] = deriveDecoder
implicit val in4Encoder: Encoder[In4] = deriveEncoder
implicit val in5Decoder: Decoder[In5] = deriveDecoder
implicit val in5Encoder: Encoder[In5] = deriveEncoder
implicit val outDecoder: Decoder[Out] = deriveDecoder
implicit val outEncoder: Encoder[Out] = deriveEncoder
// instead of automatic derivation import.

This shouldn’t be hard on compiler?

Note

I defined some very mean nested case class. All you need to know that it’s 5 levels of nesting deep.

I want to use Circe, both Encoder and Decoder, and do a round trip - encode this case class, and then decode it, for example to test if I get the same value or do some benchmarks.

I try to do it once with automatic derivation, and once with semi-automatic approach.

Both will be single-file modules which only generate a few codecs. How bad it can be?

!

(less is better)

Note

On Scala 2.13, not so bad. The compilation times are quite close, not deserving the bad press.

Of course, if we ignore the fact that both need at least 12 seconds on cold JVM to compile a single short file.

But on Scala 3 with automatic we have 46 seconds to compile a single file on cold JVM! On the other hand semi-auto works much faster!

What about runtime?

                     Scala 2   Scala 3  Units
compilation of      cold hot  cold hot
circeGenericAuto      14   4    46  16      s
circeGenericSemi      12   3    10   1      s
circeMagnoliaAuto     13   2    65  32      s
circeMagnoliaSemi     12   7    12   2      s
jsoniterScalaSanely    -   -     9   1      s
jsoniterScalaSemi     10   4     8   1      s

!

Scala 2.13.14

[info] Benchmark                          Mode  Cnt   Score   Error   Units
[info] JsonRoundTrips.circeGenericAuto    thrpt  10   7.319 ± 0.011  ops/ms
[info] JsonRoundTrips.circeGenericSemi    thrpt  10   6.775 ± 0.013  ops/ms

Scala 3.3.3

[info] Benchmark                            Mode  Cnt   Score   Error   Units
[info] JsonRoundTrips.circeGenericAuto     thrpt   10   0.490 ± 0.432  ops/ms
[info] JsonRoundTrips.circeGenericSemi     thrpt   10   4.607 ± 0.014  ops/ms

(more is better)

Note	Scala 2.13 has very small differences between auto and semi in benchmarks as well. Scala 3 on the other hand… semi-automatic derivation 1/3rd slower than Scala 2.13! But the automatic derivation is a disaster - an order of magnitude slower than that. Can we example these results?

Auto vs Semi on Scala 2

PR #5649 - Faster compilation of inductive implicits (closed)
PR #6481 - Topic/inductive implicits 2.13.x (closed)
PR #6580 - Prune polymorphic implicits more aggressively (merged)
PR #7012 - Speed up implicit resolution by avoiding allocations when traversing TypeRefs in core (merged)
and more

             1) baseline - scalac 2.13.x  2) scalac 2.13.x with matchesPtInst
 HList Size
  50          4                            3
 100          7                            3
 150         15                            4
 200         28                            4
 250         48                            5
 300         81                            6
 350        126                            8
 400        189                           11
 450        322                           13
 500        405                           16         Compile time in seconds

Note

Autor of Shapeless spend a lot of time contributing to the compiler, to optimize the implici resolution. He made a whole series of PRs.

2 of them got closed, but the 3rd one finally got merged and released as a part of 2.13.0-M5.

And then there was another one.

We can see that he was pretty happy with the result because we boasted how the compilation times went down.

Some of that work was ported to Scala 3 but perhaps not everything, or maybe these opimization do not play well with how Mirrors work.

And all the bad press that automatic derivation has comes probably from before these PRs. Or maybe people were deriving exactly the same implicit 50 times.

Putting these optimizations aside (next slide)

!

Could something else improve performance?

Note	Before Scala 3, some people believed that yes. For instance replacing Shapeless.

Magnolia

alternative to Shapeless/Mirrors
boasts about:
- better API
- better performance
- better compilation times
- better error messages when derivation fail

Note	Was created as an alternative to Shapeless for the most common use cases, with better API, performance, compilation times, and error messages. Let’s start with the last claim.

Error messages

Semi-automatic derivation

case class Street(name: Either[String, Nothing]) // <-- should not be able to derive name
case class Address(street: Street)
case class User(name: String, address: Address)

implicit val streetEncoder: Encoder[Street] = deriveEncoder
implicit val addressEncoder: Encoder[Address] = deriveEncoder
implicit val userEncoder: Encoder[User] = deriveEncoder

user.asJson

Shapeless' errors

could not find Lazy implicit value of type DerivedAsObjectEncoder[Street]
   implicit val streetEncoder: Encoder[Street] = deriveEncoder
                                                 ^

Mirrors' errors

  implicit val streetEncoder: Encoder[Street] = deriveEncoder
                                                ^^^^^^^^^^^^^
Failed to find an instance of Encoder[Either[String, Nothing]]

Magnolia’s errors

magnolia: could not find Encoder.Typeclass for type Either[String,Nothing]
     in parameter 'name' of product type Street
   implicit val streetEncoder: Encoder[Street] = EncoderSemi.derived
                                                             ^

Note

Here we have some nested case classes - User has Address, Address has Street, where the last one stores Nothing in a field and cannot be encoded out of the box.

What kind of errors we’ll get?

Shapeless tells us that it cannot find implicit for the Street type.

Mirrors tell us that it cannot find implicit for the type of bad field (without naming that field and which class has it, but at least we know the location).

Magnolia tells us which field has a type that cannot be encoded, and in which case class this field is defined. Nice!

But it’s semi-automatic derivation. What about automatic?

!

Automatic derivation

case class Street(name: Either[String, Nothing])
case class Address(street: Street)
case class User(name: String, address: Address)

user.asJson

Shapeless/Mirrors/Magnolia

could not find implicit value for parameter encoder: Encoder[User]
     user.asJson
          ^

Note	Unfortunatelly, no matter which library we used, none of them could tell us anything useful: implicit not found for `User`. All of them are equaly unhelpful. Ok, so let’s take a look at the compilation times.

Round trip (reminder)

// Out - the outerermost of a deep nested, nasty case class structure
def roundTrip(out: Out): (Json, Result[Out]) = {
  val json = out.asJson // encode
  val parsed = json.as[Out] // decode
  json -> parsed
}

!

(less is better)

Note

Perhaps at the times of Scala 2.12 the difference was bigger, but it seems that the compilation times on Scala 2.13 are close.

The small spike on hot JVM might be an error.

But something weird happens on Scala 3, Magnolia is always worse than Mirrors! Why?

Because on Scala 3 it was implemented with Mirrors so it adds its own overhead on top of Mirrors.

How about benchmarks?

                     Scala 2   Scala 3  Units
compilation of      cold hot  cold hot
circeGenericAuto      14   4    46  16      s
circeGenericSemi      12   3    10   1      s
circeMagnoliaAuto     13   2    65  32      s
circeMagnoliaSemi     12   7    12   2      s
jsoniterScalaSanely    -   -     9   1      s
jsoniterScalaSemi     10   4     8   1      s

!

Scala 2.13.14

[info] Benchmark                          Mode  Cnt   Score   Error   Units
[info] JsonRoundTrips.circeGenericAuto    thrpt  10   7.319 ± 0.011  ops/ms
[info] JsonRoundTrips.circeGenericSemi    thrpt  10   6.775 ± 0.013  ops/ms
[info] JsonRoundTrips.circeMagnoliaAuto   thrpt  10   7.689 ± 0.013  ops/ms
[info] JsonRoundTrips.circeMagnoliaSemi   thrpt  10   7.838 ± 0.013  ops/ms

Scala 3.3.3

[info] Benchmark                            Mode  Cnt   Score   Error   Units
[info] JsonRoundTrips.circeGenericAuto     thrpt   10   0.490 ± 0.432  ops/ms
[info] JsonRoundTrips.circeGenericSemi     thrpt   10   4.607 ± 0.014  ops/ms
[info] JsonRoundTrips.circeMagnoliaAuto    thrpt   10   0.077 ± 0.039  ops/ms
[info] JsonRoundTrips.circeMagnoliaSemi    thrpt   10   5.590 ± 0.013  ops/ms

(more is better)

Note

There are some small differences on Scala 2.13, Magnolia is a bit faster, but still, results are very close.

On Scala 3, semi-automatic Magnolia seem to doo better than semi-automatic Mirrors, curious, but automatic Magnolia is order or magnitude slower than even automatic Mirrors!

I suspect that it might be about inlining, a bit too much inlining.

!

Shapeless/Mirrors/Magnolia - different APIs, same approach.

Did anyone try something else?

Note

It seems that Shapeless/Mirrors/Magnolia are offering mostly different APIs - we don’t care about that in this talk.

They have slightly different errors with semi-automatic derivation.

Sometimes ridiculous performance on Scala 3 with automatic derivation.

But for us, users, it’s mostly the same DX.

Did anyone try something else?

Jsoniter Scala

prioritizes performance
no automatic derivation
no need to derive intermediate instances

How?

Note	Jsoniter Scala is a library which has performance at heart. It intentionally has no automatic derivation - why? Because intermediate type class instances can hurt performance. But how you can have no intermediate instances, for intermediate types? Apparently Jsoniter handles it somehow. How?

!

// Yes, only 1 codec, no need to manually derive implicits for nested cases
implicit val outCodec: JsonValueCodec[Out] =
  JsonCodecMaker.make(CodecMakerConfig.withAllowRecursiveTypes(true))

def roundTrip(out: Out): (String, Either[Throwable, Out]) = {
  val str = writeToString(out)
  val parsed = scala.util.Try(readFromString(str)).toEither
  str -> parsed
}

Note

Quite simply: its semiautomatic derivation is recursive and handles intermediate types in the same macro expansion.

You tell it to derive an implicit and it will handle all the nested case classes, and so on, inside that implicit implementation.

So the mechanism is a bit different to what we see in Circe-like libraries.

Recursive semi-automatic derivation

Note	If we look at this diagram it looks more complex. Why? Because everything that was delegated before on the compiler, typer and implicit search is now handled "manually" in the same macro, with if-elses, loop, or good old recursion. If we zoom out a bit… (next slide)

Recursive semi-automatic derivation

delegates everything to implicit search
types supported OOTB are handled via implicits in companion object

use implicit search only for overrides
types supported OOTB are handled by macro, implicit scope is empty by default

Note	…we might suspect why people prefer to develop things in the Circe style - it’s much easier for developer to now thing about these things! You write some implicits and it works, while with macros you have to deal manually write conditional code and create trees. (next slide)

!

OK, but what does this gibberish mean for users?

Note	Probably you are asking yourself this question, so let’s get to the numbers.

!

(less is better)

Note

Jsonier Scala beaten all of the other approaches: Shapeless, Mirrors, Magnolia, whether automatic or semiautomatic. 10 seconds on cold JVM going down to 4 seconds on Scala 2.13. 8 seconds down to 1 on Scala 3. And we are only taking about the compilation time, not the actual performance!

                     Scala 2   Scala 3  Units
compilation of      cold hot  cold hot
circeGenericAuto      14   4    46  16      s
circeGenericSemi      12   3    10   1      s
circeMagnoliaAuto     13   2    65  32      s
circeMagnoliaSemi     12   7    12   2      s
jsoniterScalaSanely    -   -     9   1      s
jsoniterScalaSemi     10   4     8   1      s

!

Scala 2.13.14

[info] Benchmark                          Mode  Cnt   Score   Error   Units
[info] JsonRoundTrips.circeGenericAuto    thrpt  10   7.319 ± 0.011  ops/ms
[info] JsonRoundTrips.circeGenericSemi    thrpt  10   6.775 ± 0.013  ops/ms
[info] JsonRoundTrips.circeMagnoliaAuto   thrpt  10   7.689 ± 0.013  ops/ms
[info] JsonRoundTrips.circeMagnoliaSemi   thrpt  10   7.838 ± 0.013  ops/ms
[info] JsonRoundTrips.jsoniterScalaSemi   thrpt  10  20.081 ± 0.151  ops/ms

Scala 3.3.3

[info] Benchmark                            Mode  Cnt   Score   Error   Units
[info] JsonRoundTrips.circeGenericAuto     thrpt   10   0.490 ± 0.432  ops/ms
[info] JsonRoundTrips.circeGenericSemi     thrpt   10   4.607 ± 0.014  ops/ms
[info] JsonRoundTrips.circeMagnoliaAuto    thrpt   10   0.077 ± 0.039  ops/ms
[info] JsonRoundTrips.circeMagnoliaSemi    thrpt   10   5.590 ± 0.013  ops/ms
[info] JsonRoundTrips.jsoniterScalaSemi    thrpt   10  21.480 ± 0.070  ops/ms

(more is better)

Note	In benchmarks, it was 3 times faster than the fastest Circe result. And have to admit: I am cheating, Jsoniter parses and writes to String while, Circe parses and writes to Json AST. If Circe was first: parsing from String to Json and then Json to case class… I suspect it would be even worse.

!

But can it be automatic?

Note	Jsoniter’s approach, while promising, still is not as easy for newcomers as automatic derivation, it requires some ceremony after all. Can we get rid of it?

Automatic derivation a’la Chimney

Solution

trait TypeClass[A] extends TypeClass.AutoDerived[A] { ... }
object TypeClass {

  // semi-automatic derivation of TypeClass[A]
  inline def derived[A]: TypeClass[A] = ${ derivedImpl[A] }

  trait AutoDerived[A] { ... }
  object AutoDerived extends AutoDerivedLowPriorityImplicits
  trait AutoDerivedLowPriorityImplicits {

    // automatic derivation of TypeClass.AutoDerived[A]
    inline given derived[A]: AutoDerived[A] = ${ derivedImpl[A] }
  }
}

extension [A](value: A)
  // uses TypeClass[A] defined by user manually or with TypeClass.derived,
  // falling back on automatic derivation
  def method(using TypeClass.AutoDerived[A]) = ...

// allowed to try summoning TypeClass[Sth].
// NOT allowed to try summoning TypeClass.AutoDerived[Sth]!
def derivedImpl[A: Type]: Expr[TypeClass[A]] = ...

(Disclaimer: understanding this code is not necessary to understand its implications on the next slides)

(Solutions for New Prioritization of Givens in Scala 3.7 available at the checkout)

Note

We have 2 separate types: one is intended to be used by the ., and one used only for automatic derivation.

Extension methods and other summoning should try to use the one we exposed to user, and then fallback on automatic derivation.

Both automatic and semiautomatic derivation can only use the type intended for users, so a macro never calls itself.

If you don’t get it, don’t worry, it’s enough if you just understand the implications.

If you have questions about givens and 3.7 we can talk about it later.

(In case I forgot: summonFrom for ordering the summons the old way + opaque type for the result of such ordered summoning.)

!

Can we test it outside Chimney?

Yes.

Note	I know that it works in Chimney, but we are using JSON examples for now. The answer is "yes".

Sanely-automatic derivation

I implemented wrapper around Jsoniter (on Scala 3-only) which works like this:

import jsonitersanely.* // <-- 1 import, like with std automatic derivation

def roundTrip(out: Out): (String, Either[Throwable, Out]) = {
  val str = write(out)
  val parsed = scala.util.Try(read[Out](str)).toEither
  str -> parsed
}

Note	The approach, which I named sanely-automatic as opposed to semi-automatic, is something I implemented for Jsoniter. Since I couldn’t just edit the Jsoniter code, I made a wrapper, and only for Scala 3 because it was easier for me. As you can see, it’s used just like automatic derivation on Circe.

!

How does it compare to Circe or normal Jsoniter Scala?

Note	Did we managed to avoid all of the issues of automatic derivation without the ceremony of semi-automatic derivation?

!

(less is better)

Note

I would say "yes". Sanely-automatic derivation has amost the same compilation times as Jsoniter, much, much faster than Circe, no matter which apporach.

                     Scala 2   Scala 3  Units
compilation of      cold hot  cold hot
circeGenericAuto      14   4    46  16      s
circeGenericSemi      12   3    10   1      s
circeMagnoliaAuto     13   2    65  32      s
circeMagnoliaSemi     12   7    12   2      s
jsoniterScalaSanely    -   -     9   1      s
jsoniterScalaSemi     10   4     8   1      s

!

Scala 2.13.14

[info] Benchmark                          Mode  Cnt   Score   Error   Units
[info] JsonRoundTrips.circeGenericAuto    thrpt  10   7.319 ± 0.011  ops/ms
[info] JsonRoundTrips.circeGenericSemi    thrpt  10   6.775 ± 0.013  ops/ms
[info] JsonRoundTrips.circeMagnoliaAuto   thrpt  10   7.689 ± 0.013  ops/ms
[info] JsonRoundTrips.circeMagnoliaSemi   thrpt  10   7.838 ± 0.013  ops/ms
[info] JsonRoundTrips.jsoniterScalaSemi   thrpt  10  20.081 ± 0.151  ops/ms

Scala 3.3.3

[info] Benchmark                            Mode  Cnt   Score   Error   Units
[info] JsonRoundTrips.circeGenericAuto     thrpt   10   0.490 ± 0.432  ops/ms
[info] JsonRoundTrips.circeGenericSemi     thrpt   10   4.607 ± 0.014  ops/ms
[info] JsonRoundTrips.circeMagnoliaAuto    thrpt   10   0.077 ± 0.039  ops/ms
[info] JsonRoundTrips.circeMagnoliaSemi    thrpt   10   5.590 ± 0.013  ops/ms
[info] JsonRoundTrips.jsoniterScalaSemi    thrpt   10  21.480 ± 0.070  ops/ms
[info] JsonRoundTrips.jsoniterScalaSanely  thrpt   10  21.408 ± 0.070  ops/ms

(more is better)

Note	Benchmarks are virtually the same. We the fastest compilation, with the fastest bytecode, and no manually written implicits!

!

But Jsoniter parsing String s vs Circe parsing Json might be apples vs oranges.

Can we have some more fair comparison?

Note	However, you can remind me: these libraries have different philosophies, designs, etc. The results might be the artifact of something else than just the way they implemented the derivation. And you would be right which is why I also implemented something else.

More fair comparison

!

trait FastShowPretty[A] {

  def showPretty(
    value:   A,
    sb:      StringBuilder,
    indent:  String = "  ",
    nesting: Int = 0
  ): StringBuilder
}

implicit class FastShowPrettyOps[A](private val value: A) {

  def showPretty(indent: String = "  ", nesting: Int = 0)(
    implicit fsp: FastShowPretty[A]
  ): String =
    fsp.showPretty(value, new StringBuilder, indent, nesting).toString()
}

case class Street(name: String)
case class Address(street: Street)
case class User(name: String, address: Address)

println(User("John", Address(Street("Paper St"))).showPretty())

User(
  name = "John",
  address = Address(
    street = Street(
      name = "Paper St"
    )
  )
)

Note

Some of you might be familiar with Show type class - it’s basically toString but "better".

It also has a ShowPretty variant, which adds some nice indents.

I decided to use that pretty variant, but instead of concatenating Strings, like in the original, I decided I would be appending them to StringBuilder.

This is how I’d like to use that type class, and what kind of output I’d like to see.

!

automatic and semi-automatic derivation using Shapeless (Scala 2)
automatic and semi-automatic derivation using Mirror s (Scala 3)
automatic and semi-automatic derivation using Magnolia (Scala 2 & 3)
sanely-automatic derivation with macros and Chimney macro commons (Scala 2 & 3)

Note	Then I implemented it for Shapeless on Scala 2 Mirrors on Scala 3 Magnolia on both 2 and 3 Sanely-automatic derivation with macros on both 2 and 3 For startes I implemented sanely-automatic derivation in naive way - inlining everything. Then I run numbers for my evil, nested case class.

!

(less is better)

Note

It seems that some results are the same like with JSONs experiments.

Scala 2.13 approaches are close.

Semi-automatic results on Scala 3 are slightly better.

Automatic results on Scala 3 are much worse.

Naive macro implementation, isn’t very bad, especially considering how convenient it is, but it seems that it’s slower to compile than semi-automatic.

                            Scala 2   Scala 3  Units
compilation of             cold hot  cold hot
showGenericProgrammingAuto   15   5    53  29      s
showGenericProgrammingSemi   10   2    10   2      s
showMagnoliaAuto             10   1    43  15      s
showMagnoliaSemi             10   2     9   1      s
showSanely                   14   4    16   5      s

!

Scala 2.13.14

[info] Benchmark                                Mode  Cnt  Score   Error   Units
[info] ShowOutputs.showGenericProgrammingAuto  thrpt   10  2.651 ± 0.012  ops/ms
[info] ShowOutputs.showGenericProgrammingSemi  thrpt   10  2.829 ± 0.033  ops/ms
[info] ShowOutputs.showMagnoliaAuto            thrpt   10  3.621 ± 0.017  ops/ms
[info] ShowOutputs.showMagnoliaSemi            thrpt   10  3.745 ± 0.028  ops/ms
[info] ShowOutputs.showSanely                  thrpt   10  2.202 ± 0.359  ops/ms

Scala 3.3.3

[info] Benchmark                                Mode  Cnt  Score   Error   Units
[info] ShowOutputs.showGenericProgrammingAuto  thrpt   10  0.156 ± 0.013  ops/ms
[info] ShowOutputs.showGenericProgrammingSemi  thrpt   10  3.492 ± 0.013  ops/ms
[info] ShowOutputs.showMagnoliaAuto            thrpt   10  0.090 ± 0.023  ops/ms
[info] ShowOutputs.showMagnoliaSemi            thrpt   10  3.918 ± 0.012  ops/ms
[info] ShowOutputs.showSanely                  thrpt   10  2.204 ± 0.396  ops/ms

(more is better)

Note	Similarly benchmarks, our naive sanely-automatic derivation isn’t terrible, but probably you would prefer anything else (other than automatic derivation on Scala 3). So was it a failed experiment?

!

But wait.

Jsoniter had one more trick. It "caches" subroutines as def s.

Note	If you need to handle the same type, you wouldn’t derive code for it again, but just call that def you defined when you handled it the first time.

Would that make a difference?

!

(less is better)

Note

It seems that caching results of the derivation, inside the same macro, is not that difficult, and 1 non-invasive PR later, we run the numbers again.

We beat all the other approaches. It’s the fasted thing to compile. How about benchmarks?

                            Scala 2   Scala 3  Units
compilation of             cold hot  cold hot
showGenericProgrammingAuto   15   5    53  29      s
showGenericProgrammingSemi   10   2    10   2      s
showMagnoliaAuto             10   1    43  15      s
showMagnoliaSemi             10   2     9   1      s
showSanely                    6   1     7   1      s

!

Scala 2.13.14

[info] Benchmark                                Mode  Cnt  Score   Error   Units
[info] ShowOutputs.showGenericProgrammingAuto  thrpt   10  2.651 ± 0.012  ops/ms
[info] ShowOutputs.showGenericProgrammingSemi  thrpt   10  2.829 ± 0.033  ops/ms
[info] ShowOutputs.showMagnoliaAuto            thrpt   10  3.621 ± 0.017  ops/ms
[info] ShowOutputs.showMagnoliaSemi            thrpt   10  3.745 ± 0.028  ops/ms
[info] ShowOutputs.showSanely                  thrpt   10  4.811 ± 0.026  ops/ms

Scala 3.3.3

[info] Benchmark                                Mode  Cnt  Score   Error   Units
[info] ShowOutputs.showGenericProgrammingAuto  thrpt   10  0.156 ± 0.013  ops/ms
[info] ShowOutputs.showGenericProgrammingSemi  thrpt   10  3.492 ± 0.013  ops/ms
[info] ShowOutputs.showMagnoliaAuto            thrpt   10  0.090 ± 0.023  ops/ms
[info] ShowOutputs.showMagnoliaSemi            thrpt   10  3.918 ± 0.012  ops/ms
[info] ShowOutputs.showSanely                  thrpt   10  4.800 ± 0.042  ops/ms

(more is better)

Note	Again, the fastest! The code that required as little ceremony as a single import is both the fastest to compile and the fastest to run! But we haven’t talked about debugging these macros, did we?

Bonus: debugging

case class Street(name: Either[String, Nothing]) // <-- this should fail the derivation
case class Address(street: Street)
case class User(name: String, address: Address)

// scalacOptions += "-Xmacro-settings:fastshowpretty.logging=true"
def printObject(out: User): String = out.showPretty()

[error] .../ShowSanely.scala:12:54: Failed to derive showing for value : example.ShowSanely.User:
[error] No build-in support nor implicit for type scala.Nothing
[error]   def printObject(out: User): String = out.showPretty()
[error]                                                      ^

[info] .../ShowSanely.scala:12:54: Logs:
[info]  - Started derivation for value : example.ShowSanely.User
[info]  - Attempting rule ImplicitRule
[info]  - Skipped summoning example.showmacros.FastShowPretty[example.ShowSanely.User]
[info]  - Attempting rule CachedDefRule
[info]  - Attempting rule BuildInRule
[info]  - Attempting rule ProductRule
[info]  - Checking if def for example.ShowSanely.User exists
[info]  - Started deriving def for example.ShowSanely.User
[info]    - Started derivation for string : java.lang.String
[info]    - Attempting rule ImplicitRule
[info]    - Attempting rule CachedDefRule
[info]    - Attempting rule BuildInRule
[info]    - Successfully shown java.lang.String: sb.append("\"").append(string).append("\"")
[info]    - Started derivation for address : example.ShowSanely.Address
[info]    - Attempting rule ImplicitRule
[info]    - Attempting rule CachedDefRule
[info]    - Attempting rule BuildInRule
[info]    - Attempting rule ProductRule
[info]    - Checking if def for example.ShowSanely.Address exists
[info]    - Started deriving def for example.ShowSanely.Address
[info]      - Started derivation for street : example.ShowSanely.Street
[info]      - Attempting rule ImplicitRule
[info]      - Attempting rule CachedDefRule
[info]      - Attempting rule BuildInRule
[info]      - Attempting rule ProductRule
[info]      - Checking if def for example.ShowSanely.Street exists
[info]      - Started deriving def for example.ShowSanely.Street
[info]        - Started derivation for either : scala.util.Either[java.lang.String, scala.Nothing]
[info]        - Attempting rule ImplicitRule
[info]        - Attempting rule CachedDefRule
[info]        - Attempting rule BuildInRule
[info]        - Attempting rule ProductRule
[info]        - Attempting rule SumTypeRule
[info]        - Checking if def for scala.util.Either[java.lang.String, scala.Nothing] exists
[info]        - Started deriving def for scala.util.Either[java.lang.String, scala.Nothing]
[info]          - Started derivation for left : scala.util.Left[java.lang.String, scala.Nothing]
[info]          - Attempting rule ImplicitRule
[info]          - Attempting rule CachedDefRule
[info]          - Attempting rule BuildInRule
[info]          - Attempting rule ProductRule
[info]          - Checking if def for scala.util.Left[java.lang.String, scala.Nothing] exists
[info]          - Started deriving def for scala.util.Left[java.lang.String, scala.Nothing]
[info]            - Started derivation for string : java.lang.String
[info]            - Attempting rule ImplicitRule
[info]            - Attempting rule CachedDefRule
[info]            - Attempting rule BuildInRule
[info]            - Successfully shown java.lang.String: sb.append("\"").append(string).append("\"")
[info]          - Cached result of def for scala.util.Left[java.lang.String, scala.Nothing]
[info]          - Successfully shown scala.util.Left[java.lang.String, scala.Nothing]: show_nothing$u005D(left, nesting)
[info]          - Started derivation for right : scala.util.Right[java.lang.String, scala.Nothing]
[info]          - Attempting rule ImplicitRule
[info]          - Attempting rule CachedDefRule
[info]          - Attempting rule BuildInRule
[info]          - Attempting rule ProductRule
[info]          - Checking if def for scala.util.Right[java.lang.String, scala.Nothing] exists
[info]          - Started deriving def for scala.util.Right[java.lang.String, scala.Nothing]
[info]            - Started derivation for nothing : scala.Nothing
[info]            - Attempting rule ImplicitRule
[info]            - Attempting rule CachedDefRule
[info]            - Attempting rule BuildInRule
[info]          - Cached result of def for scala.util.Right[java.lang.String, scala.Nothing]
[info]        - Cached result of def for scala.util.Either[java.lang.String, scala.Nothing]
[info]      - Cached result of def for example.ShowSanely.Street
[info]    - Cached result of def for example.ShowSanely.Address
[info]  - Cached result of def for example.ShowSanely.User
[info]   def printObject(out: User): String = out.showPretty()
[info]                                                      ^

Note

Again, the example with nested case classes, and one nasty field.

We are able to provide quite a good error message!

And with a single scalac option we can also take a look how the code is generated with a structured logging!

Even if we had some doubts after reading the error message with a whole log, we know exactly what happened.

Summary

Note

Last year’s survey showed that 53% of developers complained about compile times. That the effort should be made to make the compiler faster.

Perhaps, that’s not the compiler. Perhaps, we just did the derivation the wrong way.

We could see that through great effort automatic-derivation was optimized to be as performant as semi-automatic one on Scala 2.13. It took several years.

It hasn’t yet happened on Scala 3.

And then we could beat all that effort with a single PR to a macro, that just doesn’t follow the popular conventions. For me it means that the conventions are at fault.

It doesn’t mean that Shapeless/Mirrors/Magnolia/all the current inline def and compiletime ops effort was in vain - if you take a look at libraries implemented with these tools and libraries implemented with macros, you can think that many of our popular libraries wouldn’t be faster without Shapeless.

They would never have been created in the first place. These tools make it much easier to start learning about metaprogramming and even with them it’s difficult.

But if we want to have user friendly libraries in Scala - and I know we all do - we should start challenging the current solutions.

So we should start giving - in a polite and respectful way - feedback to library maintainers that we can do better. That it’s the libraries that can make compilation time shorter, generated code faster, and error messages better.

And I’m fine if that won’t happen with the sanely-automatic derivation I invented, as long as thing will improve.

Thank you!

https://github.com/MateuszKubuszok/derivation-benchmarks

Files

index.adoc

Latest commit

History

index.adoc

File metadata and controls

Slow-Auto, Inconvenient-Semi

About me

Agenda

Examples

Type class

!

!

Type class derivation

Derivation a’la Circe

!

Automatic derivation of Address

Semi-automatic derivation of Address

Automatic derivation of User

Semi-automatic derivation of User

OK, but where is the code?

1. We are focusing on user-side of the derivation story

2. Code is in the link

3. If you really need the derivation-internals-explanation-experience

Why people bother with semi-automatic derivation?

1. They want to make sure that they use the same implementation everywhere

2. "Speed"

!

!

!

Auto vs Semi on Scala 2

!

Magnolia

Error messages

!

Round trip (reminder)

!

!

!

Jsoniter Scala

!

Recursive semi-automatic derivation

Recursive semi-automatic derivation

!

!

!

!

Automatic derivation a’la Chimney

Solution

!

Sanely-automatic derivation

!

!

!

!

More fair comparison

!

!

!

!

!

!

!

Bonus: debugging

Summary

Thank you!