Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in 2.2.3 #148

Closed
gianm opened this issue May 20, 2014 · 21 comments
Closed

Performance regression in 2.2.3 #148

gianm opened this issue May 20, 2014 · 21 comments
Milestone

Comments

@gianm
Copy link

gianm commented May 20, 2014

Version 2.2.3 does untyped deserialization dramatically slower (about 150x in the benchmark I am using) than 2.2.2. I'm testing with a json object that has a few levels of nesting (maps within maps; some keys are lists too) with a loop that looks like this:

  def timeJacksonJava(reps: Int) = {
    var ret: Seq[AnyRef] = null
    val jm = new ObjectMapper
    for (i <- 0 until reps) {
      val xs = bytess map (jm.readValue(_, classOf[AnyRef]))
      if (ret == null) {
        ret = xs
      }
    }
    ret
  }

  def timeJacksonScala(reps: Int) = {
    var ret: Seq[AnyRef] = null
    val jm = new ObjectMapper()
    jm.registerModule(DefaultScalaModule)
    for (i <- 0 until reps) {
      val xs = bytess map (jm.readValue(_, classOf[AnyRef]))
      if (ret == null) {
        ret = xs
      }
    }
    ret
  }

bytess is a Seq[Array[Byte]] where each byte array is a json object on the order of a few kb. jackson-module-scala 2.2.2 runs roughly the same speed as jackson itself, but 2.2.3 is much slower.

@cowtowncoder
Copy link
Member

Can you verify if 2.3.3 has the same issue?

@gianm
Copy link
Author

gianm commented May 20, 2014

Just tried it, looks like 2.3.3 is slow too.

@gianm
Copy link
Author

gianm commented May 20, 2014

Btw just to be totally clear-- timeJacksonJava runs about the same speed in 2.2.2, 2.2.3, & 2.3.3 (~170MB/s on my machine). timeJacksonScala runs at about 170MB/s as well on 2.2.2, but ~1MB/s on 2.2.3+

@brharrington
Copy link
Contributor

I noticed a similar issue and the cause seems to be the changes for #89. In particular a profile of our application shows the time is being spent calling ctxt.findRootValueDeserializer. For my example, if I set DeserializationFeature.USE_JAVA_ARRAY_FOR_JSON_ARRAY the problem goes away because it bypasses these changes. I also confirmed the problem goes away for me if I build a jar with the changes to mapArray reverted, though a dataset with lots of objects would likely still have problems. A simple test case that illustrates the problem:

  val testJson = {
    val str = new StringBuilder
    str.append("{\"values\":[")
    (0 until 100000).foreach { i =>
      str.append(s"[${i.toDouble}],")
    }
    str.append(s"[${0.0}]")
    str.append("]}")
    str.toString
  }

  def main(args: Array[String]): Unit = {
    (0 until 10).foreach { i =>
      val s = System.nanoTime
      val m = deserialize[Map[String,AnyRef]](testJson)
      val e = System.nanoTime
      println(s"${m.size}, ${(e - s) / 1e6}ms")
    }
  }

If I run with DeserializationFeature.USE_JAVA_ARRAY_FOR_JSON_ARRAY enabled I get:

1, 530.666ms
1, 36.227ms
1, 19.585ms
1, 47.563ms
1, 17.945ms
1, 23.39ms
1, 19.716ms
1, 18.649ms
1, 22.985ms
1, 31.618ms

If I run it without that flag I get:

1, 27709.518ms
1, 26620.834ms
1, 23866.794ms
1, 23579.977ms
1, 23839.36ms
1, 23629.038ms
1, 23505.023ms
1, 23448.91ms
1, 23807.761ms
1, 23553.336ms

@deromka
Copy link

deromka commented Nov 11, 2014

I also noticed significant performance degradation with the jackson scala module library starting from 2.2.3, here are my tests:

Recently I performed performance tests on one of our server side components and saw very significant impact on performance. After some investigations I have managed to isolate the culprit of the performance degradation, it is the Jackson library (for json parsing), here are the results:

In my tests I use some relatively big json string ~11KBytes.
The test performs deserialization from json to object 10000 times, repeats it 10 times and makes an average.

def measure(repeat : Int, times : Int) = {
    var start = 0L
    var end = 0L
    var diff = 0L
    var sum = 0L
    (1 to repeat).foreach { rep =>
      start = System.currentTimeMillis()
      (1 to times).par.foreach { i =>
        val hh = JsonLayer.JSON_MAPPER.readValue(householdString("hh" + i * rep), classOf[Household])
        //val st = JsonLayer.JSON_MAPPER.writeValueAsString(hh)
      }
      end = System.currentTimeMillis()
      diff = (end - start)
      sum += diff
      logger.debug("Round " + rep + " took " + diff + " millis")
    }
    sum
  }

  @Test
  def householdJsonPerformance() =
  {
    val repeat = 10
    val times = 10000
    var start = 0L
    var end = 0L
    var sum  = 0L
    var diff = 0L

    // warmup
    diff = measure(repeat, times)
    logger.debug("Round warm up took " + diff + " millis")

    Measure.time(this.logger, "performance household deserialize/serialize") {
          sum = measure(repeat, times)
      }
    logger.debug("Average per run " + (sum/repeat) + " millis")
  }

Jackson libraries version: 2.4.3

jackson-core-2.4.3.jar jackson-annotations-2.4.3.jar jackson-databind-2.4.3.jar

The only difference between the tests is following jar were updated each time:

jackson-module-scala_2.10-{version}.jar

The results of the test above with different versions of the jars:

2.2.2

 Round 1 took 2921 millis   
 Round 2 took 842 millis   
 Round 3 took 677 millis   
 Round 4 took 597 millis   
 Round 5 took 556 millis   
 Round 6 took 590 millis   
 Round 7 took 697 millis   
 Round 8 took 597 millis   
 Round 9 took 577 millis   
 Round 10 took 548 millis   
 Round warm up took 8602 millis   
 [ Executing performance household deserialize/serialize ...]   
 Round 1 took 474 millis   
 Round 2 took 458 millis   
 Round 3 took 465 millis   
 Round 4 took 493 millis   
 Round 5 took 519 millis   
 Round 6 took 445 millis   
 Round 7 took 446 millis   
 Round 8 took 485 millis   
 Round 9 took 485 millis   
 Round 10 took 456 millis   
 [ performance household deserialize/serialize took 4731 ms ]   
**Average per run 472 millis</strong>  millis**

2.2.3

Round 1 took 8228 millis</br>
Round 2 took 5428 millis</br>   
Round 3 took 3771 millis</br>   
Round 4 took 3798 millis</br>   
Round 5 took 3560 millis</br>   
Round 6 took 3558 millis</br>   
Round 7 took 3847 millis</br>   
Round 8 took 3709 millis</br>   
Round 9 took 3496 millis</br>   
Round 10 took 3836 millis</br>   
Round warm up took 43231 millis   </br>
[ Executing performance household deserialize/serialize ...]   </br>
Round 1 took 5355 millis   </br>
Round 2 took 6018 millis   </br>
Round 3 took 3962 millis   </br>
Round 4 took 3481 millis   </br>
Round 5 took 3821 millis   </br>
Round 6 took 3739 millis   </br>
Round 7 took 3545 millis   </br>
Round 8 took 3594 millis   </br>
Round 9 took 3615 millis   </br>
Round 10 took 3595 millis   </br>
[ performance household deserialize/serialize took 40726 ms ]   </br>
**Average per run 4072 millis**

2.3.3

Round 1 took 8631 millis   
Round 2 took 5342 millis   
Round 3 took 3617 millis   
Round 4 took 3578 millis   
Round 5 took 3633 millis   
Round 6 took 3740 millis   
Round 7 took 3541 millis   
Round 8 took 3833 millis   
Round 9 took 3678 millis   
Round 10 took 3596 millis   
Round warm up took 43189 millis   
[ Executing performance household deserialize/serialize ...]   
Round 1 took 3753 millis   
Round 2 took 3408 millis   
Round 3 took 4728 millis   
Round 4 took 4270 millis   
Round 5 took 3565 millis   
Round 6 took 3521 millis   
Round 7 took 3492 millis   
Round 8 took 3573 millis   
Round 9 took 3959 millis   
Round 10 took 4094 millis   
[ performance household deserialize/serialize took 38365 ms ]   
**Average per run 3836 millis**

2.4.3

 Round 1 took 8331 millis   
 Round 2 took 4903 millis   
 Round 3 took 3568 millis   
 Round 4 took 3545 millis   
 Round 5 took 3494 millis   
 Round 6 took 3597 millis   
 Round 7 took 3637 millis   
 Round 8 took 3629 millis   
 Round 9 took 3612 millis   
 Round 10 took 3550 millis   
 Round warm up took 41866 millis   
 [ Executing performance household deserialize/serialize ...]   
 Round 1 took 3611 millis   
 Round 2 took 3510 millis   
 Round 3 took 3508 millis   
 Round 4 took 3923 millis   
 Round 5 took 4848 millis   
 Round 6 took 4819 millis   
 Round 7 took 3960 millis   
 Round 8 took 3580 millis   
 Round 9 took 4092 millis   
 Round 10 took 4480 millis   
 [ performance household deserialize/serialize took 40332 ms ]   
**Average per run 4033 millis**

To summarize:

There is performance degradation in jackson scala module 2_10 libraries starting from 2.2.3 version (2.2.2 version works fine)

In my tests above version jackson-module-scala_2.10-2.2.2 work ~7.5 times faster that jackson-module-scala_2.10-2.2.3

When this bug will be fixed?

@nbauernfeind
Copy link
Member

@deromka Can you re-run your test using the jar from 2.4.3 but with this module instead of the DefaultScalaModule:

new JacksonModule
     with IteratorModule
     with EnumerationModule
     with OptionModule
     with SeqModule
     with IterableModule
     with TupleModule
     with MapModule
     with SetModule
     with ScalaClassIntrospectorModule
     with UntypedObjectDeserializerModule {
  override def getModuleName = "DefaultScalaModule"
}

Version 2.2.3 added ScalaStdValueInstantiatorsModule which analyzes all methods/fields that are scala getter/setters.

@nbauernfeind
Copy link
Member

@deromka Nevermind. I just did something similar to your test and that module is definitely not to blame.

@deromka
Copy link

deromka commented Nov 11, 2014

No problem, I can run some additional tests

On Tue, Nov 11, 2014 at 5:28 PM, Nate Bauernfeind notifications@github.com
wrote:

@deromka https://github.com/deromka Nevermind. I just did something
similar to your test and that module is definitely not to blame.


Reply to this email directly or view it on GitHub
#148 (comment)
.

@cowtowncoder
Copy link
Member

I would strongly recommend taking a profiler snapshot (with sampling). Given magnitude of deviation, it should be relatively easy what is taking time here.

@cowtowncoder
Copy link
Member

Also: one other thing that may or may not be useful: this benchmark project:

https://github.com/FasterXML/jackson-benchmarks

is something I wrote for 2.4, and have used it for regular Java benchmarking, over most data formats Jackson supports. Perhaps it'd be possible to build a simplified version of problematic data type(s), and add a Scala test; so once we figure out and solve the current issue, we could perhaps guard against regression.

@deromka
Copy link

deromka commented Nov 12, 2014

Hi,

I have done profiling on both versions - 2.2.2 and 2.4.3

2.2.2 screenshot

image

2.4.3 screenshot

image

So as it seen in the hotspots the reflection methods consumes about 50% cpu

java.lang.reflect.Method.getDeclaredAnnotations
java.lang.Class.getInterfaces
java.lang.Class.getDeclaredMethods
java.lang.Class.getDeclaredAnnotations
java.lang.Class.getDeclaredConstructors

These are called from

com.fasterxml.jackson.module.scala.deser.UntypedObjectDeserializer$$anonfun$deserializeAbstractType$1.appy

Please check that

On Tue, Nov 11, 2014 at 7:26 PM, Tatu Saloranta notifications@github.com
wrote:

Also: one other thing that may or may not be useful: this benchmark
project:

https://github.com/FasterXML/jackson-benchmarks

is something I wrote for 2.4, and have used it for regular Java
benchmarking, over most data formats Jackson supports. Perhaps it'd be
possible to build a simplified version of problematic data type(s), and add
a Scala test; so once we figure out and solve the current issue, we could
perhaps guard against regression.


Reply to this email directly or view it on GitHub
#148 (comment)
.

@cowtowncoder
Copy link
Member

Thank you. I think that there are a few things in UntypedObjectDeserializer.scala that could be optimized (starting with per-deser-call construction of helper types); but the biggest thing probably is that it is not currently being cached.

Version 2.4.4 of jackson-databind may help here, actually, as the base implementation (UntypedObjectDeserializer.java) now implemented isCachable() to return true.
But it would probably make sense to override it on Scala sub-class as well, just in case.

If you can build Scala module locally, defining:

    @Override
    public boolean isCachable() { return true; }

(in suitable Scala version)

should help; so if it'd be easy enough to do, you may want to test to see if and how it would change performance.

@christophercurrie
Copy link
Member

@cowtowncoder, with respect to per-deser-call construction of types, how immutable can I expect the results of DeserializationContext#getFactory and DeserializationContext#findRootValueDeserializer to be? Can I cache the results of these calls, do I need to cache them using the DeserializationContext as a key, or some other mechanism?

In general, I've been assuming that since DeserializationContext is passed in to mapArray and mapObject, that the value could be different every time, and thus I could get different results from those calls and can't cache them. If there's more caching I can do of those values, or a way that I could get them from the DeserizalizationConfig and not the DeserializationContext, I'm sure that would help performance here a lot.

@cowtowncoder
Copy link
Member

@christophercurrie Good question. The answer should be considered static, and may be fully cached. This is related to ObjectMapper being "configure-fully-then-use", so that additional registrations or changes after first use are not guaranteed to have any effect.
Dynamic lookups are mostly needed for serializers, although same rule applies there. But type information may be incomplete (static declared type vs actual runtime type).

The main limitation is that deserializers themselves must be single-threaded, so most commonly deserializers resolve dependencies by implementing createContextual(...) method of ContextualDeserializer, and by constructing a new deserializer instance with new/changed deserializers.
The reason for doing it this way instead of trying to locate dependencies during construction (which is difficult for non-core deserializers due to other reasons as well) is that this gives access to annotations for the property, and class declaring property.

@christophercurrie
Copy link
Member

Follow up question: can I rely upon ObjectMapper to synchronize access to my UntypedObjectDeserializer, or do I need to synchronize my caching logic?

@cowtowncoder
Copy link
Member

Depends on what you mean; since deserializers need to be thread-safe after construction (usually by keeping them stateless), no synchronization is needed. The only places where state can be change is constructor and resolve(); call to latter is fully synchronized. createContextual() is not thread-safe, but it should not modify state but rather create a new instance.
That new instance will only be used for particular property (although if no changes are made, original may be shared), but will still be accessed concurrently.

Maybe it would be easier to consider specific case. Are you think of specific implementation here?

christophercurrie added a commit that referenced this issue Nov 25, 2014
@christophercurrie
Copy link
Member

So, resolve is new to me, glad I know it exists now, because it solves exactly the problem I was trying to solve, which is that I need DeserializationContext for some things that shouldn't change in later calls.

That said, I tried to override UntypedObjectDeserializer with the equivalent code for the Scala module. In particular, providing the correct list serializer for the default type in Scala. This doesn't work, because of circular contextualization: the list serializer tries to contextualize its value serializer, but since the value type is Object, it ends up calling createContextual in UntypedObjectDeserializer again, and around we go.

The default implementation for this works because it shortcuts contextualization if the list serializer isn't overridden. The unit test for overriding the list serializer don't fail, because the unit test ListSerializer isn't contextual.

I don't know if this is a bug, or just a fact of life of the UntypedObjectSerializer. But it seems problematic that it can be contextualized before it's resolved. You can clearly see this behavior by applying this patch and running the tests.

The real reason I'm concerned is that I suspect that this shortcutting in contextualization means that overridden deserializers won't be consistently applied; some copies of the contextualized deserializer won't have the overrides, and some will. I don't have a test developed for this yet, but it's on my to-do list.

@christophercurrie christophercurrie added this to the 2.4.5 milestone Nov 26, 2014
@christophercurrie
Copy link
Member

Some work has been done on this issue in the 2.4 branch, which I need to release in the next few days; if any of you have the chance to run your local perf tests on 2.4.4-SNAPSHOT, your feedback would be appreciated.

@christophercurrie
Copy link
Member

FYI, version 2.4.4 was released today. Hopefully it will improve things for you.

@gianm
Copy link
Author

gianm commented Dec 3, 2014

@christophercurrie 2.4.4 runs well in our benchmarks, thanks!

@christophercurrie
Copy link
Member

Thank you! My apologies that this took so long to resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants