-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24601] Update Jackson to 2.9.6 #21596
Conversation
The tests will be already failed given my tries before and there are many concerns about this. Also see and address the concerns in #9759. |
This was more than a year ago, we should eventually upgrade.. |
I agree we should upgrade, but just changing the pom version will not work. I have an internal patch for this and I needed small changes in a few other places... |
Which is, btw, a way of saying you should run |
I tried this in my local too. I didn't mean that we shouldn't but let's fix the test cases (or main codes) too and address other concerns. |
I could not get the tests working locally. :-) Let me give it another try. |
ok to test |
Test build #92166 has finished for PR 21596 at commit
|
Please, make sure that performance doesn't degrade after upgrading Jackson. You can check that by JsonBenchmarks. |
Bump jackson from 2.6.7&2.6.7.1 to 2.7.7 spark-shell:
expect result : 3000 |
Yes, let me add it and run the tests locally |
I've pushed the changes, the tests are still running locally. |
Test build #92209 has finished for PR 21596 at commit
|
Hmm, I see a relevant failing test:
From the test
It looks like it is inserting the null value Edit: I agree this field should not be rendered: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala#L44 |
Ok, I've found the issue: FasterXML/jackson-module-scala#325 Changing this to |
Test build #92229 has finished for PR 21596 at commit
|
Oops, a dangling file. Sorry for triggering so many builds guys. |
Test build #92228 has finished for PR 21596 at commit
|
Test build #92231 has finished for PR 21596 at commit
|
Test build #92230 has finished for PR 21596 at commit
|
val value = "x" * 3000 | ||
checkEvaluation( | ||
GetJsonObject(NonFoldableLiteral((s"""{"big": "$value"}""")) | ||
, NonFoldableLiteral("$.big")), value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ->
checkEvaluation(
GetJsonObject(NonFoldableLiteral((s"""{"big": "$value"}""")),
NonFoldableLiteral("$.big")), value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, much nicer indeed
When looking at the history server, we have a similar issue. From at the list command https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92231/testReport/org.apache.spark.deploy.history/HistoryServerSuite/stage_list_json/ : I've formatted the json, expected: [{
"status": "COMPLETE",
"stageId": 3,
"attemptId": 0,
"numTasks": 8,
"numActiveTasks": 0,
"numCompleteTasks": 8,
"numFailedTasks": 0,
"numKilledTasks": 0,
"numCompletedIndices": 8,
"executorRunTime": 162,
"executorCpuTime": 0,
"submissionTime": "2015-02-03T16:43:07.191GMT",
"firstTaskLaunchedTime": "2015-02-03T16:43:07.191GMT",
"completionTime": "2015-02-03T16:43:07.226GMT",
"inputBytes": 160,
"inputRecords": 0,
"outputBytes": 0,
"outputRecords": 0,
"shuffleReadBytes": 0,
"shuffleReadRecords": 0,
"shuffleWriteBytes": 0,
"shuffleWriteRecords": 0,
"memoryBytesSpilled": 0,
"diskBytesSpilled": 0,
"name": "count at <console>:17",
"details": "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)\n$line19.$read$$iwC$$iwC$$iwC.<init>(<console>:22)\n$line19.$read$$iwC$$iwC.<init>(<console>:24)\n$line19.$read$$iwC.<init>(<console>:26)\n$line19.$read.<init>(<console>:28)\n$line19.$read$.<init>(<console>:32)\n$line19.$read$.<clinit>(<console>)\n$line19.$eval$.<init>(<console>:7)\n$line19.$eval$.<clinit>(<console>)\n$line19.$eval.$print(<console>)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\norg.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)\norg.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)\norg.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)\norg.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)\norg.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)",
"schedulingPool": "default",
"rddIds": [6, 5],
"accumulatorUpdates": [],
"killedTasksSummary": {}
}, ... ] found: [{
"status": "COMPLETE",
"stageId": 3,
"attemptId": 0,
"numTasks": 8,
"numActiveTasks": 0,
"numCompleteTasks": 8,
"numFailedTasks": 0,
"numKilledTasks": 0,
"numCompletedIndices": 8,
"executorRunTime": 162,
"executorCpuTime": 0,
"submissionTime": "2015-02-03T16:43:07.191GMT",
"firstTaskLaunchedTime": "2015-02-03T16:43:07.191GMT",
"completionTime": "2015-02-03T16:43:07.226GMT",
"failureReason": null,
"inputBytes": 160,
"inputRecords": 0,
"outputBytes": 0,
"outputRecords": 0,
"shuffleReadBytes": 0,
"shuffleReadRecords": 0,
"shuffleWriteBytes": 0,
"shuffleWriteRecords": 0,
"memoryBytesSpilled": 0,
"diskBytesSpilled": 0,
"name": "count at <console>:17",
"description": null,
"details": "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)\n$line19.$read$$iwC$$iwC$$iwC.<init>(<console>:22)\n$line19.$read$$iwC$$iwC.<init>(<console>:24)\n$line19.$read$$iwC.<init>(<console>:26)\n$line19.$read.<init>(<console>:28)\n$line19.$read$.<init>(<console>:32)\n$line19.$read$.<clinit>(<console>)\n$line19.$eval$.<init>(<console>:7)\n$line19.$eval$.<clinit>(<console>)\n$line19.$eval.$print(<console>)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\norg.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)\norg.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)\norg.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)\norg.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)\norg.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)",
"schedulingPool": "default",
"rddIds": [6, 5],
"accumulatorUpdates": [],
"tasks": null,
"executorSummary": null,
"killedTasksSummary": {}
}, ... ] We observe that the |
@Fokko, btw, I believe #21596 (comment) is a valid comment. I think it would be nicer if we execute this benchmark and update the results in the file, and push it into here. |
BTW, I believe it's okay to use Jenkins resource if that's a faster way to resolve this issue in any event. |
@HyukjinKwon Any idea if this has any change of getting merged? The 2.4 branch has been cut a while ago. The number of merge conflicts are minimal in all the times I've rebased onto master. I think this should be done some time to bump the Jackson version, since old version of Jackson aren't compatible with the newer ones. Please let me know. |
Test build #96330 has finished for PR 21596 at commit
|
44b8d1b
to
0b0b8fc
Compare
I think we're (at least i am) not clear on 2.5 vs 3.0 now (correct me if im mistaken). The change itself looks okay but I think it's targeted to 3.0.0 as discussed. I can take an action here when I'm clear on this. |
Test build #96335 has finished for PR 21596 at commit
|
0b0b8fc
to
f4d294c
Compare
Test build #96377 has finished for PR 21596 at commit
|
f4d294c
to
2bab06f
Compare
Test build #96413 has finished for PR 21596 at commit
|
So we keep the results in the git commit history and we can do an easy comparison
Jackson is incompatible with upstream versions, therefore bump the Jackson version to a more recent one. Behaviour has changed since 2.7.x that None is not considered null anymore: FasterXML/jackson-module-scala#325 This will only include value that indicates that only properties with null value, or what is considered empty, are not to be included. https://fasterxml.github.io/jackson-annotations/javadoc/2.7/com/fasterxml/jackson/annotation/JsonInclude.Include.html Change the flags to NON_ABSENT
Since: scala> None == null res0: Boolean = false We need to implement the isEmpty function to verify if the Option[_] is Empty or not.
Rebased onto master. @HyukjinKwon can we target this for 2.5? |
Test build #96926 has finished for PR 21596 at commit
|
retest this please |
Test build #96928 has finished for PR 21596 at commit
|
Merged to master. |
## What changes were proposed in this pull request? In #21596, Jackson is upgraded to 2.9.6. There are some deprecated API warnings in SQLListener. Create a trivial PR to fix them. ``` [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) [warn] [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(objectType, objectType)) [warn] [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long]) [warn] [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(longType, longType)) ``` ## How was this patch tested? Existing unit tests. Closes #22848 from gengliangwang/fixSQLListenerWarning. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
It sounds like Jackson have an important behavior change that might impact us. import com.fasterxml.jackson.annotation.JsonInclude.Include
import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper}
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
val mapper = new ObjectMapper with ScalaObjectMapper
mapper.setSerializationInclusion(Include.NON_NULL)
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
mapper.registerModule(DefaultScalaModule)
println(mapper.writeValueAsString(Map("abc" -> null))) Previously, it outputs |
@gatorsmile Are you completely sure about that? The Spark REST API
If you're doing proper Scala and try to avoid the
|
Hi all, Jackson is incompatible with upstream versions, therefore bump the Jackson version to a more recent one. I bumped into some issues with Azure CosmosDB that is using a more recent version of Jackson. This can be fixed by adding exclusions and then it works without any issues. So no breaking changes in the API's. I would also consider bumping the version of Jackson in Spark. I would suggest to keep up to date with the dependencies, since in the future this issue will pop up more frequently. ## What changes were proposed in this pull request? Bump Jackson to 2.9.6 ## How was this patch tested? Compiled and tested it locally to see if anything broke. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#21596 from Fokko/fd-bump-jackson. Authored-by: Fokko Driesprong <fokkodriesprong@godatadriven.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>
## What changes were proposed in this pull request? In apache#21596, Jackson is upgraded to 2.9.6. There are some deprecated API warnings in SQLListener. Create a trivial PR to fix them. ``` [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) [warn] [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(objectType, objectType)) [warn] [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long]) [warn] [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(longType, longType)) ``` ## How was this patch tested? Existing unit tests. Closes apache#22848 from gengliangwang/fixSQLListenerWarning. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
Hi all,
Jackson is incompatible with upstream versions, therefore bump the Jackson version to a more recent one. I bumped into some issues with Azure CosmosDB that is using a more recent version of Jackson. This can be fixed by adding exclusions and then it works without any issues. So no breaking changes in the API's.
I would also consider bumping the version of Jackson in Spark. I would suggest to keep up to date with the dependencies, since in the future this issue will pop up more frequently.
Jira: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-24601
What changes were proposed in this pull request?
Bump Jackson to 2.9.6
How was this patch tested?
Compiled and tested it locally to see if anything broke.
Please review http://spark.apache.org/contributing.html before opening a pull request.