Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24601] Update Jackson to 2.9.6 #21596

Closed
wants to merge 3 commits into from
Closed

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Jun 20, 2018

Hi all,

Jackson is incompatible with upstream versions, therefore bump the Jackson version to a more recent one. I bumped into some issues with Azure CosmosDB that is using a more recent version of Jackson. This can be fixed by adding exclusions and then it works without any issues. So no breaking changes in the API's.

I would also consider bumping the version of Jackson in Spark. I would suggest to keep up to date with the dependencies, since in the future this issue will pop up more frequently.

Jira: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-24601

What changes were proposed in this pull request?

Bump Jackson to 2.9.6

How was this patch tested?

Compiled and tested it locally to see if anything broke.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@HyukjinKwon
Copy link
Member

The tests will be already failed given my tries before and there are many concerns about this. Also see and address the concerns in #9759.

@Fokko
Copy link
Contributor Author

Fokko commented Jun 20, 2018

This was more than a year ago, we should eventually upgrade..

@vanzin
Copy link
Contributor

vanzin commented Jun 20, 2018

I agree we should upgrade, but just changing the pom version will not work. I have an internal patch for this and I needed small changes in a few other places...

@vanzin
Copy link
Contributor

vanzin commented Jun 20, 2018

Which is, btw, a way of saying you should run dev/run-tests locally, at least, when changing a dependency.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jun 21, 2018

I tried this in my local too. I didn't mean that we shouldn't but let's fix the test cases (or main codes) too and address other concerns.

@Fokko
Copy link
Contributor Author

Fokko commented Jun 21, 2018

I could not get the tests working locally. :-) Let me give it another try.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jun 21, 2018

Test build #92166 has finished for PR 21596 at commit ba454b0.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member

MaxGekk commented Jun 21, 2018

Please, make sure that performance doesn't degrade after upgrading Jackson. You can check that by JsonBenchmarks.

@cxzl25
Copy link
Contributor

cxzl25 commented Jun 22, 2018

#20738

Bump jackson from 2.6.7&2.6.7.1 to 2.7.7
Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of the value is in a range
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7
https://github.com/FasterXML/jackson-core/issues/30

spark-shell:

val value = "x" * 3000
val json = s"""{"big": "$value"}"""
spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect
res0: Array[org.apache.spark.sql.Row] = Array([2991])

expect result : 3000
actual result : 2991

@HyukjinKwon
Copy link
Member

@cxzl25, 2.7.x has a regression so we had to revert it back. Please see #9759.

@HyukjinKwon
Copy link
Member

@Fokko, mind adding the test described in #20738 here too?

@Fokko
Copy link
Contributor Author

Fokko commented Jun 22, 2018

Yes, let me add it and run the tests locally

@Fokko Fokko force-pushed the fd-bump-jackson branch from ba454b0 to dfcc039 Compare June 22, 2018 12:11
@Fokko
Copy link
Contributor Author

Fokko commented Jun 22, 2018

I've pushed the changes, the tests are still running locally.

@SparkQA
Copy link

SparkQA commented Jun 22, 2018

Test build #92209 has finished for PR 21596 at commit dfcc039.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Fokko
Copy link
Contributor Author

Fokko commented Jun 22, 2018

Hmm, I see a relevant failing test:

org.scalatest.exceptions.TestFailedException: "...430","name":"scope1"[,"parent":null]}" did not equal "...430","name":"scope1"[]}"

From the test core/src/test/scala/org/apache/spark/rdd/RDDOperationScopeSuite.scala:

  test("json de/serialization") {
    val scope1Json = scope1.toJson
    val scope2Json = scope2.toJson
    val scope3Json = scope3.toJson
    assert(scope1Json === s"""{"id":"${scope1.id}","name":"scope1"}""")
    assert(scope2Json === s"""{"id":"${scope2.id}","name":"scope2","parent":$scope1Json}""")
    assert(scope3Json === s"""{"id":"${scope3.id}","name":"scope3","parent":$scope2Json}""")
    assert(RDDOperationScope.fromJson(scope1Json) === scope1)
    assert(RDDOperationScope.fromJson(scope2Json) === scope2)
    assert(RDDOperationScope.fromJson(scope3Json) === scope3)
  }

It looks like it is inserting the null value ,"parent":null, which was omitted before. I would say that this is rather a design choice, than a bug.

Edit: I agree this field should not be rendered: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala#L44

@Fokko Fokko force-pushed the fd-bump-jackson branch from 0d2e186 to 9107328 Compare June 22, 2018 21:01
@Fokko
Copy link
Contributor Author

Fokko commented Jun 22, 2018

Ok, I've found the issue:

FasterXML/jackson-module-scala#325

Changing this to NON_ABSENT should fix this.

@SparkQA
Copy link

SparkQA commented Jun 22, 2018

Test build #92229 has finished for PR 21596 at commit 9107328.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Fokko Fokko force-pushed the fd-bump-jackson branch from 9107328 to 0cb0ea5 Compare June 22, 2018 21:06
@Fokko
Copy link
Contributor Author

Fokko commented Jun 22, 2018

Oops, a dangling file. Sorry for triggering so many builds guys.

@Fokko Fokko force-pushed the fd-bump-jackson branch from 0cb0ea5 to 1fd1d10 Compare June 22, 2018 21:08
@SparkQA
Copy link

SparkQA commented Jun 23, 2018

Test build #92228 has finished for PR 21596 at commit 0d2e186.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 23, 2018

Test build #92231 has finished for PR 21596 at commit 1fd1d10.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 23, 2018

Test build #92230 has finished for PR 21596 at commit 0cb0ea5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val value = "x" * 3000
checkEvaluation(
GetJsonObject(NonFoldableLiteral((s"""{"big": "$value"}"""))
, NonFoldableLiteral("$.big")), value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ->

checkEvaluation(
  GetJsonObject(NonFoldableLiteral((s"""{"big": "$value"}""")),
  NonFoldableLiteral("$.big")), value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, much nicer indeed

@Fokko
Copy link
Contributor Author

Fokko commented Jun 23, 2018

When looking at the history server, we have a similar issue. From at the list command https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92231/testReport/org.apache.spark.deploy.history/HistoryServerSuite/stage_list_json/ :

I've formatted the json, expected:

[{
	"status": "COMPLETE",
	"stageId": 3,
	"attemptId": 0,
	"numTasks": 8,
	"numActiveTasks": 0,
	"numCompleteTasks": 8,
	"numFailedTasks": 0,
	"numKilledTasks": 0,
	"numCompletedIndices": 8,
	"executorRunTime": 162,
	"executorCpuTime": 0,
	"submissionTime": "2015-02-03T16:43:07.191GMT",
	"firstTaskLaunchedTime": "2015-02-03T16:43:07.191GMT",
	"completionTime": "2015-02-03T16:43:07.226GMT",
	"inputBytes": 160,
	"inputRecords": 0,
	"outputBytes": 0,
	"outputRecords": 0,
	"shuffleReadBytes": 0,
	"shuffleReadRecords": 0,
	"shuffleWriteBytes": 0,
	"shuffleWriteRecords": 0,
	"memoryBytesSpilled": 0,
	"diskBytesSpilled": 0,
	"name": "count at <console>:17",
	"details": "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)\n$line19.$read$$iwC$$iwC$$iwC.<init>(<console>:22)\n$line19.$read$$iwC$$iwC.<init>(<console>:24)\n$line19.$read$$iwC.<init>(<console>:26)\n$line19.$read.<init>(<console>:28)\n$line19.$read$.<init>(<console>:32)\n$line19.$read$.<clinit>(<console>)\n$line19.$eval$.<init>(<console>:7)\n$line19.$eval$.<clinit>(<console>)\n$line19.$eval.$print(<console>)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\norg.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)\norg.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)\norg.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)\norg.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)\norg.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)",
	"schedulingPool": "default",
	"rddIds": [6, 5],
	"accumulatorUpdates": [],
	"killedTasksSummary": {}
}, ... ]

found:

[{
	"status": "COMPLETE",
	"stageId": 3,
	"attemptId": 0,
	"numTasks": 8,
	"numActiveTasks": 0,
	"numCompleteTasks": 8,
	"numFailedTasks": 0,
	"numKilledTasks": 0,
	"numCompletedIndices": 8,
	"executorRunTime": 162,
	"executorCpuTime": 0,
	"submissionTime": "2015-02-03T16:43:07.191GMT",
	"firstTaskLaunchedTime": "2015-02-03T16:43:07.191GMT",
	"completionTime": "2015-02-03T16:43:07.226GMT",
	"failureReason": null,
	"inputBytes": 160,
	"inputRecords": 0,
	"outputBytes": 0,
	"outputRecords": 0,
	"shuffleReadBytes": 0,
	"shuffleReadRecords": 0,
	"shuffleWriteBytes": 0,
	"shuffleWriteRecords": 0,
	"memoryBytesSpilled": 0,
	"diskBytesSpilled": 0,
	"name": "count at <console>:17",
	"description": null,
	"details": "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)\n$line19.$read$$iwC$$iwC$$iwC.<init>(<console>:22)\n$line19.$read$$iwC$$iwC.<init>(<console>:24)\n$line19.$read$$iwC.<init>(<console>:26)\n$line19.$read.<init>(<console>:28)\n$line19.$read$.<init>(<console>:32)\n$line19.$read$.<clinit>(<console>)\n$line19.$eval$.<init>(<console>:7)\n$line19.$eval$.<clinit>(<console>)\n$line19.$eval.$print(<console>)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\norg.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)\norg.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)\norg.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)\norg.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)\norg.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)",
	"schedulingPool": "default",
	"rddIds": [6, 5],
	"accumulatorUpdates": [],
	"tasks": null,
	"executorSummary": null,
	"killedTasksSummary": {}
}, ... ]

We observe that the "failureReason": null, is listed, while it should have been omitted.

@HyukjinKwon
Copy link
Member

@Fokko, btw, I believe #21596 (comment) is a valid comment. I think it would be nicer if we execute this benchmark and update the results in the file, and push it into here.

@HyukjinKwon
Copy link
Member

BTW, I believe it's okay to use Jenkins resource if that's a faster way to resolve this issue in any event.

@Fokko
Copy link
Contributor Author

Fokko commented Sep 20, 2018

@HyukjinKwon Any idea if this has any change of getting merged? The 2.4 branch has been cut a while ago.

The number of merge conflicts are minimal in all the times I've rebased onto master. I think this should be done some time to bump the Jackson version, since old version of Jackson aren't compatible with the newer ones. Please let me know.

@SparkQA
Copy link

SparkQA commented Sep 20, 2018

Test build #96330 has finished for PR 21596 at commit 44b8d1b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I think we're (at least i am) not clear on 2.5 vs 3.0 now (correct me if im mistaken). The change itself looks okay but I think it's targeted to 3.0.0 as discussed. I can take an action here when I'm clear on this.

@SparkQA
Copy link

SparkQA commented Sep 20, 2018

Test build #96335 has finished for PR 21596 at commit 0b0b8fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96377 has finished for PR 21596 at commit f4d294c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96413 has finished for PR 21596 at commit 2bab06f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Fokko Driesprong added 3 commits October 3, 2018 23:12
So we keep the results in the git commit history and we can do an
easy comparison
Jackson is incompatible with upstream versions, therefore bump
the Jackson version to a more recent one.

Behaviour has changed since 2.7.x that None is not considered null
anymore: FasterXML/jackson-module-scala#325

This will only include value that indicates that only properties with
null value, or what is considered empty, are not to be included.
https://fasterxml.github.io/jackson-annotations/javadoc/2.7/com/fasterxml/jackson/annotation/JsonInclude.Include.html

Change the flags to NON_ABSENT
Since:
scala> None == null
res0: Boolean = false

We need to implement the isEmpty function to verify if the Option[_]
is Empty or not.
@Fokko
Copy link
Contributor Author

Fokko commented Oct 4, 2018

Rebased onto master. @HyukjinKwon can we target this for 2.5?

@SparkQA
Copy link

SparkQA commented Oct 4, 2018

Test build #96926 has finished for PR 21596 at commit b53c8f2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Oct 4, 2018

Test build #96928 has finished for PR 21596 at commit b53c8f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in ab1650d Oct 5, 2018
asfgit pushed a commit that referenced this pull request Oct 26, 2018
## What changes were proposed in this pull request?

In #21596, Jackson is upgraded to 2.9.6.
There are some deprecated API warnings in SQLListener.
Create a trivial PR to fix them.

```
[warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information.
[warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
[warn]
[warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information.
[warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(objectType, objectType))
[warn]
[warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information.
[warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long])
[warn]
[warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information.
[warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(longType, longType))
```

## How was this patch tested?

Existing unit tests.

Closes #22848 from gengliangwang/fixSQLListenerWarning.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
@Fokko Fokko deleted the fd-bump-jackson branch October 28, 2018 07:10
@gatorsmile
Copy link
Member

It sounds like Jackson have an important behavior change that might impact us.

import com.fasterxml.jackson.annotation.JsonInclude.Include
import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper}
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper

val mapper = new ObjectMapper with ScalaObjectMapper
mapper.setSerializationInclusion(Include.NON_NULL)
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
mapper.registerModule(DefaultScalaModule)

println(mapper.writeValueAsString(Map("abc" -> null)))

Previously, it outputs {"abc":null}, but now it is {}, unless we explicitly set it to Include.ALWAYS. The upgrade looks risky to me.

@Fokko
Copy link
Contributor Author

Fokko commented Nov 5, 2018

@gatorsmile Are you completely sure about that? The Spark REST API null values are always omitted. Where is this behaviour different?

mapper.setSerializationInclusion(Include.NON_ABSENT):

scala> println(mapper.writeValueAsString(Map("abc" -> null)))
{}

mapper.setSerializationInclusion(Include.ALWAYS):

scala> println(mapper.writeValueAsString(Map("abc" -> null)))
{"abc":null}

If you're doing proper Scala and try to avoid the null's in general, with None in combination with mapper.setSerializationInclusion(Include.NON_ABSENT) you'll get:

scala> println(mapper.writeValueAsString(Map("abc" -> None)))
{"abc":null}

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
Hi all,

Jackson is incompatible with upstream versions, therefore bump the Jackson version to a more recent one. I bumped into some issues with Azure CosmosDB that is using a more recent version of Jackson. This can be fixed by adding exclusions and then it works without any issues. So no breaking changes in the API's.

I would also consider bumping the version of Jackson in Spark. I would suggest to keep up to date with the dependencies, since in the future this issue will pop up more frequently.

## What changes were proposed in this pull request?

Bump Jackson to 2.9.6

## How was this patch tested?

Compiled and tested it locally to see if anything broke.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes apache#21596 from Fokko/fd-bump-jackson.

Authored-by: Fokko Driesprong <fokkodriesprong@godatadriven.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

In apache#21596, Jackson is upgraded to 2.9.6.
There are some deprecated API warnings in SQLListener.
Create a trivial PR to fix them.

```
[warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information.
[warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
[warn]
[warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information.
[warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(objectType, objectType))
[warn]
[warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information.
[warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long])
[warn]
[warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information.
[warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(longType, longType))
```

## How was this patch tested?

Existing unit tests.

Closes apache#22848 from gengliangwang/fixSQLListenerWarning.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.