Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch dotnet backend jvm bridge process ondemand #562

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

laneser
Copy link
Contributor

@laneser laneser commented Jun 22, 2020

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

ref #539

The SparkSession builder pattern could detect the backend jvm bridge is running or not, then launch the jvm bridge process on-demand.

In old way, spark-submit launch jvm bridge, the jvm bridge launch dotnet process.
But that is inconvenient to debug.

The SparkSession builder pattern doesn't need jvm bridge launched first.

To launch a default SparkSession

var spark = SparkSession.Builder().GetOrCreate();

And dotnet run could do the spark job.

To launch a local SparkSession

var spark = SparkSession.Builder().Master("local").GetOrCreate();

No need to setup those config at spark-submit.

But still support the old method, because we could detect if the jvm bridge is started or not.

@imback82
Copy link
Contributor

Thanks @laneser for the PR. However, I am not fully convinced with this approach:

  • This requires another set of documentation: e.g., it will not work if SPARK_HOME is not set.
  • There are objects you can create other than SparkSession which invokes getting the JvmBridge, e.g., SparkConf.

Is there anything we can to update the current doc to make the experience better instead of introducing another option?

I am open for further discussion to make the debugging experience seamless. cc: @suhsteve

@laneser
Copy link
Contributor Author

laneser commented Jun 22, 2020

Thanks for your information!
I will try to setup the dotnet spark jvm bridge through the spark.driver.extraClassPath and let buildSparkSubmitCommand do the same things for dotnet spark.

@laneser laneser closed this Jun 22, 2020
@laneser laneser reopened this Jun 22, 2020
@laneser
Copy link
Contributor Author

laneser commented Jun 22, 2020

I think the JVMBridgeHelper works like

pyspark launch gateway method

https://github.com/apache/spark/blob/feeca63198466640ac461a2a34922493fa6162a8/python/pyspark/context.py#L319

https://github.com/apache/spark/blob/95aec091e4d8a45e648ce84d32d912f585eeb151/python/pyspark/java_gateway.py#L40

The SparkConf could not work if jvm bridge does not exists.

I will try to solve the E2E test fail, hope I could figure out why the JVMBridgeHelper does not work with SparkFixture in the unit test.

Also I agreed that the helper should try to find spark home if SPARK_HOME environement is not defined.

Hoping the dotnet spark listed in the https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession , We could have same experience just like other spark tools.

@laneser
Copy link
Contributor Author

laneser commented Jun 23, 2020

Though the checks have passed, but I have no idea why it was not working before... 😿

Base automatically changed from master to main March 18, 2021 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants