Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation]Take your application to production #349

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

elvaliuliuliu
Copy link
Contributor

@elvaliuliuliu elvaliuliuliu commented Nov 23, 2019

Currently, there are some questions asked by customers that how they can run spark dotnet application in different scenarios. This PR gathers most commonly asked scenarios and provides general instructions on how customer can package their applications and submit jobs in such scenarios.

I have closed the previous pr #345 (cause I couldn't edit there), I have moved all the previous comments here so that we can discuss here. Thanks for your understanding and sorry for any inconvenience.

@@ -0,0 +1,85 @@
Taking your .NET for Apache Spark Application to Production
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments from @suhsteve "This title somehow makes me feel like after reading this I will know how to make my spark application production ready. Is this the purpose of this document ? Or just to outline different spark-submit scenarios."

I believe the purpose of this doc should tell users how to move their application to production. I am thinking of putting different scenarios along with the instruction on how to move it to production based on these different scenarios. Any suggestions to make this doc more precise and explicit would be really appreciate.

#### 2. SparkSession code references a function from a Nuget package that has been installed in the csproj
This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj).
#### 3. SparkSession code references a function from a DLL on the user's machine
This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments from @suhsteve :"Why does businessLogic.dll be from a different machine ?"

This would be one of the scenarios, wouldn't it?

Copy link
Member

@suhsteve suhsteve Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make more sense if this scenario description was something like loading assemblies dynamically at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, but I am wondering how can we structure the description of this scenario to be more explainable to users?

#### 2. SparkSession code and business logic in the same project, but different .cs files
This would be the use case when you have `SparkSession` code and business logic (UDFs) in the different .cs files but in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj).

### Package your application
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments from @suhsteve "I don't know if this section is too useful."

I think it will be very helpful if we can put more detailed instruction here. Any suggestions?


## How to deploy your application when you have a single dependency
### Scenarios
#### 1. SparkSession code and business logic in the same Program.cs file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suhsteve I will keep the number here to make it more clear.

===

# Table of Contents
This how-to provides general instructions on how to take your .NET for Apache Spark application to production.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from @bamurtaugh : "What does it mean to take an app to production? Perhaps add a couple words/sentence defining that (does it just mean running on-prem? Deploying to cloud? Building and running spark-submit? CI/CD?)"

Great point! @rapoth Could you please help with elaborating this a little more?

- [Package your application](#package-your-application-1)
- [Launch your application](#launch-your-application-1)

## How to deploy your application when you have a single dependency
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments from @bamurtaugh : "What does single dependency mean? I think it could help users to include a short explanation here or at the top of the document of what a dependency means in the .NET for Spark context."

Actually I am not so sure if we should use single dependency and multiple dependency to define and separate these scenarios. @rapoth and @imback82 any suggestions? Thanks.

docs/take-to-prod.md Outdated Show resolved Hide resolved
docs/take-to-prod.md Outdated Show resolved Hide resolved
--deploy-mode cluster \
--files <some dir>\<dotnet version>\mySparkApp.dll \
<some dir>\<dotnet version>\microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar \
dotnet <some dir>\<dotnet version>\mySparkApp.dll <app arg 1> <app arg 2> ... <app arg n>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dotnet is used in this example, but won't this fail in user scenarios where dotnet may not be available on their cluster ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, but I have put pre-requisites in this example.

Or we can also put an example like mySparkApp args or dotnet mySparkApp.dll args which gives both options depending on cluster environment.

docs/take-to-prod.md Outdated Show resolved Hide resolved
docs/take-to-prod.md Outdated Show resolved Hide resolved
docs/take-to-prod.md Outdated Show resolved Hide resolved
#### 2. SparkSession code references a function from a Nuget package that has been installed in the csproj
This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj).
#### 3. SparkSession code references a function from a DLL on the user's machine
This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine).
Copy link
Member

@suhsteve suhsteve Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make more sense if this scenario description was something like loading assemblies dynamically at runtime.

"archives": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/businessLogics.zip#udfs”, "adl://<cluster name>.azuredatalakestore.net/<some dir>/myLibraries.zip”],
"args": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/mySparkApp.zip","mySparkApp","<app arg 1>","<app arg 2>,"...","<app arg n>"]
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should say somewhere that scenarios 1 and 2 can also be submitted using these submission examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean scenarios 1 and 2 in single dependency session?
They can submit it using the example here, but they have to zip single dll first, would like be more work for user in such case? How about we specify the another example usage of mySparkApp args as I mentioned earlier?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users are manually submitting then yes, but in "production" use, users would most likely automate the packaging and submission. It would be good to know that they don't have to take different steps if they have single vs multiple dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would like to unify these two sections, should we remove the command example in the first section?

I just feel like we need to add more contexts in this whole instruction in general, any suggestions would be really appreciated.

Base automatically changed from master to main March 18, 2021 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants