-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Documentation]Take your application to production #349
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,85 @@ | |||
Taking your .NET for Apache Spark Application to Production |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments from @suhsteve "This title somehow makes me feel like after reading this I will know how to make my spark application production ready. Is this the purpose of this document ? Or just to outline different spark-submit scenarios."
I believe the purpose of this doc should tell users how to move their application to production. I am thinking of putting different scenarios along with the instruction on how to move it to production based on these different scenarios. Any suggestions to make this doc more precise and explicit would be really appreciate.
docs/take-to-prod.md
Outdated
#### 2. SparkSession code references a function from a Nuget package that has been installed in the csproj | ||
This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj). | ||
#### 3. SparkSession code references a function from a DLL on the user's machine | ||
This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments from @suhsteve :"Why does businessLogic.dll be from a different machine ?"
This would be one of the scenarios, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make more sense if this scenario description was something like loading assemblies dynamically at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, but I am wondering how can we structure the description of this scenario to be more explainable to users?
#### 2. SparkSession code and business logic in the same project, but different .cs files | ||
This would be the use case when you have `SparkSession` code and business logic (UDFs) in the different .cs files but in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). | ||
|
||
### Package your application |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments from @suhsteve "I don't know if this section is too useful."
I think it will be very helpful if we can put more detailed instruction here. Any suggestions?
|
||
## How to deploy your application when you have a single dependency | ||
### Scenarios | ||
#### 1. SparkSession code and business logic in the same Program.cs file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suhsteve I will keep the number here to make it more clear.
=== | ||
|
||
# Table of Contents | ||
This how-to provides general instructions on how to take your .NET for Apache Spark application to production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from @bamurtaugh : "What does it mean to take an app to production? Perhaps add a couple words/sentence defining that (does it just mean running on-prem? Deploying to cloud? Building and running spark-submit
? CI/CD?)"
Great point! @rapoth Could you please help with elaborating this a little more?
- [Package your application](#package-your-application-1) | ||
- [Launch your application](#launch-your-application-1) | ||
|
||
## How to deploy your application when you have a single dependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments from @bamurtaugh : "What does single dependency mean? I think it could help users to include a short explanation here or at the top of the document of what a dependency means in the .NET for Spark context."
Actually I am not so sure if we should use single dependency and multiple dependency to define and separate these scenarios. @rapoth and @imback82 any suggestions? Thanks.
--deploy-mode cluster \ | ||
--files <some dir>\<dotnet version>\mySparkApp.dll \ | ||
<some dir>\<dotnet version>\microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar \ | ||
dotnet <some dir>\<dotnet version>\mySparkApp.dll <app arg 1> <app arg 2> ... <app arg n> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dotnet
is used in this example, but won't this fail in user scenarios where dotnet
may not be available on their cluster ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, but I have put pre-requisites in this example.
Or we can also put an example like mySparkApp args
or dotnet mySparkApp.dll args
which gives both options depending on cluster environment.
docs/take-to-prod.md
Outdated
#### 2. SparkSession code references a function from a Nuget package that has been installed in the csproj | ||
This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj). | ||
#### 3. SparkSession code references a function from a DLL on the user's machine | ||
This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make more sense if this scenario description was something like loading assemblies dynamically at runtime.
"archives": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/businessLogics.zip#udfs”, "adl://<cluster name>.azuredatalakestore.net/<some dir>/myLibraries.zip”], | ||
"args": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/mySparkApp.zip","mySparkApp","<app arg 1>","<app arg 2>,"...","<app arg n>"] | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should say somewhere that scenarios 1 and 2 can also be submitted using these submission examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean scenarios 1 and 2 in single dependency session?
They can submit it using the example here, but they have to zip single dll first, would like be more work for user in such case? How about we specify the another example usage of mySparkApp args
as I mentioned earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If users are manually submitting then yes, but in "production" use, users would most likely automate the packaging and submission. It would be good to know that they don't have to take different steps if they have single vs multiple dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would like to unify these two sections, should we remove the command example in the first section?
I just feel like we need to add more contexts in this whole instruction in general, any suggestions would be really appreciated.
Currently, there are some questions asked by customers that how they can run spark dotnet application in different scenarios. This PR gathers most commonly asked scenarios and provides general instructions on how customer can package their applications and submit jobs in such scenarios.
I have closed the previous pr #345 (cause I couldn't edit there), I have moved all the previous comments here so that we can discuss here. Thanks for your understanding and sorry for any inconvenience.