docs: Update cloud deployment doc (#5119)

* Update cloud deployment doc Signed-off-by: Sherlock113 <sherlockxu07@gmail.com> * Update the messaging Signed-off-by: Sherlock113 <sherlockxu07@gmail.com> * Update messaging Signed-off-by: Sherlock113 <sherlockxu07@gmail.com> --------- Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
bentoml · Dec 11, 2024 · 4105899 · 4105899
1 parent ca0a370
commit 4105899
Show file tree

Hide file tree

Showing 4 changed files with 47 additions and 92 deletions.
diff --git a/README.md b/README.md
@@ -133,7 +133,7 @@ bentoml cloud login
 bentoml deploy .
 ```
 
-![bentocloud-ui](./docs/source/_static/img/bentocloud/get-started/bentocloud-playground-quickstart.png)
+![bentocloud-ui](./docs/source/_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png)
 
 </details>
 

diff --git a/docs/source/_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png b/docs/source/_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png
diff --git a/docs/source/get-started/cloud-deployment.rst b/docs/source/get-started/cloud-deployment.rst
@@ -2,22 +2,26 @@
 Cloud deployment
 ================
 
-BentoCloud offers serverless infrastructure tailored for AI inference, allowing you to efficiently deploy, manage, and scale any models in the cloud. It operates in conjunction with BentoML to facilitate the easy creation and deployment of high-performance AI API services with custom code. As the original creators of BentoML and its ecosystem tools like OpenLLM, we seek to improve cost efficiency of your inference workload with our
-serverless infrastructure optimized for GPUs and fast autoscaling.
+BentoCloud is an Inference Management Platform and Compute Orchestration Engine built on top of BentoML's open-source serving framework. It provides a complete stack for building fast and scalable AI systems with any mode, on any cloud.
 
-Specifically, BentoCloud features:
+Why developers love BentoCloud:
 
-- Optimized infrastructure for deploying any model, including the latest large language models (LLMs), Stable Diffusion models, and user-customized models built with various ML frameworks.
-- Autoscaling with scale-to-zero support so you only pay for what you use.
-- Flexible APIs for continuous integration and deployments (CI/CD).
-- Built-in observability tools for monitoring model performance and troubleshooting.
+- **Flexible Pythonic APIs** for building inference APIs, batch jobs, and compound AI systems
+- **Blazing fast cold start** with a container infrastructure stack rebuilt for ML/AI workloads
+- Support for **any ML frameworks and inference runtimes** (vLLM, TensorRT, Triton, etc.)
+- **Streamlined workflows** across development, testing, deployment, monitoring, and CI/CD
 
 Log in to BentoCloud
 --------------------
 
 1. Visit the `BentoML website <https://www.bentoml.com/>`_ to sign up.
-2. After your BentoCloud account is approved, install BentoML by running ``pip install bentoml``.
-3. Log in to BentoCloud with the ``bentoml cloud login`` command. Follow the on-screen instructions to create a new API token.
+2. Install BentoML.
+
+   .. code-block:: bash
+
+      pip install bentoml
+
+3. Log in to BentoCloud with the ``bentoml cloud login`` command. Follow the on-screen instructions to :ref:`create a new API token <creating-an-api-token>`.
 
    .. code-block:: bash
 
@@ -30,90 +34,67 @@ Log in to BentoCloud
 Deploy your first model
 -----------------------
 
-Perform the following steps to quickly deploy an example application on BentoCloud. It is a summarization service powered by a Transformer model `sshleifer/distilbart-cnn-12-6 <https://huggingface.co/sshleifer/distilbart-cnn-12-6>`_.
+Perform the following steps to quickly deploy the :doc:`hello-world` example to BentoCloud.
 
-1. Install the dependencies.
+1. Make sure you have already cloned the `project repository <https://github.com/bentoml/quickstart>`_.
+2. In the root directory of this project, run ``bentoml deploy``. Optionally, use the ``-n`` flag to set a name.
 
    .. code-block:: bash
 
-      pip install bentoml torch transformers
-
-2. Create a BentoML Service in a ``service.py`` file as below. The pre-trained model is pulled from Hugging Face.
-
-   .. code-block:: python
-
-      from __future__ import annotations
-      import bentoml
-      from transformers import pipeline
-
-
-      EXAMPLE_INPUT = "Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century."
-
+      bentoml deploy . -n my-first-bento
 
-      @bentoml.service(
-          resources={"cpu": "2"},
-          traffic={"timeout": 10},
-      )
-      class Summarization:
-          def __init__(self) -> None:
-              self.pipeline = pipeline('summarization')
+3. On the BentoCloud console, navigate to the **Deployments** page, and click your Deployment. Once it's up and running, you can interact with it using the **Form** section on the **Playground** tab.
 
-          @bentoml.api
-          def summarize(self, text: str = EXAMPLE_INPUT) -> str:
-              result = self.pipeline(text)
-              return result[0]['summary_text']
+   .. image:: ../_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png
+      :alt: A summarization model running on BentoCloud
 
-   .. note::
-
-      You can test this Service locally by running ``bentoml serve service:Summarization``.
-
-3. Create a ``bentofile.yaml`` file as below.
+Call the Deployment endpoint
+----------------------------
 
-   .. code-block:: yaml
-
-        service: 'service:Summarization'
-        labels:
-          owner: bentoml-team
-          project: gallery
-        include:
-        - '*.py'
-        python:
-          packages:
-            - torch
-            - transformers
-
-4. Deploy the application to BentoCloud. The deployment status is displayed both in your terminal and the BentoCloud console.
+1. Retrieve the Deployment URL via CLI. Replace ``my-first-bento`` if you use another name.
 
    .. code-block:: bash
 
-      bentoml deploy .
+      bentoml deployment get my-first-bento -o json | jq ."endpoint_urls"
 
-5. On the BentoCloud console, navigate to the **Deployments** page, and click your Deployment. On its details page, you can see the sample input and summarize it with the application once it is up and running.
+   .. note::
 
-   .. image:: ../_static/img/bentocloud/get-started/bentocloud-playground-quickstart.png
+      Ensure ``jq`` is installed for processing JSON output.
 
-   Interact with it using the Form, Python client, or CURL command on the **Playground** tab. Here is an example of creating a Python client to interact with it. Replace the endpoint URL with your own.
+2. Create :doc:`a BentoML client </build-with-bentoml/clients>` to call the exposed endpoint. Replace the example URL with your Deployment's URL:
 
    .. code-block:: python
 
       import bentoml
 
-      client = bentoml.SyncHTTPClient("https://summarization-example--aws-ca-1.mt1.bentoml.ai")
+      client = bentoml.SyncHTTPClient("https://my-first-bento-e3c1c7db.mt-guc1.bentoml.ai")
       result: str = client.summarize(
             text="Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century.",
          )
       print(result)
 
-6. To terminate this Deployment, click **Stop** in the top right corner of its details page or simply run:
+Configure scaling
+-----------------
 
-   .. code-block:: bash
+The replica count defaults to ``1``. You can update the minimum and maximum replicas allowed for scaling:
+
+.. code-block:: bash
+
+   bentoml deployment update my-first-bento --scaling-min 0 --scaling-max 3
+
+Cleanup
+-------
+
+To terminate this Deployment, click **Stop** in the top right corner of its details page or simply run:
+
+.. code-block:: bash
 
-      bentoml deployment terminate summarization
+   bentoml deployment terminate my-first-bento
 
-Resources
----------
+More resources
+--------------
 
-If you are a first-time user of BentoCloud, we recommend you read the following documents to get familiar with BentoCloud:
+If you are a first-time user of BentoCloud, we recommend you read the following documents to get started:
 
 - Deploy :doc:`example projects </examples/overview>` to BentoCloud
 - :doc:`/scale-with-bentocloud/deployment/manage-deployments`

diff --git a/docs/source/get-started/deployment.rst b/docs/source/get-started/deployment.rst