Skip to content

Commit

Permalink
docs: Update cloud deployment doc (#5119)
Browse files Browse the repository at this point in the history
* Update cloud deployment doc

Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>

* Update the messaging

Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>

* Update messaging

Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>

---------

Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
  • Loading branch information
Sherlock113 authored Dec 11, 2024
1 parent ca0a370 commit 4105899
Show file tree
Hide file tree
Showing 4 changed files with 47 additions and 92 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ bentoml cloud login
bentoml deploy .
```

![bentocloud-ui](./docs/source/_static/img/bentocloud/get-started/bentocloud-playground-quickstart.png)
![bentocloud-ui](./docs/source/_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png)

</details>

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
111 changes: 46 additions & 65 deletions docs/source/get-started/cloud-deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,26 @@
Cloud deployment
================

BentoCloud offers serverless infrastructure tailored for AI inference, allowing you to efficiently deploy, manage, and scale any models in the cloud. It operates in conjunction with BentoML to facilitate the easy creation and deployment of high-performance AI API services with custom code. As the original creators of BentoML and its ecosystem tools like OpenLLM, we seek to improve cost efficiency of your inference workload with our
serverless infrastructure optimized for GPUs and fast autoscaling.
BentoCloud is an Inference Management Platform and Compute Orchestration Engine built on top of BentoML's open-source serving framework. It provides a complete stack for building fast and scalable AI systems with any mode, on any cloud.

Specifically, BentoCloud features:
Why developers love BentoCloud:

- Optimized infrastructure for deploying any model, including the latest large language models (LLMs), Stable Diffusion models, and user-customized models built with various ML frameworks.
- Autoscaling with scale-to-zero support so you only pay for what you use.
- Flexible APIs for continuous integration and deployments (CI/CD).
- Built-in observability tools for monitoring model performance and troubleshooting.
- **Flexible Pythonic APIs** for building inference APIs, batch jobs, and compound AI systems
- **Blazing fast cold start** with a container infrastructure stack rebuilt for ML/AI workloads
- Support for **any ML frameworks and inference runtimes** (vLLM, TensorRT, Triton, etc.)
- **Streamlined workflows** across development, testing, deployment, monitoring, and CI/CD

Log in to BentoCloud
--------------------

1. Visit the `BentoML website <https://www.bentoml.com/>`_ to sign up.
2. After your BentoCloud account is approved, install BentoML by running ``pip install bentoml``.
3. Log in to BentoCloud with the ``bentoml cloud login`` command. Follow the on-screen instructions to create a new API token.
2. Install BentoML.

.. code-block:: bash
pip install bentoml
3. Log in to BentoCloud with the ``bentoml cloud login`` command. Follow the on-screen instructions to :ref:`create a new API token <creating-an-api-token>`.

.. code-block:: bash
Expand All @@ -30,90 +34,67 @@ Log in to BentoCloud
Deploy your first model
-----------------------

Perform the following steps to quickly deploy an example application on BentoCloud. It is a summarization service powered by a Transformer model `sshleifer/distilbart-cnn-12-6 <https://huggingface.co/sshleifer/distilbart-cnn-12-6>`_.
Perform the following steps to quickly deploy the :doc:`hello-world` example to BentoCloud.

1. Install the dependencies.
1. Make sure you have already cloned the `project repository <https://github.com/bentoml/quickstart>`_.
2. In the root directory of this project, run ``bentoml deploy``. Optionally, use the ``-n`` flag to set a name.

.. code-block:: bash
pip install bentoml torch transformers
2. Create a BentoML Service in a ``service.py`` file as below. The pre-trained model is pulled from Hugging Face.

.. code-block:: python
from __future__ import annotations
import bentoml
from transformers import pipeline
EXAMPLE_INPUT = "Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century."
bentoml deploy . -n my-first-bento
@bentoml.service(
resources={"cpu": "2"},
traffic={"timeout": 10},
)
class Summarization:
def __init__(self) -> None:
self.pipeline = pipeline('summarization')
3. On the BentoCloud console, navigate to the **Deployments** page, and click your Deployment. Once it's up and running, you can interact with it using the **Form** section on the **Playground** tab.

@bentoml.api
def summarize(self, text: str = EXAMPLE_INPUT) -> str:
result = self.pipeline(text)
return result[0]['summary_text']
.. image:: ../_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png
:alt: A summarization model running on BentoCloud

.. note::

You can test this Service locally by running ``bentoml serve service:Summarization``.

3. Create a ``bentofile.yaml`` file as below.
Call the Deployment endpoint
----------------------------

.. code-block:: yaml
service: 'service:Summarization'
labels:
owner: bentoml-team
project: gallery
include:
- '*.py'
python:
packages:
- torch
- transformers
4. Deploy the application to BentoCloud. The deployment status is displayed both in your terminal and the BentoCloud console.
1. Retrieve the Deployment URL via CLI. Replace ``my-first-bento`` if you use another name.

.. code-block:: bash
bentoml deploy .
bentoml deployment get my-first-bento -o json | jq ."endpoint_urls"
5. On the BentoCloud console, navigate to the **Deployments** page, and click your Deployment. On its details page, you can see the sample input and summarize it with the application once it is up and running.
.. note::

.. image:: ../_static/img/bentocloud/get-started/bentocloud-playground-quickstart.png
Ensure ``jq`` is installed for processing JSON output.

Interact with it using the Form, Python client, or CURL command on the **Playground** tab. Here is an example of creating a Python client to interact with it. Replace the endpoint URL with your own.
2. Create :doc:`a BentoML client </build-with-bentoml/clients>` to call the exposed endpoint. Replace the example URL with your Deployment's URL:

.. code-block:: python
import bentoml
client = bentoml.SyncHTTPClient("https://summarization-example--aws-ca-1.mt1.bentoml.ai")
client = bentoml.SyncHTTPClient("https://my-first-bento-e3c1c7db.mt-guc1.bentoml.ai")
result: str = client.summarize(
text="Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century.",
)
print(result)
6. To terminate this Deployment, click **Stop** in the top right corner of its details page or simply run:
Configure scaling
-----------------

.. code-block:: bash
The replica count defaults to ``1``. You can update the minimum and maximum replicas allowed for scaling:

.. code-block:: bash
bentoml deployment update my-first-bento --scaling-min 0 --scaling-max 3
Cleanup
-------

To terminate this Deployment, click **Stop** in the top right corner of its details page or simply run:

.. code-block:: bash
bentoml deployment terminate summarization
bentoml deployment terminate my-first-bento
Resources
---------
More resources
--------------

If you are a first-time user of BentoCloud, we recommend you read the following documents to get familiar with BentoCloud:
If you are a first-time user of BentoCloud, we recommend you read the following documents to get started:

- Deploy :doc:`example projects </examples/overview>` to BentoCloud
- :doc:`/scale-with-bentocloud/deployment/manage-deployments`
Expand Down
26 changes: 0 additions & 26 deletions docs/source/get-started/deployment.rst

This file was deleted.

0 comments on commit 4105899

Please sign in to comment.