Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update cloud deployment doc #5119

Merged
merged 3 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ bentoml cloud login
bentoml deploy .
```

![bentocloud-ui](./docs/source/_static/img/bentocloud/get-started/bentocloud-playground-quickstart.png)
![bentocloud-ui](./docs/source/_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png)

</details>

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
111 changes: 46 additions & 65 deletions docs/source/get-started/cloud-deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,26 @@
Cloud deployment
================

BentoCloud offers serverless infrastructure tailored for AI inference, allowing you to efficiently deploy, manage, and scale any models in the cloud. It operates in conjunction with BentoML to facilitate the easy creation and deployment of high-performance AI API services with custom code. As the original creators of BentoML and its ecosystem tools like OpenLLM, we seek to improve cost efficiency of your inference workload with our
serverless infrastructure optimized for GPUs and fast autoscaling.
BentoCloud is an Inference Management Platform and Compute Orchestration Engine built on top of BentoML's open-source serving framework. It provides a complete stack for building fast and scalable AI systems with any mode, on any cloud.

Specifically, BentoCloud features:
Why developers love BentoCloud:

- Optimized infrastructure for deploying any model, including the latest large language models (LLMs), Stable Diffusion models, and user-customized models built with various ML frameworks.
- Autoscaling with scale-to-zero support so you only pay for what you use.
- Flexible APIs for continuous integration and deployments (CI/CD).
- Built-in observability tools for monitoring model performance and troubleshooting.
- **Flexible Pythonic APIs** for building inference APIs, batch jobs, and compound AI systems
- **Blazing fast cold start** with a container infrastructure stack rebuilt for ML/AI workloads
- Support for **any ML frameworks and inference runtimes** (vLLM, TensorRT, Triton, etc.)
- **Streamlined workflows** across development, testing, deployment, monitoring, and CI/CD

Log in to BentoCloud
--------------------

1. Visit the `BentoML website <https://www.bentoml.com/>`_ to sign up.
2. After your BentoCloud account is approved, install BentoML by running ``pip install bentoml``.
3. Log in to BentoCloud with the ``bentoml cloud login`` command. Follow the on-screen instructions to create a new API token.
2. Install BentoML.

.. code-block:: bash

pip install bentoml

3. Log in to BentoCloud with the ``bentoml cloud login`` command. Follow the on-screen instructions to :ref:`create a new API token <creating-an-api-token>`.

.. code-block:: bash

Expand All @@ -30,90 +34,67 @@ Log in to BentoCloud
Deploy your first model
-----------------------

Perform the following steps to quickly deploy an example application on BentoCloud. It is a summarization service powered by a Transformer model `sshleifer/distilbart-cnn-12-6 <https://huggingface.co/sshleifer/distilbart-cnn-12-6>`_.
Perform the following steps to quickly deploy the :doc:`hello-world` example to BentoCloud.

1. Install the dependencies.
1. Make sure you have already cloned the `project repository <https://github.com/bentoml/quickstart>`_.
2. In the root directory of this project, run ``bentoml deploy``. Optionally, use the ``-n`` flag to set a name.

.. code-block:: bash

pip install bentoml torch transformers

2. Create a BentoML Service in a ``service.py`` file as below. The pre-trained model is pulled from Hugging Face.

.. code-block:: python

from __future__ import annotations
import bentoml
from transformers import pipeline


EXAMPLE_INPUT = "Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century."

bentoml deploy . -n my-first-bento

@bentoml.service(
resources={"cpu": "2"},
traffic={"timeout": 10},
)
class Summarization:
def __init__(self) -> None:
self.pipeline = pipeline('summarization')
3. On the BentoCloud console, navigate to the **Deployments** page, and click your Deployment. Once it's up and running, you can interact with it using the **Form** section on the **Playground** tab.

@bentoml.api
def summarize(self, text: str = EXAMPLE_INPUT) -> str:
result = self.pipeline(text)
return result[0]['summary_text']
.. image:: ../_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png
:alt: A summarization model running on BentoCloud

.. note::

You can test this Service locally by running ``bentoml serve service:Summarization``.

3. Create a ``bentofile.yaml`` file as below.
Call the Deployment endpoint
----------------------------

.. code-block:: yaml

service: 'service:Summarization'
labels:
owner: bentoml-team
project: gallery
include:
- '*.py'
python:
packages:
- torch
- transformers

4. Deploy the application to BentoCloud. The deployment status is displayed both in your terminal and the BentoCloud console.
1. Retrieve the Deployment URL via CLI. Replace ``my-first-bento`` if you use another name.

.. code-block:: bash

bentoml deploy .
bentoml deployment get my-first-bento -o json | jq ."endpoint_urls"

5. On the BentoCloud console, navigate to the **Deployments** page, and click your Deployment. On its details page, you can see the sample input and summarize it with the application once it is up and running.
.. note::

.. image:: ../_static/img/bentocloud/get-started/bentocloud-playground-quickstart.png
Ensure ``jq`` is installed for processing JSON output.

Interact with it using the Form, Python client, or CURL command on the **Playground** tab. Here is an example of creating a Python client to interact with it. Replace the endpoint URL with your own.
2. Create :doc:`a BentoML client </build-with-bentoml/clients>` to call the exposed endpoint. Replace the example URL with your Deployment's URL:
Sherlock113 marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python

import bentoml

client = bentoml.SyncHTTPClient("https://summarization-example--aws-ca-1.mt1.bentoml.ai")
client = bentoml.SyncHTTPClient("https://my-first-bento-e3c1c7db.mt-guc1.bentoml.ai")
result: str = client.summarize(
text="Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century.",
)
print(result)

6. To terminate this Deployment, click **Stop** in the top right corner of its details page or simply run:
Configure scaling
-----------------

.. code-block:: bash
The replica count defaults to ``1``. You can update the minimum and maximum replicas allowed for scaling:

.. code-block:: bash

bentoml deployment update my-first-bento --scaling-min 0 --scaling-max 3

Cleanup
-------

To terminate this Deployment, click **Stop** in the top right corner of its details page or simply run:

.. code-block:: bash

bentoml deployment terminate summarization
bentoml deployment terminate my-first-bento

Resources
---------
More resources
--------------

If you are a first-time user of BentoCloud, we recommend you read the following documents to get familiar with BentoCloud:
If you are a first-time user of BentoCloud, we recommend you read the following documents to get started:

- Deploy :doc:`example projects </examples/overview>` to BentoCloud
- :doc:`/scale-with-bentocloud/deployment/manage-deployments`
Expand Down
26 changes: 0 additions & 26 deletions docs/source/get-started/deployment.rst

This file was deleted.

Loading