diff --git a/versioned_docs/version-2.0/concepts/evaluation/evaluation.mdx b/versioned_docs/version-2.0/concepts/evaluation/evaluation.mdx index 5896297a..ed3e164f 100644 --- a/versioned_docs/version-2.0/concepts/evaluation/evaluation.mdx +++ b/versioned_docs/version-2.0/concepts/evaluation/evaluation.mdx @@ -313,7 +313,7 @@ The outputs are just the output of that step, which is usually the LLM response. The evaluator for this is usually some binary score for whether the correct tool call was selected, as well as some heuristic for whether the input to the tool was correct. The reference tool can be simply specified as a string. -There are several benefits to this type of evaluation. It allows you to evaluate individual actions, which lets you hone in where your application may be failing. They are also relatively fast to run (because they only involve a single LLM call) and evaluation often uses simple heuristc evaluation of the selected tool relative to the reference tool. One downside is that they don't capture the full agent - only one particular step. Another downside is that dataset creation can be challenging, particular if you want to include past history in the agent input. It is pretty easy to generate a dataset for steps early on in an agent's trajectory (e.g., this may only include the input prompt), but it can be difficult to generate a dataset for steps later on in the trajectory (e.g., including numerous prior agent actions and responses). +There are several benefits to this type of evaluation. It allows you to evaluate individual actions, which lets you hone in where your application may be failing. They are also relatively fast to run (because they only involve a single LLM call) and evaluation often uses simple heuristic evaluation of the selected tool relative to the reference tool. One downside is that they don't capture the full agent - only one particular step. Another downside is that dataset creation can be challenging, particular if you want to include past history in the agent input. It is pretty easy to generate a dataset for steps early on in an agent's trajectory (e.g., this may only include the input prompt), but it can be difficult to generate a dataset for steps later on in the trajectory (e.g., including numerous prior agent actions and responses). :::tip diff --git a/versioned_docs/version-2.0/concepts/usage_and_billing/usage_limits.mdx b/versioned_docs/version-2.0/concepts/usage_and_billing/usage_limits.mdx index 6319b4cb..26768d11 100644 --- a/versioned_docs/version-2.0/concepts/usage_and_billing/usage_limits.mdx +++ b/versioned_docs/version-2.0/concepts/usage_and_billing/usage_limits.mdx @@ -7,7 +7,7 @@ This page assumes that you have already read our guide on [data retention](./dat ## How usage limits work LangSmith lets you configure usage limits on tracing. Note that these are _usage_ limits, not _spend_ limits, which -mean they let you limit the quantity of occurrances of some event rather than the total amount you will spend. +mean they let you limit the quantity of occurrences of some event rather than the total amount you will spend. LangSmith lets you set two different monthly limits, mirroring our Billable Metrics discussed in the aforementioned data retention guide: diff --git a/versioned_docs/version-2.0/how_to_guides/datasets/version_datasets.mdx b/versioned_docs/version-2.0/how_to_guides/datasets/version_datasets.mdx index c3609063..c706a614 100644 --- a/versioned_docs/version-2.0/how_to_guides/datasets/version_datasets.mdx +++ b/versioned_docs/version-2.0/how_to_guides/datasets/version_datasets.mdx @@ -34,7 +34,7 @@ You can also tag versions of your dataset using the SDK. Here's an example of ho ```python from langsmith import Client -fromt datetime import datetime +from datetime import datetime client = Client() diff --git a/versioned_docs/version-2.0/how_to_guides/evaluation/unit_testing.mdx b/versioned_docs/version-2.0/how_to_guides/evaluation/unit_testing.mdx index 0445763f..f843c95d 100644 --- a/versioned_docs/version-2.0/how_to_guides/evaluation/unit_testing.mdx +++ b/versioned_docs/version-2.0/how_to_guides/evaluation/unit_testing.mdx @@ -199,7 +199,7 @@ The following metrics are available off-the-shelf: | `embedding_distance` | Cosine distance between two embeddings | expect.embedding_distance(prediction=prediction, expectation=expectation) | | `edit_distance` | Edit distance between two strings | expect.edit_distance(prediction=prediction, expectation=expectation) | -You can also log any arbitrary feeback within a unit test manually using the `client`. +You can also log any arbitrary feedback within a unit test manually using the `client`. ```python from langsmith import unit, Client diff --git a/versioned_docs/version-2.0/how_to_guides/monitoring/webhooks.mdx b/versioned_docs/version-2.0/how_to_guides/monitoring/webhooks.mdx index 6f3e513b..355cf833 100644 --- a/versioned_docs/version-2.0/how_to_guides/monitoring/webhooks.mdx +++ b/versioned_docs/version-2.0/how_to_guides/monitoring/webhooks.mdx @@ -182,7 +182,7 @@ stub = Stub("auth-example", image=Image.debian_slim().pip_install("langsmith")) @web_endpoint(method="POST") # We set up a `secret` query parameter def f(data: dict, secret: str = Query(...)): - # You can import dependencies you don't have locally inside Modal funxtions + # You can import dependencies you don't have locally inside Modal functions from langsmith import Client # First, we validate the secret key we pass diff --git a/versioned_docs/version-2.0/how_to_guides/prompts/manage_prompts_programatically.mdx b/versioned_docs/version-2.0/how_to_guides/prompts/manage_prompts_programatically.mdx index 61cf3768..82c52fda 100644 --- a/versioned_docs/version-2.0/how_to_guides/prompts/manage_prompts_programatically.mdx +++ b/versioned_docs/version-2.0/how_to_guides/prompts/manage_prompts_programatically.mdx @@ -223,7 +223,7 @@ openai_response = oai_client.chat.completions.create(**openai_payload)`), ## List, delete, and like prompts -You can also list, delete, and like/unline prompts using the `list prompts`, `delete prompt`, `like prompt` and `unlike prompt` methods. +You can also list, delete, and like/unlike prompts using the `list prompts`, `delete prompt`, `like prompt` and `unlike prompt` methods. See the [LangSmith SDK client](https://github.com/langchain-ai/langsmith-sdk) for extensive documentation on these methods. Usage. There, you will able to see a graph of the daily nunber of billable LangSmith traces from the last 30, 60, or 90 days. Note that this data is delayed by 1-2 hours and so may trail your actual number of runs slightly for the current day. +Under the Settings section for your Organization you will see subsection for Usage. There, you will able to see a graph of the daily number of billable LangSmith traces from the last 30, 60, or 90 days. Note that this data is delayed by 1-2 hours and so may trail your actual number of runs slightly for the current day. ### I have a question about my bill... diff --git a/versioned_docs/version-2.0/self_hosting/configuration/external_postgres.mdx b/versioned_docs/version-2.0/self_hosting/configuration/external_postgres.mdx index db4ca952..ff31db05 100644 --- a/versioned_docs/version-2.0/self_hosting/configuration/external_postgres.mdx +++ b/versioned_docs/version-2.0/self_hosting/configuration/external_postgres.mdx @@ -14,7 +14,7 @@ However, you can configure LangSmith to use an external Postgres database (**str - A provisioned Postgres database that your LangSmith instance will have network access to. We recommend using a managed Postgres service like: - [Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.PostgreSQL.html) - [Google Cloud SQL](https://cloud.google.com/curated-resources/cloud-sql#section-1) - - [Azure Database for PostgreSQ](https://azure.microsoft.com/en-us/products/postgresql#features) + - [Azure Database for PostgreSQL](https://azure.microsoft.com/en-us/products/postgresql#features) - Note: We only officially support Postgres versions >= 14. - A user with admin access to the Postgres database. This user will be used to create the necessary tables, indexes, and schemas. - This user will also need to have the ability to create extensions in the database. We use/will try to install the btree_gin, btree_gist, pgcrypto, citext, and pg_trgm extensions. diff --git a/versioned_docs/version-2.0/self_hosting/release_notes.mdx b/versioned_docs/version-2.0/self_hosting/release_notes.mdx index 3b7a8436..333f7ef0 100644 --- a/versioned_docs/version-2.0/self_hosting/release_notes.mdx +++ b/versioned_docs/version-2.0/self_hosting/release_notes.mdx @@ -17,7 +17,7 @@ This release adds a number of new features, improves the performance of the Thre - [Resource tags to organize your Workspace in LangSmith](https://changelog.langchain.com/announcements/resource-tags-to-organize-your-workspace-in-langsmith) - [Generate synthetic examples to enhance a LangSmith dataset](https://changelog.langchain.com/announcements/generate-synthetic-examples-to-enhance-a-langsmith-dataset) -- [Enhanced trace comparison view and savable custom trace filters](https://changelog.langchain.com/announcements/trace-comparison-view-saving-custom-trace-filters) +- [Enhanced trace comparison view and saveable custom trace filters](https://changelog.langchain.com/announcements/trace-comparison-view-saving-custom-trace-filters) - [Defining, validating and updating dataset schemas](https://changelog.langchain.com/announcements/define-validate-and-update-dataset-schemas-in-langsmith) - [Multiple annotators can review a run in LangSmith](https://changelog.langchain.com/announcements/multiple-annotators-can-review-a-run-in-langsmith) - [Support for filtering runs within the trace view](https://changelog.langchain.com/announcements/filtering-runs-within-the-trace-view) diff --git a/versioned_docs/version-2.0/self_hosting/scripts/delete_traces.mdx b/versioned_docs/version-2.0/self_hosting/scripts/delete_traces.mdx index 2e06fe01..d387d08d 100644 --- a/versioned_docs/version-2.0/self_hosting/scripts/delete_traces.mdx +++ b/versioned_docs/version-2.0/self_hosting/scripts/delete_traces.mdx @@ -6,7 +6,7 @@ table_of_contents: true # Deleting Traces -The LangSmith UI does not currently support the deletion of an invidual trace. This, however, can be accomplished by directly removing the trace from all materialized views in ClickHouse (except the runs_history views) and the runs and feedback tables themselves. +The LangSmith UI does not currently support the deletion of an individual trace. This, however, can be accomplished by directly removing the trace from all materialized views in ClickHouse (except the runs_history views) and the runs and feedback tables themselves. This command can either be run using a trace ID as an argument or using a file that is a list of trace IDs. diff --git a/versioned_docs/version-2.0/tutorials/Administrators/manage_spend.mdx b/versioned_docs/version-2.0/tutorials/Administrators/manage_spend.mdx index 6f8ccdd7..88178a41 100644 --- a/versioned_docs/version-2.0/tutorials/Administrators/manage_spend.mdx +++ b/versioned_docs/version-2.0/tutorials/Administrators/manage_spend.mdx @@ -228,7 +228,7 @@ use this feature, please read more about its functionality [here](../../concepts ### Set dev/staging limits and view total spent limit across workspaces -Following the same logic for our dev and staging environments, whe set limits at 10% of the production +Following the same logic for our dev and staging environments, we set limits at 10% of the production limit on usage for each workspace. While this works with our usage pattern, setting good dev and staging limits may vary depending on diff --git a/versioned_docs/version-2.0/tutorials/Developers/agents.mdx b/versioned_docs/version-2.0/tutorials/Developers/agents.mdx index f53468fc..ef18faa3 100644 --- a/versioned_docs/version-2.0/tutorials/Developers/agents.mdx +++ b/versioned_docs/version-2.0/tutorials/Developers/agents.mdx @@ -160,7 +160,7 @@ class State(TypedDict): ### SQL Assistant -Use [prompt based roughtly on what is shown here](https://python.langchain.com/v0.2/docs/tutorials/sql_qa/#agents). +Use [prompt based roughly on what is shown here](https://python.langchain.com/v0.2/docs/tutorials/sql_qa/#agents). ```python from langchain_core.runnables import Runnable, RunnableConfig @@ -178,8 +178,8 @@ class Assistant: # Invoke the tool-calling LLM result = self.runnable.invoke(state) # If it is a tool call -> response is valid - # If it has meaninful text -> response is valid - # Otherwise, we re-prompt it b/c response is not meaninful + # If it has meaningful text -> response is valid + # Otherwise, we re-prompt it b/c response is not meaningful if not result.tool_calls and ( not result.content or isinstance(result.content, list) @@ -478,7 +478,7 @@ def check_specific_tool_call(root_run: Run, example: Example) -> dict: """ Check if the first tool call in the response matches the expected tool call. """ - # Exepected tool call + # Expected tool call expected_tool_call = 'sql_db_list_tables' # Run diff --git a/versioned_docs/version-2.0/tutorials/Developers/evaluation.mdx b/versioned_docs/version-2.0/tutorials/Developers/evaluation.mdx index 8eaec72c..e6ac8b7c 100644 --- a/versioned_docs/version-2.0/tutorials/Developers/evaluation.mdx +++ b/versioned_docs/version-2.0/tutorials/Developers/evaluation.mdx @@ -19,7 +19,7 @@ At a high level, in this tutorial we will go over how to: - _Track results over time_ - _Set up automated testing to run in CI/CD_ -For more information on the evaluation workflows LangSmith supports, check out the [how-to guides](../../how_to_guides). +For more information on the evaluation workflows LangSmith supports, check out the [how-to guides](../../how_to_guides), or see the reference docs for [evaluate](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._runner.evaluate.html) and its asynchronous [aevaluate](https://langsmith-sdk.readthedocs.io/en/latest/evaluation/langsmith.evaluation._arunner.aevaluate.html) counterpart. Lots to cover, let's dive in! @@ -192,7 +192,7 @@ Now we're ready to run evaluation. Let's do it! ```python -from langsmith.evaluation import evaluate +from langsmith import evaluate experiment_results = evaluate( langsmith_app, # Your AI system @@ -200,6 +200,18 @@ experiment_results = evaluate( evaluators=[evaluate_length, qa_evaluator], # The evaluators to score the results experiment_prefix="openai-3.5", # A prefix for your experiment names to easily identify them ) + +# Note: If your system is async, you can use the asynchronous `aevaluate` function +# import asyncio +# from langsmith import aevaluate +# +# experiment_results = asyncio.run(aevaluate( +# my_async_langsmith_app, # Your AI system +# data=dataset_name, # The data to predict and grade over +# evaluators=[evaluate_length, qa_evaluator], # The evaluators to score the results +# experiment_prefix="openai-3.5", # A prefix for your experiment names to easily identify them +# )) + ``` This will output a URL. If we click on it, we should see results of our evaluation! diff --git a/versioned_docs/version-2.0/tutorials/Developers/optimize_classifier.mdx b/versioned_docs/version-2.0/tutorials/Developers/optimize_classifier.mdx index 94b59e91..1620d06e 100644 --- a/versioned_docs/version-2.0/tutorials/Developers/optimize_classifier.mdx +++ b/versioned_docs/version-2.0/tutorials/Developers/optimize_classifier.mdx @@ -237,10 +237,10 @@ import numpy as np def find_similar(examples, topic, k=5): inputs = [e.inputs['topic'] for e in examples] + [topic] - embedds = client.embeddings.create(input=inputs, model="text-embedding-3-small") - embedds = [e.embedding for e in embedds.data] - embedds = np.array(embedds) - args = np.argsort(-embedds.dot(embedds[-1])[:-1])[:5] + vectors = client.embeddings.create(input=inputs, model="text-embedding-3-small") + vectors = [e.embedding for e in vectors.data] + vectors = np.array(vectors) + args = np.argsort(-vectors.dot(vectors[-1])[:-1])[:5] examples = [examples[i] for i in args] return examples ``` diff --git a/versioned_docs/version-2.0/tutorials/Developers/swe-benchmark.mdx b/versioned_docs/version-2.0/tutorials/Developers/swe-benchmark.mdx index 4bdcf505..10b8ce06 100644 --- a/versioned_docs/version-2.0/tutorials/Developers/swe-benchmark.mdx +++ b/versioned_docs/version-2.0/tutorials/Developers/swe-benchmark.mdx @@ -39,7 +39,7 @@ df['version'] = df['version'].apply(lambda x: f"version:{x}") ### Save to CSV -To upload the data to LangSmith, we first need to save it to a CSV, which we can do using the `to_csv` function provided by pandas. Make sure to save this file somewhere that is easily accesible to you. +To upload the data to LangSmith, we first need to save it to a CSV, which we can do using the `to_csv` function provided by pandas. Make sure to save this file somewhere that is easily accessible to you. ```python df.to_csv("./../SWE-bench.csv",index=False) @@ -55,7 +55,7 @@ Next, select `Key-Value` as the dataset type. Lastly head to the `Create Schema` Once you have populated the `Input fields` (and left the `Output fields` empty!) you can click the blue `Create` button in the top right corner, and your dataset will be created! -### Upload CSV to LangSmith Programatically +### Upload CSV to LangSmith Programmatically Alternatively you can upload your csv to LangSmith using the sdk as shown in the code block below: @@ -178,7 +178,7 @@ def convert_runs_to_langsmith_feedback( else: feedback_for_instance.append({"key":"resolved-patch","score":0}) else: - # The instance did not run succesfully + # The instance did not run successfully feedback_for_instance += [{"key":"completed-patch","score":0},{"key":"resolved-patch","score":0}] feedback_for_all_instances[prediction['run_id']] = feedback_for_instance