Skip to content

Commit

Permalink
Merge pull request #553 from john0isaac/fixes/chores
Browse files Browse the repository at this point in the history
Fixes/chores
  • Loading branch information
leestott authored Aug 7, 2024
2 parents 963bd54 + 3e4b224 commit fa6bb54
Show file tree
Hide file tree
Showing 26 changed files with 32 additions and 38 deletions.
2 changes: 1 addition & 1 deletion 02-exploring-and-comparing-different-llms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Image source: [2108.07258.pdf (arxiv.org)](https://arxiv.org/pdf/2108.07258.pdf?

Another way to categorize LLMs is whether they are open source or proprietary.

Open-source models are models that are made available to the public and can be used by anyone. They are often made available by the company that created them, or by the research community. These models are allowed to be inspected, modified, and customized for the various use cases in LLMs. However, they are not always optimized for production use, and may not be as performant as proprietary models. Plus, funding for open-source models can be limited, and they may not be maintained long term or may not be updated with the latest research. Examples of popular open source models include [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-koreyst), [Bloom](https://sapling.ai/llm/bloom?WT.mc_id=academic-105485-koreyst) and [LLaMA](https://sapling.ai/llm/llama?WT.mc_id=academic-105485-koreyst).
Open-source models are models that are made available to the public and can be used by anyone. They are often made available by the company that created them, or by the research community. These models are allowed to be inspected, modified, and customized for the various use cases in LLMs. However, they are not always optimized for production use, and may not be as performant as proprietary models. Plus, funding for open-source models can be limited, and they may not be maintained long term or may not be updated with the latest research. Examples of popular open source models include [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-koreyst), [Bloom](https://huggingface.co/bigscience/bloom) and [LLaMA](https://llama.meta.com).

Proprietary models are models that are owned by a company and are not made available to the public. These models are often optimized for production use. However, they are not allowed to be inspected, modified, or customized for different use cases. Plus, they are not always available for free, and may require a subscription or payment to use. Also, users do not have control over the data that is used to train the model, which means they should entrust the model owner with ensuring commitment to data privacy and responsible use of AI. Examples of popular proprietary models include [OpenAI models](https://platform.openai.com/docs/models/overview?WT.mc_id=academic-105485-koreyst), [Google Bard](https://sapling.ai/llm/bard?WT.mc_id=academic-105485-koreyst) or [Claude 2](https://www.anthropic.com/index/claude-2?WT.mc_id=academic-105485-koreyst).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ LLM 模型有许多不同类型,您选择的模型取决于您的用途、您

对 LLM 进行分类的另一种方法是它们是开源的还是专有的。

开源模型是向公众开放并且任何人都可以使用的模型。 它们通常由创建它们的公司或研究团体提供。 这些模型可以针对 LLMs 的各种用例进行检查、修改和定制。 然而,它们并不总是针对生产用途进行优化,并且可能不如专有模型具备高性能。 此外,开源模型的资金可能有限,并且它们可能无法长期维护或可能无法根据最新研究进行更新。 流行的开源模型的例子包括 [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html)[Bloom](https://sapling.ai/llm/bloom)[ LLaMA](https://sapling.ai/llm/llama?WT.mc_id=academic-105485-koreyst)
开源模型是向公众开放并且任何人都可以使用的模型。 它们通常由创建它们的公司或研究团体提供。 这些模型可以针对 LLMs 的各种用例进行检查、修改和定制。 然而,它们并不总是针对生产用途进行优化,并且可能不如专有模型具备高性能。 此外,开源模型的资金可能有限,并且它们可能无法长期维护或可能无法根据最新研究进行更新。 流行的开源模型的例子包括 [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html)[Bloom](https://huggingface.co/bigscience/bloom)[ LLaMA](https://llama.meta.com)

专有模型是公司拥有的模型,不向公众提供。 这些模型通常针对生产用途进行了优化。 但是,不允许针对特定的使用场景进行检查、修改或定制它们。 另外,它们并不总是免费提供,可能需要订阅或付费才能使用。 此外,用户无法控制用于训练模型的数据,这意味着他们应该委托模型所有者确保对数据隐私和负责任地使用人工智能的承诺。 流行的专有模型的例子包括 [OpenAI 模型](https://platform.openai.com/docs/models/overview)[Google Bard](https://sapling.ai/llm/bard?WT.mc_id=academic-105485-koreyst)[Claude 2](https://www.anthropic.com/index/claude-2)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ LLM モデルには、さまざまな種類があり、どのモデルを選択

大規模言語モデル(LLM)を分類する別の方法として、それがオープンソースなのか、もしくはプロプライエタリな物なのか、という観点もあります。

オープンソース・モデルは、一般に公開され、誰でも利用できるモデルです。これらは多くの場合、そのモデルを開発した企業や研究コミュニティによって提供されます。これらのモデルは、LLM の様々な用途に合わせて検証、変更、カスタマイズの許可がされています。しかし、常に本番環境での利用に最適化されているわけではなく、プロプライエタリモデルほど高いパフォーマンスを発揮しない場合もあります。さらに、オープンソース・モデルの資金調達は限られており、長期的に継続できない可能性や、最新の研究に基づいて更新されていない可能性もあります。[Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-yoterada)、[Bloom](https://sapling.ai/llm/bloom?WT.mc_id=academic-105485-yoterada)、[LLaMA](https://sapling.ai/llm/llama?WT.mc_id=academic-105485-yoterada) などが人気のオープンソース・モデルの例です。
オープンソース・モデルは、一般に公開され、誰でも利用できるモデルです。これらは多くの場合、そのモデルを開発した企業や研究コミュニティによって提供されます。これらのモデルは、LLM の様々な用途に合わせて検証、変更、カスタマイズの許可がされています。しかし、常に本番環境での利用に最適化されているわけではなく、プロプライエタリモデルほど高いパフォーマンスを発揮しない場合もあります。さらに、オープンソース・モデルの資金調達は限られており、長期的に継続できない可能性や、最新の研究に基づいて更新されていない可能性もあります。[Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-yoterada)[Bloom](https://huggingface.co/bigscience/bloom)[LLaMA](https://llama.meta.com) などが人気のオープンソース・モデルの例です。

プロプライエタリ・モデルは、企業が所有し一般には公開されていないモデルです。これらのモデルは、通常本番環境での利用に最適化されています。しかし異なるユースケースに対して、検証、変更、カスタマイズは許可されていません。また、常に無料で利用できるわけではなく、利用するためには、サブスクリプション等による支払いが必要な場合もあります。さらに、利用者はモデルをトレーニングする際に使用するデータをコントロールできず、データのプライバシーや、責任ある AI の原則に基づく使用をモデル・プロバイダが保証しているのを信用しなければなりません。[OpenAI のモデル](https://platform.openai.com/docs/models/overview?WT.mc_id=academic-105485-yoterada)、[Google Bard](https://sapling.ai/llm/bard?WT.mc_id=academic-105485-yoterada)、[Claude 2](https://www.anthropic.com/index/claude-2?WT.mc_id=academic-105485-yoterada) などが人気のプロプライエタリ・モデルです。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Foundation Model이라는 용어는 [스탠포드 연구원들에 의해 만들

LLM을 분류하는 또 다른 방법은 오픈 소스인지 독점 모델인지에 따라 나눌 수 있습니다.

오픈 소스 모델은 일반에 공개되어 누구나 사용할 수 있는 모델입니다. 이러한 모델은 일반적으로 해당 모델을 개발한 회사나 연구 커뮤니티에 의해 제공됩니다. 이러한 모델은 검토, 수정 및 사용 사례에 맞게 사용자 정의할 수 있습니다. 그러나 이러한 모델은 항상 프로덕션 환경에 최적화되지 않을 수 있으며, 독점 모델만큼 성능이 우수하지 않을 수도 있습니다. 또한, 오픈 소스 모델의 자금 지원은 제한적일 수 있으며, 장기적으로 유지되지 않거나 최신 연구로 업데이트되지 않을 수도 있습니다. 대표적인 오픈 소스 모델로는 [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-koreyst), [Bloom](https://sapling.ai/llm/bloom?WT.mc_id=academic-105485-koreyst)[LLaMA](https://sapling.ai/llm/llama?WT.mc_id=academic-105485-koreyst)이 있습니다.
오픈 소스 모델은 일반에 공개되어 누구나 사용할 수 있는 모델입니다. 이러한 모델은 일반적으로 해당 모델을 개발한 회사나 연구 커뮤니티에 의해 제공됩니다. 이러한 모델은 검토, 수정 및 사용 사례에 맞게 사용자 정의할 수 있습니다. 그러나 이러한 모델은 항상 프로덕션 환경에 최적화되지 않을 수 있으며, 독점 모델만큼 성능이 우수하지 않을 수도 있습니다. 또한, 오픈 소스 모델의 자금 지원은 제한적일 수 있으며, 장기적으로 유지되지 않거나 최신 연구로 업데이트되지 않을 수도 있습니다. 대표적인 오픈 소스 모델로는 [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-koreyst), [Bloom](https://huggingface.co/bigscience/bloom)[LLaMA](https://llama.meta.com)이 있습니다.

독점 모델 (Proprietary models)은 회사에 소유되어 일반에 공개되지 않는 모델입니다. 이러한 모델은 일반적으로 프로덕션 환경에 최적화되어 있습니다. 그러나 이러한 모델은 사용자가 검토, 수정 또는 사용 사례에 맞게 사용자 정의할 수 없습니다. 또한, 이러한 모델은 항상 무료로 제공되지 않을 수 있으며, 사용을 위해 구독 또는 결제가 필요할 수 있습니다. 또한, 사용자는 모델을 훈련하는 데 사용되는 데이터를 제어할 수 없으므로 데이터 프라이버시와 AI의 책임있는 사용을 보장하기 위해 모델 소유자에게 의존해야 합니다. 대표적인 독점 모델로는 [OpenAI 모델](https://platform.openai.com/docs/models/overview?WT.mc_id=academic-105485-koreyst), [Google Bard](https://sapling.ai/llm/bard?WT.mc_id=academic-105485-koreyst)[Claude 2](https://www.anthropic.com/index/claude-2?WT.mc_id=academic-105485-koreyst)가 있습니다.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Fonte da imagem: [2108.07258.pdf (arxiv.org)](https://arxiv.org/pdf/2108.07258.p

Outra maneira de categorizar os Modelos de Linguagem de Grande Escala (LLMs) é se eles são de código aberto ou proprietários.

Os modelos de código aberto são modelos que são disponibilizados ao público e podem ser usados por qualquer pessoa. Eles são frequentemente disponibilizados pela empresa que os criou ou pela comunidade de pesquisa. Esses modelos podem ser inspecionados, modificados e personalizados para diversos casos de uso em LLMs. No entanto, nem sempre são otimizados para uso em produção e podem não ser tão eficientes quanto os modelos proprietários. Além disso, o financiamento para modelos de código aberto pode ser limitado, e eles podem não ser mantidos a longo prazo ou não ser atualizados com as pesquisas mais recentes. Exemplos de modelos de código aberto populares incluem [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-koreyst), [Bloom](https://sapling.ai/llm/bloom?WT.mc_id=academic-105485-koreyst) e [LLaMA](https://sapling.ai/llm/llama?WT.mc_id=academic-105485-koreyst).
Os modelos de código aberto são modelos que são disponibilizados ao público e podem ser usados por qualquer pessoa. Eles são frequentemente disponibilizados pela empresa que os criou ou pela comunidade de pesquisa. Esses modelos podem ser inspecionados, modificados e personalizados para diversos casos de uso em LLMs. No entanto, nem sempre são otimizados para uso em produção e podem não ser tão eficientes quanto os modelos proprietários. Além disso, o financiamento para modelos de código aberto pode ser limitado, e eles podem não ser mantidos a longo prazo ou não ser atualizados com as pesquisas mais recentes. Exemplos de modelos de código aberto populares incluem [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-koreyst), [Bloom](https://huggingface.co/bigscience/bloom) e [LLaMA](https://llama.meta.com).

Os modelos proprietários são modelos de propriedade de uma empresa e não são disponibilizados ao público. Esses modelos são frequentemente otimizados para uso em produção. No entanto, não podem ser inspecionados, modificados ou personalizados para diferentes casos de uso. Além disso, nem sempre estão disponíveis gratuitamente e podem exigir uma assinatura ou pagamento para uso. Além disso, os usuários não têm controle sobre os dados usados para treinar o modelo, o que significa que devem confiar ao proprietário do modelo o compromisso com a privacidade dos dados e o uso responsável da IA. Exemplos de modelos proprietários populares incluem [modelos da OpenAI](https://platform.openai.com/docs/models/overview?WT.mc_id=academic-105485-koreyst), [Google Bard](https://sapling.ai/llm/bard?WT.mc_id=academic-105485-koreyst) ou [Claude 2](https://www.anthropic.com/index/claude-2?WT.mc_id=academic-105485-koreyst).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ LLMs 可以根據其架構、訓練數據和使用案例進行多種分類。了

另一種分類 LLMs 的方式是它們是開放原始碼還是專有的。

開放原始碼模型是公開提供給大眾使用的模型,任何人都可以使用。這些模型通常由創建它們的公司或研究社群提供。這些模型允許被檢查、修改和自訂,以適應LLM的各種使用案例。然而,它們並不總是針對生產使用進行最佳化,性能可能不如專有模型。此外,開放原始碼模型的資金可能有限,可能不會長期維護或更新最新的研究。受歡迎的開放原始碼模型範例包括[Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-koreyst)[Bloom](https://sapling.ai/llm/bloom?WT.mc_id=academic-105485-koreyst)[LLaMA](https://sapling.ai/llm/llama?WT.mc_id=academic-105485-koreyst)
開放原始碼模型是公開提供給大眾使用的模型,任何人都可以使用。這些模型通常由創建它們的公司或研究社群提供。這些模型允許被檢查、修改和自訂,以適應LLM的各種使用案例。然而,它們並不總是針對生產使用進行最佳化,性能可能不如專有模型。此外,開放原始碼模型的資金可能有限,可能不會長期維護或更新最新的研究。受歡迎的開放原始碼模型範例包括[Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html?WT.mc_id=academic-105485-koreyst)[Bloom](https://huggingface.co/bigscience/bloom)[LLaMA](https://llama.meta.com)

專有模型是由公司擁有且不對公眾開放的模型。這些模型通常針對生產用途進行最佳化。然而,它們不允許被檢查、修改或針對不同的使用案例進行自訂。此外,它們並不總是免費提供,可能需要訂閱或支付費用才能使用。而且,使用者無法控制用於訓練模型的數據,這意味著他們應該信任模型擁有者來確保對數據隱私和負責任使用 AI 的承諾。流行的專有模型範例包括[OpenAI models](https://platform.openai.com/docs/models/overview?WT.mc_id=academic-105485-koreyst)[Google Bard](https://sapling.ai/llm/bard?WT.mc_id=academic-105485-koreyst)[Claude 2](https://www.anthropic.com/index/claude-2?WT.mc_id=academic-105485-koreyst)

Expand Down
8 changes: 2 additions & 6 deletions 07-building-chat-applications/python/aoai-assignment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -92,9 +92,7 @@
},
"source": [
"### Build your first prompt \n",
"This short exercise will provide a basic introduction for submitting prompts to an OpenAI model for a simple task \"summarization\". \n",
"\n",
"![](images/generative-AI-models-reduced.jpg) \n",
"This short exercise will provide a basic introduction for submitting prompts to an OpenAI model for a simple task \"summarization\".\n",
"\n",
"\n",
"**Steps**: \n",
Expand Down Expand Up @@ -209,8 +207,7 @@
"### 3. Finding the right model \n",
"The GPT-3.5-turbo or GPT-4 models can understand and generate natural language. The service offers four model capabilities, each with different levels of power and speed suitable for different tasks. \n",
"\n",
"[Azure OpenAI models](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models?WT.mc_id=academic-105485-koreyst) \n",
"![](images/a-b-c-d-models-reduced.jpg) \n"
"[Azure OpenAI models](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models?WT.mc_id=academic-105485-koreyst)\n"
]
},
{
Expand Down Expand Up @@ -296,7 +293,6 @@
}
},
"source": [
"![](images/prompt_design.jpg)\n",
"image is creating your first text prompt!"
]
},
Expand Down
4 changes: 1 addition & 3 deletions 07-building-chat-applications/python/oai-assignment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,7 @@
},
"source": [
"### Build your first prompt \n",
"This short exercise will provide a basic introduction for submitting prompts to an OpenAI model for a simple task \"summarization\". \n",
"\n",
"![](images/generative-AI-models-reduced.jpg) \n",
"This short exercise will provide a basic introduction for submitting prompts to an OpenAI model for a simple task \"summarization\".\n",
"\n",
"\n",
"**Steps**: \n",
Expand Down
2 changes: 1 addition & 1 deletion 08-building-search-applications/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ The Embedding index for this lesson was created with a series of Python scripts.

The scripts perform the following operations:

1. The transcript for each YouTube video in the [AI Show](https://www.youtube.com/playlist?list=PLlrxD0HtieHi0mwteKBOfEeOYf0LJU4O1?WT.mc_id=academic-105485-koreyst) playlist is downloaded.
1. The transcript for each YouTube video in the [AI Show](https://www.youtube.com/playlist?list=PLlrxD0HtieHi0mwteKBOfEeOYf0LJU4O1) playlist is downloaded.
2. Using [OpenAI Functions](https://learn.microsoft.com/azure/ai-services/openai/how-to/function-calling?WT.mc_id=academic-105485-koreyst), an attempt is made to extract the speaker name from the first 3 minutes of the YouTube transcript. The speaker name for each video is stored in the Embedding Index named `embedding_index_3m.json`.
3. The transcript text is then chunked into **3 minute text segments**. The segment includes about 20 words overlapping from the next segment to ensure that the Embedding for the segment is not cut off and to provide better search context.
4. Each text segment is then passed to the OpenAI Chat API to summarize the text into 60 words. The summary is also stored in the Embedding Index `embedding_index_3m.json`.
Expand Down
Loading

0 comments on commit fa6bb54

Please sign in to comment.