diff --git a/example/llm/lemonade/README.md b/example/llm/lemonade/README.md index 244ea34b..b5bbcd78 100644 --- a/example/llm/lemonade/README.md +++ b/example/llm/lemonade/README.md @@ -1,6 +1,6 @@ # Ryzen AI LLM Lemonade Examples -The following table contains a curated list of LLMs that have been validated with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) on Ryzen AI hybrid execution mode, along with CPU implementations of those same checkpoints. +The following table contains a curated list of LLMs that have been validated with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) on Ryzen AI hybrid execution mode, along with CPU implementations of those same checkpoints. The hybrid examples are built on top of OnnxRuntime GenAI (OGA), while the CPU baseline is built on top of Hugging Face (HF) ``transformers``. Validation is defined as running all commands in the example page successfully. diff --git a/example/llm/lemonade/cpu/CodeLlama_7b_Instruct_hf.md b/example/llm/lemonade/cpu/CodeLlama_7b_Instruct_hf.md index ebd9882b..4372ebf6 100644 --- a/example/llm/lemonade/cpu/CodeLlama_7b_Instruct_hf.md +++ b/example/llm/lemonade/cpu/CodeLlama_7b_Instruct_hf.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/CodeLlama_7b_Instruct_hf.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`meta-llama/CodeLlama-7b-Instruct-hf`](h # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i meta-llama/CodeLlama-7b-Instruct-hf huggingface-load --device cpu -- ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i meta-llama/CodeLlama-7b-Instruct-hf huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Llama_8B.md b/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Llama_8B.md index e4a2490a..90214c66 100644 --- a/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Llama_8B.md +++ b/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Llama_8B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/DeepSeek_R1_Distill_Llama_8B.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`deepseek-ai/DeepSeek-R1-Distill-Llama-8 # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i deepseek-ai/DeepSeek-R1-Distill-Llama-8B huggingface-load --device c ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i deepseek-ai/DeepSeek-R1-Distill-Llama-8B huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Qwen_1_5B.md b/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Qwen_1_5B.md index 99e622b3..1529bf99 100644 --- a/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Qwen_1_5B.md +++ b/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Qwen_1_5B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/DeepSeek_R1_Distill_Qwen_1_5B.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1. # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B huggingface-load --device ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Qwen_7B.md b/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Qwen_7B.md index 3d3e24b7..f6eb7181 100644 --- a/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Qwen_7B.md +++ b/example/llm/lemonade/cpu/DeepSeek_R1_Distill_Qwen_7B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/DeepSeek_R1_Distill_Qwen_7B.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-7B # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i deepseek-ai/DeepSeek-R1-Distill-Qwen-7B huggingface-load --device cp ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i deepseek-ai/DeepSeek-R1-Distill-Qwen-7B huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Llama_2_7b_chat_hf.md b/example/llm/lemonade/cpu/Llama_2_7b_chat_hf.md index 0d2482e8..6fb0c9a3 100644 --- a/example/llm/lemonade/cpu/Llama_2_7b_chat_hf.md +++ b/example/llm/lemonade/cpu/Llama_2_7b_chat_hf.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Llama_2_7b_chat_hf.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`meta-llama/Llama-2-7b-chat-hf`](https:/ # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i meta-llama/Llama-2-7b-chat-hf huggingface-load --device cpu --dtype ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i meta-llama/Llama-2-7b-chat-hf huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Llama_2_7b_hf.md b/example/llm/lemonade/cpu/Llama_2_7b_hf.md index cdf0ad6e..25fd80a3 100644 --- a/example/llm/lemonade/cpu/Llama_2_7b_hf.md +++ b/example/llm/lemonade/cpu/Llama_2_7b_hf.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Llama_2_7b_hf.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`meta-llama/Llama-2-7b-hf`](https://hugg # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i meta-llama/Llama-2-7b-hf huggingface-load --device cpu --dtype bfloa ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i meta-llama/Llama-2-7b-hf huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Llama_3_1_8B.md b/example/llm/lemonade/cpu/Llama_3_1_8B.md index 72d65130..5dd4aa48 100644 --- a/example/llm/lemonade/cpu/Llama_3_1_8B.md +++ b/example/llm/lemonade/cpu/Llama_3_1_8B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Llama_3_1_8B.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`meta-llama/Llama-3.1-8B`](https://huggi # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i meta-llama/Llama-3.1-8B huggingface-load --device cpu --dtype bfloat ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i meta-llama/Llama-3.1-8B huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Llama_3_1_8B_Instruct.md b/example/llm/lemonade/cpu/Llama_3_1_8B_Instruct.md index 9f5a1cef..16c930a2 100644 --- a/example/llm/lemonade/cpu/Llama_3_1_8B_Instruct.md +++ b/example/llm/lemonade/cpu/Llama_3_1_8B_Instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Llama_3_1_8B_Instruct.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`meta-llama/Llama-3.1-8B-Instruct`](http # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i meta-llama/Llama-3.1-8B-Instruct huggingface-load --device cpu --dty ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i meta-llama/Llama-3.1-8B-Instruct huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Llama_3_2_1B_Instruct.md b/example/llm/lemonade/cpu/Llama_3_2_1B_Instruct.md index 9c38a66d..eb583199 100644 --- a/example/llm/lemonade/cpu/Llama_3_2_1B_Instruct.md +++ b/example/llm/lemonade/cpu/Llama_3_2_1B_Instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Llama_3_2_1B_Instruct.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`meta-llama/Llama-3.2-1B-Instruct`](http # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i meta-llama/Llama-3.2-1B-Instruct huggingface-load --device cpu --dty ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i meta-llama/Llama-3.2-1B-Instruct huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Llama_3_2_3B_Instruct.md b/example/llm/lemonade/cpu/Llama_3_2_3B_Instruct.md index 66eda265..5859e58a 100644 --- a/example/llm/lemonade/cpu/Llama_3_2_3B_Instruct.md +++ b/example/llm/lemonade/cpu/Llama_3_2_3B_Instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Llama_3_2_3B_Instruct.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`meta-llama/Llama-3.2-3B-Instruct`](http # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i meta-llama/Llama-3.2-3B-Instruct huggingface-load --device cpu --dty ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i meta-llama/Llama-3.2-3B-Instruct huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Meta_Llama_3_8B.md b/example/llm/lemonade/cpu/Meta_Llama_3_8B.md index eaf08474..32f79589 100644 --- a/example/llm/lemonade/cpu/Meta_Llama_3_8B.md +++ b/example/llm/lemonade/cpu/Meta_Llama_3_8B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Meta_Llama_3_8B.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`meta-llama/Meta-Llama-3-8B`](https://hu # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i meta-llama/Meta-Llama-3-8B huggingface-load --device cpu --dtype bfl ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i meta-llama/Meta-Llama-3-8B huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Mistral_7B_Instruct_v0_3.md b/example/llm/lemonade/cpu/Mistral_7B_Instruct_v0_3.md index c374f9ee..ad4db9cc 100644 --- a/example/llm/lemonade/cpu/Mistral_7B_Instruct_v0_3.md +++ b/example/llm/lemonade/cpu/Mistral_7B_Instruct_v0_3.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Mistral_7B_Instruct_v0_3.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`mistralai/Mistral-7B-Instruct-v0.3`](ht # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i mistralai/Mistral-7B-Instruct-v0.3 huggingface-load --device cpu --d ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i mistralai/Mistral-7B-Instruct-v0.3 huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Phi_3_5_mini_instruct.md b/example/llm/lemonade/cpu/Phi_3_5_mini_instruct.md index 5bc65d8a..56285fcb 100644 --- a/example/llm/lemonade/cpu/Phi_3_5_mini_instruct.md +++ b/example/llm/lemonade/cpu/Phi_3_5_mini_instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Phi_3_5_mini_instruct.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`microsoft/Phi-3.5-mini-instruct`](https # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i microsoft/Phi-3.5-mini-instruct huggingface-load --device cpu --dtyp ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i microsoft/Phi-3.5-mini-instruct huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Phi_3_mini_4k_instruct.md b/example/llm/lemonade/cpu/Phi_3_mini_4k_instruct.md index 3d17f789..4c57236c 100644 --- a/example/llm/lemonade/cpu/Phi_3_mini_4k_instruct.md +++ b/example/llm/lemonade/cpu/Phi_3_mini_4k_instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Phi_3_mini_4k_instruct.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`microsoft/Phi-3-mini-4k-instruct`](http # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i microsoft/Phi-3-mini-4k-instruct huggingface-load --device cpu --dty ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i microsoft/Phi-3-mini-4k-instruct huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Qwen1_5_7B_Chat.md b/example/llm/lemonade/cpu/Qwen1_5_7B_Chat.md index f30688b2..8d4d91b0 100644 --- a/example/llm/lemonade/cpu/Qwen1_5_7B_Chat.md +++ b/example/llm/lemonade/cpu/Qwen1_5_7B_Chat.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Qwen1_5_7B_Chat.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`Qwen/Qwen1.5-7B-Chat`](https://huggingf # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i Qwen/Qwen1.5-7B-Chat huggingface-load --device cpu --dtype bfloat16 ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i Qwen/Qwen1.5-7B-Chat huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Qwen2_1_5B.md b/example/llm/lemonade/cpu/Qwen2_1_5B.md index 5a06c02b..22c25730 100644 --- a/example/llm/lemonade/cpu/Qwen2_1_5B.md +++ b/example/llm/lemonade/cpu/Qwen2_1_5B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Qwen2_1_5B.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`Qwen/Qwen2-1.5B`](https://huggingface.c # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i Qwen/Qwen2-1.5B huggingface-load --device cpu --dtype bfloat16 huggi ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i Qwen/Qwen2-1.5B huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/Qwen2_7B.md b/example/llm/lemonade/cpu/Qwen2_7B.md index 5a065ab9..a392557c 100644 --- a/example/llm/lemonade/cpu/Qwen2_7B.md +++ b/example/llm/lemonade/cpu/Qwen2_7B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/Qwen2_7B.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`Qwen/Qwen2-7B`](https://huggingface.co/ # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i Qwen/Qwen2-7B huggingface-load --device cpu --dtype bfloat16 hugging ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i Qwen/Qwen2-7B huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/cpu/gemma_2_2b.md b/example/llm/lemonade/cpu/gemma_2_2b.md index d631bb81..5f2b2fe7 100644 --- a/example/llm/lemonade/cpu/gemma_2_2b.md +++ b/example/llm/lemonade/cpu/gemma_2_2b.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo The CPU implementation in this guide is designed to run on most PCs. However, for optimal performance on Ryzen AI 300-series PCs, try the [hybrid execution mode](../hybrid/gemma_2_2b.md). -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework, as well as the support for Hugging Face `transformers` baselines leveraged in this guide. # Checkpoint @@ -12,7 +12,7 @@ The Hugging Face CPU implementation of [`google/gemma-2-2b`](https://huggingface # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -35,9 +35,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm] + pip install lemonade-sdk[llm] ``` # Validation Tools @@ -62,7 +62,7 @@ lemonade -i google/gemma-2-2b huggingface-load --device cpu --dtype bfloat16 hug ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i google/gemma-2-2b huggingface-load --device cpu --dtype bfloat16 accuracy-mmlu --tests management @@ -138,7 +138,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/CodeLlama_7b_Instruct_hf.md b/example/llm/lemonade/hybrid/CodeLlama_7b_Instruct_hf.md index be77f63d..8bd1ac02 100644 --- a/example/llm/lemonade/hybrid/CodeLlama_7b_Instruct_hf.md +++ b/example/llm/lemonade/hybrid/CodeLlama_7b_Instruct_hf.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`meta-llama/CodeLlama-7b-Instruct-hf`](ht # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/CodeLlama-7b-instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid oga ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/CodeLlama-7b-instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Llama_8B.md b/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Llama_8B.md index 80576ac8..57e6ea66 100644 --- a/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Llama_8B.md +++ b/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Llama_8B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`deepseek-ai/DeepSeek-R1-Distill-Llama-8B # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-hyb ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Qwen_1_5B.md b/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Qwen_1_5B.md index 2b093359..be18db57 100644 --- a/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Qwen_1_5B.md +++ b/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Qwen_1_5B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5 # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/DeepSeek-R1-Distill-Qwen-1.5B-awq-asym-uint4-g128-lmhead-onnx-hy ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/DeepSeek-R1-Distill-Qwen-1.5B-awq-asym-uint4-g128-lmhead-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Qwen_7B.md b/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Qwen_7B.md index 91819401..7520400b 100644 --- a/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Qwen_7B.md +++ b/example/llm/lemonade/hybrid/DeepSeek_R1_Distill_Qwen_7B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/DeepSeek-R1-Distill-Qwen-7B-awq-asym-uint4-g128-lmhead-onnx-hybr ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/DeepSeek-R1-Distill-Qwen-7B-awq-asym-uint4-g128-lmhead-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Llama_2_7b_chat_hf.md b/example/llm/lemonade/hybrid/Llama_2_7b_chat_hf.md index ab0c1cb8..36d153df 100644 --- a/example/llm/lemonade/hybrid/Llama_2_7b_chat_hf.md +++ b/example/llm/lemonade/hybrid/Llama_2_7b_chat_hf.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`meta-llama/Llama-2-7b-chat-hf`](https:// # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid oga-load ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Llama_2_7b_hf.md b/example/llm/lemonade/hybrid/Llama_2_7b_hf.md index 51aa621b..f5d418db 100644 --- a/example/llm/lemonade/hybrid/Llama_2_7b_hf.md +++ b/example/llm/lemonade/hybrid/Llama_2_7b_hf.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`meta-llama/Llama-2-7b-hf`](https://huggi # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Llama-2-7b-hf-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --dev ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Llama-2-7b-hf-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Llama_3_1_8B.md b/example/llm/lemonade/hybrid/Llama_3_1_8B.md index aaca732c..c6365234 100644 --- a/example/llm/lemonade/hybrid/Llama_3_1_8B.md +++ b/example/llm/lemonade/hybrid/Llama_3_1_8B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`meta-llama/Llama-3.1-8B`](https://huggin # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Llama-3.1-8B-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --devi ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Llama-3.1-8B-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Llama_3_1_8B_Instruct.md b/example/llm/lemonade/hybrid/Llama_3_1_8B_Instruct.md index 62d7402c..f753ae8e 100644 --- a/example/llm/lemonade/hybrid/Llama_3_1_8B_Instruct.md +++ b/example/llm/lemonade/hybrid/Llama_3_1_8B_Instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`meta-llama/Llama-3.1-8B-Instruct`](https # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Llama-3.1-8B-Instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid oga ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Llama-3.1-8B-Instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Llama_3_2_1B_Instruct.md b/example/llm/lemonade/hybrid/Llama_3_2_1B_Instruct.md index f5870bf9..25254d24 100644 --- a/example/llm/lemonade/hybrid/Llama_3_2_1B_Instruct.md +++ b/example/llm/lemonade/hybrid/Llama_3_2_1B_Instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`meta-llama/Llama-3.2-1B-Instruct`](https # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-lo ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Llama_3_2_3B_Instruct.md b/example/llm/lemonade/hybrid/Llama_3_2_3B_Instruct.md index bc6ed4ce..1a712a8c 100644 --- a/example/llm/lemonade/hybrid/Llama_3_2_3B_Instruct.md +++ b/example/llm/lemonade/hybrid/Llama_3_2_3B_Instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`meta-llama/Llama-3.2-3B-Instruct`](https # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-lo ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Meta_Llama_3_8B.md b/example/llm/lemonade/hybrid/Meta_Llama_3_8B.md index 69933763..0b4581ba 100644 --- a/example/llm/lemonade/hybrid/Meta_Llama_3_8B.md +++ b/example/llm/lemonade/hybrid/Meta_Llama_3_8B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`meta-llama/Meta-Llama-3-8B`](https://hug # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Mistral_7B_Instruct_v0_3.md b/example/llm/lemonade/hybrid/Mistral_7B_Instruct_v0_3.md index 194ca54a..770bce7f 100644 --- a/example/llm/lemonade/hybrid/Mistral_7B_Instruct_v0_3.md +++ b/example/llm/lemonade/hybrid/Mistral_7B_Instruct_v0_3.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`mistralai/Mistral-7B-Instruct-v0.3`](htt # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid oga ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Phi_3_5_mini_instruct.md b/example/llm/lemonade/hybrid/Phi_3_5_mini_instruct.md index cc84e115..16a4a0e2 100644 --- a/example/llm/lemonade/hybrid/Phi_3_5_mini_instruct.md +++ b/example/llm/lemonade/hybrid/Phi_3_5_mini_instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`microsoft/Phi-3.5-mini-instruct`](https: # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-lo ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Phi_3_mini_4k_instruct.md b/example/llm/lemonade/hybrid/Phi_3_mini_4k_instruct.md index 3c50d58d..0fe6321a 100644 --- a/example/llm/lemonade/hybrid/Phi_3_mini_4k_instruct.md +++ b/example/llm/lemonade/hybrid/Phi_3_mini_4k_instruct.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`microsoft/Phi-3-mini-4k-instruct`](https # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-l ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Qwen1_5_7B_Chat.md b/example/llm/lemonade/hybrid/Qwen1_5_7B_Chat.md index 9a9064da..cc93ae84 100644 --- a/example/llm/lemonade/hybrid/Qwen1_5_7B_Chat.md +++ b/example/llm/lemonade/hybrid/Qwen1_5_7B_Chat.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`Qwen/Qwen1.5-7B-Chat`](https://huggingfa # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --d ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Qwen2_1_5B.md b/example/llm/lemonade/hybrid/Qwen2_1_5B.md index d443db65..251c525e 100644 --- a/example/llm/lemonade/hybrid/Qwen2_1_5B.md +++ b/example/llm/lemonade/hybrid/Qwen2_1_5B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`Qwen/Qwen2-1.5B`](https://huggingface.co # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Qwen2-1.5B-awq-uint4-asym-global-g128-lmhead-g32-fp16-onnx-hybri ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Qwen2-1.5B-awq-uint4-asym-global-g128-lmhead-g32-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/Qwen2_7B.md b/example/llm/lemonade/hybrid/Qwen2_7B.md index fc0c2530..701a0cac 100644 --- a/example/llm/lemonade/hybrid/Qwen2_7B.md +++ b/example/llm/lemonade/hybrid/Qwen2_7B.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`Qwen/Qwen2-7B`](https://huggingface.co/Q # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/Qwen2-7B-awq-uint4-asym-g128-lmhead-fp16-onnx-hybrid oga-load -- ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/Qwen2-7B-awq-uint4-asym-g128-lmhead-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright diff --git a/example/llm/lemonade/hybrid/gemma_2_2b.md b/example/llm/lemonade/hybrid/gemma_2_2b.md index c5e6138d..2f04c45a 100644 --- a/example/llm/lemonade/hybrid/gemma_2_2b.md +++ b/example/llm/lemonade/hybrid/gemma_2_2b.md @@ -4,7 +4,7 @@ This guide contains all of the instructions necessary to get started with the mo Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. -The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. +The commands and scripts in this guide leverage the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade), which provides everything you need to get up and running with LLMs on the OnnxRuntime GenAI (OGA) framework. # Checkpoint @@ -12,7 +12,7 @@ The Ryzen AI Hybrid implementation of [`google/gemma-2-2b`](https://huggingface. # Setup -To get started with the [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) in a Python environment, follow these instructions. +To get started with the [Lemonade SDK](https://github.com/lemonade-sdk/lemonade) in a Python environment, follow these instructions. ### System-level pre-requisites @@ -36,9 +36,9 @@ To create and set up an environment, run these commands in your terminal: conda activate ryzenai-llm ``` -3. Install ONNX TurnkeyML to get access to the LLM tools and APIs. +3. Install the Lemonade SDK to get access to the LLM tools and APIs. ```bash - pip install turnkeyml[llm-oga-hybrid] + pip install lemonade-sdk[llm-oga-hybrid] ``` 4. Install support for Ryzen AI Hybrid LLMs. @@ -68,7 +68,7 @@ lemonade -i amd/gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid oga-l ## Task Performance -To measure the model's accuracy on the [MMLU test](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md) `management` subject, run: +To measure the model's accuracy on the [MMLU test](https://github.com/lemonade-sdk/lemonade/blob/main/docs/mmlu_accuracy.md) `management` subject, run: ```bash lemonade -i amd/gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 accuracy-mmlu --tests management @@ -126,7 +126,7 @@ thread.join() ## Application Example -See the [Chat Demo](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. +See the [Chat Demo](https://github.com/lemonade-sdk/lemonade/blob/main/examples/demos/chat/chat_hybrid.py) for an example application that demonstrates streaming, multi-threading, and response interruption. # Server Interface (REST API) @@ -144,7 +144,7 @@ This guide provided instructions for testing and deploying an LLM on a target de - Visit the [Lemonade LLM examples table](../README.md) to learn how to do this for any of the supported combinations of LLM and device. - Visit the [overall Ryzen AI LLM documentation](https://ryzenai.docs.amd.com/en/latest/llm/overview.html#) to learn about other deployment options, such as native C++ libraries. -- Visit the [Lemonade SDK repository](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) to learn about more tools and features. +- Visit the [Lemonade SDK repository](https://github.com/lemonade-sdk/lemonade) to learn about more tools and features. # Copyright