Using CUDA and Llama-cpp to Run a Phi-3-Small-128K-Instruct Model on IBM Cloud VSI with GPUs

The popularity of llama.cpp and optimized GGUF format for models is growing. This post outlines steps to run "Phi-3-Small-128K-Instruct" in GGUF format with llama.cpp on an IBM Cloud VSI with GPUs and Ubuntu 22.04. It covers VSI setup, CUDA toolkit, compilation, Python environment, model usage, and additional resources.

June 12, 2024 0

Getting started with Text Generation Inference (TGI) using a container to serve your LLM model

This blog post outlines a bash automation for setting up and testing Text Generation Inference (TGI) using a container. It provides instructions for creating a Python test client, starting the TGI server, and troubleshooting common issues. The post emphasizes the benefits of using containers and references the Hugging Face and Nvidia technologies.

February 14, 2024 2

Blog at WordPress.com.

Up ↑

Tag: #gpu

Using CUDA and Llama-cpp to Run a Phi-3-Small-128K-Instruct Model on IBM Cloud VSI with GPUs

Getting started with Text Generation Inference (TGI) using a container to serve your LLM model

Blog Stats