Using Open-Source Tools to Reduce Language Model Costs

This module teaches you how to set up and use local large language models (LLMs) like LM Studio, JAN, GPT-4ALL, and Ollama. You’ll learn to run models offline, ensuring privacy and cost savings, while enabling advanced features like document-based querying and custom AI personas. The lessons cover setting up local servers that mimic OpenAI’s API, running models like Llama 3, and creating personalized chat systems for personal or business use.

By the end, you'll be able to build, customize, and optimize local LLMs for tasks such as chat interactions, document retrieval, and AI assistants, all while maintaining control over your data.

Lesson 1

Setting Up LM Studio with Lama3 for Offline Document-Based AI Chatting

In this video, you'll learn how to set up LM Studio, a user-friendly platform that allows you to run large language models (LLMs) entirely offline. LM Studio supports downloading models from Hugging Face and includes an OpenAI-compatible local server. However, it lacks native support for chatting with documents. To address this, the video demonstrates how to integrate LM Studio with Anything LLM to enable document-based interactions.

You'll explore:

  • Installing LM Studio and setting it up with the Lama3 model.
  • Using Anything LLM to run local document-based chats.
  • Uploading documents, embedding them, and querying specific information with citations.

By the end, you'll have a system for interacting with LLMs using your own documents, creating a powerful offline solution.

Full Video & Source Code
 

Lesson 2

Exploring JAN: The Open-Source Alternative to LM Studio for Local LLMs

In this video, we explore JAN, an open-source alternative to LM Studio for running large language models locally. While LM Studio has become a popular tool, its proprietary nature makes it less ideal for business use. JAN offers a clean interface, active community support, and the ability to switch between different models, including GPT-4 and Llama 3, seamlessly.

You'll learn how to:

  • Set up JAN by cloning the GitHub repository
  • Switch between models like GPT-4 and Llama 3
  • Enable experimental features such as document retrieval
  • Compare JAN's performance and UI with LM Studio

By the end, you'll have a better understanding of JAN's capabilities as a local LLM solution.

Full Video & Source Code
 

Lesson 3

Exploring GPT-4ALL: A Seamless Local LLM for Tasks and Document Retrieval

In this video, we explore GPT-4ALL, an all-in-one application designed to run local language models for common tasks and retrieval augmented generation (RAG). Unlike JAN, which struggled with RAG in testing, GPT-4ALL integrates smoothly with local documents and provides references for its responses. The setup process is simple and intuitive, with a user-friendly interface.

You'll learn how to:

  • Install GPT-4ALL and download models like Llama 3
  • Test the model by generating responses and completing tasks
  • Upload and query local documents, retrieving accurate, reference-backed answers
  • Compare GPT-4ALL’s performance with other local LLM tools

By the end, you'll see how GPT-4ALL efficiently handles both basic queries and document-based retrieval.

Full Video & Source Code
 

Lesson 4

Using Ollama for Running Local LLMs and Replacing OpenAI API

In this tutorial, we explore how to use CLI tools like Ollama to run open-source models locally and integrate them with custom applications. Ollama provides OpenAI-compatible endpoints, allowing seamless integration with front-end interfaces and Python scripts. While commonly used for chatbots, these tools can also power more complex systems like agents using frameworks such as CrewAI and Microsoft Autogen.

You'll learn how to:

  • Install and set up Ollama for Mac, Linux, or Windows
  • Run the Llama 3 model from the command line
  • Replace OpenAI's API with local models in Python scripts
  • Use a virtual environment to keep dependencies clean and separate

By the end, you'll have a locally hosted, privacy-preserving alternative to GPT that can be used across various projects.

Full Video & Source Code
 

Lesson 5

Setting Up LM Studio as a Local API Server for Private LLM Use

In this video, we demonstrate how to use LM Studio to run a local language model server that mimics the functionality of the OpenAI API. This allows you to keep your data private and free from external dependencies by running models like Llama 3 directly on your machine. LM Studio provides an intuitive interface to start the server and includes sample code for easy integration with Python.

You'll learn how to:

  • Set up a local inference server using LM Studio
  • Install and configure the OpenAI Python package for local model usage
  • Run chat completions with Llama 3 on your local machine
  • Customize responses and extract specific message content from the API

By the end, you'll have a fully operational local server with LM Studio, allowing you to interact with LLMs privately and efficiently.

Full Video & Source Code
 

Lesson 6

Getting Started with Open Web UI

In this video, you'll learn how to install and set up Open Web UI, a powerful alternative to ChatGPT that supports both remote and local models. You'll explore features like running models in parallel for comparison, creating prompt templates, and chatting with your own documents.

You'll learn how to:

  • Install Open Web UI and set up the interface without Docker
  • Connect and run local models using the Ollama API
  • Compare results from multiple models in parallel
  • Create and save prompt templates for repeated use
  • Chat with your own documents by uploading them and querying based on specific content

By the end, you'll have a fully functional Open Web UI setup, offering advanced features like document chat and model comparison.

Full Video & Source Code
 

Lesson 7

Building a Dark Persona AI with Local LLMs and TTS

In this video, you'll learn how to create an AI character with a dark persona using local large language models (LLMs) and text-to-speech (TTS) capabilities. By setting up models like coqui TTS and combining them with a local LLM server provided by LM Studio, you can create a conversational AI that responds with wit and a sharp edge. The character can even speak its responses using the TTS tool, making for an immersive AI experience.

You'll learn how to:

  • Set up a virtual environment and install coqui TTS for speech generation
  • Build a conversational AI that adopts a dark persona with a cold, witty tone
  • Integrate LM Studio’s local LLM server to power AI responses
  • Use Python and Pygame to automate playing the AI’s spoken replies
  • Iterate through multiple conversational interactions to simulate real-time dialogue

By the end, you'll have a fully functional AI character capable of engaging in sharp, sarcastic conversations with realistic speech output.

Full Video & Source Code