Learn To Build Real-World AI Solutions

✓ Join 1,000+ developers building AI apps that actually ship
✓ Get full source code, templates, and commercial-use rights
✓ Based on apps and tools built for paying clients

👉 Start Building Now — Free for 7 Days

“ A very nice and precise lesson plan. The money spent is well invested.

★★★★★

Achim Dehnert , Professor at Neu-Ulm University

Leveraging Pinecone Vector Database with Groq and Whisper for Advanced Q&A Systems

groq pinecone Jul 23, 2024

Today, we're diving deep into building a cutting-edge Q&A system that combines the power of Groq's API, OpenAI's Whisper model, and the game-changing Pinecone vector database.

If you've been wondering how to supercharge your AI applications with efficient data storage and retrieval, you're in for a treat! 🚀

The Power Trio: Groq, Whisper, and Pinecone

Before we jump into the code, let's break down our star players:

Groq API: Provides lightning-fast inference capabilities.
Whisper (Large V3): OpenAI's state-of-the-art speech recognition model.
Pinecone Vector Database: The secret sauce for efficient storage and retrieval of high-dimensional data.

What is Pinecone, and Why Should You Care?

Pinecone is not just another database - it's a vector database specifically designed for machine learning applications. Here's why it's a game-changer:

Efficient Similarity Search: Pinecone makes finding the most relevant data based on similarity incredibly fast.
Scalability: It can handle billions of vectors, perfect for growing AI applications.
Real-time Updates: Add, update, or delete vectors on the fly without rebuilding the index.
Cloud-native: Designed to work seamlessly in cloud environments.

Building Our Q&A System: A Step-by-Step Guide

Step 1: Setting Up the Environment

First things first, let's get our environment ready:

pip install groq langchain pinecone-client langchain-community

Let's break down what each library does:

groq: This is the official Groq API client, allowing us to interact with Groq's powerful language models.
langchain: A framework for developing applications powered by language models, providing tools for chaining together different AI components.
pinecone-client: The official Python client for Pinecone, enabling us to interact with the Pinecone vector database.
langchain-community: Additional community-contributed components for LangChain, expanding its capabilities.

Step 2: Transcription Magic with Whisper

Let's start by transcribing audio to text using Groq's Whisper Large V3 model:

from groq import Groq
import os

client = Groq(api_key=os.environ['GROQ_API_KEY'])

def audio_to_text(filepath):
    with open(filepath, "rb") as file:
        translation = client.audio.translations.create(
            file=(filepath, file.read()),
            model="whisper-large-v3",
        )
    return translation.text

translation_text = audio_to_text('ghandi.mp3')

Let's break this down:

We import Groq from the groq library to interact with Groq's API.
We create a Groq client using an API key stored in an environment variable for security.
The audio_to_text function takes a filepath as input:
- It opens the file in binary mode ("rb")
- It uses client.audio.translations.create() to transcribe the audio:
  - file: A tuple containing the filepath and the file's binary content
  - model: Specifies the Whisper model version to use (in this case, "whisper-large-v3")
The function returns the transcribed text

Step 3: Embedding and Storing with Pinecone

Before we can start embedding and storing our transcripts, we need to create an index in Pinecone. This is a crucial step that sets up the structure for our vector database.

Log into your Pinecone account and navigate to the dashboard.
Click on "Create Index" or a similar option to start the index creation process.
Give your index a name. In our case, we'll use "transcripts".
Set the dimension to 384. This corresponds to the output dimension of our chosen embedding model ("all-MiniLM-L6-v2").
Choose "cosine" as the metric. This is commonly used for text similarity searches.
For the Capacity mode, select "Serverless" if you're just starting out. This option charges based on usage and is suitable for most projects.
Review your settings and click "Create Index".

Now that we have our index set up, we can proceed with embedding and storing our documents. We'll create embeddings of our transcribed text and store them efficiently:

from groq import Groq
from langchain.docstore.document import Document
from langchain_pinecone import PineconeVectorStore
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
import os

client = Groq(api_key=os.environ['GROQ_API_KEY'])

def audio_to_text(filepath):
    with open(filepath, "rb") as file:
        translation = client.audio.translations.create(
            file=(filepath, file.read()),
            model="whisper-large-v3",
        )
    return translation.text

translation_text = audio_to_text('ghandi.mp3')

documents = []

documents.append(Document(page_content=f"Title: Ghandi Speech \n\n {translation_text}"))
embedding_function = SentenceTransformerEmbeddings(
    model_name = "all-MiniLM-L6-v2"
)
PineconeVectorStore.from_documents(documents, embedding_function, index_name="transcripts")

Here's what's happening:

We import the necessary classes from LangChain and Pinecone.
We create a Document object with our transcribed text.
We set up an embedding function using SentenceTransformerEmbeddings:
- model_name="all-MiniLM-L6-v2": This specifies the pre-trained model to use for creating embeddings.
We initialize Pinecone with our API key and environment.
We check if our "transcripts" index exists, and create it if it doesn't:
- dimension=384: This matches the output dimension of our chosen embedding model.
Finally, we use PineconeVectorStore.from_documents() to create embeddings of our documents and store them in Pinecone:
- documents: Our list of Document objects
- embedding_function: The function we defined to create embeddings
- index_name: The name of our Pinecone index

Step 4: Retrieval and Question Answering

Now for the exciting part - retrieving relevant information and answering questions:

from groq import Groq
from langchain.docstore.document import Document
from langchain_pinecone import PineconeVectorStore
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
import os

client = Groq(api_key=os.environ['GROQ_API_KEY'])

def transcript_chat_completion(client, transcript, user_question):
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": f"""
                  Use this transcript or transcripts to answer any user questions, citing specific quotes:
                  {transcript}
              """

            },
            {
                "role": "user",
                "content": user_question
            }
        ],
        model="llama3-8b-8192"
    )

    return chat_completion.choices[0].message.content

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = PineconeVectorStore(
   index_name='transcripts',
   embedding=embedding_function
)

user_question = "Why is groq so fast?"
docs = vectorstore.similarity_search(user_question)
print(transcript_chat_completion(client, docs[0], user_question))

This final code snippet ties everything we've learned into a functional Q&A system:

Setup: We import necessary libraries and initialize our Groq client and Pinecone vector store.
Chat Completion Function: We define a function that uses Groq's API to generate answers based on a given transcript and user question.
Embedding: We create an embedding function to convert text into vector representations.
Question Processing:
- We take a user question ("Why is groq so fast?")
- Perform a similarity search in our Pinecone vector store to find relevant transcripts
- Use the most relevant transcript to generate an answer using our chat completion function

Why This System Rocks

Efficiency: Pinecone's vector database allows for lightning-fast retrieval of relevant information.
Scalability: As your dataset grows, Pinecone can handle billions of vectors without breaking a sweat.
Accuracy: By using RAG with Pinecone, we're ensuring that our AI has the most relevant context to answer questions.
Flexibility: This system can be easily adapted for various use cases - from podcast analysis to customer support automation.

Wrapping Up

We've just scratched the surface of what's possible when you combine Groq's API, Whisper, and Pinecone's vector database. The possibilities are endless - from creating smart content recommendation systems to building advanced chatbots with deep knowledge bases.

Remember, the key to building powerful AI systems isn't just about having a great language model - it's also about how efficiently you can store and retrieve relevant information. That's where Pinecone truly shines!

Have you tried integrating Pinecone into your AI projects?

Ready to turn your AI ideas into real apps?

Access step-by-step courses & launch-ready tools

👉 Start Building Now - Free For 7 Days