WellAlly Logo
WellAlly康心伴
Development

Building a Source-Backed Nutrition Chatbot: RAG with FastAPI & Pinecone

Discover how to build a reliable AI nutrition assistant. This guide covers embedding scientific data, setting up Pinecone vector search, and deploying a FastAPI backend for accurate, source-cited answers.

W
2025-12-12
8 min read

"Is red meat bad for you?" "What's the best source of vegan protein?" In the age of information overload, getting a straight, reliable answer to nutrition questions is surprisingly difficult. You're often met with conflicting blog posts, out-of-date advice, or algorithm-driven content that lacks scientific backing. For developers, this chaos presents a fascinating technical challenge: can we build a tool that provides accurate, evidence-based nutritional guidance?

That's exactly what we're going to do in this project showcase. We will build a "NutriRAG" chatbot, a smart AI assistant that answers nutritional questions based only on a trusted knowledge base of scientific articles.

We'll be using a powerful, modern AI stack:

  • FastAPI for our high-performance, easy-to-use API backend.
  • Pinecone as our managed vector database for lightning-fast similarity searches.
  • Sentence-Transformers to convert our scientific texts into meaningful vector embeddings.
  • A Large Language Model (LLM) like one from OpenAI or a local model to generate human-like answers.

This article will guide you through the entire process, from preparing the data to deploying a live API endpoint. You'll learn the core concepts behind Retrieval-Augmented Generation (RAG) and gain the practical skills to build your own specialized chatbots.

Understanding the Problem

Standard LLMs are trained on vast amounts of internet data. While incredibly knowledgeable, they have two key limitations for our use case:

  1. Hallucination: They can invent facts or present outdated information with high confidence, which is dangerous for health-related topics.
  2. Lack of Specificity: Their knowledge is general. They haven't been specifically trained on a curated set of recent, peer-reviewed nutritional science papers.

A RAG architecture solves this. Instead of just asking an LLM a question, we first retrieve relevant information from our trusted knowledge base and then pass that information to the LLM as context. This forces the model to base its answer on our provided sources, dramatically increasing accuracy and allowing us to cite the exact documents used to generate the response.

Our approach is better because it's verifiable, specialized, and trustworthy.

Prerequisites

To follow along, you'll need:

  • Python 3.8+: Ensure you have a recent version of Python installed.
  • Pinecone Account: Sign up for a free tier account at Pinecone.io to get your API key.
  • LLM API Key: An API key from a provider like OpenAI, or a locally running LLM that you can access.
  • Familiarity with Python and APIs: Basic knowledge of Python programming and REST API concepts will be helpful.
  • Curated Documents: A collection of scientific articles or trusted texts about nutrition (e.g., in .txt or .pdf format). For this project, you can start by saving abstracts from sources like PubMed.

First, let's set up our project environment. Create a new project directory and a virtual environment:

code
mkdir nutritrag-project
cd nutritrag-project
python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
```Next, create a `requirements.txt` file with the necessary libraries.
```text
# requirements.txt
fastapi
uvicorn[standard]
pinecone-client
sentence-transformers
python-dotenv
langchain
langchain-openai
Code collapsed

Install them using pip:

code
pip install -r requirements.txt
Code collapsed

Finally, create a .env file to securely store your API keys.

code
# .env
PINECONE_API_KEY="YOUR_PINECONE_API_KEY"
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
Code collapsed

Step 1: Data Preparation and Embedding

What we're doing

The first step in any RAG system is to prepare the knowledge base. This involves loading our nutritional articles, splitting them into manageable chunks, and converting those chunks into vector embeddings. Embeddings are numerical representations of text that capture its semantic meaning, allowing us to find similar pieces of text by comparing their vectors.

Implementation

For this step, we'll create a script called ingest.py. Let's assume you have your scientific articles in a directory named data/.

code
# ingest.py
import os
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from pinecone import Pinecone, ServerlessSpec

# Load environment variables
load_dotenv()

PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX_NAME = "nutritrag"

def ingest_data():
    """Load, chunk, embed, and store documents in Pinecone."""
    # 1. Load documents from the data directory
    loader = DirectoryLoader('data/', glob="**/*.txt", show_progress=True)
    documents = loader.load()
    print(f"Loaded {len(documents)} documents.")

    # 2. Chunk documents into smaller pieces
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    docs = text_splitter.split_documents(documents)
    print(f"Split into {len(docs)} chunks.")

    # 3. Create embeddings
    embeddings_model = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    )
    print("Embedding model loaded.")

    # 4. Initialize Pinecone and create index
    pc = Pinecone(api_key=PINECONE_API_KEY)

    if PINECONE_INDEX_NAME not in pc.list_indexes().names():
        pc.create_index(
            name=PINECONE_INDEX_NAME,
            dimension=384,  # Dimension for all-MiniLM-L6-v2
            metric='cosine',
            spec=ServerlessSpec(
                cloud='aws',
                region='us-east-1'
            )
        )
        print(f"Pinecone index '{PINECONE_INDEX_NAME}' created.")
    
    index = pc.Index(PINECONE_INDEX_NAME)

    # 5. Upsert embeddings to Pinecone in batches
    batch_size = 100
    for i in range(0, len(docs), batch_size):
        batch = docs[i:i+batch_size]
        ids = [f"doc_{i+j}" for j in range(len(batch))]
        texts = [doc.page_content for doc in batch]
        embeds = embeddings_model.embed_documents(texts)
        metadata = [doc.metadata for doc in batch]
        
        # Prepare vectors for upsert
        vectors_to_upsert = []
        for doc_id, embed, meta in zip(ids, embeds, metadata):
            vectors_to_upsert.append({
                "id": doc_id,
                "values": embed,
                "metadata": { "text": texts[ids.index(doc_id)], **meta }
            })
        
        index.upsert(vectors=vectors_to_upsert)
        print(f"Upserted batch {i//batch_size + 1}")

    print("Data ingestion complete.")

if __name__ == "__main__":
    ingest_data()```

### How it works
1.  **Loading**: We use `DirectoryLoader` from LangChain to load all `.txt` files from our `data` folder.
2.  **Chunking**: Large documents are broken down into smaller, semantically coherent chunks using `RecursiveCharacterTextSplitter`. This ensures that our embeddings are focused and relevant.
3.  **Embedding**: We use the `all-MiniLM-L6-v2` model from Sentence-Transformers, a popular choice for its balance of performance and size. It converts each text chunk into a 384-dimensional vector.
4.  **Storing**: We initialize the Pinecone client, create a new index if it doesn't exist, and then `upsert` (upload/update) our chunks' embeddings and metadata. Storing the original text in the metadata is crucial for the generation step.

To run this, simply execute the script from your terminal:
```bash
python ingest.py
Code collapsed

Step 2: Building the FastAPI Backend

What we're doing

Now that our knowledge is in Pinecone, we need an API to query it. We'll create a FastAPI application with a single endpoint, /ask, that accepts a user's question, retrieves relevant context from Pinecone, and uses an LLM to generate an answer.

Implementation

Create a file named main.py.

code
# main.py
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from dotenv import load_dotenv
from pinecone import Pinecone
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain_pinecone import PineconeVectorStore

# Load environment variables
load_dotenv()

# Initialize API keys and settings
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_INDEX_NAME = "nutritrag"

# Initialize FastAPI app
app = FastAPI(
    title="NutriRAG API",
    description="An API for asking nutritional questions backed by scientific articles.",
    version="1.0.0",
)

# Pydantic model for request body
class QueryRequest(BaseModel):
    question: str

# Initialize Embeddings and LLM
embeddings_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
llm = ChatOpenAI(api_key=OPENAI_API_KEY, model_name='gpt-3.5-turbo', temperature=0.0)

# Initialize Pinecone Vector Store
vectorstore = PineconeVectorStore.from_existing_index(
    index_name=PINECONE_INDEX_NAME, 
    embedding=embeddings_model
)

# Create the RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

@app.post("/ask")
async def ask_question(request: QueryRequest):
    """
    Receives a nutritional question, retrieves relevant context from Pinecone,
    and generates an answer using an LLM.
    """
    if not request.question:
        raise HTTPException(status_code=400, detail="Question cannot be empty.")
    
    try:
        response = qa_chain.invoke({"query": request.question})
        
        answer = response.get("result", "No answer found.")
        source_documents = response.get("source_documents", [])
        
        sources = []
        if source_documents:
            sources = [
                {"source": doc.metadata.get("source", "Unknown"), "text": doc.page_content}
                for doc in source_documents
            ]

        return {"answer": answer, "sources": sources}

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/")
def read_root():
    return {"message": "Welcome to the NutriRAG API. Use the /docs endpoint to test."}

Code collapsed

How it works

  1. FastAPI Setup: We create a standard FastAPI application.
  2. Pydantic Model: QueryRequest defines the expected JSON structure for incoming POST requests, ensuring data validation.
  3. Initialization: We load our embedding model and LLM. We also initialize the PineconeVectorStore from LangChain, which acts as a bridge to our Pinecone index.
  4. RetrievalQA Chain: This is the core of our RAG logic. LangChain's RetrievalQA chain automates the process:
    • It takes the user's query.
    • Uses the retriever (our Pinecone vector store) to find the most relevant documents.
    • "Stuffs" these documents into the LLM's context along with the query.
    • Generates the final answer.
  5. Endpoint Logic: The /ask endpoint invokes the qa_chain, formats the response to include the answer and the source documents, and returns it as JSON.

Putting It All Together

To run your API server, use uvicorn:

code
uvicorn main:app --reload```
Uvicorn will start a local server, usually on `http://127.0.0.1:8000`.

### Testing the API
FastAPI provides automatic interactive documentation. Open your browser and navigate to **`http://127.0.0.1:8000/docs`**. You'll see a Swagger UI where you can test the `/ask` endpoint directly.

1.  Click on the `/ask` endpoint to expand it.
2.  Click "Try it out".
3.  In the request body, enter a question related to your documents, like:
    ```json
    {
      "question": "What is the impact of omega-3 fatty acids on cognitive function?"
    }
Code collapsed
  1. Click "Execute".

You should receive a JSON response containing the LLM-generated answer and a list of the source text chunks that were used to create it.

Performance Considerations

  • Embedding Model: The choice of embedding model is a trade-off between speed, cost, and accuracy. all-MiniLM-L6-v2 is great for getting started, but larger models may provide better retrieval quality.
  • Batching: When ingesting large amounts of data, always process and upsert in batches to avoid overwhelming your connection or hitting API rate limits. Our ingest.py script does this.
  • LLM Latency: The generation step is often the slowest. Consider using smaller, faster LLMs or streaming the response for a better user experience.

Security Best Practices

  • Environment Variables: Never hardcode API keys in your source code. Use a .env file for local development and environment variables in production.
  • Input Validation: FastAPI's Pydantic integration handles basic input validation, protecting against many common injection-style attacks.
  • Authentication: For a production application, protect your API endpoint with an authentication mechanism like OAuth2 or API keys to control access.

Conclusion

You have successfully built a complete Retrieval-Augmented Generation system! This project demonstrates how to create a specialized, trustworthy chatbot that grounds its answers in a curated knowledge base. By combining the strengths of FastAPI's performance, Pinecone's efficient vector search, and the generative power of LLMs, you've created a powerful tool for combating misinformation in a critical domain like nutrition.

From here, you can expand the project by:

  • Building a frontend interface with a framework like React or Streamlit.
  • Experimenting with different chunking strategies and embedding models.
  • Adding conversation history to create a more stateful chatbot experience.

Resources

#

Article Tags

python
ai
fastapi
rag

Related Medical Knowledge

Learn more about related medical concepts and tests

W

WellAlly's core development team, comprised of healthcare professionals, software engineers, and UX designers committed to revolutionizing digital health management.

Expertise

Healthcare Technology
Software Development
User Experience
AI & Machine Learning

Found this article helpful?

Try KangXinBan and start your health management journey