Skip to main content
All Tutorials
intermediateclaude

Build a RAG pipeline with Pinecone and Claude

llm-rag

~60 min hands-on4 min readJune 16, 2026
Recipe code coming soon — subscribe to get notified

Prerequisites

  • Node.js 20+
  • Anthropic API key
  • Pinecone account (free tier works)
  • A folder of PDF or Markdown docs to index

What you'll build

A small RAG (retrieval-augmented generation) service that:

  1. Chunks and embeds your documents into a Pinecone index
  2. Retrieves top-k chunks for a user question
  3. Sends grounded context to Claude with a strict "answer from context only" prompt
  4. Exposes a CLI or HTTP endpoint for queries

This pattern powers support bots, internal wikis, and sales enablement without fine-tuning a model.

Architecture overview

Documents → chunk (500 tokens) → embed → Pinecone upsert
User question → embed query → Pinecone query → top 5 chunks → Claude completion

Keep chunks overlapping by ~50 tokens so sentences split across boundaries still retrieve fully. Store metadata (source, page, title) on each vector for citations in the UI.

Before you start

Create accounts:

  • Anthropic console — API key
  • Pinecone — create an index with dimension 1536 if using OpenAI text-embedding-3-small, or 1024 for many open embedders
export ANTHROPIC_API_KEY=sk-ant-...
export PINECONE_API_KEY=...
export PINECONE_INDEX=rag-demo
export OPENAI_API_KEY=sk-...   # for embeddings only

Step 1 — Project setup

mkdir rag-pinecone-claude && cd rag-pinecone-claude
npm init -y
npm install @anthropic-ai/sdk @pinecone-database/pinecone openai pdf-parse
npm install -D tsx @types/node typescript

Step 2 — Chunk and embed documents

Create src/ingest.ts:

import fs from 'fs/promises'
import path from 'path'
import OpenAI from 'openai'
import { Pinecone } from '@pinecone-database/pinecone'

const openai = new OpenAI()
const pinecone = new Pinecone()
const index = pinecone.index(process.env.PINECONE_INDEX!)

const CHUNK_SIZE = 500
const OVERLAP = 50

function chunkText(text: string): string[] {
  const words = text.split(/\s+/)
  const chunks: string[] = []
  for (let i = 0; i < words.length; i += CHUNK_SIZE - OVERLAP) {
    chunks.push(words.slice(i, i + CHUNK_SIZE).join(' '))
  }
  return chunks.filter((c) => c.length > 80)
}

async function embed(texts: string[]): Promise<number[][]> {
  const res = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
  })
  return res.data.map((d) => d.embedding)
}

export async function ingestDir(dir: string) {
  const files = (await fs.readdir(dir)).filter((f) => f.endsWith('.md') || f.endsWith('.txt'))
  let id = 0
  for (const file of files) {
    const raw = await fs.readFile(path.join(dir, file), 'utf-8')
    const chunks = chunkText(raw)
    const vectors = await embed(chunks)
    await index.upsert(
      vectors.map((values, i) => ({
        id: `${file}-${id++}`,
        values,
        metadata: { source: file, text: chunks[i] },
      })),
    )
    console.log(`Indexed ${chunks.length} chunks from ${file}`)
  }
}

Run: npx tsx -e "import { ingestDir } from './src/ingest.ts'; ingestDir('./docs')"

Step 3 — Query with retrieval + Claude

Create src/ask.ts:

import Anthropic from '@anthropic-ai/sdk'
import OpenAI from 'openai'
import { Pinecone } from '@pinecone-database/pinecone'

const anthropic = new Anthropic()
const openai = new OpenAI()
const index = new Pinecone().index(process.env.PINECONE_INDEX!)

async function retrieve(question: string, topK = 5) {
  const emb = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: question,
  })
  const res = await index.query({
    vector: emb.data[0].embedding,
    topK,
    includeMetadata: true,
  })
  return (res.matches ?? [])
    .map((m) => String(m.metadata?.text ?? ''))
    .filter(Boolean)
}

export async function ask(question: string): Promise<string> {
  const chunks = await retrieve(question)
  const context = chunks.map((c, i) => `[${i + 1}] ${c}`).join('\n\n')

  const msg = await anthropic.messages.create({
    model: 'claude-sonnet-4-5',
    max_tokens: 1024,
    system: `Answer using ONLY the numbered context below. If the answer is not in context, say "I don't have that in the indexed documents." Cite chunk numbers like [2] when relevant.`,
    messages: [
      {
        role: 'user',
        content: `Context:\n${context}\n\nQuestion: ${question}`,
      },
    ],
  })

  const block = msg.content[0]
  return block.type === 'text' ? block.text : ''
}

Test: npx tsx -e "import { ask } from './src/ask.ts'; ask('What is our refund policy?').then(console.log)"

Step 4 — Evaluation before production

Build a CSV of 20 question/answer pairs from your docs. For each question:

  1. Run retrieval-only — do the right chunks appear in top 5?
  2. Run full RAG — is the answer faithful to context?
  3. Log failures and tune chunk size or metadata filters

Bad RAG is usually a retrieval problem, not a model problem. Compare Pinecone vs Weaviate if you need hybrid search later.

Production hardening

  • Namespaces per customer in Pinecone for multi-tenant SaaS
  • Rate limits on the ask endpoint
  • Citation UI — show source filenames from metadata
  • Re-ingest webhook when docs change in Notion or Google Drive
  • Cost caps on embedding batch jobs

Common errors

Empty retrieval — Index name mismatch or wrong embedding dimension vs index config.

Hallucinations despite context — Strengthen the system prompt; reduce topK if irrelevant chunks confuse the model.

Slow queries — Cache embeddings for frequent questions; use Claude Haiku for draft answers.

Next steps

  • Add reranking with Cohere for better precision on long corpora
  • Wire the ask endpoint into your support widget
  • Read How to Choose an LLM API in 2026 for provider failover patterns

Get the full recipe

Clone the starter repo and follow along in your own environment.

Related Stacks

Indie Hackers community

The indie SaaS AI stack Marc Lou uses to ship products in days, not months

by Marc Lou

claudecursorv0perplexity+1 more
3 min readMay 2026