🔰 Tutorial: Getting Started with LangChain + LLMs + Embeddings (OpenAI, Anthropic, Gemini, Hugging Face)
1. What You’ll Learn
By the end of this tutorial, you will know how to:
-
Use LangChain to call:
-
OpenAI (GPT)
-
Anthropic (Claude)
-
Google Gemini
-
Hugging Face models (cloud + local/pipeline)
-
-
Generate embeddings (numerical vectors) from text
-
Use embeddings for a simple semantic search with cosine similarity
This tutorial assumes:
-
You know basic Python (
print, variables, functions) -
You have access to at least one API key (OpenAI, Anthropic, Google, or Hugging Face)
What is this section?
A quick overview of the goals and prerequisites of the tutorial.
When should you care?
Before you start coding—so you know whether this tutorial matches your skills and setup.
Why is it important?
It sets expectations and helps a beginner understand what they’ll be able to do after finishing.
2. Project Setup
2.1. Create a Project Folder
mkdir llm_langchain_demo
cd llm_langchain_demo
2.2. Create a Virtual Environment (Recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
2.3. Add requirements.txt
Create a file named requirements.txt and paste:
# LangChain Core
langchain
langchain-core
# OpenAI Integration
langchain-openai
openai
# Anthropic Integration
langchain-anthropic
# Google Gemini (PaLM) Integration
langchain-google-genai
google-generativeai
# Hugging Face Integration
langchain-huggingface
transformers
huggingface-hub
# Environment Variable Management
python-dotenv
# Machine Learning Utilities
numpy
scikit-learn
Install dependencies:
pip install -r requirements.txt
What is this section?
This is the environment setup: folder, virtualenv, and dependencies for all providers and tools.
When do you do this?
At the very beginning, before running any of the example code.
Why is it important?
-
Keeps your project isolated from other Python projects.
-
Ensures all required packages are available.
-
Prevents version conflicts later.
3. Setting Up API Keys with .env
We will store API keys in a .env file so we don’t hard-code them in the script.
3.1. Create .env
In your project folder, create a file called .env:
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
GOOGLE_API_KEY=your_google_api_key_here
HUGGINGFACEHUB_API_TOKEN=your_huggingface_token_here
You don’t need all of them to start.
If you only have OpenAI now, just setOPENAI_API_KEY.
3.2. Load .env in Python
In each script, we’ll do:
from dotenv import load_dotenv
load_dotenv()
This makes your keys available in the environment.
What is this section?
It shows how to configure secrets (API keys) using a .env file and python-dotenv.
When should you use this pattern?
-
Whenever you use API keys or other secrets (tokens, passwords).
-
When you plan to share code (GitHub, teammates) without exposing keys.
Why is it important?
-
Security: you don’t push keys to Git.
-
Flexibility: you can easily switch keys or environments (dev/stage/prod).
-
Clean code: no hard-coded secrets inside scripts.
4. File Structure
For learning, you can keep everything in one file named 1_llm_demo.py.
We will break it into sections so you can comment/uncomment and run part by part.
What is this section?
A suggestion on how to organize this tutorial’s code practically.
When to follow this?
When you’re just exploring and don’t want to manage too many files.
Why does it help?
-
Easier for beginners: one file, multiple small sections.
-
You can test each section independently by commenting others.
5. Using OpenAI with LangChain
5.1. OpenAI – Simple Text Completion (OpenAI)
This is like the classic “completion” API.
# --- Section 1: OpenAI Text Completion ---
from langchain_openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
llm = OpenAI(model='gpt-3.5-turbo-instruct')
result = llm.invoke("What is the capital of India?")
print("OpenAI Text Completion Output:")
print(result)
How It Works
-
OpenAIis a LangChain wrapper for completion-style models. -
model='gpt-3.5-turbo-instruct'tells LangChain which model to use. -
llm.invoke(prompt)sends the prompt to the model and returns a string.
Example Output (will vary):
OpenAI Text Completion Output:
The capital of India is New Delhi.
What is this feature?
A simple one-shot text generation using an OpenAI model—classic “prompt → text” behavior.
When should you use it?
-
When you need direct text completion, e.g.:
-
Short answers
-
Summaries
-
Simple transformations
-
Why is it useful?
-
Easiest way to start with LLMs.
-
Great for scripts and background tasks where you don’t need chat-style roles.
5.2. OpenAI – Chat Model (ChatOpenAI)
Now let’s use a chat-style model like GPT-4.
# --- Section 2: OpenAI Chat Model ---
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
model = ChatOpenAI(
model='gpt-4',
temperature=1.5, # more creative
max_completion_tokens=10 # short response
)
result = model.invoke("Write a 5 line poem on cricket")
print("\nOpenAI Chat (Poem) Output:")
print(result.content)
Important Parameters
-
temperature-
0.0→ more deterministic -
1.5→ more random/creative
-
-
max_completion_tokens-
Limits response length.
-
Example Output (will vary):
OpenAI Chat (Poem) Output:
Willow sings under stadium light,
Leather kisses sky in flight,
Crowd erupts with pure delight,
Fields of green in painted white,
Cricket dreams soar through the night.
What is this feature?
Using chat-based models (like GPT-4) via LangChain’s ChatOpenAI.
When to use it?
-
For conversational tasks:
-
Chatbots
-
Assistants
-
Multi-turn conversations
-
-
When you may later pass message lists (system/human/AI).
Why is it important?
-
Modern LLMs like GPT-4 are primarily chat models.
-
Better control over style via roles (system/human).
-
More aligned with real-world apps and tools.
6. Using Anthropic Claude with LangChain
# --- Section 3: Anthropic Claude ---
from langchain_anthropic import ChatAnthropic
from dotenv import load_dotenv
load_dotenv()
model = ChatAnthropic(model='claude-3-5-sonnet-20241022')
result = model.invoke("What is the capital of India?")
print("\nAnthropic Claude Output:")
print(result.content)
How It Works
-
ChatAnthropicis LangChain’s wrapper for Anthropic’s Claude models. -
It uses
ANTHROPIC_API_KEYfrom.env.
Example Output:
Anthropic Claude Output:
The capital of India is New Delhi.
What is this feature?
Integration of Anthropic Claude models via LangChain.
When to use it?
-
When you want to:
-
Compare model outputs across providers.
-
Use Claude’s strengths (reasoning, safer outputs in some contexts).
-
Why is it important?
-
Real projects rarely rely on just one provider.
-
Having a common LangChain interface makes switching providers easy.
7. Using Google Gemini with LangChain
# --- Section 4: Google Gemini ---
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv
load_dotenv()
model = ChatGoogleGenerativeAI(model='gemini-1.5-pro')
result = model.invoke("What is the capital of India?")
print("\nGoogle Gemini Output:")
print(result.content)
Notes
-
Requires
GOOGLE_API_KEYin.env. -
Works similar to other chat models.
Example Output:
Google Gemini Output:
The capital of India is New Delhi.
What is this feature?
Access to Google Gemini via LangChain.
When to use it?
-
If you’re already in the Google Cloud ecosystem.
-
If you want multimodal or certain Gemini-specific capabilities.
Why is it important?
-
Shows how the same LangChain pattern works across providers.
-
Gives flexibility in choosing the best model for your use case.
8. Using Hugging Face Models with LangChain
We’ll see two ways:
-
Using
HuggingFaceEndpoint(hosted by Hugging Face) -
Using
HuggingFacePipeline(runs via local transformers pipeline)
8.1. Hugging Face Hosted Model (HuggingFaceEndpoint)
# --- Section 5: Hugging Face Endpoint ---
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from dotenv import load_dotenv
load_dotenv()
llm = HuggingFaceEndpoint(
repo_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
task="text-generation"
)
model = ChatHuggingFace(llm=llm)
result = model.invoke("What is the capital of India?")
print("\nHugging Face Endpoint Output:")
print(result.content)
Notes
-
Uses
HUGGINGFACEHUB_API_TOKENfor authentication. -
repo_idis the model name from Hugging Face. -
Output might be less accurate than GPT/Gemini because this is a tiny model.
Example Output (might be approximate):
Hugging Face Endpoint Output:
The capital of India is New Delhi.
What is this feature?
Calling hosted models on Hugging Face via a remote endpoint.
When to use it?
-
If you don’t want to run models locally.
-
You want to use open-source models but with cloud hosting.
Why is it useful?
-
No need for GPU locally.
-
Access to lots of OSS models (TinyLlama, Mistral, etc.) with the same LangChain interface.
8.2. Hugging Face Local/Pipeline (HuggingFacePipeline)
# --- Section 6: Hugging Face Pipeline (Local/Transformers) ---
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
import os
# Optional: choose where to cache models
os.environ['HF_HOME'] = 'D:/huggingface_cache' # change path for Linux/Mac if needed
llm = HuggingFacePipeline.from_model_id(
model_id='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
task='text-generation',
pipeline_kwargs=dict(
temperature=0.5,
max_new_tokens=100,
)
)
model = ChatHuggingFace(llm=llm)
result = model.invoke("What is the capital of India?")
print("\nHugging Face Pipeline Output:")
print(result.content)
What’s Happening?
-
HuggingFacePipelinedownloads the model and usestransformersto run it locally.
What is this feature?
Running open-source models locally via transformers and LangChain.
When to use it?
-
When you:
-
Want to avoid external APIs.
-
Have local resources (CPU/GPU).
-
Need full control over model behavior.
-
Why is it important?
-
You are not locked into paid APIs.
-
Good for privacy-sensitive or offline use cases.
9. Working with Embeddings
9.1. What Are Embeddings (In Simple Terms)?
-
They convert text → list of numbers (vector).
-
Texts with similar meaning → vectors that are close to each other.
-
Used in:
-
Search
-
Recommendation
-
RAG (Retrieval-Augmented Generation)
-
Clustering, etc.
-
What is this concept?
Embeddings are the numeric representation of text that encode semantic meaning.
When do you use them?
-
Whenever you want to compare meanings of texts, not just exact words.
-
For search like: “find me text similar to this idea”.
Why are they crucial?
They are the foundation of:
-
Semantic search
-
Document retrieval in RAG
-
Many “smart” features built on top of LLMs.
9.2. Single Text Embedding with OpenAI
# --- Section 7: OpenAI Embedding - Single Query ---
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
load_dotenv()
embedding = OpenAIEmbeddings(
model='text-embedding-3-large',
dimensions=32 # for demo, use smaller vector
)
result = embedding.embed_query("Delhi is the capital of India")
print("\nOpenAI Single Query Embedding:")
print(result)
print("Length of vector:", len(result))
Example Output (shape only):
OpenAI Single Query Embedding:
[0.0123, -0.0045, 0.0234, ...]
Length of vector: 32
Actual numbers will differ and are long floating-point values.
What is this doing?
It converts one sentence into a numeric vector.
When to use it?
-
When you want to:
-
Compare this sentence with other embeddings.
-
Use it as an input to similarity search or clustering.
-
Why is it important?
This is the simplest building block for semantic operations on text.
9.3. Multiple Text Embeddings with OpenAI
# --- Section 8: OpenAI Embedding - Multiple Documents ---
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
load_dotenv()
embedding = OpenAIEmbeddings(
model='text-embedding-3-large',
dimensions=32
)
documents = [
"Delhi is the capital of India",
"Kolkata is the capital of West Bengal",
"Paris is the capital of France"
]
result = embedding.embed_documents(documents)
print("\nOpenAI Document Embeddings:")
print("Number of vectors:", len(result))
print("Length of each vector:", len(result[0]))
Example Output:
OpenAI Document Embeddings:
Number of vectors: 3
Length of each vector: 32
What is this doing?
Embedding a list of documents, one vector per document.
When should you do this?
-
When building:
-
A simple document search system.
-
Indexing a knowledge base or FAQ list.
-
Why is it useful?
You can store these vectors and later compare a query embedding to decide which document is most relevant.
9.4. Embeddings with Hugging Face (Sentence Transformers)
# --- Section 9: Hugging Face Embeddings ---
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(
model_name='sentence-transformers/all-MiniLM-L6-v2'
)
documents = [
"Delhi is the capital of India",
"Kolkata is the capital of West Bengal",
"Paris is the capital of France"
]
vectors = embedding.embed_documents(documents)
print("\nHugging Face Embeddings:")
print("Number of vectors:", len(vectors))
print("Length of each vector:", len(vectors[0]))
Example Output:
Hugging Face Embeddings:
Number of vectors: 3
Length of each vector: 384
384is a common dimension for this model.
What is this feature?
Using an open-source embedding model (Sentence Transformers) via Hugging Face.
When to use it?
-
When you:
-
Want a free / open-source alternative to OpenAI embeddings.
-
Prefer to run things locally or without paid APIs.
-
Why is it important?
-
Reduces cost for large-scale usage.
-
Open-source models can be fine-tuned or hosted on your infra.
10. Semantic Search Using Embeddings + Cosine Similarity
Now let’s connect the dots:
-
We have a list of cricket player descriptions.
-
User query:
"tell me about bumrah". -
We find which description is most similar to the query using cosine similarity.
# --- Section 10: Semantic Search with OpenAI Embeddings ---
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
load_dotenv()
embedding = OpenAIEmbeddings(
model='text-embedding-3-large',
dimensions=300 # larger dimension for better accuracy
)
documents = [
"Virat Kohli is an Indian cricketer known for his aggressive batting and leadership.",
"MS Dhoni is a former Indian captain famous for his calm demeanor and finishing skills.",
"Sachin Tendulkar, also known as the 'God of Cricket', holds many batting records.",
"Rohit Sharma is known for his elegant batting and record-breaking double centuries.",
"Jasprit Bumrah is an Indian fast bowler known for his unorthodox action and yorkers."
]
query = "tell me about bumrah"
# 1. Embed all documents
doc_embeddings = embedding.embed_documents(documents)
# 2. Embed the query
query_embedding = embedding.embed_query(query)
# 3. Compute cosine similarity
scores = cosine_similarity([query_embedding], doc_embeddings)[0]
# 4. Find best match
index, score = sorted(list(enumerate(scores)), key=lambda x: x[1])[-1]
print("\nSemantic Search Demo")
print("Query:", query)
print("Best Match:", documents[index])
print("Similarity Score:", score)
Explanation Step-by-Step
-
Convert each document to a vector →
doc_embeddings. -
Convert the query to a vector →
query_embedding. -
Use
cosine_similarityto compare query vs each document. -
Pick the document with the highest similarity score.
Example Output:
Semantic Search Demo
Query: tell me about bumrah
Best Match: Jasprit Bumrah is an Indian fast bowler known for his unorthodox action and yorkers.
Similarity Score: 0.87
Score roughly from
-1to1, where:
1→ almost identical direction (very similar)
0→ unrelatedNegative → opposite meaning (rare in practice here)
What is this doing?
A mini semantic search engine:
-
Turning documents + query into vectors.
-
Using similarity to find the most relevant document.
When to use this pattern?
-
FAQ search
-
Support answer suggestion
-
Document retrieval before passing context to an LLM (RAG).
Why is it powerful?
This is the core idea of many modern AI systems:
-
You don’t search by keywords; you search by meaning.
11. Extra Tips & Best Practices
-
✅ Start with one provider (OpenAI or Gemini), then add others.
-
✅ Use small temperature (
0.2–0.7) for factual tasks. -
✅ Use higher temperature (
>1) for creative tasks like poems/stories. -
✅ For embeddings:
-
Use OpenAI or Sentence Transformers (
all-MiniLM-L6-v2) for most projects.
-
-
✅ For real apps, you will:
-
Store embeddings in a vector database (Chroma, Pinecone, Qdrant, etc.).
-
Use them in RAG (Retrieval-Augmented Generation) flows.
-
What is this section?
A set of high-level practical rules.
When to apply them?
-
While designing or tuning real applications.
-
When you move from “demo” to “production-ish” setups.
Why do they matter?
They help you:
-
Avoid common mistakes (wrong temperature, over-complicated setup early).
-
Choose stable, proven tools and patterns.
No comments:
Post a Comment