Vector Embeddings
Generate high-quality embeddings for semantic search, clustering, and RAG with a unified API.
What you get
Multi-provider
Use HuggingFace (local), Ollama, or LMStudio with the same code.
Simple API
embed(), embed_batch(), and similarity helpers.
RAG-ready
Build semantic search + context injection patterns in minutes.
Table of Contents
Quick Start
Install embeddings support:
pip install "abstractcore[embeddings]"
Generate your first embeddings:
from abstractcore.embeddings import EmbeddingManager
# Option 1: HuggingFace (default) - local models
embedder = EmbeddingManager() # all-MiniLM-L6-v2 by default
# Option 2: Ollama - local models via Ollama API
# embedder = EmbeddingManager(provider="ollama", model="granite-embedding:278m")
# Option 3: LMStudio - local models via LMStudio API
# embedder = EmbeddingManager(provider="lmstudio", model="text-embedding-all-minilm-l6-v2")
embedding = embedder.embed("Machine learning transforms how we process information")
print(f"Embedding dimension: {len(embedding)}")
similarity = embedder.compute_similarity("artificial intelligence", "machine learning")
print(f"Similarity: {similarity:.3f}")
Providers & Models
AbstractCore supports multiple embedding providers. Pick what matches your environment:
HuggingFace (default)
Local sentence-transformers models. Great default for development and production.
Ollama
Local embeddings via the Ollama API (requires Ollama running).
LMStudio
Local embeddings via LM Studio (GUI-friendly model management).
from abstractcore.embeddings import EmbeddingManager
# HuggingFace (default)
embedder = EmbeddingManager()
# Ollama
embedder = EmbeddingManager(provider="ollama", model="granite-embedding:278m")
# LMStudio
embedder = EmbeddingManager(provider="lmstudio", model="text-embedding-all-minilm-l6-v2")
Core API
Key methods you’ll use most often:
embed(text)→ a single embedding vectorembed_batch(texts)→ batch embeddings (faster)compute_similarity(a, b)→ cosine similarity for two stringscompute_similarities(query, docs)→ one-to-many similaritycompute_similarities_matrix(a, b=None)→ similarity matrix
from abstractcore.embeddings import EmbeddingManager
embedder = EmbeddingManager()
texts = [
"Python programming language",
"JavaScript for web development",
"Machine learning with Python",
]
embeddings = embedder.embed_batch(texts)
print(f"Generated {len(embeddings)} embeddings")
query = "learn python"
scores = embedder.compute_similarities(query, texts)
print([round(s, 3) for s in scores])
Semantic Search
A minimal semantic search loop:
from abstractcore.embeddings import EmbeddingManager
embedder = EmbeddingManager()
documents = [
"Python is strong for data science and machine learning applications",
"JavaScript enables interactive web pages and modern frontend development",
"SQL databases store and query structured data efficiently",
]
def semantic_search(query: str, docs: list[str], top_k: int = 2):
scored = [(i, embedder.compute_similarity(query, doc), doc) for i, doc in enumerate(docs)]
scored.sort(key=lambda t: t[1], reverse=True)
return scored[:top_k]
for i, score, doc in semantic_search("web development frameworks", documents):
print(f"{score:.3f} → {doc}")
Simple RAG Pipeline
Retrieve relevant context with embeddings, then pass it to any LLM provider:
from abstractcore import create_llm
from abstractcore.embeddings import EmbeddingManager
embedder = EmbeddingManager()
llm = create_llm("openai", model="gpt-4o-mini")
knowledge_base = [
"The Eiffel Tower is 330 meters tall and was completed in 1889.",
"Paris is the capital city of France with over 2 million inhabitants.",
"The Louvre Museum in Paris houses the Mona Lisa painting.",
]
def rag_answer(question: str) -> str:
scored = sorted(
((embedder.compute_similarity(question, doc), doc) for doc in knowledge_base),
reverse=True,
)
context = "\n".join([doc for _, doc in scored[:2]])
prompt = f"""Context:
{context}
Question: {question}
Answer:"""
return llm.generate(prompt).content
print(rag_answer("How tall is the Eiffel Tower?"))
Server: OpenAI-compatible /v1/embeddings
If you run the AbstractCore HTTP server, you can also generate embeddings over REST.
pip install "abstractcore[server]"
python -m abstractcore.server.app --port 8000
curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": ["text 1", "text 2", "text 3"],
"model": "ollama/granite-embedding:278m"
}'
Tip: See the Server guide for model discovery (/v1/models) and provider status (/providers).
Performance Tips
- Prefer
embed_batch()when embedding many texts. - Cache embeddings for your knowledge base; only re-embed changed documents.
- For local providers, keep the embedding model “warm” (avoid frequent reloads).