Vector Embeddings

Generate high-quality embeddings for semantic search, clustering, and RAG with a unified API.

What you get

Multi-provider

Use HuggingFace (local), Ollama, or LMStudio with the same code.

Simple API

embed(), embed_batch(), and similarity helpers.

RAG-ready

Build semantic search + context injection patterns in minutes.

Table of Contents

Quick Start Providers & Models Core API
Semantic Search Simple RAG Pipeline Server /v1/embeddings
Performance Tips

Quick Start

Install embeddings support:

pip install "abstractcore[embeddings]"

Generate your first embeddings:

from abstractcore.embeddings import EmbeddingManager

# Option 1: HuggingFace (default) - local models
embedder = EmbeddingManager()  # all-MiniLM-L6-v2 by default

# Option 2: Ollama - local models via Ollama API
# embedder = EmbeddingManager(provider="ollama", model="granite-embedding:278m")

# Option 3: LMStudio - local models via LMStudio API
# embedder = EmbeddingManager(provider="lmstudio", model="text-embedding-all-minilm-l6-v2")

embedding = embedder.embed("Machine learning transforms how we process information")
print(f"Embedding dimension: {len(embedding)}")

similarity = embedder.compute_similarity("artificial intelligence", "machine learning")
print(f"Similarity: {similarity:.3f}")

Providers & Models

AbstractCore supports multiple embedding providers. Pick what matches your environment:

HuggingFace (default)

Local sentence-transformers models. Great default for development and production.

Ollama

Local embeddings via the Ollama API (requires Ollama running).

LMStudio

Local embeddings via LM Studio (GUI-friendly model management).

from abstractcore.embeddings import EmbeddingManager

# HuggingFace (default)
embedder = EmbeddingManager()

# Ollama
embedder = EmbeddingManager(provider="ollama", model="granite-embedding:278m")

# LMStudio
embedder = EmbeddingManager(provider="lmstudio", model="text-embedding-all-minilm-l6-v2")

Core API

Key methods you’ll use most often:

  • embed(text) → a single embedding vector
  • embed_batch(texts) → batch embeddings (faster)
  • compute_similarity(a, b) → cosine similarity for two strings
  • compute_similarities(query, docs) → one-to-many similarity
  • compute_similarities_matrix(a, b=None) → similarity matrix
from abstractcore.embeddings import EmbeddingManager

embedder = EmbeddingManager()

texts = [
    "Python programming language",
    "JavaScript for web development",
    "Machine learning with Python",
]

embeddings = embedder.embed_batch(texts)
print(f"Generated {len(embeddings)} embeddings")

query = "learn python"
scores = embedder.compute_similarities(query, texts)
print([round(s, 3) for s in scores])

Simple RAG Pipeline

Retrieve relevant context with embeddings, then pass it to any LLM provider:

from abstractcore import create_llm
from abstractcore.embeddings import EmbeddingManager

embedder = EmbeddingManager()
llm = create_llm("openai", model="gpt-4o-mini")

knowledge_base = [
    "The Eiffel Tower is 330 meters tall and was completed in 1889.",
    "Paris is the capital city of France with over 2 million inhabitants.",
    "The Louvre Museum in Paris houses the Mona Lisa painting.",
]

def rag_answer(question: str) -> str:
    scored = sorted(
        ((embedder.compute_similarity(question, doc), doc) for doc in knowledge_base),
        reverse=True,
    )
    context = "\n".join([doc for _, doc in scored[:2]])
    prompt = f"""Context:
{context}

Question: {question}
Answer:"""
    return llm.generate(prompt).content

print(rag_answer("How tall is the Eiffel Tower?"))

Server: OpenAI-compatible /v1/embeddings

If you run the AbstractCore HTTP server, you can also generate embeddings over REST.

pip install "abstractcore[server]"
python -m abstractcore.server.app --port 8000
curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["text 1", "text 2", "text 3"],
    "model": "ollama/granite-embedding:278m"
  }'

Tip: See the Server guide for model discovery (/v1/models) and provider status (/providers).

Performance Tips

  • Prefer embed_batch() when embedding many texts.
  • Cache embeddings for your knowledge base; only re-embed changed documents.
  • For local providers, keep the embedding model “warm” (avoid frequent reloads).

Related Documentation

Getting Started

First LLM call + core concepts

API Reference

EmbeddingManager API

Examples

RAG example and recipes

HTTP Server

Expose embeddings via OpenAI-compatible endpoints