Unified LLM Interface
Write once, run everywhere

Open-source first, provider-agnostic LLM infrastructure for Python β€” run local models to control the full stack end-to-end (software + models), or switch to cloud providers when you need them. Code once, run everywhere across Ollama, LM Studio, MLX, HuggingFace, vLLM, OpenAI, Anthropic, and more with consistent APIs for tools, structured output, streaming, and media.

πŸ€– For AI Agents: Use our AI-optimized documentation at llms.txt (concise) or llms-full.txt (complete) for seamless integration.

9+
Providers
2.11.2
Version
MIT
License
example.py
from abstractcore import create_llm

# Works with any provider - just change the name
llm = create_llm("anthropic", model="claude-haiku-4-5", temperature=0.0)  # Use temp=0 for consistency
response = llm.generate("What is the capital of France?")
print(response.content)

# Switch providers with zero code changes
llm = create_llm("openai", model="gpt-4o-mini")
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")

# Same interface across providers

Why Choose AbstractCore?

Production-ready LLM infrastructure with everything you need to build reliable AI applications.

Centralized Configuration

One-time setup with ~/.abstractcore/config/abstractcore.json. Set global defaults, app-specific preferences, and API keys. Never specify providers repeatedly again.

Universal Media Handling

Attach files with media=[...] (or @file in the CLI). Images and documents are processed automatically; audio/video inputs are policy-driven (audio_policy/video_policy) to avoid silent semantic changes.

Vision Capabilities

Image and video understanding across providers with automatic optimization. Vision fallback for text-only models through smart captioning, plus explicit frame sampling for video when native input isn’t available.

Provider Discovery

Centralized registry for providers, defaults, and capabilities β€” from open-source local stacks to cloud APIs. Discover what's installed and what each provider supports.

Production Ready

Built-in retry logic, circuit breakers, comprehensive error handling, and event-driven observability.

Universal Tools + Syntax Rewriting

Tool calling across ALL providers with real-time format conversion for agent CLI compatibility. Architecture-aware detection and custom tag support.

Type Safe

Full Pydantic integration for structured outputs with automatic validation and retry on failures.

Local & Cloud

Run open-source models locally to control the full stack end-to-end (software + models), or use cloud APIs for maximum performance.

Token Management + Streaming

Unified token parameters across all providers with budget validation. Real-time streaming with proper tool call handling and cost estimation.

Session Management + Analytics

Persistent conversations with SOTA 2025 auto-compaction algorithm. Built-in analytics: summaries, assessments, fact extraction with complete serialization.

Production Resilience + OpenAI Server

Production-grade retry logic, circuit breakers, and event-driven monitoring. OpenAI-compatible HTTP server for multi-language access.

Get Started in Minutes

Install AbstractCore and make your first LLM call in under 5 minutes.

1

Install

Install AbstractCore with your preferred providers

# Core (lightweight default)
pip install abstractcore

# Providers (install only what you use; zsh: keep quotes)
pip install "abstractcore[openai]"
pip install "abstractcore[anthropic]"

# Optional features
pip install "abstractcore[media]"   # images, PDFs, Office docs
pip install "abstractcore[server]"  # OpenAI-compatible HTTP gateway
2

Configure

Set up your API keys or local providers

# For cloud providers
export OPENAI_API_KEY="your-key-here"
export ANTHROPIC_API_KEY="your-key-here"

# For local providers (no keys needed)
# Install Ollama, LMStudio, or MLX
3

Code

Start building with the unified interface

from abstractcore import create_llm

# Create LLM instance
llm = create_llm("openai", model="gpt-4o-mini")

# Generate response
response = llm.generate("Hello, world!")
print(response.content)
centralized_configuration.sh
# Check current configuration
abstractcore --status

# Set global fallback model (used when no app-specific default)
abstractcore --set-global-default ollama/qwen3:4b-instruct

# Set app-specific defaults (examples)
abstractcore --set-app-default summarizer openai gpt-4o-mini
abstractcore --set-app-default cli ollama qwen3:4b-instruct

# Configure vision fallback for text-only models
abstractcore --download-vision-model  # Download local caption model
# OR use an existing vision model:
# abstractcore --set-vision-provider ollama qwen2.5vl:7b

# Set API keys
abstractcore --set-api-key openai sk-your-key-here

# Configure logging
abstractcore --set-console-log-level WARNING

# Now use without specifying provider/model every time!
abstractcore-chat --prompt "Hello!"  # Uses configured defaults
media_handling.py
from abstractcore import create_llm

# Works with any provider - same API everywhere
llm = create_llm("openai", model="gpt-4o")

# Attach any file type with media parameter
response = llm.generate(
    "What's in this image and document?",
    media=["photo.jpg", "report.pdf"]
)

# Or use CLI with @filename syntax
# abstractcore-chat --prompt "Analyze @report.pdf"

# Supported file types:
# - Images: PNG, JPEG, GIF, WEBP, BMP, TIFF
# - Documents: PDF, DOCX, XLSX, PPTX
# - Data: CSV, TSV, TXT, MD, JSON

# Same code works with any provider
llm = create_llm("anthropic", model="claude-haiku-4-5")
response = llm.generate(
    "Summarize these materials",
    media=["chart.png", "data.csv", "presentation.pptx"]
)
vision_capabilities.py
from abstractcore import create_llm

# Vision works across all providers with same interface
openai_llm = create_llm("openai", model="gpt-4o")
response = openai_llm.generate(
    "Describe this image in detail",
    media=["photo.jpg"]
)

# Same code with local provider
ollama_llm = create_llm("ollama", model="qwen2.5vl:7b")
response = ollama_llm.generate(
    "What objects do you see?",
    media=["scene.jpg"]
)

# Vision fallback for text-only models
# Configure once: abstractcore --download-vision-model
text_llm = create_llm("lmstudio", model="qwen/qwen3-4b-2507")  # No native vision
response = text_llm.generate(
    "What's in this image?",
    media=["complex_scene.jpg"]
)
# Works transparently: vision model analyzes β†’ text model processes description

# Multi-image analysis
response = llm.generate(
    "Compare these architectural styles",
    media=["building1.jpg", "building2.jpg", "building3.jpg"]
)
audio_and_voice.py
from abstractcore import create_llm

llm = create_llm('openai', model='gpt-4o-mini')

# Speech audio as input (policy-driven)
resp = llm.generate(
    'Summarize this call.',
    media=['call.wav'],
    audio_policy='speech_to_text',  # requires: pip install abstractvoice
)
print(resp.content)

# Deterministic STT/TTS surfaces (capability plugin)
text = llm.audio.transcribe('speech.wav')
wav_bytes = llm.voice.tts('Hello', format='wav')
print(len(wav_bytes))
tool_syntax_rewriting.py
from abstractcore import create_llm, tool

@tool
def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")

# Convert tool-call markup in `content` for downstream parsers.
# Tool calls are always available as structured data in `chunk.tool_calls`.
for chunk in llm.generate(
    "Compute 15 * 23 using the tool.",
    tools=[multiply],
    stream=True,
    tool_call_tags="llama3",
):
    if chunk.tool_calls:
        print(f"Tool detected: {chunk.tool_calls[0].name}")
    print(chunk.content or "", end="", flush=True)
session_analytics.py
from abstractcore import BasicSession, create_llm

llm = create_llm("openai", model="gpt-4o-mini")
session = BasicSession(llm, system_prompt="You are a helpful assistant.")

response1 = session.generate("My name is Alice")
response2 = session.generate("What's my name?")  # Remembers context

# Compact chat history (in-place)
session.force_compact(preserve_recent=6, focus="key details")

# Advanced analytics
summary = session.generate_summary(focus="key decisions")
assessment = session.generate_assessment(criteria={"clarity": True, "helpfulness": True})
facts = session.extract_facts(output_format="triples")

# Save (includes analytics if generated)
session.save("conversation.json")
token_management.py
from abstractcore import create_llm
from abstractcore.utils.token_utils import estimate_tokens

# Unified token parameters work across ALL providers
llm = create_llm(
    "anthropic",
    model="claude-haiku-4-5",
    max_tokens=32000,           # Context window (input + output)
    max_output_tokens=8000,     # Maximum output tokens
    max_input_tokens=24000      # Maximum input tokens (auto-calculated if not set)
)

# Token estimation and validation
text = "Your input text here..."
estimated = estimate_tokens(text, model="claude-haiku-4-5")
print(f"Estimated tokens: {estimated}")

# Budget validation with warnings
response = llm.generate("Write a detailed analysis...")
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
print(f"Cost estimate: ${response.usage.cost_usd:.4f}")
deterministic_generation.py
from abstractcore import create_llm

# Deterministic outputs with seed + temperature=0
llm = create_llm("openai", model="gpt-4o-mini", seed=42, temperature=0.0)

# Best-effort determinism depends on provider/model
response1 = llm.generate("Write exactly 3 words about coding")
response2 = llm.generate("Write exactly 3 words about coding")
print(f"Response 1: {response1.content}")  # "Innovative, challenging, rewarding."
print(f"Response 2: {response2.content}")  # "Innovative, challenging, rewarding."

# Notes:
# - Many local providers support `seed` (Ollama/MLX/HF/LMStudio best-effort).
# - Anthropic issues a warning when `seed` is provided; use temperature=0.0 for consistency.

# Works across all providers with same interface
ollama_llm = create_llm("ollama", model="qwen3:4b-instruct", seed=42, temperature=0.0)
mlx_llm = create_llm("mlx", model="mlx-community/Qwen3-4B-4bit", seed=42, temperature=0.0)
http_server.py
# Start OpenAI-compatible server
# uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

import requests

# NEW: OpenAI Responses API with native file support
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "gpt-4o",
        "input": [
            {
                "role": "user",
                "content": [
                    {"type": "input_text", "text": "Analyze this document"},
                    {"type": "input_file", "file_url": "https://example.com/report.pdf"}
                ]
            }
        ],
        "stream": False  # Optional streaming
    }
)

# Or use standard chat completions with @filename syntax
import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
    model="ollama/qwen3:4b-instruct",
    messages=[{"role": "user", "content": "Analyze @report.pdf"}]
)

Supported Providers

One interface for all major LLM providers. Switch between them with a single line change.

*SEED Support Note: Anthropic doesn't support seed parameters but issues a warning when provided. Use temperature=0.0 for more consistent outputs with Claude models.

Comprehensive Documentation

Everything you need to build production-ready LLM applications. AI agents: Use our optimized llms.txt or llms-full.txt for direct integration.

Real-World Examples

Learn from practical examples and use cases.

Universal API Gateway

Server

Deploy a single OpenAI-compatible /v1 gateway for chat + tools, and optionally add /v1/images/* and /v1/audio/* via capability plugins.

# Start server
python -m abstractcore.server.app --port 8000

# Route by changing model:
#   ollama/qwen3:4b-instruct
#   openai/gpt-4o-mini
curl -X POST http://localhost:8000/v1/chat/completions \
  -d '{"model":"ollama/qwen3:4b-instruct","messages":[...]}'
View Full Example β†’

Vision Fallback (Images β†’ Any LLM)

Vision

Attach images to text-only models (AbstractCore-exclusive fallback). A configured vision model captions and injects short observations into your request.

from abstractcore import create_llm

# Text-only local model (no native vision)
llm = create_llm("ollama", model="qwen3:4b-instruct")

# Configure vision fallback once, then attach images anyway
resp = llm.generate(
    "What are the key numbers in this chart?",
    media=["chart.png"],
)
print(resp.content)
View Full Example β†’

Audio & Voice Agents (STT/TTS)

Audio

Speech-to-text for audio inputs is policy-driven (no silent semantic changes) and works across providers via an optional capability plugin (AbstractVoice).

from abstractcore import create_llm

llm = create_llm("openai", model="gpt-4o-mini")

resp = llm.generate(
    "Summarize this call and list action items.",
    media=["call.wav"],
    audio_policy="speech_to_text",  # requires: pip install abstractvoice
)
print(resp.content)
View Full Example β†’

Provider Flexibility

Core Feature

Switch between providers with identical code. Perfect for development vs production environments.

# Development (free, local)
llm_dev = create_llm("ollama", model="qwen3:4b-instruct")

# Production (high quality, cloud)
llm_prod = create_llm("openai", model="gpt-4o-mini")

# Same interface, different capabilities
View Full Example β†’

RAG with Embeddings

Advanced

Build retrieval-augmented generation systems with built-in embedding support.

from abstractcore.embeddings import EmbeddingManager

embedder = EmbeddingManager()
docs_embeddings = embedder.embed_batch(documents)

# Find most similar document
query_embedding = embedder.embed("user query")
similarity = embedder.compute_similarity(query, docs[0])
View Full Example β†’

CLI Apps & Debug Mode

New

Ready-to-use terminal tools with debug capabilities and focus areas for targeted processing.

# Extract knowledge with debug mode
extractor document.pdf --format json-ld --debug --iterate 3

# Evaluate with focus areas
judge README.md --focus "examples, completeness" --debug

# Self-healing JSON handles truncated responses automatically
View CLI Docs β†’