# AbstractCore - llms-full (Self-Contained Agent Handbook)
> This is the "full context" companion to `llms.txt`: a single, self-contained guide for agents and developers using AbstractCore. It intentionally avoids GitHub link hubs; only a few external references are included (API-key signup pages).
Last updated: 2026-02-09
Package version: 2.11.8
## How to use this file (agents)
- Treat this as the primary context for answering questions or making changes.
- Prefer the public docs in this repo as the source of truth; keep behavior claims consistent with `docs/`.
- Keep the default install lightweight: heavy/optional dependencies must stay behind extras and be imported lazily.
- Default tool execution is **pass-through**: AbstractCore returns tool calls; the host/runtime executes them.
- In the AbstractFramework ecosystem, **AbstractRuntime** is the recommended runtime for executing tool calls durably (policy, retries, persistence): https://github.com/lpalbou/abstractruntime
- Media handling is **policy-driven** by design (no silent semantic changes). If audio/video/images "don't work", check policy + configured fallbacks.
- `llms.txt` is the short index; this file is intentionally large. Prefer pulling only the sections you need.
## Table of contents
1. Quick start (2 minutes)
2. What AbstractCore is (and isn't)
3. Installation + extras (what to `pip install`)
4. Providers (IDs, env vars, and examples)
5. Core Python API patterns (generate/stream/async/sessions)
6. Tool calling (agentic workflows)
7. Structured output (`response_model=...`)
8. Media handling (images/audio/video + documents)
9. Embeddings
10. CLI + centralized config (`abstractcore --config`)
11. Server + endpoint (OpenAI-compatible `/v1`)
12. Repo map + contribution workflow
13. Troubleshooting checklist
---
## 1) Quick start (2 minutes)
### Install
```bash
pip install abstractcore
```
If you're using a cloud provider SDK, install only what you need:
```bash
pip install "abstractcore[openai]" # OpenAI Python SDK
pip install "abstractcore[anthropic]" # Anthropic Python SDK
```
### Minimal first call
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini") # requires: abstractcore[openai]
resp = llm.generate("Say hello in French.")
print(resp.content)
```
### If you're using a local OpenAI-compatible server
```python
from abstractcore import create_llm
llm = create_llm("openai-compatible", model="default", base_url="http://localhost:1234/v1")
print(llm.generate("Hello!").content)
```
Important: most OpenAI-compatible servers expect the base URL to include `/v1`.
---
## 2) What AbstractCore is (and isn't)
### What it is
AbstractCore is a unified Python interface over multiple LLM backends (cloud + local), with consistent support for:
- **Streaming** (`stream=True`)
- **Tool calling** (`@tool`), with a universal tool representation across providers
- **Structured output** via Pydantic (`response_model=...`)
- **Media handling** for images/audio/video/documents via an explicit, policy-driven system
- **Embeddings** (optional)
- **An optional OpenAI-compatible HTTP server** (`/v1/chat/completions`, plus optional images/audio endpoints via plugins)
### What it is not
- It is not "one giant dependency": the default install is intentionally small.
- It is not "magic multimodal": attaching audio/video/images has explicit policies and optional fallbacks.
- It is not a hosted service: local providers and servers are your responsibility to run and secure.
### Core design invariant: lightweight default install
`pip install abstractcore` should:
- install quickly,
- import cleanly,
- not pull heavyweight deps (torch/transformers/PDF pipelines/server deps).
Anything heavy must be behind install extras and imported lazily inside the code paths that need it.
---
## 3) Installation + extras (what to `pip install`)
### Core
```bash
pip install abstractcore
```
### Provider SDK extras (install only what you use)
```bash
pip install "abstractcore[openai]"
pip install "abstractcore[anthropic]"
pip install "abstractcore[huggingface]" # transformers/torch (heavy)
pip install "abstractcore[mlx]" # Apple Silicon local inference (heavy)
pip install "abstractcore[vllm]" # NVIDIA GPU inference server integration (heavy)
```
Notes:
- `ollama`, `lmstudio`, `openrouter`, `portkey`, and `openai-compatible` use only core deps (they speak HTTP).
### Optional feature extras
```bash
pip install "abstractcore[tools]" # built-in web + filesystem helper tools
pip install "abstractcore[media]" # images + PDF/Office extraction
pip install "abstractcore[compression]" # glyph visual-text compression
pip install "abstractcore[embeddings]" # EmbeddingManager + local embedding models
pip install "abstractcore[tokens]" # precise token counting (tiktoken)
pip install "abstractcore[server]" # OpenAI-compatible /v1 HTTP gateway (FastAPI)
```
Compatibility note:
- `abstractcore[tool]` is accepted as an alias of `abstractcore[tools]`.
### Turnkey installs (pick one)
```bash
pip install "abstractcore[all-apple]" # macOS/Apple Silicon (includes MLX, excludes vLLM)
pip install "abstractcore[all-non-mlx]" # Linux/Windows/Intel Mac (excludes MLX and vLLM)
pip install "abstractcore[all-gpu]" # Linux NVIDIA GPU (includes vLLM, excludes MLX)
```
Shell tip:
- In zsh, always quote extras: `pip install "abstractcore[media]"`.
---
## 4) Providers (IDs, env vars, and examples)
AbstractCore uses a **provider ID** plus a **model name**:
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
```
### Provider ID list (common)
- Cloud: `openai`, `anthropic`
- Gateways (OpenAI-compatible routing): `openrouter`, `portkey`
- Local/self-hosted (OpenAI-compatible HTTP): `ollama`, `lmstudio`, `vllm`, `openai-compatible`
- Local in-process: `mlx`, `huggingface` (require heavy extras)
### Environment variables (quick map)
Cloud / gateways:
- `OPENAI_API_KEY`
- `ANTHROPIC_API_KEY`
- `OPENROUTER_API_KEY` (optional: `OPENROUTER_BASE_URL`, `OPENROUTER_SITE_URL`, `OPENROUTER_APP_NAME`)
- `PORTKEY_API_KEY` (routing: `PORTKEY_CONFIG` or `PORTKEY_VIRTUAL_KEY`; provider-direct: `PORTKEY_PROVIDER` + `PORTKEY_PROVIDER_API_KEY`; optional: `PORTKEY_BASE_URL`)
OpenAI-compatible base URLs (local/self-hosted):
- `OLLAMA_BASE_URL` (legacy: `OLLAMA_HOST`)
- `LMSTUDIO_BASE_URL`
- `VLLM_BASE_URL`
- `OPENAI_COMPATIBLE_BASE_URL` (generic provider)
OpenAI-compatible optional auth:
- `OPENAI_COMPATIBLE_API_KEY`
HuggingFace caching (optional):
- `HF_HOME` (or rely on defaults)
### 4.1 OpenAI (`openai`)
Install:
```bash
pip install "abstractcore[openai]"
export OPENAI_API_KEY="sk-..."
```
Use:
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
print(llm.generate("Give me 3 bullet points about HTTP caching.").content)
```
### 4.2 Anthropic (`anthropic`)
Install:
```bash
pip install "abstractcore[anthropic]"
export ANTHROPIC_API_KEY="sk-ant-..."
```
Use:
```python
from abstractcore import create_llm
llm = create_llm("anthropic", model="claude-haiku-4-5")
print(llm.generate("Write a haiku about distributed systems.").content)
```
### 4.3 OpenRouter gateway (`openrouter`)
OpenRouter is an OpenAI-compatible gateway/aggregator.
Setup:
```bash
export OPENROUTER_API_KEY="sk-or-..."
# Optional override (default: https://openrouter.ai/api/v1)
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
```
Optional analytics headers:
```bash
export OPENROUTER_SITE_URL="https://your-site.example"
export OPENROUTER_APP_NAME="YourAppName"
```
Use:
```python
from abstractcore import create_llm
llm = create_llm("openrouter", model="openai/gpt-4o-mini")
print(llm.generate("Say hello in Japanese.").content)
```
### 4.4 Portkey gateway (`portkey`)
Portkey is an OpenAI-compatible AI gateway that routes requests via headers.
Setup (most common: config routing):
```bash
export PORTKEY_API_KEY="pk_..."
export PORTKEY_CONFIG="pcfg_..." # config id
# Optional override (default: https://api.portkey.ai/v1)
export PORTKEY_BASE_URL="https://api.portkey.ai/v1"
```
Use (config mode):
```python
from abstractcore import create_llm
llm = create_llm("portkey", model="gpt-4o-mini", config_id="pcfg_...")
print(llm.generate("Say hello in French.").content)
```
Portkey routing modes (pick one; don't mix):
- **Config mode**: `PORTKEY_CONFIG` or `config_id=...` -> sends `x-portkey-config`
- **Virtual-key mode**: `PORTKEY_VIRTUAL_KEY` or `virtual_key=...` -> sends `x-portkey-virtual-key`
- **Provider-direct mode**: `PORTKEY_PROVIDER` / `portkey_provider=...` + `PORTKEY_PROVIDER_API_KEY` / `provider_api_key=...` -> sends `x-portkey-provider` + backend auth
Gateway parameter safety:
- Gateways forward your payload to a routed backend model.
- To avoid sending defaults that strict models reject, AbstractCore's gateway providers forward optional generation parameters (like `temperature`, `top_p`, `max_output_tokens`) **only when you explicitly set them**.
### 4.5 Generic OpenAI-compatible (`openai-compatible`)
Best for: any OpenAI-compatible `/v1` endpoint (llama.cpp servers, LocalAI, text-generation-webui, custom proxies).
Setup:
```bash
export OPENAI_COMPATIBLE_BASE_URL="http://localhost:1234/v1"
# Optional (if your endpoint requires auth)
export OPENAI_COMPATIBLE_API_KEY="your-api-key"
```
Use:
```python
from abstractcore import create_llm
llm = create_llm("openai-compatible", model="default", base_url="http://localhost:1234/v1")
print(llm.generate('Give me 3 synonyms for "fast".').content)
```
### 4.6 Ollama (`ollama`)
Ollama runs a local HTTP server. Typical base URL: `http://localhost:11434`.
```python
from abstractcore import create_llm
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M", base_url="http://localhost:11434")
print(llm.generate("Explain what a mutex is.").content)
```
### 4.7 LM Studio (`lmstudio`)
LM Studio's OpenAI-compatible base URL commonly ends with `/v1` (example: `http://localhost:1234/v1`).
```python
from abstractcore import create_llm
llm = create_llm("lmstudio", model="qwen/qwen3-4b-2507", base_url="http://localhost:1234/v1")
print(llm.generate("Write a one-line joke about compilers.").content)
```
### 4.8 vLLM (`vllm`)
vLLM is a GPU inference server (NVIDIA CUDA only). Typical base URL: `http://localhost:8000/v1`.
```python
from abstractcore import create_llm
llm = create_llm("vllm", model="Qwen/Qwen3-Coder-30B-A3B-Instruct", base_url="http://localhost:8000/v1")
print(llm.generate("Write a Python function that reverses a list.").content)
```
### 4.9 MLX (`mlx`)
MLX runs in-process on Apple Silicon and requires a heavy extra:
```bash
pip install "abstractcore[mlx]"
```
```python
from abstractcore import create_llm
llm = create_llm("mlx", model="mlx-community/Qwen3-4B")
print(llm.generate("Summarize the CAP theorem.").content)
```
### 4.10 HuggingFace (`huggingface`)
HuggingFace runs in-process and requires transformers/torch:
```bash
pip install "abstractcore[huggingface]"
```
```python
from abstractcore import create_llm
llm = create_llm("huggingface", model="unsloth/Qwen3-4B-Instruct-2507-GGUF")
print(llm.generate("Explain what RAG is in 3 bullets.").content)
```
---
## 5) Core Python API patterns (generate/stream/async/sessions)
### `create_llm(provider, model=..., **kwargs)`
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini", temperature=0.2)
```
Common kwargs (best-effort across providers):
- `temperature`
- `seed`
- `thinking` (`None|"auto"|"on"|"off"|True|False|"low"|"medium"|"high"`)
### `generate(prompt_or_messages, ...)`
```python
resp = llm.generate("Hello!")
print(resp.content)
print(resp.usage) # provider-dependent
print(resp.tool_calls) # pass-through by default
print(resp.metadata) # provider/model specific (e.g., normalized reasoning channel)
```
### Streaming
```python
for chunk in llm.generate("Write a short poem.", stream=True):
print(chunk.content or "", end="", flush=True)
```
### Async
```python
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
resp = await llm.agenerate("Give me 3 bullet points about HTTP/3.")
print(resp.content)
asyncio.run(main())
```
### Sessions (`BasicSession`)
Use sessions to keep conversation state and shared defaults:
```python
from abstractcore import BasicSession, create_llm
session = BasicSession(create_llm("anthropic", model="claude-haiku-4-5"), temperature=0.3)
print(session.generate("Give me 3 startup name ideas.").content)
print(session.generate("Pick the best one and explain why.").content)
```
---
## 6) Tool calling (agentic workflows)
### Define tools with `@tool`
```python
from abstractcore import tool
@tool
def get_weather(city: str) -> str:
"""Return a short weather string for a city."""
return f"{city}: 22C and sunny"
```
### Pass-through is the default
By default, AbstractCore does **not** execute tools. Instead:
- the model emits tool calls,
- AbstractCore parses/normalizes them,
- your host/runtime executes them (or ignores them),
- you feed the tool results back in if you want.
```python
from abstractcore import create_llm, tool
@tool
def add(a: int, b: int) -> int:
return a + b
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("Use the add tool to compute 2+3.", tools=[add])
print(resp.tool_calls) # structured calls for the host to run
```
### Built-in tools (optional)
Install:
```bash
pip install "abstractcore[tools]"
```
Then import from `abstractcore.tools.common_tools` (examples):
- `skim_websearch` vs `web_search` (compact vs full search results)
- `skim_url` vs `fetch_url` (fast triage vs full fetch + parsing)
Recommended agent workflow to keep outputs small:
1. `skim_websearch` to get a shortlist
2. `skim_url` to validate which links are worth opening
3. `fetch_url` only for the final few sources (use `include_full_content=False` when you need a smaller result)
### Tool-call syntax rewriting (preserve markup)
Some agent runtimes want tool calls preserved in `response.content` using custom tags.
- Python: pass `tool_call_tags=...` to `generate()` / `agenerate()`
- Server: set `agent_format` in requests
This is documented as `tool syntax rewriting` and is designed to keep tool-call markup stable across providers.
---
## 7) Structured output (`response_model=...`)
Structured output turns `return JSON` prompts into typed objects.
```python
from pydantic import BaseModel
from abstractcore import create_llm
class Answer(BaseModel):
title: str
bullets: list[str]
llm = create_llm("openai", model="gpt-4o-mini")
result = llm.generate("Summarize HTTP/3 in 3 bullets.", response_model=Answer)
print(result.title)
print(result.bullets)
```
How it works (high level):
- When the provider supports native structured output, AbstractCore uses it.
- Otherwise, it uses prompted strategies plus validation/retry to produce a valid object.
Practical tips:
- Keep schemas small and unambiguous.
- If validation fails, check the error and simplify the schema or give the model a clearer extraction instruction.
---
## 8) Media handling (images/audio/video + documents)
Media is opt-in and policy-driven to avoid silent semantic changes.
### Installation
```bash
pip install "abstractcore[media]"
```
### Attach media
```python
from abstractcore import create_llm
llm = create_llm("anthropic", model="claude-haiku-4-5")
resp = llm.generate("Describe the image.", media=["./image.png"])
print(resp.content)
```
### Policies (important)
Audio and video input are controlled by explicit policies:
- `audio_policy`: `native_only|speech_to_text|auto|caption`
- `video_policy`: `native_only|frames_caption|auto`
Defaults are strict (`native_only`) unless you configure a fallback.
### Vision fallback (for text-only main models)
If your main model is text-only, you can configure a vision fallback pipeline:
- caption the image (or sampled video frames),
- inject short observations into the main request.
Config CLI examples:
```bash
abstractcore --set-vision-provider huggingface Salesforce/blip-image-captioning-base
abstractcore --add-vision-fallback lmstudio qwen/qwen3-vl-4b
abstractcore --disable-vision
```
### Video fallback requirements
Frame sampling requires `ffmpeg` and `ffprobe` available on `PATH`.
Helpful defaults:
```bash
abstractcore --set-video-strategy auto
abstractcore --set-video-max-frames 6
abstractcore --set-video-sampling-strategy keyframes
abstractcore --set-video-max-frame-side 1024
```
### Audio fallback requirements
Speech-to-text fallback typically uses a capability plugin backend:
```bash
pip install abstractvoice
abstractcore --set-audio-strategy auto
abstractcore --set-stt-language en
```
### Capability plugins (voice/audio/vision)
AbstractCore supports optional capability plugins discovered via Python entry points:
- install `abstractvoice` -> enables `llm.voice` and `llm.audio` (TTS/STT)
- install `abstractvision` -> enables `llm.vision` (generative vision; often via an OpenAI-compatible images endpoint)
Server note:
- the server can optionally expose `/v1/images/*` (requires `abstractvision`)
- the server can optionally expose `/v1/audio/*` (requires an audio/voice plugin; commonly `abstractvoice`)
### Glyph visual-text compression (optional, experimental)
If you need to squeeze long text into smaller vision-friendly inputs, AbstractCore supports glyph compression.
```bash
pip install "abstractcore[compression]"
```
---
## 9) Embeddings
Install:
```bash
pip install "abstractcore[embeddings]"
```
Use:
```python
from abstractcore.embeddings import EmbeddingManager
em = EmbeddingManager()
vec = em.embed_text("hello world")
print(len(vec))
```
---
## 10) CLI + centralized config (`abstractcore --config`)
AbstractCore stores persistent configuration at:
`~/.abstractcore/config/abstractcore.json`
### Most-used commands
```bash
abstractcore --config
abstractcore --status
abstractcore --set-api-key openai sk-...
abstractcore --set-api-key anthropic sk-ant-...
abstractcore --set-api-key openrouter sk-or-...
abstractcore --set-api-key portkey pk-...
abstractcore --set-chat-model openai/gpt-4o-mini
abstractcore --set-code-model anthropic/claude-haiku-4-5
```
### Priority order (high -> low)
1. Explicit parameters (per call / per request)
2. App-specific configuration (CLI apps)
3. Global configuration
4. Hardcoded defaults
### Logging controls
```bash
abstractcore --set-console-log-level DEBUG
abstractcore --enable-file-logging
abstractcore --set-log-base-dir ~/.abstractcore/logs
abstractcore --status
```
### Installed console scripts (quick map)
These entrypoints are defined in `pyproject.toml` and are available after install:
- `abstractcore` / `abstractcore-config`: configuration CLI (`--config`, `--status`, `--set-api-key`, defaults)
- `abstractcore-chat`: interactive REPL for chatting with a provider/model
- `abstractcore-endpoint`: single-model OpenAI-compatible endpoint server (local inference hosting)
- Built-in apps: `summarizer`, `extractor`, `judge`, `intent`, `deepsearch` (run `--help` for each)
Examples:
```bash
abstractcore-chat --provider openai --model gpt-4o-mini
summarizer ./document.txt --provider ollama --model gemma3:1b-it-qat
```
---
## 11) Server + endpoint (OpenAI-compatible `/v1`)
The server turns AbstractCore into an OpenAI-compatible API gateway.
### Install + run
```bash
pip install "abstractcore[server]"
python -m abstractcore.server.app
```
Health check:
```bash
curl http://localhost:8000/health
```
Interactive docs:
- Swagger UI: `http://localhost:8000/docs`
- ReDoc: `http://localhost:8000/redoc`
### Model naming
Requests use `provider/model`:
- `openai/gpt-4o-mini`
- `anthropic/claude-haiku-4-5`
- `ollama/qwen3:4b-instruct-2507-q4_K_M`
### Chat completions request
```bash
curl -X POST http://localhost:8000/v1/chat/completions \\
-H "Content-Type: application/json" \\
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
### AbstractCore server extensions (important)
The server supports a few non-OpenAI fields to make multi-provider routing practical:
- `api_key`: per-request provider key (useful for multi-tenant scenarios)
- `base_url`: per-request provider base URL override (include `/v1` for OpenAI-compatible servers)
- `thinking`: unified thinking/reasoning control (`null|"auto"|"on"|"off"|...`)
- `unload_after`: best-effort model unload after request (dangerous in multi-tenant environments)
- `agent_format`: tool-call syntax rewriting (preserve tool markup)
### Provider base_url override example
```bash
curl -X POST http://localhost:8000/v1/chat/completions \\
-H "Content-Type: application/json" \\
-d '{
"model": "lmstudio/qwen/qwen3-4b-2507",
"base_url": "http://localhost:1234/v1",
"messages": [{"role": "user", "content": "Hello from LM Studio"}]
}'
```
### Optional images/audio endpoints
- Images: `POST /v1/images/generations`, `POST /v1/images/edits` (requires `abstractvision`)
- Audio: `POST /v1/audio/transcriptions`, `POST /v1/audio/speech` (requires an audio/voice plugin; commonly `abstractvoice`)
If a required plugin/backend is missing, the server returns `501` with actionable messaging.
### Single-model endpoint (one provider/model per worker)
If you want to host **one** provider+model as a dedicated OpenAI-compatible `/v1` endpoint (no `provider/model` routing), use `abstractcore-endpoint`:
```bash
pip install "abstractcore[server]"
abstractcore-endpoint --provider mlx --model mlx-community/Qwen3-4B --host 0.0.0.0 --port 8001
```
Config via env vars (alternative):
- `ABSTRACTENDPOINT_PROVIDER`
- `ABSTRACTENDPOINT_MODEL`
- `ABSTRACTENDPOINT_HOST`
- `ABSTRACTENDPOINT_PORT`
See `docs/endpoint.md` and `abstractcore/endpoint/app.py`.
---
## 12) Repo map + contribution workflow
### Where things live
Core API:
- `abstractcore/core/` (interfaces, sessions, types, factory)
- `abstractcore/core/factory.py` (`create_llm(...)`)
Providers:
- `abstractcore/providers/` (implementations)
- `abstractcore/providers/base.py` (shared provider logic: tools/media/structured output)
- `abstractcore/providers/registry.py` (single source of truth for provider IDs/metadata)
Tools:
- `abstractcore/tools/` (tool system: decorator, registry, parsing, handler)
- `abstractcore/tools/common_tools.py` (built-in tools; requires `abstractcore[tools]`)
Media + capabilities:
- `abstractcore/media/` (media pipeline; requires `abstractcore[media]`)
- `abstractcore/capabilities/` (capability registry + proxies)
Server:
- `abstractcore/server/app.py` (multi-provider `/v1` gateway)
- `abstractcore/endpoint/app.py` (single-model endpoint)
Docs:
- `docs/` (canonical docs)
Tests:
- `tests/` (pytest)
### Dev commands
```bash
pip install -e ".[dev,test]"
pytest -q
black .
ruff check .
```
### Rules for optional dependencies (must-follow)
1. Don't import optional deps in default import paths (especially `abstractcore/__init__.py`).
2. Use lazy imports inside functions/methods where optional deps are used.
3. When a dep is missing, raise a clear error telling users which extra to install, e.g. `pip install "abstractcore[media]"`.
4. Keep features modular (tools/media/embeddings/compression/server) so core stays small.
### Adding a provider (checklist)
1. Implement provider in `abstractcore/providers/`.
2. Register it in `abstractcore/providers/registry.py` (ID, defaults, supported features, install hints).
3. Ensure default install still imports cleanly.
4. Add tests under `tests/` (unit tests that don't require real API keys when possible).
5. Update docs (at minimum: prerequisites + API + FAQ) and add a `CHANGELOG.md` entry.
---
## 13) Troubleshooting checklist
### Unsupported parameter errors (temperature/max_tokens/etc.)
Common with gateways and strict model families:
- Gateways forward your payload to a routed backend model.
- AbstractCore gateway providers only send optional generation parameters when explicitly set.
- If you still see errors, check that your gateway config isn't injecting forbidden parameters.
### Local OpenAI-compatible servers not working
- Ensure your base URL includes `/v1` (LM Studio, vLLM, many proxies).
- Confirm the server is reachable from your process (Docker networking is a common gotcha).
### Missing dependency errors
Typical fixes:
- Tools: `pip install "abstractcore[tools]"`
- Media/doc extraction: `pip install "abstractcore[media]"`
- Embeddings: `pip install "abstractcore[embeddings]"`
- Server: `pip install "abstractcore[server]"`
### Tools aren't executed
Expected default: pass-through. Execute tool calls in your host/runtime, or explicitly opt into an execution path if your app requires it.
### Media attachments fail
- Images/docs: install `abstractcore[media]`.
- Video frames fallback: install `abstractcore[media]` and have `ffmpeg`/`ffprobe` available.
- Audio STT fallback: install `abstractvoice` and set `audio_policy="auto"` (or configure via `abstractcore --set-audio-strategy auto`).
---
---
---
## Appendix A) Inlined canonical docs (de-hyperlinked)
This appendix inlines key docs so `llms-full.txt` is self-contained and can be used offline.
For LLM/agent consumption, markdown link targets are removed (link text is preserved).
If you're token-limited, prefer sections 1-13 above and only pull the specific inlined doc(s) you need.
---
### Inlined: `docs/getting-started.md`
# Getting Started
AbstractCore is a unified Python interface for cloud + local LLM providers. The default install is lightweight; add features via extras.
## Prerequisites
- Python 3.9+
- `pip`
## Installation
```bash
# Core (small, lightweight default)
pip install abstractcore
# Providers (install only what you use)
pip install "abstractcore[openai]" # OpenAI SDK
pip install "abstractcore[anthropic]" # Anthropic SDK
pip install "abstractcore[huggingface]" # Transformers / torch (heavy)
pip install "abstractcore[mlx]" # Apple Silicon local inference (heavy)
pip install "abstractcore[vllm]" # GPU inference server integrations (heavy)
# Optional features
pip install "abstractcore[tools]" # built-in tools (web/file/command helpers)
pip install "abstractcore[media]" # images, PDFs, Office docs
pip install "abstractcore[compression]" # glyph visual-text compression (Pillow renderer)
pip install "abstractcore[embeddings]" # EmbeddingManager + local embedding models
pip install "abstractcore[tokens]" # precise token counting (tiktoken)
pip install "abstractcore[server]" # OpenAI-compatible HTTP gateway
# Combine extras (zsh: keep quotes)
pip install "abstractcore[openai,media,tools]"
```
Local OpenAI-compatible servers (Ollama, LMStudio, vLLM, llama.cpp, LocalAI, etc.) work with the core install; you just point AbstractCore at the server base URL. See Prerequisites for provider setup.
Optional capability plugins (deterministic multimodal outputs):
```bash
pip install abstractvoice # enables llm.voice / llm.audio (TTS/STT)
pip install abstractvision # enables llm.vision (generative vision; typically via an OpenAI-compatible images endpoint)
```
See: Capabilities and Server.
## Providers and models
AbstractCore uses a provider ID plus a model name:
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
# llm = create_llm("anthropic", model="claude-haiku-4-5")
# llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
# llm = create_llm("lmstudio", model="qwen/qwen3-4b-2507")
# llm = create_llm("openai-compatible", model="default", base_url="http://localhost:1234/v1")
```
Tip: you can omit `model=...`, but it’s usually better to pass an explicit model to avoid surprises when defaults change.
Open-source-first: start with local providers (Ollama, LMStudio, MLX, HuggingFace), then add cloud or gateway providers as needed.
Gateway providers (OpenRouter, Portkey) examples:
```python
from abstractcore import create_llm
llm_openrouter = create_llm("openrouter", model="openai/gpt-4o-mini")
llm_portkey = create_llm("portkey", model="gpt-5-mini", api_key="PORTKEY_API_KEY", config_id="pcfg_...")
```
Note: gateway providers only forward optional generation params (e.g. `temperature`, `top_p`, `max_output_tokens`) when you explicitly set them.
## Your first call
OpenAI example (requires `pip install "abstractcore[openai]"`):
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("What is the capital of France?")
print(resp.content)
```
## Streaming
```python
from abstractcore import create_llm
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
for chunk in llm.generate("Write a short poem about distributed systems.", stream=True):
print(chunk.content or "", end="", flush=True)
```
## Tool calling
AbstractCore supports native tool calling (when the provider supports it) and prompted tool syntax (when it doesn’t).
By default, tool execution is pass-through (`execute_tools=False`): you get tool calls in `resp.tool_calls`, and your host/runtime decides how to execute them.
In the AbstractFramework ecosystem, **AbstractRuntime** is the recommended runtime for executing tool calls durably (policy, retries, persistence). See Architecture and Tool Calling.
```python
from abstractcore import create_llm, tool
@tool
def get_weather(city: str) -> str:
return f"{city}: 22°C and sunny"
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("What's the weather in Paris? Use the tool.", tools=[get_weather])
print(resp.content)
print(resp.tool_calls)
```
See Tool Calling and Tool Syntax Rewriting (`tool_call_tags`, server `agent_format`).
### Built-in tools (optional)
If you want a ready-made toolset for agentic scripts, install:
```bash
pip install "abstractcore[tools]"
```
Then import from `abstractcore.tools.common_tools`:
- `skim_websearch` vs `web_search`: compact/filtered links vs full results
- `skim_url` vs `fetch_url`: fast URL triage (small output) vs full fetch + parsing for text-first types (HTML/JSON/text)
See Tool Calling for a recommended workflow and the full built-in tool list.
## Structured output
Pass a Pydantic model via `response_model=...` to get a typed result back (instead of parsing JSON yourself):
```python
from pydantic import BaseModel
from abstractcore import create_llm
class Answer(BaseModel):
title: str
bullets: list[str]
llm = create_llm("openai", model="gpt-4o-mini")
answer = llm.generate("Summarize HTTP/3 in 3 bullets.", response_model=Answer)
print(answer.bullets)
```
See Structured Output for strategy details and limitations.
## Media input (images/audio/video + documents)
Images and document extraction require `pip install "abstractcore[media]"` (Pillow + PDF/Office deps).
```python
from abstractcore import create_llm
llm = create_llm("anthropic", model="claude-haiku-4-5")
resp = llm.generate("Describe the image.", media=["./image.png"])
print(resp.content)
```
Audio and video attachments are also supported, but they are **policy-driven** (no silent semantic changes):
- audio: `audio_policy` (`native_only|speech_to_text|auto|caption`)
- video: `video_policy` (`native_only|frames_caption|auto`)
Speech-to-text fallback (`audio_policy="speech_to_text"`) typically requires installing `abstractvoice` (capability plugin).
What you need (quick checklist):
- **Images**: `abstractcore[media]` + either a vision-capable model (VLM/VL) **or** configured vision fallback (`abstractcore --set-vision-provider PROVIDER MODEL`).
- **Video**: `ffmpeg`/`ffprobe` on `PATH` + either a vision-capable model **or** configured vision fallback (for frame sampling). Native video input is model/provider dependent.
- **Audio**: either an audio-capable model **or** speech-to-text fallback via `abstractvoice` + `audio_policy="auto"`/`"speech_to_text"`.
Defaults can be configured via the config CLI (`abstractcore --config`, `abstractcore --status`). See Centralized Config.
If your main model is text-only, you can configure vision fallback (two-stage captioning) so images are automatically described and injected as short observations. See Media Handling, Vision Capabilities, and Centralized Config.
For long documents, AbstractCore can optionally apply Glyph visual-text compression. Install `pip install "abstractcore[compression]"` (and `pip install "abstractcore[media]"` for PDFs) and see Glyph Visual-Text Compression.
## Async
```python
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
resp = await llm.agenerate("Give me 3 bullet points about HTTP caching.")
print(resp.content)
asyncio.run(main())
```
## CLI (optional)
```bash
# Configure defaults and API keys
abstractcore --config
abstractcore --status
# Interactive chat
abstractcore-chat --provider openai --model gpt-4o-mini
```
## Next steps
- Prerequisites — provider setup (keys, base URLs, hardware notes)
- FAQ — common questions and setup gotchas
- Examples — end-to-end patterns and recipes
- API (Python) — public API map and common patterns
- API Reference — complete function/class listing
- Troubleshooting — common errors and fixes
- Server — OpenAI-compatible HTTP gateway
- Endpoint — single-model OpenAI-compatible endpoint (one provider/model per worker)
---
### Inlined: `docs/prerequisites.md`
# Prerequisites & Setup Guide
This guide walks you through setting up AbstractCore with different LLM providers. Choose the provider(s) that are suitable for your needs — you can use multiple providers in the same application.
## Quick Decision Guide
**Want to get started immediately?** → OpenAI Setup (requires API key)
**Want free local models?** → Ollama Setup (free, runs on your machine)
**Have Apple Silicon Mac?** → MLX Setup (optimized for M1/M2/M3/M4 chips)
**Have NVIDIA GPU?** → vLLM Setup (production GPU inference; NVIDIA CUDA only)
**Want a GUI for local models?** → LMStudio Setup (easiest local setup)
**Want a gateway/proxy?** → Gateway Provider Setup (OpenRouter/Portkey routing + governance)
**Using a custom OpenAI-compatible `/v1` endpoint?** → OpenAI-Compatible Setup
## Core Installation
Install AbstractCore, then add the extras you need:
```bash
# Core (small default)
pip install abstractcore
# Providers (only if you use them)
pip install "abstractcore[openai]" # OpenAI SDK
pip install "abstractcore[anthropic]" # Anthropic SDK
pip install "abstractcore[huggingface]" # Transformers / torch (heavy)
pip install "abstractcore[mlx]" # Apple Silicon only (heavy)
pip install "abstractcore[vllm]" # NVIDIA CUDA/ROCm only (heavy)
# Optional features
pip install "abstractcore[tools]" # built-in web tools (web_search, skim_websearch, skim_url, fetch_url)
pip install "abstractcore[media]" # images, PDFs, Office docs
pip install "abstractcore[embeddings]" # EmbeddingManager + local embedding models
pip install "abstractcore[tokens]" # precise token counting (tiktoken)
pip install "abstractcore[server]" # OpenAI-compatible HTTP gateway
pip install "abstractcore[compression]" # Glyph visual-text compression (Pillow renderer)
# Turnkey "everything" installs (pick one)
pip install "abstractcore[all-apple]" # macOS/Apple Silicon (includes MLX, excludes vLLM)
pip install "abstractcore[all-non-mlx]" # Linux/Windows/Intel Mac (excludes MLX and vLLM)
pip install "abstractcore[all-gpu]" # Linux NVIDIA GPU (includes vLLM, excludes MLX)
```
**Hardware Notes:**
- `[mlx]` - Only works on Apple Silicon (M1/M2/M3/M4)
- `[vllm]` - Only works with NVIDIA CUDA GPUs
- `[all-apple]` - Best for Apple Silicon (includes MLX, excludes vLLM)
- `[all-non-mlx]` - Best for Linux/Windows/Intel Mac (excludes MLX and vLLM)
- `[all-gpu]` - Best for Linux NVIDIA GPU (includes vLLM, excludes MLX)
## Cloud Provider Setup
### OpenAI Setup
**Best for**: Production applications and OpenAI’s hosted models
#### 1. Get API Key
1. Go to OpenAI API Dashboard
2. Create account or sign in
3. Click "Create new secret key"
4. Copy the key (starts with `sk-`)
#### 2. Set Environment Variable
```bash
# Option 1: Export in terminal (temporary)
export OPENAI_API_KEY="sk-your-actual-api-key-here"
# Option 2: Add to ~/.bashrc or ~/.zshrc (permanent)
echo 'export OPENAI_API_KEY="sk-your-actual-api-key-here"' >> ~/.bashrc
source ~/.bashrc
# Option 3: Create .env file in your project
echo 'OPENAI_API_KEY=sk-your-actual-api-key-here' > .env
```
#### 3. Test Setup
```python
from abstractcore import create_llm
# Test with an example model (use any model available on your account)
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate("Say hello in French")
print(response.content) # Should output: "Bonjour!"
```
**Model names**: Use any model supported by your account (examples: `gpt-4o-mini`, `gpt-4o`).
### Anthropic Setup
**Best for**: Claude models via Anthropic’s API
#### 1. Get API Key
1. Go to Anthropic Console
2. Create account or sign in
3. Go to "API Keys" section
4. Click "Create Key"
5. Copy the key (starts with `sk-ant-`)
#### 2. Set Environment Variable
```bash
# Option 1: Export in terminal (temporary)
export ANTHROPIC_API_KEY="sk-ant-your-actual-api-key-here"
# Option 2: Add to shell profile (permanent)
echo 'export ANTHROPIC_API_KEY="sk-ant-your-actual-api-key-here"' >> ~/.bashrc
source ~/.bashrc
# Option 3: Create .env file
echo 'ANTHROPIC_API_KEY=sk-ant-your-actual-api-key-here' > .env
```
#### 3. Test Setup
```python
from abstractcore import create_llm
# Test with an example model (use any model available on your account)
llm = create_llm("anthropic", model="claude-haiku-4-5")
response = llm.generate("Explain Python in one sentence")
print(response.content)
```
**Model names**: Use any model supported by your account (examples: `claude-haiku-4-5`, `claude-sonnet-4-5`).
### Gateway Provider Setup (OpenRouter, Portkey)
**Best for**: routing, observability/governance, and unified billing across multiple backends.
Gateways expose an OpenAI-compatible `/v1` endpoint and forward your payload to the routed backend model. Because some backends are strict (for example OpenAI reasoning families like gpt-5/o1 reject unsupported parameters), AbstractCore’s gateway providers forward optional generation parameters (like `temperature`, `top_p`, `max_output_tokens`) **only when explicitly set**.
#### OpenRouter Setup
1. Create an API key: https://openrouter.ai/keys
2. Set the environment variable:
```bash
export OPENROUTER_API_KEY="sk-or-..."
# Optional override (default: https://openrouter.ai/api/v1)
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
```
3. Test:
```python
from abstractcore import create_llm
llm = create_llm("openrouter", model="openai/gpt-4o-mini")
resp = llm.generate("Say hello in French")
print(resp.content)
```
#### Portkey Setup
Portkey routes requests using a **config id** (commonly `pcfg_...`).
1. Create an API key and config in Portkey, then copy:
- `PORTKEY_API_KEY`
- `PORTKEY_CONFIG` (config id)
2. Set environment variables:
```bash
export PORTKEY_API_KEY="pk_..."
export PORTKEY_CONFIG="pcfg_..."
# Optional override (default: https://api.portkey.ai/v1)
export PORTKEY_BASE_URL="https://api.portkey.ai/v1"
```
3. Test:
```python
from abstractcore import create_llm
llm = create_llm("portkey", model="gpt-4o-mini", config_id="pcfg_...")
resp = llm.generate("Say hello in French")
print(resp.content)
```
## Local Provider Setup
### Ollama Setup
**Best for**: Privacy, no API keys, offline usage, customization
**Requirements**: 8GB+ RAM, works on Mac/Linux/Windows
#### 1. Install Ollama
**macOS:**
```bash
curl -fsSL https://ollama.com/install.sh | sh
# OR download from https://ollama.com/download
```
**Linux:**
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
**Windows:**
1. Download installer from ollama.com/download
2. Run the installer
3. Restart terminal
#### 2. Start Ollama Service
```bash
# Start Ollama server (runs in background)
ollama serve
```
#### 3. Download Models
```bash
# Pull any model you want to use, then verify it's installed.
ollama pull qwen3:4b-instruct-2507-q4_K_M
ollama list
```
#### 4. Test Setup
```python
from abstractcore import create_llm
# Test with any model you installed via `ollama pull ...`
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
response = llm.generate("What is Python?")
print(response.content)
```
### MLX Setup
**Best for**: M1/M2/M3/M4 Macs, optimized inference, good speed
**Requirements**: Apple Silicon Mac (M1/M2/M3/M4)
#### 1. Install MLX Dependencies
```bash
# MLX is automatically installed with AbstractCore
pip install "abstractcore[mlx]"
```
#### 2. Download Models
MLX models are automatically downloaded when first used. Popular options:
```python
from abstractcore import create_llm
# Models are auto-downloaded on first use
llm = create_llm("mlx", model="mlx-community/Qwen2.5-Coder-7B-Instruct-4bit") # 4.2GB
# OR
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit") # 1.8GB
```
#### 3. Test Setup
```python
from abstractcore import create_llm
# Test with a good balance model
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit")
response = llm.generate("Explain machine learning briefly")
print(response.content)
```
**Popular MLX Models**:
- `mlx-community/Llama-3.2-3B-Instruct-4bit` - 1.8GB, fast
- `mlx-community/Qwen2.5-Coder-7B-Instruct-4bit` - 4.2GB, suitable for code
- `mlx-community/Llama-3.1-8B-Instruct-4bit` - 4.7GB, high quality
### LMStudio Setup
**Best for**: Easy GUI management, Windows users, non-technical users
**Requirements**: 8GB+ RAM, works on Mac/Linux/Windows
#### 1. Install LMStudio
1. Download from lmstudio.ai
2. Install the application
3. Launch LMStudio
#### 2. Download Models
1. Open LMStudio
2. Go to "Discover" tab
3. Search for recommended models:
- `microsoft/Phi-3-mini-4k-instruct-gguf` (small, fast)
- `microsoft/Phi-3-medium-4k-instruct-gguf` (medium quality)
- `meta-llama/Llama-2-7b-chat-gguf` (good general purpose)
4. Click download for your preferred model
#### 3. Start Local Server
1. Go to "Local Server" tab in LMStudio
2. Select your downloaded model
3. Click "Start Server"
4. Note the port (usually 1234)
#### 4. Test Setup
```python
from abstractcore import create_llm
# LM Studio exposes an OpenAI-compatible server (default: http://localhost:1234/v1).
# Use the model ID shown in LM Studio (or try "local-model" if unsure).
llm = create_llm("lmstudio", model="local-model", base_url="http://localhost:1234/v1")
resp = llm.generate("Hello, how are you?")
print(resp.content)
```
### HuggingFace Setup
**Best for**: Latest research models, custom models, GGUF files
**Requirements**: 8GB+ RAM, Python environment
#### 1. Install Dependencies
```bash
pip install "abstractcore[huggingface]"
```
#### 2. Optional: Get HuggingFace Token
For private models or higher rate limits:
1. Go to huggingface.co/settings/tokens
2. Create a "Read" token
3. Set environment variable:
```bash
export HUGGINGFACE_TOKEN="hf_your-token-here"
```
#### 3. Test Setup
```python
from abstractcore import create_llm
# Use a small model for testing (auto-downloads)
llm = create_llm("huggingface", model="microsoft/DialoGPT-medium")
response = llm.generate("Hello there!")
print(response.content)
```
**Popular HuggingFace Models**:
- `microsoft/DialoGPT-medium` - Good for conversation
- `facebook/blenderbot-400M-distill` - Conversational AI
- `microsoft/CodeBERT-base` - Code understanding
### vLLM Setup
**Best for**: Production GPU deployments, high-throughput inference, tensor parallelism
**Requirements**:
- **NVIDIA GPU with CUDA support** (A100, H100, RTX 4090, etc.)
- Linux operating system
- CUDA 12.1+ installed
- 16GB+ VRAM recommended
- **NOT compatible with**: Apple Silicon, AMD GPUs, CPU-only systems
**NVIDIA CUDA only.** If you’re on Apple Silicon, use MLX. If you’re on CPU-only, use Ollama/HuggingFace.
#### ⚠️ Hardware Compatibility Warning
**vLLM ONLY works with NVIDIA CUDA GPUs.** It will NOT work on:
- ❌ Apple Silicon (M1/M2/M3/M4) - Use MLX provider instead
- ❌ AMD GPUs - Use HuggingFace or Ollama instead
- ❌ Intel integrated graphics
- ❌ CPU-only systems
#### 1. Install vLLM
```bash
# Install AbstractCore with vLLM support
pip install "abstractcore[vllm]"
# This installs vLLM which requires NVIDIA CUDA
# If you get CUDA errors, ensure CUDA 12.1+ is installed:
# https://developer.nvidia.com/cuda-downloads
```
#### 2. Start vLLM Server
**IMPORTANT**: Check your GPU setup first to avoid Out Of Memory (OOM) errors:
```bash
# Check available GPUs
nvidia-smi
# Shows: GPU name, VRAM capacity, and current usage
# Example: 4x NVIDIA L4 (23GB each) = 92GB total
```
**Choose the right startup command based on your hardware:**
```bash
# Single GPU (24GB+) - Works for 7B-14B models
vllm serve Qwen/Qwen2.5-Coder-7B-Instruct --port 8000
# Single GPU (24GB+) - For 30B models, reduce memory
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
--port 8000 \
--gpu-memory-utilization 0.85 \
--max-model-len 4096
# Multiple GPUs (RECOMMENDED for 30B models) - Use tensor parallelism
# Example: 4x NVIDIA L4 (23GB each)
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
--host 0.0.0.0 --port 8000 \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.9 \
--max-model-len 8192 \
--max-num-seqs 128
# Multiple GPUs + LoRA support (Production setup)
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
--host 0.0.0.0 --port 8000 \
--tensor-parallel-size 4 \
--enable-lora --max-loras 4 \
--gpu-memory-utilization 0.9 \
--max-model-len 8192 \
--max-num-seqs 128
```
**Key Parameters:**
- `--tensor-parallel-size N` - Split model across N GPUs (REQUIRED for 30B+ models on <40GB GPUs)
- `--gpu-memory-utilization 0.9` - Use 90% of GPU memory (leave 10% for CUDA overhead)
- `--max-model-len` - Maximum context length (reduce if OOM)
- `--max-num-seqs` - Maximum concurrent sequences (128 recommended for 30B models, default 256 may cause OOM)
- `--enable-lora` - Enable dynamic LoRA adapter loading
- `--max-loras` - Maximum number of LoRA adapters to keep in memory
**Troubleshooting OOM Errors:**
If you see `CUDA out of memory` errors:
1. **Reduce concurrent sequences**: `--max-num-seqs 128` (or 64, 32 for tighter memory)
2. **Enable tensor parallelism**: `--tensor-parallel-size 2` (or 4, 8 depending on GPU count)
3. **Reduce memory usage**: `--gpu-memory-utilization 0.85 --max-model-len 4096`
4. **Use smaller model**: `Qwen/Qwen2.5-Coder-7B-Instruct` instead of 30B
5. **Use quantized model**: `Qwen/Qwen2.5-Coder-30B-Instruct-AWQ` (4-bit quantization)
**Test server is running:**
```bash
# Check server health
curl http://localhost:8000/health
# List available models
curl http://localhost:8000/v1/models
# Test generation
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
"messages": [{"role": "user", "content": "Say hello"}],
"max_tokens": 50
}'
```
#### 3. Test Setup
```python
from abstractcore import create_llm
# Basic generation
llm = create_llm("vllm", model="Qwen/Qwen3-Coder-30B-A3B-Instruct")
response = llm.generate("Write a Python function to sort a list")
print(response.content)
# With guided JSON (vLLM-specific feature)
response = llm.generate(
"List 3 programming languages",
guided_json={
"type": "object",
"properties": {
"languages": {"type": "array", "items": {"type": "string"}}
}
}
)
print(response.content)
```
#### 4. vLLM-Specific Features
**Guided Decoding** (syntax-constrained generation):
```python
# Regex-constrained generation
response = llm.generate(
"Write a Python function",
guided_regex=r"def \w+\([^)]*\):\n(?:\s{4}.*\n)+"
)
# JSON schema enforcement
response = llm.generate(
"Extract person info",
guided_json={"type": "object", "properties": {...}}
)
```
**Multi-LoRA** (1 base model → many specialized agents):
```python
# Load specialized adapters
llm.load_adapter("sql-expert", "/models/adapters/sql-lora")
llm.load_adapter("react-dev", "/models/adapters/react-lora")
# Route to specialized adapter
response = llm.generate("Write SQL query", model="sql-expert")
```
**Beam Search** (higher accuracy for complex tasks):
```python
response = llm.generate(
"Solve this complex algorithm problem...",
use_beam_search=True,
best_of=5 # Generate 5 candidates, return best
)
```
#### Environment Variables
```bash
# vLLM server URL (default: http://localhost:8000/v1)
export VLLM_BASE_URL="http://192.168.1.100:8000/v1"
# Optional API key (if server started with --api-key)
export VLLM_API_KEY="your-api-key"
# HuggingFace cache (shared with HF/MLX providers)
export HF_HOME="~/.cache/huggingface"
```
**Available Models**:
- `Qwen/Qwen3-Coder-30B-A3B-Instruct` (default) - Excellent for code
- `meta-llama/Llama-3.1-8B-Instruct` - Good general purpose
- `mistralai/Mistral-7B-Instruct-v0.3` - Fast and efficient
- Any HuggingFace model compatible with vLLM
**Performance notes**: Throughput depends on model size, context length, concurrency, quantization, and GPU. See vLLM docs for tuning knobs (`--tensor-parallel-size`, `--max-model-len`, `--max-num-seqs`, …).
### OpenAI-Compatible Setup
**Best for**: any OpenAI-compatible `/v1` endpoint (llama.cpp servers, LocalAI, text-generation-webui, custom proxies, etc.)
AbstractCore supports a generic OpenAI-compatible provider plus specific convenience providers (LM Studio, vLLM, OpenRouter, Portkey).
#### 1. Get the endpoint base URL
You must include `/v1` for OpenAI-compatible servers:
```bash
export OPENAI_COMPATIBLE_BASE_URL="http://localhost:1234/v1"
# Optional (if your endpoint requires auth)
export OPENAI_COMPATIBLE_API_KEY="your-api-key"
```
#### 2. Test Setup
```python
from abstractcore import create_llm
llm = create_llm("openai-compatible", model="default", base_url="http://localhost:1234/v1")
resp = llm.generate("Say hello in French")
print(resp.content)
```
## Troubleshooting
### Common Issues
#### "No module named .abstractcore."
```bash
# Make sure you installed AbstractCore
pip install abstractcore
```
#### "OpenAI API key not found"
```bash
# Check if environment variable is set
echo $OPENAI_API_KEY
# If empty, set it:
export OPENAI_API_KEY="sk-your-key-here"
```
#### "Connection error to Ollama"
```bash
# Make sure Ollama is running
ollama serve
# Check if models are available
ollama list
# Pull a model if none available
ollama pull gemma3:1b
```
#### "Model not found in MLX"
```python
# Use exact model names from HuggingFace MLX community
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit")
```
#### "LMStudio connection refused"
```bash
# Make sure LMStudio server is running on correct port
# Check LMStudio logs for the exact port and URL
```
### Memory Issues
#### "Out of memory" with local models
```bash
# Try smaller models
ollama pull gemma3:1b # Only 1.3GB
ollama pull tinyllama # Only 637MB
# Or increase swap space on Linux
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
```
#### MLX models too slow
```python
# Use 4-bit quantized models for faster inference
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit")
```
### API Key Issues
#### OpenAI billing issues
1. Check your billing dashboard
2. Add payment method if needed
3. Check usage limits
#### Anthropic rate limits
1. Check your console
2. Upgrade to higher tier if needed
3. Implement retry logic in your code
## Testing Your Setup
### Universal Test Script
Save this as `test_setup.py` and run it to test all your providers:
```python
#!/usr/bin/env python3
"""Test script for AbstractCore providers"""
import os
from abstractcore import create_llm
def test_provider(provider_name, model, **kwargs):
"""Test a specific provider"""
try:
print(f"\n🧪 Testing {provider_name} with {model}...")
llm = create_llm(provider_name, model=model, **kwargs)
response = llm.generate("Say 'Hello from AbstractCore!'")
print(f"[OK] {provider_name}: {response.content}")
return True
except Exception as e:
print(f"[FAIL] {provider_name}: {e}")
return False
def main():
print("AbstractCore Provider Test Suite")
print("=" * 40)
results = {}
# Test cloud providers (if API keys available)
if os.getenv("OPENAI_API_KEY"):
results["OpenAI"] = test_provider("openai", "gpt-4o-mini")
else:
print("\n⚠️ Skipping OpenAI (no OPENAI_API_KEY)")
if os.getenv("ANTHROPIC_API_KEY"):
results["Anthropic"] = test_provider("anthropic", "claude-haiku-4-5")
else:
print("\n⚠️ Skipping Anthropic (no ANTHROPIC_API_KEY)")
if os.getenv("OPENROUTER_API_KEY"):
results["OpenRouter"] = test_provider("openrouter", "openai/gpt-4o-mini")
else:
print("\n⚠️ Skipping OpenRouter (no OPENROUTER_API_KEY)")
# Test local providers
results["Ollama"] = test_provider("ollama", "gemma3:1b")
try:
results["MLX"] = test_provider("mlx", "mlx-community/Llama-3.2-3B-Instruct-4bit")
except:
print("\n⚠️ Skipping MLX (not on Apple Silicon or model not available)")
try:
# Note: OpenAI-compatible servers expect `/v1` in the base URL (LM Studio default is http://localhost:1234/v1)
results["LMStudio"] = test_provider("lmstudio", "qwen/qwen3-4b-2507", base_url="http://localhost:1234/v1")
except:
print("\n⚠️ Skipping LMStudio (server not running on localhost:1234)")
# Summary
print("\n" + "=" * 40)
print("Test Results:")
working = [name for name, success in results.items() if success]
if working:
print(f"[OK] Working providers: {', '.join(working)}")
else:
print("[FAIL] No providers working")
print("\n[INFO] Next steps:")
print("- Add API keys for cloud providers")
print("- Install Ollama and download models")
print("- Start LMStudio local server")
print("- See docs/prerequisites.md for detailed setup")
if __name__ == "__main__":
main()
```
Run the test:
```bash
python test_setup.py
```
### Live API smoke tests (opt-in)
Some tests are intentionally **real network calls** and are disabled by default. To enable them, set:
- `ABSTRACTCORE_RUN_LIVE_API_TESTS=1`
Example (OpenRouter):
```bash
ABSTRACTCORE_RUN_LIVE_API_TESTS=1 OPENROUTER_API_KEY="$OPENROUTER_API_KEY" \
python -m pytest -q tests/test_graceful_fallback.py::test_openrouter_generation_smoke
```
Local provider smoke tests use `ABSTRACTCORE_RUN_LOCAL_PROVIDER_TESTS=1` (and `ABSTRACTCORE_RUN_MLX_TESTS=1` for MLX).
## Security Notes
### API Keys
- Never commit API keys to version control
- Use environment variables or `.env` files
- Rotate keys periodically
- Monitor usage for unexpected spikes
### Local Models
- Local models keep data on your machine
- No internet required after initial download
- Models can be large (1GB-20GB+)
- Some models may have usage restrictions
### Network Security
- LMStudio and Ollama servers run locally by default
- Be careful exposing servers to network (use authentication)
- Consider firewall rules for production deployments
This setup guide should get you running with any AbstractCore provider. Choose what works well for your use case - you can always add more providers later!
---
### Inlined: `docs/api.md`
# API (Python)
This page is a user-facing map of the **public Python API** exposed from `abstractcore` (see `abstractcore/__init__.py`). For a complete listing of functions/classes (including events), see **API Reference**.
New to AbstractCore? Start with **Getting Started**.
Implementation pointers (source of truth):
- `create_llm`: `abstractcore/core/factory.py` → `abstractcore/providers/registry.py`
- `BasicSession`: `abstractcore/core/session.py`
- Response/types: `abstractcore/core/types.py`
- Tool decorator: `abstractcore/tools/core.py`
## Core entrypoints
### `create_llm(...)`
Create a provider instance:
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini") # requires: pip install "abstractcore[openai]"
resp = llm.generate("Hello!")
print(resp.content)
```
Provider IDs (common): `openai`, `anthropic`, `openrouter`, `portkey`, `ollama`, `lmstudio`, `vllm`, `openai-compatible`, `huggingface`, `mlx`.
### Gateway providers (OpenRouter, Portkey)
```python
from abstractcore import create_llm
llm_openrouter = create_llm("openrouter", model="openai/gpt-4o-mini")
llm_portkey = create_llm("portkey", model="gpt-5-mini", api_key="PORTKEY_API_KEY", config_id="pcfg_...")
```
Gateway notes:
- OpenRouter uses `OPENROUTER_API_KEY` (model names like `openai/...`).
- Portkey uses `PORTKEY_API_KEY` plus a config id (`PORTKEY_CONFIG`).
- Optional generation parameters (`temperature`, `top_p`, `max_output_tokens`, etc.) are only forwarded when explicitly set.
### `BasicSession`
Keep conversation state:
```python
from abstractcore import BasicSession, create_llm
session = BasicSession(create_llm("anthropic", model="claude-haiku-4-5")) # requires: abstractcore[anthropic]
print(session.generate("Give me 3 name ideas.").content)
print(session.generate("Pick the best one.").content)
```
### `tool` (decorator)
Define tools in Python with a decorator, then pass them to `generate()` / `agenerate()`:
```python
from abstractcore import create_llm, tool
@tool
def get_weather(city: str) -> str:
return f"{city}: 22°C and sunny"
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("Use the tool.", tools=[get_weather])
print(resp.tool_calls)
```
## Responses (`GenerateResponse`)
Most calls return a `GenerateResponse` object (or an iterator of them for streaming). Common fields:
- `content`: cleaned assistant text
- `tool_calls`: structured tool calls (pass-through by default)
- `usage`: token usage (provider-dependent)
- `metadata`: provider/model specific fields (for example extracted reasoning text when configured)
## Model downloads (`download_model`, optional)
`download_model(...)` is an **async generator** that yields `DownloadProgress` updates while a model is being fetched.
Supported providers:
- `ollama`: pulls via the Ollama HTTP API (`/api/pull`)
- `huggingface` / `mlx`: downloads from HuggingFace Hub (requires `pip install "abstractcore[huggingface]"`; pass `token=` for gated models)
Example:
```python
import asyncio
from abstractcore import download_model
async def main():
async for p in download_model("ollama", "qwen3:4b-instruct-2507-q4_K_M"):
print(p.status.value, p.message)
asyncio.run(main())
```
Implementation: `abstractcore/download.py`. For provider setup and base URLs, see Prerequisites.
## Tool calling
Tools are passed explicitly to `generate()` / `agenerate()`:
```python
from abstractcore import create_llm, tool
@tool
def get_weather(city: str) -> str:
return f"{city}: 22°C and sunny"
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("Use the tool.", tools=[get_weather])
print(resp.tool_calls)
```
See **Tool Calling** and **Tool Syntax Rewriting**.
### Built-in tools (optional)
If you want a ready-made toolset (web + filesystem helpers), install:
```bash
pip install "abstractcore[tools]"
```
Then import from `abstractcore.tools.common_tools` (for example `web_search`, `skim_websearch`, `skim_url`, `fetch_url`). See **Tool Calling** for usage patterns and when to use `skim_*` vs `fetch_*`.
## Structured output
Pass a Pydantic model via `response_model=...` to receive a typed result:
```python
from pydantic import BaseModel
from abstractcore import create_llm
class Answer(BaseModel):
title: str
bullets: list[str]
llm = create_llm("openai", model="gpt-4o-mini")
result = llm.generate("Summarize HTTP/3 in 3 bullets.", response_model=Answer)
print(result.bullets)
```
See **Structured Output**.
## Media input
Media handling is opt-in:
```bash
pip install "abstractcore[media]"
```
Then pass `media=[...]` to `generate()` / `agenerate()` (or use the media pipeline). Media behavior is **policy-driven**:
- Images: use a vision-capable model, or configure vision fallback (caption → inject short observations).
- Video: controlled by `video_policy` (native when supported; otherwise frame sampling via `ffmpeg` + vision handling).
- Audio: controlled by `audio_policy` (native when supported; otherwise optional speech-to-text via `abstractvoice`).
See **Media Handling**, **Vision Capabilities**, and **Centralized Config**.
## HTTP API (optional)
If you want an OpenAI-compatible `/v1` gateway, install and run the server:
```bash
pip install "abstractcore[server]"
python -m abstractcore.server.app
```
See **Server**.
---
### Inlined: `docs/session.md`
# Session Management and Serialization
AbstractCore provides comprehensive session management with complete serialization capabilities, preserving every aspect of your conversations including metadata, tool executions, and optional analytics.
## Overview
A **BasicSession** represents a complete conversation with an LLM, including:
- All messages with timestamps and metadata
- Tool calls and their results (inline with conversation flow)
- Session configuration and settings
- Optional analytics: summary, assessment, and extracted facts
## API Design: Two Methods for Different Purposes
The `BasicSession` provides two main methods for managing conversation history:
### `generate()` - For Normal Conversations (Recommended)
Use this for typical chat interactions where you want the LLM to respond:
```python
# Normal conversation flow
response = session.generate("What is Python?", name="alice")
# This automatically:
# 1. Adds your message to history
# 2. Calls the LLM provider
# 3. Adds the assistant's response to history
# 4. Returns a GenerateResponse object with full metadata
# Access the response data
print(f"Response: {response.content}") # Generated text
print(f"Tokens used: {response.total_tokens}") # Token count
print(f"Generation time: {response.gen_time}ms") # Performance metrics
```
### `add_message()` - For Manual History Management
Use this when you need fine-grained control over conversation history:
```python
# Add system messages
session.add_message('system', 'You are a helpful assistant.')
# Add messages without triggering LLM generation
session.add_message('user', 'Hello!', name='alice')
session.add_message('assistant', 'Hi there!')
# Add tool messages
session.add_message('tool', '{"result": "success"}')
```
**Key Difference**: `generate()` triggers LLM response generation, `add_message()` only adds to history.
**Parameter Consistency**: Both methods use `name` parameter, which aligns with the `metadata.name` field in the serialization schema.
## Session Serialization
### Why Serialize Sessions?
Session serialization enables:
- **Persistence**: Save and restore conversations across application restarts
- **Portability**: Share conversations between different environments
- **Analytics**: Generate summaries, assessments, and fact extractions
- **Auditing**: Complete conversation history with tool executions
- **Memory Management**: Load partial conversation windows while preserving full history
### Serialization Format
Sessions are serialized as JSON with a versioned schema for future compatibility:
```json
{
"schema_version": "session-archive/v1",
"session": {
"id": "sess_01J8...",
"created_at": "2025-10-13T14:52:46Z",
"provider": "openai",
"model": "gpt-4o-mini",
"system_prompt": "You are a helpful AI assistant.",
"settings": { "auto_compact": true },
"summary": { /* optional */ },
"assessment": { /* optional */ },
"facts": { /* optional */ }
},
"messages": [ /* complete conversation history */ ]
}
```
### Field Descriptions
#### Session Fields
- **`id`**: Unique session identifier for tracking and correlation
- **`created_at`**: ISO timestamp of session creation
- **`provider`**: LLM provider used (openai, anthropic, ollama, etc.)
- **`model`**: Specific model name (gpt-4o-mini, claude-haiku-4-5, etc.)
- **`model_params`**: Model parameters used (temperature, max_tokens, etc.)
- **`system_prompt`**: The system prompt that guides the assistant's behavior
- **`tool_registry`**: Available tools with their schemas (declarative, no executable code)
- **`settings`**: Session configuration (auto_compact, thresholds, etc.)
#### Optional Analytics Fields
- **`summary`** *(optional)*: Compressed representation of the entire conversation
- `created_at`: When the summary was generated
- `preserve_recent`: Number of recent messages preserved during compaction
- `focus`: Summary focus (e.g., "technical decisions", "key outcomes")
- `text`: The actual summary content
- `metrics`: Compression statistics (tokens before/after, ratio)
- **`assessment`** *(optional)*: Quality evaluation of the entire conversation
- `created_at`: When the assessment was generated
- `criteria`: Evaluation criteria used (clarity, coherence, relevance, etc.)
- `overall_score`: Numeric score (typically 1-5)
- `judge_summary`: Brief assessment summary
- `strengths`: List of conversation strengths
- `actionable_feedback`: Suggestions for improvement
- **`facts`** *(optional)*: Extracted facts and knowledge from the conversation
- `extracted_at`: When facts were extracted
- `simple_triples`: Array of [subject, predicate, object] fact triples
- `jsonld`: Optional JSON-LD structured data
- `statistics`: Extraction statistics (entity count, relationship count)
#### Message Structure
Each message preserves the complete conversational flow:
```json
{
"id": "msg_01J8...",
"role": "user|assistant|system|tool",
"timestamp": "2025-10-13T14:55:20.123Z",
"content": "Message content",
"metadata": {
"name": "alice",
"location": "London, UK",
"custom_field": "value"
}
}
```
**Message Fields:**
- **`id`**: Unique message identifier
- **`role`**: Message role (user, assistant, system, tool)
- **`timestamp`**: When the message was created (auto-generated)
- **`content`**: The actual message content
- **`metadata`**: Flexible container for additional context
- `name`: Username (defaults to "user" for user messages)
- `location`: Geographic or contextual location
- Any additional custom fields
#### Tool Execution Flow
Tool calls are preserved inline with the conversation to maintain sequence:
```json
[
{
"role": "assistant",
"content": "Let me read that file for you.",
"metadata": {
"requested_tool_calls": [
{
"call_id": "tc_01K",
"name": "read_file",
"arguments": { "path": "README.md" }
}
]
}
},
{
"role": "tool",
"content": "File contents...",
"metadata": {
"call_id": "tc_01K",
"name": "read_file",
"arguments": { "path": "README.md" },
"status": "ok",
"duration_ms": 120,
"stderr": null
}
}
]
```
This approach:
- Preserves exact execution order
- Links tool calls to results via `call_id`
- Captures execution metadata (duration, status, errors)
- Maintains human-readable conversation flow
## Usage Examples
### Basic Session Persistence
```python
from abstractcore import BasicSession, create_llm
# Create and use session
provider = create_llm("openai", model="gpt-4o-mini")
session = BasicSession(provider, system_prompt="You are a helpful assistant.")
session.add_message('user', 'Hello!', name='alice', location='Paris')
response = session.generate('What is Python?')
# Save complete session
session.save('conversation.json')
# Load session later
loaded_session = BasicSession.load('conversation.json', provider=provider)
```
### Session with Analytics
```python
# Generate optional analytics
session.generate_summary(focus="technical discussion")
session.generate_assessment(criteria=["clarity", "completeness"])
session.extract_facts()
# Save with all analytics
session.save('analyzed_conversation.json')
```
### Memory Window Management
```python
# Get recent messages for LLM context
recent_messages = session.get_window(last_n=10)
# Get messages within time range
today_messages = session.get_window(
since="2025-10-13T00:00:00Z",
until="2025-10-13T23:59:59Z"
)
# Get messages with token budget
windowed = session.get_window(
token_budget=4000,
include_summary=True # Prepend summary if context trimmed
)
```
## Schema Reference
The complete JSON schema is available at: `abstractcore/assets/session_schema.json`
This schema can be used for:
- Validation of serialized sessions
- Integration with other tools and systems
- Documentation generation
- API contract definition
## CLI Integration
The CLI provides convenient commands for session management:
```bash
# Save session with basic serialization
/session save my_conversation
# Save with optional analytics
/session save analyzed_session --summary --assessment --facts
# Load session
/session load my_conversation
# Generate analytics on demand
/facts
/judge
# Optional: persist local prompt/KV cache (MLX only)
/cache save chat_cache
/cache load chat_cache
```
## Best Practices
### When to Use Analytics
- **Summary**: For long conversations that need compaction
- **Assessment**: For evaluating conversation quality and outcomes
- **Facts**: For knowledge extraction and structured data needs
### Performance Considerations
- Analytics are optional and computed on-demand
- Large conversations benefit from windowing for active memory
- JSON format balances human readability with performance
- Consider compression (gzip/zstd) for very large sessions
### Security and Privacy
- Sessions contain complete conversation history
- Metadata may include sensitive information (usernames, locations)
- If your host/runtime appends tool results into the conversation, those results will be preserved (and may contain file contents, etc.)
- Store sessions securely and consider data retention policies
## Migration and Compatibility
The versioned schema (`session-archive/v1`) ensures:
- Backward compatibility with older session formats
- Forward compatibility through graceful field handling
- Safe evolution of the serialization format
Legacy sessions are automatically migrated on load, preserving all available data while upgrading to the current schema.
---
### Inlined: `docs/async-guide.md`
# Async/Await Guide
Complete guide to using async/await with AbstractCore for concurrent LLM operations.
## Overview
AbstractCore exposes `agenerate()` for async generation across providers.
- **HTTP-based providers** (OpenAI-compatible endpoints, OpenRouter, Ollama, LMStudio, vLLM, etc.) implement native async I/O.
- **In-process local inference** providers (MLX, HuggingFace) use an `asyncio.to_thread()` fallback to avoid blocking the event loop.
Concurrency can improve throughput when requests are **I/O-bound** (network calls). For local inference, throughput is limited by your hardware and the model runtime.
## Provider support
| Provider | Async implementation |
|----------|----------------------|
| `openai`, `anthropic` | Native async SDK clients (when installed) |
| HTTP-based providers (`ollama`, `lmstudio`, `openrouter`, `vllm`, `openai-compatible`, …) | `httpx.AsyncClient` (native async HTTP) |
| `mlx`, `huggingface` | `asyncio.to_thread()` fallback (keeps the event loop responsive) |
## Basic Usage
### Single Async Request
```python
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
# Single async request
response = await llm.agenerate("What is Python?")
print(response.content)
asyncio.run(main())
```
### Concurrent Requests
```python
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("ollama", model="qwen3:4b")
# Execute 3 requests concurrently
tasks = [
llm.agenerate(f"Summarize {topic}")
for topic in ["Python", "JavaScript", "Rust"]
]
# Gather runs all tasks concurrently
responses = await asyncio.gather(*tasks)
for i, response in enumerate(responses):
print(f"\n{['Python', 'JavaScript', 'Rust'][i]}:")
print(response.content)
asyncio.run(main())
```
## Async Streaming
### Basic Streaming
```python
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("anthropic", model="claude-haiku-4-5")
# Step 1: await the generator
stream_gen = await llm.agenerate(
"Write a haiku about coding",
stream=True
)
# Step 2: async for over the chunks
async for chunk in stream_gen:
if chunk.content:
print(chunk.content, end="", flush=True)
print()
asyncio.run(main())
```
### Concurrent Streaming
```python
import asyncio
from abstractcore import create_llm
async def stream_response(llm, topic, label):
"""Stream a single response with label."""
print(f"\n{label}:")
stream_gen = await llm.agenerate(f"Explain {topic} in one sentence", stream=True)
async for chunk in stream_gen:
if chunk.content:
print(chunk.content, end="", flush=True)
print()
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
# Stream 3 responses concurrently
await asyncio.gather(
stream_response(llm, "Python", "Python"),
stream_response(llm, "JavaScript", "JavaScript"),
stream_response(llm, "Rust", "Rust")
)
asyncio.run(main())
```
## Session Async
### Async Conversation Management
```python
import asyncio
from abstractcore import create_llm
from abstractcore.core.session import BasicSession
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
session = BasicSession(provider=llm)
# Maintain conversation history with async
response1 = await session.agenerate("What is Python?")
print(response1.content)
response2 = await session.agenerate("What are its main use cases?")
print(response2.content)
# Session tracks full conversation history
print(f"\nConversation length: {len(session.conversation_history)} messages")
asyncio.run(main())
```
### Concurrent Sessions
```python
import asyncio
from abstractcore import create_llm
from abstractcore.core.session import BasicSession
async def chat_session(llm, topic, name):
"""Run independent chat session."""
session = BasicSession(provider=llm)
response1 = await session.agenerate(f"What is {topic}?")
response2 = await session.agenerate("Give me a simple example")
print(f"\n{name}:")
print(f" Question 1: {response1.content[:50]}...")
print(f" Question 2: {response2.content[:50]}...")
async def main():
llm = create_llm("anthropic", model="claude-haiku-4-5")
# Run 3 independent conversations concurrently
await asyncio.gather(
chat_session(llm, "Python", "Session 1"),
chat_session(llm, "JavaScript", "Session 2"),
chat_session(llm, "Rust", "Session 3")
)
asyncio.run(main())
```
## Multi-Provider Comparisons
### Concurrent Provider Queries
```python
import asyncio
from abstractcore import create_llm
async def query_provider(provider_name, model, prompt):
"""Query a single provider."""
llm = create_llm(provider_name, model=model)
response = await llm.agenerate(prompt)
return {
"provider": provider_name,
"model": model,
"response": response.content
}
async def main():
prompt = "What is the capital of France?"
# Query multiple providers simultaneously
results = await asyncio.gather(
query_provider("openai", "gpt-4o-mini", prompt),
query_provider("anthropic", "claude-haiku-4-5", prompt),
query_provider("ollama", "qwen3:4b", prompt)
)
for result in results:
print(f"\n{result['provider']} ({result['model']}):")
print(result['response'])
asyncio.run(main())
```
### Provider Consensus
```python
import asyncio
from abstractcore import create_llm
async def main():
prompt = "Is the Earth flat? Answer yes or no."
# Get consensus from 3 providers
llm_openai = create_llm("openai", model="gpt-4o-mini")
llm_anthropic = create_llm("anthropic", model="claude-haiku-4-5")
llm_ollama = create_llm("ollama", model="qwen3:4b")
responses = await asyncio.gather(
llm_openai.agenerate(prompt),
llm_anthropic.agenerate(prompt),
llm_ollama.agenerate(prompt)
)
answers = [r.content.strip().lower() for r in responses]
print(f"Answers: {answers}")
print(f"Consensus: {'Yes' if answers.count('no') >= 2 else 'No'}")
asyncio.run(main())
```
## FastAPI Integration
### Async HTTP Endpoints
```python
from fastapi import FastAPI
from abstractcore import create_llm
app = FastAPI()
llm = create_llm("openai", model="gpt-4o-mini")
@app.post("/generate")
async def generate(prompt: str):
"""Non-blocking LLM generation endpoint."""
response = await llm.agenerate(prompt)
return {"response": response.content}
@app.post("/batch")
async def batch_generate(prompts: list[str]):
"""Process multiple prompts concurrently."""
tasks = [llm.agenerate(p) for p in prompts]
responses = await asyncio.gather(*tasks)
return {
"responses": [r.content for r in responses]
}
# Run with: uvicorn your_app:app --reload
```
### Streaming Endpoint
```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from abstractcore import create_llm
import asyncio
app = FastAPI()
llm = create_llm("anthropic", model="claude-haiku-4-5")
async def stream_response(prompt: str):
"""Generate streaming response."""
stream_gen = await llm.agenerate(prompt, stream=True)
async for chunk in stream_gen:
if chunk.content:
yield f"data: {chunk.content}\n\n"
@app.post("/stream")
async def stream_generate(prompt: str):
"""Streaming LLM generation endpoint."""
return StreamingResponse(
stream_response(prompt),
media_type="text/event-stream"
)
```
## Batch Document Processing
### Concurrent Document Summaries
```python
import asyncio
from abstractcore import create_llm
from abstractcore.processing import Summarizer
async def summarize_document(summarizer, doc_path):
"""Summarize single document."""
result = summarizer.summarize(
input_source=doc_path,
style="executive",
length="brief"
)
return {
"path": doc_path,
"summary": result.summary
}
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
summarizer = Summarizer(llm)
documents = [
"report1.pdf",
"report2.pdf",
"report3.pdf"
]
# Summarize all documents concurrently
tasks = [summarize_document(summarizer, doc) for doc in documents]
results = await asyncio.gather(*tasks)
for result in results:
print(f"\n{result['path']}:")
print(result['summary'])
asyncio.run(main())
```
## Error Handling
### Graceful Error Recovery
```python
import asyncio
from abstractcore import create_llm
from abstractcore.exceptions import ProviderAPIError
async def safe_generate(llm, prompt, label):
"""Generate with error handling."""
try:
response = await llm.agenerate(prompt)
return {"label": label, "content": response.content, "error": None}
except ProviderAPIError as e:
return {"label": label, "content": None, "error": str(e)}
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
# Some requests may fail - continue processing others
results = await asyncio.gather(
safe_generate(llm, "Valid prompt 1", "Task 1"),
safe_generate(llm, "Valid prompt 2", "Task 2"),
safe_generate(llm, "Valid prompt 3", "Task 3")
)
for result in results:
if result["error"]:
print(f"{result['label']}: ERROR - {result['error']}")
else:
print(f"{result['label']}: {result['content']}")
asyncio.run(main())
```
## Practical tips
### 1. Prefer native-async providers when possible
```python
# ✅ Native async HTTP (I/O-bound)
llm = create_llm("ollama", model="qwen3:4b")
# ✅ Native async SDK (cloud APIs)
llm = create_llm("openai", model="gpt-4o-mini")
# ⚠️ Fallback: runs sync generation in a thread (keeps the event loop responsive)
llm = create_llm("mlx", model="mlx-community/Qwen3-4B-4bit")
```
### 2. Batch Similar Operations
```python
# ✅ GOOD: Single gather for all tasks
tasks = [llm.agenerate(f"Task {i}") for i in range(10)]
results = await asyncio.gather(*tasks)
# ❌ BAD: Sequential awaits lose concurrency benefit
results = []
for i in range(10):
result = await llm.agenerate(f"Task {i}")
results.append(result)
```
### 3. Mix Async with Sync I/O
```python
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("anthropic", model="claude-haiku-4-5")
# Concurrent: LLM generation + file I/O
llm_task = llm.agenerate("Explain async")
file_task = asyncio.to_thread(read_large_file, "data.txt")
response, data = await asyncio.gather(llm_task, file_task)
# Both completed concurrently!
```
## Common Patterns
### Retry with Exponential Backoff
```python
import asyncio
from abstractcore import create_llm
async def generate_with_retry(llm, prompt, max_retries=3):
"""Generate with exponential backoff retry."""
for attempt in range(max_retries):
try:
return await llm.agenerate(prompt)
except Exception as e:
if attempt == max_retries - 1:
raise
wait_time = 2 ** attempt
print(f"Retry {attempt + 1} after {wait_time}s...")
await asyncio.sleep(wait_time)
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
response = await generate_with_retry(llm, "What is Python?")
print(response.content)
asyncio.run(main())
```
### Rate Limiting
```python
import asyncio
from abstractcore import create_llm
class RateLimiter:
def __init__(self, max_per_second):
self.max_per_second = max_per_second
self.semaphore = asyncio.Semaphore(max_per_second)
self.reset_task = None
async def acquire(self):
await self.semaphore.acquire()
# Release after 1 second
if not self.reset_task or self.reset_task.done():
self.reset_task = asyncio.create_task(self._release_after_delay())
async def _release_after_delay(self):
await asyncio.sleep(1.0)
self.semaphore.release()
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
limiter = RateLimiter(max_per_second=5)
# Process 20 requests with 5 requests/second limit
async def limited_generate(prompt):
await limiter.acquire()
return await llm.agenerate(prompt)
tasks = [limited_generate(f"Task {i}") for i in range(20)]
results = await asyncio.gather(*tasks)
asyncio.run(main())
```
### Progress Tracking
```python
import asyncio
from abstractcore import create_llm
async def generate_with_progress(llm, prompts):
"""Generate with real-time progress tracking."""
completed = 0
total = len(prompts)
async def track_task(prompt):
nonlocal completed
response = await llm.agenerate(prompt)
completed += 1
print(f"Progress: {completed}/{total} ({completed/total*100:.1f}%)")
return response
tasks = [track_task(p) for p in prompts]
return await asyncio.gather(*tasks)
async def main():
llm = create_llm("ollama", model="qwen3:4b")
prompts = [f"Task {i}" for i in range(10)]
results = await generate_with_progress(llm, prompts)
print(f"\nCompleted {len(results)} tasks!")
asyncio.run(main())
```
## Why MLX/HuggingFace Use Fallback
MLX and HuggingFace providers use `asyncio.to_thread()` fallback because:
1. **No Async Library APIs**: Neither `mlx_lm` nor `transformers` expose async Python APIs
2. **Direct Function Calls**: No HTTP layer to enable concurrent I/O
3. **Industry Standard**: Same pattern used by LangChain, Pydantic-AI for CPU-bound operations
4. **Event Loop Responsive**: Fallback keeps event loop responsive for mixing with I/O
```python
# MLX/HF async example (fallback keeps event loop responsive)
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("mlx", model="mlx-community/Qwen3-4B-4bit")
# Can mix MLX inference with async I/O
inference_task = llm.agenerate("What is Python?")
io_task = fetch_data_from_api() # Async I/O
# Both run concurrently - event loop not blocked!
response, data = await asyncio.gather(inference_task, io_task)
asyncio.run(main())
```
If you run local inference behind an OpenAI-compatible HTTP server (for example, via LM Studio), you can use the `lmstudio` (or `openai-compatible`) provider for native async I/O to the server:
```python
llm = create_llm("lmstudio", model="local-model", base_url="http://localhost:1234/v1")
```
## Best Practices
### 1. Always Use asyncio.gather() for Concurrent Tasks
```python
# ✅ CORRECT: All tasks run concurrently
results = await asyncio.gather(*[llm.agenerate(p) for p in prompts])
# ❌ WRONG: Sequential execution (no concurrency)
results = [await llm.agenerate(p) for p in prompts]
```
### 2. Await Stream Generator First
```python
# ✅ CORRECT: Two-step pattern
stream_gen = await llm.agenerate(prompt, stream=True)
async for chunk in stream_gen:
print(chunk.content, end="")
# ❌ WRONG: Missing await before async for
async for chunk in llm.agenerate(prompt, stream=True): # Error!
print(chunk.content, end="")
```
### 3. Close Resources Properly
```python
# ✅ GOOD: Clean shutdown
llm = create_llm("openai", model="gpt-4o-mini")
try:
response = await llm.agenerate("Test")
finally:
llm.unload_model(llm.model) # Closes async client
```
### 4. Handle Errors in Concurrent Operations
```python
# ✅ GOOD: Catch errors per-task
async def safe_task(prompt):
try:
return await llm.agenerate(prompt)
except Exception as e:
return f"Error: {e}"
results = await asyncio.gather(*[safe_task(p) for p in prompts])
```
## Learning Resources
- **Educational Demo**: examples/async_cli_demo.py - 8 core async/await patterns
- **Test Suite**: `tests/async/test_async_providers.py` - real implementation examples
- **Concurrency & Throughput**: concurrency.md - practical guidance for local inference
## Summary
- ✅ `agenerate()` works across providers
- ✅ Use `asyncio.gather()` for concurrent (I/O-bound) requests
- ✅ HTTP-based providers use native async; MLX/HuggingFace use a thread fallback to keep the event loop responsive
- ✅ Async streaming uses a 2-step pattern: `stream_gen = await llm.agenerate(..., stream=True)` then `async for ...`
- ✅ Works well in FastAPI and other async frameworks
**Get Started**:
```bash
pip install abstractcore
# Try the educational async demo
python examples/async_cli_demo.py --provider ollama --model qwen3:4b
```
---
### Inlined: `docs/tool-calling.md`
# Tool Calling System
AbstractCore provides a universal tool calling system that works across all LLM providers, even those without native tool support.
## Table of Contents
- Quick Start
- The @tool Decorator
- Universal Tool Support
- Tool Definition
- Tool Execution
- Advanced Patterns
- Error Handling
- Performance Optimization
- Tool Syntax Rewriting
- Event System Integration
## Quick Start
The simplest way to create and use tools is with the `@tool` decorator:
```python
from abstractcore import create_llm, tool
@tool
def get_weather(city: str) -> str:
"""Get current weather for a specified city."""
# In a real scenario, you'd call an actual weather API
return f"The weather in {city} is sunny, 72°F"
@tool
def calculate(expression: str) -> float:
"""Perform a mathematical calculation."""
try:
result = eval(expression) # Simplified for demo - don't use eval in production!
return result
except Exception:
return float('nan')
# Works with ANY provider
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate(
"What's the weather in Tokyo and what's 15 * 23?",
tools=[get_weather, calculate] # Pass functions directly
)
print(response.content)
# Output: The weather in Tokyo is sunny, 72°F and 15 * 23 = 345.
# By default (`execute_tools=False`), AbstractCore does not execute tools.
# Instead, it returns structured tool calls (if the model chose to call tools).
print(f"Tool calls requested: {len(response.tool_calls) if response.tool_calls else 0}")
print(f"Generation time: {response.gen_time}ms")
print(f"Summary: {response.get_summary()}") # Includes tool count
# Inspect tool calls (host/runtime executes them)
if response.tool_calls:
for call in response.tool_calls:
print(f"Tool: {call.get('name')} args={call.get('arguments')}")
```
## The @tool Decorator
The `@tool` decorator is the primary way to create tools in AbstractCore. It automatically extracts function metadata and creates proper tool definitions.
### Basic Usage
```python
from abstractcore import tool
@tool
def list_files(directory: str = ".", pattern: str = "*") -> str:
"""List files in a directory matching a pattern."""
import os
import fnmatch
try:
files = [f for f in os.listdir(directory)
if fnmatch.fnmatch(f, pattern)]
return "\n".join(files) if files else "No files found"
except Exception as e:
return f"Error: {str(e)}"
```
### Type Annotations
The decorator automatically infers parameter types from type annotations:
```python
@tool
def create_user(name: str, age: int, is_admin: bool = False) -> str:
"""Create a new user with the specified details."""
user_data = {
"name": name,
"age": age,
"is_admin": is_admin,
"created_at": "2025-01-14"
}
return f"Created user: {user_data}"
```
### Enhanced Metadata
The `@tool` decorator supports rich metadata that gets automatically injected into system prompts for prompted models and passed to native APIs:
```python
@tool(
description="Search the database for records matching the query",
tags=["database", "search", "query"],
when_to_use="When the user asks for specific data from the database or wants to find records",
examples=[
{
"description": "Find all users named John",
"arguments": {
"query": "name=John",
"table": "users"
}
},
{
"description": "Search for products under $50",
"arguments": {
"query": "price<50",
"table": "products"
}
},
{
"description": "Find recent orders",
"arguments": {
"query": "date>2025-01-01",
"table": "orders"
}
}
]
)
def search_database(query: str, table: str = "users") -> str:
"""Search the database for records matching the query."""
# Implementation here
return f"Searching {table} for: {query}"
```
**How This Metadata is Used:**
- **Prompted tool calling**: the tool formatter injects tool name/description/args into the system prompt. To keep prompts small, `when_to_use` is included only for small tool sets and a few high-impact tools (edit/write/execute + web triage tools), and tool examples are globally capped.
- **Native tool calling**: only standard fields (`name`, `description`, `parameters`) are sent to provider APIs (unknown/custom fields are intentionally omitted for compatibility).
### Built-in Tools
AbstractCore includes a comprehensive set of ready-to-use tools in `abstractcore.tools.common_tools` (requires `pip install "abstractcore[tools]"`):
```python
from abstractcore.tools.common_tools import skim_url, fetch_url, search_files, read_file, list_files
# Quick URL preview (fast, small)
preview = skim_url("https://example.com/article")
# Full web content fetching and parsing (HTML→Markdown, JSON/XML/text)
result = fetch_url("https://api.github.com/repos/python/cpython")
# For PDFs/images/other binaries, fetch_url returns metadata (and optional previews), not full extraction.
# File system operations
files = search_files("def.*fetch", ".", file_pattern="*.py")
content = read_file("config.json")
directory_listing = list_files(".", pattern="*.py", recursive=True)
```
**Available Built-in Tools:**
- `skim_url` - Fast URL skim (title/description/headings + short preview)
- `fetch_url` - Fetch + parse common text-first types (HTML→Markdown, JSON/XML/text); binaries return metadata + optional previews
- `search_files` - Search for text patterns inside files using regex
- `list_files` - Find and list files by names/paths using glob patterns
- `read_file` - Read file contents with optional line range selection
- `write_file` - Write content to files with directory creation
- `edit_file` - Edit files using pattern matching and replacement
- `web_search` - Search the web using DuckDuckGo
- `skim_websearch` - Smaller/filtered web search (compact result list)
- `execute_command` - Execute shell commands safely with security controls
**Suggested web workflow (agent-friendly):**
1. `skim_websearch(...)` → get a small set of candidate URLs
2. `skim_url(...)` → quickly decide what’s worth fetching
3. `fetch_url(...)` → parse the selected URL(s); set `include_full_content=False` when you want a smaller output
Tip: you can measure output footprint/latency locally with `python examples/skim_tools_benchmark.py --help`.
### Real-World Example
Here's an example from AbstractCore's codebase showing the enhanced `@tool` decorator:
```python
@tool(
description="Find and list files and directories by their names/paths using glob patterns (case-insensitive, supports multiple patterns)",
tags=["file", "directory", "listing", "filesystem"],
when_to_use="When you need to find files by their names, paths, or file extensions (NOT for searching file contents)",
examples=[
{
"description": "List all files in current directory",
"arguments": {
"directory_path": ".",
"pattern": "*"
}
},
{
"description": "Find all Python files recursively",
"arguments": {
"directory_path": ".",
"pattern": "*.py",
"recursive": True
}
},
{
"description": "Find all files with 'test' in filename (case-insensitive)",
"arguments": {
"directory_path": ".",
"pattern": "*test*",
"recursive": True
}
},
{
"description": "Find multiple file types using | separator",
"arguments": {
"directory_path": ".",
"pattern": "*.py|*.js|*.md",
"recursive": True
}
},
{
"description": "Complex multiple patterns - documentation, tests, and config files",
"arguments": {
"directory_path": ".",
"pattern": "README*|*test*|config.*|*.yml",
"recursive": True
}
}
]
)
def list_files(directory_path: str = ".", pattern: str = "*", recursive: bool = False, include_hidden: bool = False, head_limit: Optional[int] = 50) -> str:
"""
List files and directories in a specified directory with pattern matching (case-insensitive).
IMPORTANT: Use 'directory_path' parameter (not 'file_path') to specify the directory to list.
Args:
directory_path: Path to the directory to list files from (default: "." for current directory)
pattern: Glob pattern(s) to match files. Use "|" to separate multiple patterns (default: "*")
recursive: Whether to search recursively in subdirectories (default: False)
include_hidden: Whether to include hidden files/directories starting with '.' (default: False)
head_limit: Maximum number of files to return (default: 50, None for unlimited)
Returns:
Formatted string with file and directory listings or error message.
When head_limit is applied, shows "showing X of Y files" in the header.
"""
# Implementation here...
```
**How This Gets Transformed**
When you use this tool with a prompted model (like Ollama), AbstractCore automatically generates a system prompt like this:
```
You are a helpful AI assistant with access to the following tools:
**list_files**: Find and list files and directories by their names/paths using glob patterns (case-insensitive, supports multiple patterns)
• When to use: When you need to find files by their names, paths, or file extensions (NOT for searching file contents)
• Tags: file, directory, listing, filesystem
• Parameters: {"directory_path": {"type": "string", "default": "."}, "pattern": {"type": "string", "default": "*"}, ...}
To use a tool, respond with this EXACT format:
<|tool_call|>
{"name": "tool_name", "arguments": {"param1": "value1", "param2": "value2"}}
|tool_call|>
**EXAMPLES:**
**list_files Examples:**
1. List all files in current directory:
<|tool_call|>
{"name": "list_files", "arguments": {"directory_path": ".", "pattern": "*"}}
|tool_call|>
2. Find all Python files recursively:
<|tool_call|>
{"name": "list_files", "arguments": {"directory_path": ".", "pattern": "*.py", "recursive": true}}
|tool_call|>
... and 3 more examples with proper formatting ...
```
## Universal Tool Support
AbstractCore's tool system works across all providers through two mechanisms:
### Control Tokens vs Tool Transcript Tags (Important)
It’s easy to conflate two separate layers:
1) **Chat-template control tokens** (provider responsibility)
- These are the hidden/model-specific role separators that turn `{role:"system"}` vs `{role:"user"}` into the model’s expected prompt template.
- Examples (model-dependent): Llama role headers, Qwen `im_start` blocks, etc.
- When you use a messages API (OpenAI-compatible, Anthropic, Ollama, LMStudio), the server usually applies these automatically.
2) **Tool-call transcript tags** (prompted strategy)
- These are literal strings the model emits in *assistant content* that we parse, such as:
- Qwen-style: `<|tool_call|>…|tool_call|>`
- LLaMA-style: `…`
- XML-ish: `…`
- They may correspond to special tokens in some tokenizers, but in prompted mode we still treat them as transcript text and parse them from the output.
Native tool calling uses structured request/response fields (`tools` / `tool_calls` / Anthropic `tool_use`) and relies on the provider/server to apply the correct chat template; prompted tool calling describes tools in the system prompt and expects transcript tags in assistant text.
### 1. Native Tool Support
For providers with native tool APIs (OpenAI, Anthropic):
```python
# OpenAI with native tool support
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate("What's the weather?", tools=[get_weather])
```
### 2. Intelligent Prompting
For providers without native tool support (Ollama, MLX, LMStudio):
```python
# Ollama without native tool support - AbstractCore handles this automatically
llm = create_llm("ollama", model="qwen3:4b-instruct")
response = llm.generate("What's the weather?", tools=[get_weather])
# AbstractCore automatically:
# 1. Detects the model architecture (Qwen3)
# 2. Formats tools with examples into system prompt
# 3. Parses tool calls from response using <|tool_call|> format
# 4. Returns structured tool call requests in response.tool_calls
```
## Tool Definition
Tools are defined using the `ToolDefinition` class, but the `@tool` decorator handles this automatically:
```python
from abstractcore.tools import ToolDefinition
# Manual tool definition (rarely needed)
tool_def = ToolDefinition(
name="get_weather",
description="Get current weather for a city",
parameters={
"city": {
"type": "string",
"description": "The city name"
}
},
function=get_weather_function
)
```
### Parameter Types
Supported parameter types:
- `string` - Text values
- `integer` - Whole numbers
- `number` - Floating-point numbers
- `boolean` - True/false values
- `array` - Lists of values
- `object` - Complex nested structures
```python
@tool
def complex_tool(
text: str,
count: int,
threshold: float,
enabled: bool,
tags: list,
config: dict
) -> str:
"""Tool with various parameter types."""
return f"Processed: {text} with {count} items"
```
## Tool Execution
### Execution Modes
- **Passthrough mode (default)**: `execute_tools=False`
- AbstractCore returns structured tool calls in `GenerateResponse.tool_calls`.
- By default (`tool_call_tags is None`), tool-call markup is stripped from `GenerateResponse.content`.
- A host/runtime executes tools (recommended for servers and agent loops).
- **Direct execution mode (deprecated)**: `execute_tools=True`
- AbstractCore parses and executes tools locally via the tool registry and appends results to `content`.
- Intended for simple scripts only; avoid in server/untrusted environments.
### Architecture-Aware Tool Call Detection
AbstractCore automatically detects model architecture and uses the appropriate tool call format:
| Architecture | Format | Example |
|-------------|--------|---------|
| **Qwen3** | `<|tool_call|>...JSON...|tool_call|>` | `<|tool_call|>{"name": "get_weather", "arguments": {"city": "Paris"}}|tool_call|>` |
| **LLaMA3** | `...JSON...` | `{"name": "get_weather", "arguments": {"city": "Paris"}}` |
| **OpenAI/Anthropic** | Native API tool calls | Structured JSON in API response |
| **XML-based** | `...JSON...` | `{"name": "get_weather", "arguments": {"city": "Paris"}}` |
**Note:** AbstractCore handles architecture detection, prompt formatting, and response parsing automatically. Your tools work the same way across all providers.
### Execution Responsibility (Recommended)
In passthrough mode, `response.tool_calls` are tool call *requests*. Execute them in your host/runtime (and apply your own safety policy) before sending tool results back to the model in a follow-up turn.
## Advanced Patterns
### Tool Chaining
Tools can call other tools or return data that triggers additional tool calls:
```python
@tool
def get_user_location(user_id: str) -> str:
"""Get the location of a user."""
# Simulated implementation
locations = {"user123": "Paris", "user456": "Tokyo"}
return locations.get(user_id, "Unknown")
@tool
def get_weather(city: str) -> str:
"""Get weather for a city."""
return f"Weather in {city}: 72°F, sunny"
# LLM can chain these tools:
response = llm.generate(
"What's the weather like for user123?",
tools=[get_user_location, get_weather]
)
# In an agent loop, your host/runtime can execute tool calls and feed tool results back into the model for multi-step chaining.
```
### Conditional Tool Execution (Recommended)
In passthrough mode, your host/runtime decides which tool calls to execute:
```python
from abstractcore.tools import ToolCall, ToolRegistry
dangerous_tools = {"delete_file", "system_command", "send_email"}
registry = ToolRegistry()
registry.register(get_user_location)
registry.register(get_weather)
response = llm.generate("What's the weather like for user123?", tools=[get_user_location, get_weather])
for call in response.tool_calls or []:
name = call.get("name")
if name in dangerous_tools:
continue
result = registry.execute_tool(
ToolCall(
name=name,
arguments=call.get("arguments") or {},
call_id=call.get("call_id") or call.get("id"),
)
)
print(result)
```
### Async Tool Support
For tools that need to perform async operations:
```python
import asyncio
@tool
def fetch_data(url: str) -> str:
"""Fetch data from a URL."""
async def async_fetch():
# Simulate async HTTP request
await asyncio.sleep(0.1)
return f"Data from {url}"
# Run async function in sync context
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
result = loop.run_until_complete(async_fetch())
return result
finally:
loop.close()
```
## Error Handling
### Tool-Level Error Handling
Handle errors within tools:
```python
@tool
def safe_division(a: float, b: float) -> str:
"""Safely divide two numbers."""
try:
if b == 0:
return "Error: Division by zero is not allowed"
result = a / b
return f"{a} ÷ {b} = {result}"
except Exception as e:
return f"Error: {str(e)}"
```
### System-Level Error Handling
AbstractCore provides comprehensive error handling:
```python
from abstractcore.exceptions import ToolExecutionError
try:
response = llm.generate("Use the broken tool", tools=[broken_tool])
except ToolExecutionError as e:
print(f"Tool execution failed: {e}")
print(f"Failed tool: {e.tool_name}")
print(f"Error details: {e.error_details}")
```
### Validation and Sanitization
Validate tool inputs:
```python
@tool
def create_file(filename: str, content: str) -> str:
"""Create a file with the given content."""
import os
import re
# Validate filename
if not re.match(r'^[a-zA-Z0-9_.-]+$', filename):
return "Error: Invalid filename. Use only letters, numbers, dots, dashes, and underscores."
# Prevent directory traversal
if '..' in filename or filename.startswith('/'):
return "Error: Invalid filename. No directory traversal allowed."
try:
with open(filename, 'w') as f:
f.write(content)
return f"File '{filename}' created successfully"
except Exception as e:
return f"Error creating file: {str(e)}"
```
## Performance Optimization
### Tool Registry
Use the tool registry for better performance with many tools:
```python
from abstractcore.tools import ToolRegistry, register_tool
# Register tools globally
register_tool(get_weather)
register_tool(calculate)
register_tool(list_files)
# Use registered tools
registry = ToolRegistry.get_instance()
available_tools = registry.get_all_tools()
response = llm.generate(
"Help me with weather and calculations",
tools=available_tools
)
```
### Lazy Loading
Load expensive resources only when needed:
```python
class DatabaseTool:
def __init__(self):
self._connection = None
@property
def connection(self):
if self._connection is None:
# Expensive database connection
self._connection = create_database_connection()
return self._connection
db_tool = DatabaseTool()
@tool
def query_database(sql: str) -> str:
"""Execute a SQL query."""
try:
result = db_tool.connection.execute(sql)
return str(result)
except Exception as e:
return f"Database error: {str(e)}"
```
### Caching Results
Cache expensive tool results:
```python
from functools import lru_cache
@tool
@lru_cache(maxsize=100)
def expensive_calculation(input_data: str) -> str:
"""Perform an expensive calculation with caching."""
import time
time.sleep(1) # Simulate expensive operation
return f"Result for {input_data}"
```
## Tool Syntax Rewriting
AbstractCore can rewrite tool-call syntax for downstream agents/clients:
- **Python API**: pass `tool_call_tags=...` to `generate()` / `agenerate()` / `BasicSession.generate()` to preserve and rewrite tool-call markup in `content`.
- **HTTP server**: set the `agent_format` request field (or rely on auto-detection based on `User-Agent` + model name).
See: Tool Call Syntax Rewriting
## Event System Integration
Observe tool calling and (optional) tool execution through events:
### Cost Monitoring
```python
from abstractcore.events import EventType, on_global
def monitor_tool_costs(event):
"""Monitor costs of tool executions."""
for call in event.data.get("tool_calls", []) or []:
if call.get("name") in {"expensive_api_call", "premium_service"}:
print(f"Warning: Using expensive tool {call.get('name')}")
on_global(EventType.TOOL_STARTED, monitor_tool_costs)
```
### Performance Tracking
```python
def track_tool_performance(event):
"""Track tool execution outcomes (shape varies by emitter)."""
for result in event.data.get("tool_results", []) or []:
if result.get("success") is False:
print(f"Tool failed: {result.get('name')} error={result.get('error')}")
on_global(EventType.TOOL_COMPLETED, track_tool_performance)
```
### Security Auditing
```python
def audit_tool_usage(event):
"""Audit all tool usage for security."""
for call in event.data.get("tool_calls", []) or []:
print(f"Tool requested: {call.get('name')} args={call.get('arguments')}")
# Log to security audit system
security_log(call.get("name"), call.get("arguments"))
on_global(EventType.TOOL_STARTED, audit_tool_usage)
```
## Best Practices
### 1. Clear Documentation
Always provide clear docstrings for your tools:
```python
@tool
def send_email(to: str, subject: str, body: str) -> str:
"""Send an email to the specified recipient.
Args:
to: Email address of the recipient
subject: Subject line of the email
body: Main content of the email
Returns:
Success message or error description
Note:
This tool requires email configuration to be set up.
Use with caution as it sends actual emails.
"""
# Implementation here
```
### 2. Input Validation
Always validate and sanitize inputs:
```python
@tool
def process_user_input(user_data: str) -> str:
"""Process user input safely."""
# Validate input length
if len(user_data) > 1000:
return "Error: Input too long (max 1000 characters)"
# Sanitize input
import html
safe_data = html.escape(user_data)
# Process safely
return f"Processed: {safe_data}"
```
### 3. Error Recovery
Provide meaningful error messages and recovery suggestions:
```python
@tool
def connect_to_service(endpoint: str) -> str:
"""Connect to an external service."""
try:
# Attempt connection
result = make_connection(endpoint)
return f"Connected successfully: {result}"
except ConnectionError:
return "Error: Could not connect to service. Please check the endpoint URL and try again."
except TimeoutError:
return "Error: Connection timed out. The service may be temporarily unavailable."
except Exception as e:
return f"Error: Unexpected error occurred: {str(e)}"
```
### 4. Resource Management
Clean up resources properly:
```python
@tool
def process_large_file(filename: str) -> str:
"""Process a large file efficiently."""
try:
with open(filename, 'r') as file:
# Process file in chunks
result = ""
while True:
chunk = file.read(1024)
if not chunk:
break
result += process_chunk(chunk)
return f"Processed file: {filename}"
except FileNotFoundError:
return f"Error: File '{filename}' not found"
except MemoryError:
return "Error: File too large to process"
```
## Troubleshooting
### Common Issues
1. **Tool not being called**: Check tool description and parameter names
2. **Invalid JSON in tool calls**: Ensure proper error handling in tools
3. **Tools timing out**: Implement proper timeout handling
4. **Memory issues with large tools**: Use streaming or chunking
### Debug Mode
Enable debug mode to see tool execution details:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
# Tool execution details will be logged
response = llm.generate("Use tools", tools=[debug_tool])
```
### Testing Tools
Test tools independently:
```python
# Test tool directly
result = get_weather("Paris")
print(f"Direct call result: {result}")
# Test with LLM
response = llm.generate("What's the weather in Paris?", tools=[get_weather])
print(f"LLM result: {response.content}")
```
## Examples
See the examples directory for comprehensive tool usage examples:
- Basic Tool Usage
- Advanced Tool Patterns
- Tool Chaining Examples
## Related Documentation
- API Reference - Complete API documentation
- Event System - Event-driven tool control
- Architecture - System design and tool execution flow
- Server Guide - HTTP server and REST API
- Getting Started - Quick start guide
---
### Inlined: `docs/tool-syntax-rewriting.md`
# Tool Call Syntax Rewriting
AbstractCore can **convert tool-call syntax** to help different runtimes/clients consume tool calls consistently.
There are two related but distinct features:
1. **Python API (`tool_call_tags`)**: preserve and rewrite *tool-call markup inside assistant content* (mostly for prompted-tool models).
2. **HTTP Server (`agent_format`)**: convert/synthesize tool-call syntax for HTTP clients (Codex, other agentic CLIs), while keeping `tool_calls` structured.
## 1) Python API: `tool_call_tags` (per-call)
`tool_call_tags` is passed to `generate()` / `agenerate()` / `BasicSession.generate()` as a **per-call kwarg**.
### Default behavior (recommended)
- When `tool_call_tags is None` (default):
- `response.tool_calls` is populated when tool calls are detected (native tools or prompted tags).
- Tool-call markup is stripped from `response.content` for clean UX/history.
### When to set `tool_call_tags`
Set `tool_call_tags` when you want **tool-call markup kept in `content`** so a downstream consumer can parse it from text.
This is most useful for **prompted-tool** providers (tool calls are emitted in assistant content), e.g.:
- `ollama`
- `lmstudio`
- `mlx`
- `huggingface`
- `openai-compatible` (and compatible endpoints like vLLM / LM Studio)
For **native tool** providers (OpenAI/Anthropic), tool calls are primarily consumed from `response.tool_calls` (structured), not from tags embedded in `content`.
### Supported values
- Predefined formats:
- `qwen3` → `<|tool_call|>...JSON...|tool_call|>`
- `llama3` → `...JSON...`
- `xml` → `...JSON...`
- `gemma` → ```tool_code\n...JSON...\n```
- Custom tags:
- Comma-separated start/end: `"START,END"` or `"[TOOL],[/TOOL]"`
- Single tag name: `"MYTAG"` → `...JSON...`
### Example (non-streaming)
```python
from abstractcore import create_llm
tool = {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]},
}
llm = create_llm("ollama", model="qwen3:4b-instruct")
response = llm.generate(
"Weather in Paris?",
tools=[tool],
tool_call_tags="llama3",
)
print(response.content) # contains ...
print(response.tool_calls) # always structured dicts for host/runtime execution
```
### Example (streaming)
```python
tool_calls = []
for chunk in llm.generate(
"Weather in Paris?",
tools=[tool],
stream=True,
tool_call_tags="llama3",
):
print(chunk.content, end="", flush=True)
if chunk.tool_calls:
tool_calls.extend(chunk.tool_calls)
```
## 2) HTTP Server: `agent_format`
When using the AbstractCore server (`/v1/chat/completions`), you can request a target tool-call syntax via `agent_format`.
- `agent_format` affects how tool calls are represented in the response for a given client.
- The server always runs in passthrough mode (`execute_tools=False`): it returns tool calls; it does not execute them.
### Supported values
- `auto` (default): auto-detect based on `User-Agent` + model name patterns
- `openai`
- `codex`
- `qwen3`
- `llama3`
- `xml`
- `gemma`
- `passthrough`
### Example
```bash
curl -sS http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3:4b-instruct",
"messages": [{"role": "user", "content": "Weather in Paris?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}
}
}],
"agent_format": "codex"
}'
```
## Notes
- `tool_call_tags` is **formatting**, not execution: it only changes how tool calls are represented in `content`.
- The canonical machine-readable representation remains `GenerateResponse.tool_calls` (Python) or `message.tool_calls` (server/OpenAI format).
---
### Inlined: `docs/structured-output.md`
# Structured Output
AbstractCore implements structured output generation using Pydantic models with automatic schema validation and provider-specific optimizations. The system employs a dual-strategy architecture that adapts to provider capabilities, delivering reliable schema compliance across all supported LLM providers.
---
## Table of Contents
1. Overview
2. Architecture
3. Provider Implementation
4. Usage Guide
5. Schema Design
6. Performance Characteristics
7. Error Handling
8. Production Deployment
9. API Reference
---
## Overview
### What is Structured Output?
Structured output constrains LLM responses to conform to predefined schemas, enabling direct deserialization into typed objects. AbstractCore uses Pydantic BaseModel classes to define schemas and validate responses.
### Basic Example
```python
from abstractcore import create_llm
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
email: str
llm = create_llm("openai", model="gpt-4o-mini")
person = llm.generate(
"Extract: John Doe, 35 years old, john@example.com",
response_model=Person
)
# person is a validated Person instance
assert isinstance(person, Person)
assert person.name == "John Doe"
assert person.age == 35
```
### Key Benefits
- **Type Safety**: Responses are validated Pydantic instances with full IDE support
- **Schema Compliance**: Automatic validation ensures data conforms to defined structure
- **Provider Agnostic**: Identical API across OpenAI, Anthropic, Ollama, LMStudio, HuggingFace, MLX
- **Automatic Strategy Selection**: Framework selects optimal implementation based on provider capabilities
- **Test Coverage**: Supported strategies are exercised by the repository test suite (see `tests/structured/`)
---
## Architecture
### Dual-Strategy Design
AbstractCore implements two distinct strategies for structured output generation:
#### Strategy 1: Native Structured Output (Server-Side Enforcement)
**Mechanism**: Provider API accepts JSON schema and enforces compliance before returning response.
**Providers**:
- OpenAI (via `response_format` parameter)
- Anthropic (via tool-calling mechanism)
- Ollama (via `format` parameter)
- LMStudio (via `response_format` parameter)
- HuggingFace GGUF models (via `response_format` parameter with llama-cpp-python)
**Characteristics**:
- Server-side schema validation
- Zero client-side validation retries required
- Deterministic schema compliance
- Optimal performance for production workloads
**Validation**:
- Structured output behavior is covered by automated tests in this repo (see `tests/structured/`).
- Exact success rates and latency depend on provider/model/schema complexity.
#### Strategy 2: Prompted with Validation (Client-Side Enforcement)
**Mechanism**: Schema embedded in system prompt; response extracted, validated, and retried if necessary.
**Providers**:
- HuggingFace (Transformers models)
- MLX
- Any provider without native support
**Characteristics**:
- Schema injected into enhanced prompt
- Client-side Pydantic validation
- Automatic retry with error feedback (up to 3 attempts)
- Fallback for providers without native support
### Automatic Strategy Selection
The `StructuredOutputHandler` selects the appropriate strategy automatically:
```python
def _has_native_support(self, provider) -> bool:
"""Detect native structured output capability"""
provider_name = provider.__class__.__name__
# Ollama and LMStudio always have native support
if provider_name in ['OllamaProvider', 'LMStudioProvider']:
return True
# HuggingFace GGUF models (via llama-cpp-python)
if provider_name == 'HuggingFaceProvider':
if hasattr(provider, 'model_type') and provider.model_type == 'gguf':
return True
# Check model capabilities for other providers
capabilities = getattr(provider, 'model_capabilities', {})
return capabilities.get("structured_output") == "native"
```
No configuration required—the framework handles strategy selection transparently.
---
## Provider Implementation
### OpenAI
**Implementation**: Native support via `response_format` parameter
```python
# AbstractCore implementation (simplified)
payload["response_format"] = {
"type": "json_schema",
"json_schema": {
"name": response_model.__name__,
"schema": response_model.model_json_schema()
}
}
```
**Models with Native Support**:
- gpt-4o, gpt-4o-mini
- gpt-4-turbo
- gpt-3.5-turbo
**Reference**: OpenAI Structured Outputs Documentation
---
### Anthropic
**Implementation**: Native support via tool-calling mechanism
The provider forces execution of a tool whose input schema matches the desired output structure.
**Models with Native Support**:
- claude-haiku-4-5
- claude-sonnet-4-5
- claude-opus-4-5
**Reference**: Anthropic API Documentation
---
### Ollama
**Implementation**: Native support via `format` parameter
```python
# AbstractCore implementation (abstractcore/providers/ollama_provider.py:147-152)
if response_model and PYDANTIC_AVAILABLE:
json_schema = response_model.model_json_schema()
payload["format"] = json_schema # Full schema, server-side validation
```
**Mechanism**:
1. Full JSON schema passed to Ollama API
2. Server-side constrained sampling enforces schema compliance
3. Response is expected to follow the schema (provider/model dependent)
**Notes**:
- Native structured output depends on the Ollama server/build and the selected model.
- For example coverage, see `tests/structured/`.
**Supported Models**: Many models, including:
- Llama 3.1, 3.2, 3.3 family
- Qwen 2.5, 3, 3-coder family
- Gemma 2b, 7b, gemma2, gemma3
- Mistral, Phi-3, Phi-4, GLM-4, DeepSeek-R1
**Reference**: Ollama API Documentation
---
### LMStudio
**Implementation**: Native support via OpenAI-compatible `response_format` parameter
```python
# AbstractCore implementation (abstractcore/providers/lmstudio_provider.py:211-222)
if response_model and PYDANTIC_AVAILABLE:
json_schema = response_model.model_json_schema()
payload["response_format"] = {
"type": "json_schema",
"json_schema": {
"name": response_model.__name__,
"schema": json_schema
}
}
```
**Mechanism**:
1. OpenAI-compatible format passed to LMStudio server
2. Server-side schema enforcement via underlying inference engine
3. Response is expected to follow the schema (server/model dependent)
**Notes**:
- Behavior depends on the LMStudio server version and underlying model/runtime.
- For example coverage, see `tests/structured/`.
**Reference**: LMStudio Documentation
---
### HuggingFace
**Implementation**: Dual strategy based on model type
#### GGUF Models (Native Support)
**Backend**: llama-cpp-python with native structured output
```python
# AbstractCore implementation (abstractcore/providers/huggingface_provider.py:669-680)
if response_model and PYDANTIC_AVAILABLE:
json_schema = response_model.model_json_schema()
generation_kwargs["response_format"] = {
"type": "json_schema",
"json_schema": {
"name": response_model.__name__,
"schema": json_schema
}
}
```
**Notes**:
- GGUF structured output support depends on the llama-cpp-python backend and model.
- For example coverage, see `tests/structured/`.
#### Transformers Models (Native via Outlines)
**Backend**: Hugging Face Transformers library with Outlines
**Implementation**: Native support via Outlines constrained generation
```python
# AbstractCore implementation (abstractcore/providers/huggingface_provider.py:514-548)
if response_model and PYDANTIC_AVAILABLE and OUTLINES_AVAILABLE:
# Cache Outlines model wrapper
if not hasattr(self, '_outlines_model'):
self._outlines_model = outlines.from_transformers(
self.model_instance,
self.tokenizer
)
# Generate with constrained decoding
generator = self._outlines_model(
input_text,
outlines.json_schema(response_model),
max_tokens=max_tokens
)
# Return validated instance
validated_obj = response_model.model_validate(generator)
```
**Mechanism**:
1. Outlines wraps transformers model and tokenizer
2. JSON schema passed to constrained generator
3. Server-side logit filtering ensures only valid tokens are sampled
4. Schema compliance is enforced via constrained decoding (provider/model dependent)
5. Automatic fallback to prompted approach if Outlines unavailable
**Installation**:
```bash
pip install "abstractcore[huggingface]" # Includes Outlines automatically
```
**Characteristics**:
- Schema compliance via constrained decoding (still validated client-side)
- Zero or minimal validation retries when supported
- Works with many transformers-compatible models
- Automatic detection and activation when Outlines is installed
- Graceful fallback to prompted approach if Outlines is missing
**Fallback behavior**:
- If Outlines isn't available (or a backend doesn't support constrained decoding), AbstractCore falls back to prompted structured output with validation and retries.
- Exact success rates and latency depend on provider/model/hardware/schema complexity.
---
### MLX (Apple Silicon)
**Implementation**: Native via Outlines
**Backend**: MLX with Outlines constrained generation
```python
# AbstractCore implementation (abstractcore/providers/mlx_provider.py:165-197)
if response_model and PYDANTIC_AVAILABLE and OUTLINES_AVAILABLE:
# Cache Outlines MLX model wrapper
if not hasattr(self, '_outlines_model'):
self._outlines_model = outlines_models.mlxlm(self.model)
# Generate with constrained decoding
generator = self._outlines_model(
full_prompt,
outlines.json_schema(response_model),
max_tokens=max_tokens
)
# Return validated instance
validated_obj = response_model.model_validate(generator)
```
**Mechanism**:
1. Outlines MLX backend wraps mlx-lm model
2. JSON schema converted to token constraints
3. Constrained sampling on Apple Silicon hardware
4. Server-side schema enforcement
5. Automatic fallback to prompted approach if Outlines unavailable
**Installation**:
```bash
pip install "abstractcore[mlx]" # Includes Outlines automatically
```
**Models**:
- mlx-community/Qwen2.5-Coder-7B-Instruct-4bit
- mlx-community/Meta-Llama-3.1-8B-Instruct-4bit
- All MLX-compatible models
**Characteristics**:
- Schema compliance via constrained decoding (still validated client-side)
- Zero or minimal validation retries when supported
- Optimized for Apple M-series processors
- Automatic detection and activation when Outlines installed
- Graceful fallback to prompted approach if Outlines missing
**Performance notes**:
- Prompted structured output (validation + retry) is the default fallback and is often the simplest to run.
- Constrained decoding can be slower or faster depending on backend/model/schema; benchmark on your hardware if it matters.
---
## Usage Guide
### Basic Usage
```python
from abstractcore import create_llm
from pydantic import BaseModel
class ExtractedData(BaseModel):
name: str
age: int
email: str
llm = create_llm("ollama", model="qwen3:4b")
result = llm.generate(
"Extract: Alice Johnson, 28, alice@example.com",
response_model=ExtractedData,
temperature=0 # Recommended for deterministic output
)
print(f"{result.name} ({result.age}): {result.email}")
```
### Using Enums
Enums provide type-safe categorical values:
```python
from enum import Enum
from pydantic import BaseModel
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class Task(BaseModel):
title: str
priority: Priority
estimated_hours: float
llm = create_llm("lmstudio", model="qwen/qwen3-4b-2507")
task = llm.generate(
"Create task: Fix authentication bug, critical priority, 8 hours estimated",
response_model=Task
)
assert isinstance(task.priority, Priority)
print(f"Priority: {task.priority.value}") # "critical"
```
**Notes**: Enums are supported and exercised by tests; exact behavior depends on provider/model.
### Nested Objects
```python
from typing import List
from pydantic import BaseModel
class Address(BaseModel):
street: str
city: str
postal_code: str
class Person(BaseModel):
name: str
email: str
address: Address
llm = create_llm("openai", model="gpt-4o-mini")
person = llm.generate(
"""Extract: John Smith, john@example.com
Address: 123 Main St, Boston, MA 02101""",
response_model=Person
)
assert isinstance(person.address, Address)
```
### Complex Hierarchies
Complex schemas with multiple nesting levels are supported:
```python
from enum import Enum
from typing import List, Optional
from pydantic import BaseModel
class Department(str, Enum):
ENGINEERING = "engineering"
SALES = "sales"
MARKETING = "marketing"
class EmployeeLevel(str, Enum):
JUNIOR = "junior"
MID = "mid"
SENIOR = "senior"
class Skill(BaseModel):
name: str
proficiency: int # 1-10 scale
years_experience: float
class Employee(BaseModel):
name: str
email: str
department: Department
level: EmployeeLevel
skills: List[Skill]
manager_email: Optional[str] = None
class Team(BaseModel):
name: str
department: Department
lead: Employee
members: List[Employee]
class Organization(BaseModel):
company_name: str
founded_year: int
teams: List[Team]
total_employees: int
llm = create_llm("anthropic", model="claude-haiku-4-5")
org = llm.generate(
"""Create organization: TechCorp, founded 2020
Team: Platform (engineering)
Lead: Sarah Chen (sarah@tech.com, senior, Python-9/10-5yrs, AWS-8/10-4yrs)
Member: Bob Lee (bob@tech.com, mid, JavaScript-7/10-3yrs, manager: sarah@tech.com)
Total employees: 2""",
response_model=Organization
)
```
**Notes**: Deeply nested schemas are supported; validate against your target provider/model and see `tests/structured/` for examples.
### Direct Handler Usage
For advanced use cases requiring custom retry configuration:
```python
from abstractcore.structured import StructuredOutputHandler, FeedbackRetry
# Configure custom retry strategy
handler = StructuredOutputHandler(
retry_strategy=FeedbackRetry(max_attempts=5)
)
result = handler.generate_structured(
provider=llm,
prompt="Extract complex data from document...",
response_model=ComplexSchema,
temperature=0
)
```
---
## Schema Design
### Design Principles
Well-designed schemas improve validation success rates and reduce response times.
#### 1. Clear Field Naming
Use descriptive, unambiguous field names:
```python
# Recommended
class Employee(BaseModel):
employee_id: str
hire_date: str
department: str
annual_salary: float
# Avoid
class Employee(BaseModel):
id: str # Ambiguous
date: str # What date?
dept: str # Abbreviation unclear
salary: float # Currency? Period?
```
#### 2. Leverage Enums for Categorical Data
Enums provide validation and type safety:
```python
# Recommended
class Status(str, Enum):
ACTIVE = "active"
INACTIVE = "inactive"
PENDING = "pending"
class User(BaseModel):
status: Status # Only valid enum values accepted
# Avoid
class User(BaseModel):
status: str # Any string accepted, no validation
```
#### 3. Use Optional Fields Appropriately
Distinguish required from optional fields:
```python
from typing import Optional, List
class Task(BaseModel):
# Required fields
title: str
created_at: str
# Optional with defaults
description: str = ""
tags: List[str] = []
# Truly optional (may be None)
assigned_to: Optional[str] = None
due_date: Optional[str] = None
```
#### 4. Logical Hierarchy
Group related fields into nested objects:
```python
# Recommended
class ContactInfo(BaseModel):
email: str
phone: str
address: str
class Person(BaseModel):
name: str
contact: ContactInfo # Logical grouping
# Avoid flat structure
class Person(BaseModel):
name: str
email: str
phone: str
address: str
```
### Complexity Guidelines
Schema complexity affects latency and cost; keep schemas as small as practical.
#### Simple Schemas (< 10 fields, 1 level)
**Example**:
```python
class PersonInfo(BaseModel):
name: str
age: int
email: str
occupation: str
```
**Recommended for**: User profiles, data extraction, form processing
#### Medium Schemas (10-30 fields, 1-2 levels)
**Example**:
```python
class Project(BaseModel):
name: str
description: str
start_date: str
tasks: List[Task] # Nested objects
total_hours: float
```
**Recommended for**: Project management, task tracking, structured data extraction
#### Complex Schemas (30+ fields, 3+ levels)
**Example**:
```python
class Organization(BaseModel):
company_name: str
teams: List[Team] # Level 2
# Team contains:
# lead: Employee # Level 3
# members: List[Employee] # Level 3
# # Employee contains:
# # skills: List[Skill] # Level 4
```
**Recommended for**: Organizational hierarchies, knowledge graphs, complex data models
### Anti-Patterns
Avoid these patterns that can degrade performance or reliability:
#### 1. Excessive Nesting Depth (>4 levels)
```python
# Avoid
class Level1(BaseModel):
level2: Level2
# Level2 -> Level3 -> Level4 -> Level5 (too deep)
```
**Impact**: Increased token usage, longer response times
#### 2. Ambiguous Enum Values
```python
# Avoid
class Status(str, Enum):
ONE = "1"
TWO = "2"
THREE = "3"
# Recommended
class Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
```
#### 3. Overly Long Field Names
```python
# Avoid
class Data(BaseModel):
very_long_and_descriptive_field_name_that_uses_many_tokens: str
# Recommended
class Data(BaseModel):
user_email: str # Clear but concise
```
**Impact**: Increases token count, affecting cost and context window
---
## Performance Characteristics
Structured output performance is highly dependent on:
- Provider/backend strategy (native constrained decoding vs prompted validation/retry)
- Schema complexity (field count + nesting depth)
- Model choice, server configuration, and hardware
- Sampling settings (use `temperature=0` when you care about schema fidelity)
If performance matters, benchmark on your target provider/model/hardware.
Historical benchmark notes (non-authoritative) may exist under `docs/reports/`.
### Temperature Settings
**Recommendation**: Use `temperature=0` for structured outputs
**Rationale**:
- Deterministic responses
- Consistent schema compliance
- Reduced sampling overhead
**When to increase temperature**:
- Creative content generation within schema constraints
- Diverse response generation for the same prompt
- Exploratory data generation
---
## Error Handling
### Error Categories
#### 1. Infrastructure Errors (Retriable)
Network failures, timeouts, server unavailability—retry with exponential backoff:
```python
import time
from requests.exceptions import ConnectionError, Timeout
def generate_with_retry(llm, prompt, response_model, max_retries=3):
"""Retry infrastructure errors with exponential backoff"""
for attempt in range(max_retries):
try:
return llm.generate(
prompt,
response_model=response_model,
temperature=0
)
except (ConnectionError, Timeout) as e:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # 1s, 2s, 4s
time.sleep(wait_time)
continue
raise
result = generate_with_retry(llm, "Extract data...", DataModel)
```
**Retriable errors**:
- `ConnectionError`: Network connectivity issues
- `TimeoutError`: Request timeout
- HTTP 5xx: Server errors
- Token limit exceeded (retry with simplified schema or chunking)
#### 2. Validation Errors (Non-Retriable)
Schema validation failures indicate schema or prompt issues—do not retry:
```python
from pydantic import ValidationError
try:
result = llm.generate(
"Extract user data...",
response_model=UserModel
)
except ValidationError as e:
# Log validation errors
print("Schema validation failed:")
for error in e.errors():
field = " -> ".join(str(loc) for loc in error['loc'])
print(f" {field}: {error['msg']}")
# Fix schema or prompt—do not retry
raise
```
**Common validation errors**:
- Missing required fields: Schema too strict or prompt unclear
- Type mismatches: Field type incompatible with LLM output
- Enum validation failures: LLM returned invalid enum value
**Resolution**: Revise schema or improve prompt clarity
#### 3. Token Limit Errors
Context window exceeded—simplify schema or split request:
```python
try:
result = llm.generate(prompt, response_model=ComplexModel)
except Exception as e:
if "token" in str(e).lower() or "context" in str(e).lower():
print("Token limit exceeded. Options:")
print("1. Simplify schema (reduce fields or nesting)")
print("2. Split into multiple requests")
print("3. Use model with larger context window")
raise
```
### Retry Strategy Details
The default `FeedbackRetry` strategy:
1. **Maximum attempts**: 3 (configurable)
2. **Retry condition**: Only `ValidationError` exceptions
3. **Feedback mechanism**: Provides detailed error descriptions to LLM
**Example error feedback**:
```
Your previous response had validation errors:
• Missing required field: 'department'
• Field 'employee_level': Expected one of: junior, mid, senior
• Field 'age': Expected integer, received string
```
The LLM uses this feedback to self-correct on subsequent attempts.
**Configuration**:
```python
from abstractcore.structured import StructuredOutputHandler, FeedbackRetry
handler = StructuredOutputHandler(
retry_strategy=FeedbackRetry(max_attempts=5)
)
```
---
## Production Deployment
### Pre-Deployment Checklist
Before deploying structured outputs to production:
- [ ] Schema validated locally with Pydantic: `Model.model_validate(test_data)`
- [ ] Success rate measured with target model (target: >95%)
- [ ] Response time benchmarked under expected load
- [ ] Error handling implemented for infrastructure failures
- [ ] Logging configured for validation errors and retries
- [ ] Monitoring configured for success rates and latencies
- [ ] Fallback strategy defined for structured output failures
- [ ] Token limits verified: `len(prompt) + len(schema) + len(response) < context_window`
### Monitoring Metrics
Track these metrics in production:
**Success Metrics**:
- Validation success rate (target: >95%)
- First-attempt success rate
- Average retry count
**Performance Metrics**:
- p50, p95, p99 response times
- Response time by schema complexity
- Token usage statistics
**Error Metrics**:
- Validation error rate by field
- Infrastructure error rate
- Token limit exceeded rate
### Configuration Best Practices
**Temperature**: Set to 0 for deterministic structured outputs
```python
llm.generate(prompt, response_model=Model, temperature=0)
```
**Timeout**: Configure appropriate timeouts based on schema complexity
```python
# Simple schemas: 30s
# Medium schemas: 60s
# Complex schemas: 120s
```
**Provider Selection**:
- Development: Use local providers (Ollama, LMStudio) for cost efficiency
- Production: Select based on performance requirements and budget
### Schema Versioning
Maintain schema version compatibility:
```python
from pydantic import BaseModel, Field
class UserV1(BaseModel):
name: str
email: str
class UserV2(BaseModel):
name: str
email: str
department: str = Field(default="unassigned") # Backward compatible
```
Use optional fields with defaults for backward-compatible schema evolution.
---
## API Reference
### Core Function
```python
llm.generate(
prompt: str,
response_model: Type[BaseModel],
temperature: float = 0.0,
**kwargs
) -> BaseModel
```
**Parameters**:
- `prompt` (str): Input prompt describing extraction/generation task
- `response_model` (Type[BaseModel]): Pydantic model class defining output schema
- `temperature` (float): Sampling temperature (0.0 = deterministic, 1.0 = creative)
- `**kwargs`: Additional provider-specific parameters
**Returns**:
- Instance of `response_model`, validated and type-safe
**Raises**:
- `ValidationError`: Schema validation failed after all retry attempts
- `ConnectionError`: Network/infrastructure error
- `TimeoutError`: Request timeout
**Example**:
```python
person = llm.generate(
"Extract: John Doe, age 35",
response_model=Person,
temperature=0
)
```
### StructuredOutputHandler
Advanced handler for custom retry strategies:
```python
from abstractcore.structured import StructuredOutputHandler
handler = StructuredOutputHandler(retry_strategy=None)
```
**Methods**:
```python
handler.generate_structured(
provider: LLMProvider,
prompt: str,
response_model: Type[BaseModel],
**kwargs
) -> BaseModel
```
Generates structured output with automatic strategy selection (native or prompted).
### Retry Strategies
```python
from abstractcore.structured import FeedbackRetry
retry = FeedbackRetry(max_attempts=3)
```
**Parameters**:
- `max_attempts` (int): Maximum retry attempts including initial attempt
**Methods**:
- `should_retry(attempt, error)`: Returns True if retry should occur
- `prepare_retry_prompt(prompt, error, attempt)`: Creates retry prompt with validation feedback
---
## Related Documentation
- Getting Started - Quick introduction
- API Reference - Complete API documentation
- Examples - Real-world usage patterns
- Response Model Parameter Analysis - Why `response_model`
- Native Implementation Test Results - Detailed test data
---
## References
- OpenAI Structured Outputs
- Anthropic API Documentation
- Ollama API Documentation
- Pydantic Documentation
---
---
## Testing and Validation
Structured output behavior is exercised by automated tests under `tests/structured/`.
### Running tests
From this repository:
```bash
pip install -e ".[test]"
pytest tests/structured -q
```
Some provider-specific tests require additional extras:
- HuggingFace / Outlines: `pip install -e ".[huggingface]"`
- MLX: `pip install -e ".[mlx]"` (macOS + Apple Silicon only)
If you're installing from PyPI and just want the test dependencies:
```bash
pip install "abstractcore[test]"
pytest -q
```
### Notes
- Performance and success rates vary widely by provider/model/schema complexity and are not guaranteed.
- If performance matters, benchmark on your target hardware/provider setup.
---
### Inlined: `docs/media-handling-system.md`
# Media Handling System
AbstractCore provides a **production-ready unified media handling system** that enables seamless file attachment and processing across all LLM providers and models. The system automatically processes images, documents, and other media files using the same simple API, with intelligent provider-specific formatting and graceful fallback handling.
## Key Benefits
- **Universal API**: Same `media=[]` parameter works across all providers (OpenAI, Anthropic, Ollama, LMStudio, etc.)
- **Intelligent Processing**: Automatic file type detection with specialized processors for each format
- **Provider Adaptation**: Automatic formatting for each provider's API requirements (JSON for OpenAI, XML for Anthropic, etc.)
- **Robust Fallback**: Graceful degradation when advanced processing fails, always provides meaningful results
- **CLI Integration**: Simple `@filename` syntax in CLI for instant file attachment
- **Production Quality**: Comprehensive error handling, logging, and performance optimization
- **Cross-Format Support**: Images, PDFs, Office documents, CSV/TSV, text files all work seamlessly
## Quick Start
```python
from abstractcore import create_llm
# Works with any provider - just change the provider name
llm = create_llm("openai", model="gpt-4o", api_key="your-key")
response = llm.generate(
"What's in this image and document?",
media=["photo.jpg", "report.pdf"]
)
print(response.content)
# Same code works with Anthropic
llm = create_llm("anthropic", model="claude-3.5-sonnet", api_key="your-key")
response = llm.generate(
"Analyze these materials",
media=["chart.png", "data.csv", "presentation.ppt"]
)
# Or with local models
llm = create_llm("ollama", model="qwen2.5vl:7b")
response = llm.generate(
"Describe this image",
media=["screenshot.png"]
)
```
## How It Works Behind the Scenes
AbstractCore's media system uses a sophisticated multi-layer architecture that seamlessly processes any file type and formats it correctly for each LLM provider:
### 1. File Attachment Processing
**CLI Integration (`@filename` syntax):**
```python
# User types: "Analyze this @report.pdf and @chart.png"
# MessagePreprocessor extracts files and cleans text:
clean_text = "Analyze this and" # File references removed
media_files = ["report.pdf", "chart.png"] # Extracted file paths
```
**Python API:**
```python
# Direct media parameter usage
llm.generate("Analyze these files", media=["report.pdf", "chart.png"])
```
### 2. Intelligent File Processing Pipeline
**AutoMediaHandler Coordination:**
```python
# 1. Detect file types automatically
MediaType.IMAGE -> ImageProcessor (PIL-based)
MediaType.DOCUMENT -> PDFProcessor (PyMuPDF4LLM) or OfficeProcessor (Unstructured)
MediaType.TEXT -> TextProcessor (pandas for CSV/TSV)
# 2. Process each file with specialized processor
pdf_content = PDFProcessor.process("report.pdf") # → Markdown text
image_content = ImageProcessor.process("chart.png") # → Base64 + metadata
```
**Graceful Fallback System:**
```python
try:
# Advanced processing (PyMuPDF4LLM, Unstructured)
content = advanced_processor.process(file)
except Exception:
# Always falls back to basic processing
content = basic_text_extraction(file) # Never fails
```
### 3. Provider-Specific Formatting
**The same processed content gets formatted differently for each provider:**
**OpenAI Format (JSON):**
```json
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze these files"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0..."}},
{"type": "text", "text": "PDF Content: # Report Title\n\nExecutive Summary..."}
]
}
```
**Anthropic Format (Messages API):**
```json
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze these files"},
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "iVBORw0..."}},
{"type": "text", "text": "PDF Content: # Report Title\n\nExecutive Summary..."}
]
}
```
**Local Models (Text Embedding):**
```python
# For local models without native multimodal support
combined_prompt = """
Analyze these files:
Image Analysis: [A business chart showing quarterly revenue trends...]
PDF Content: # Report Title
Executive Summary...
"""
```
### 4. Cross-Provider Workflow
```mermaid
graph TD
A[User Input with @files] --> B[MessagePreprocessor]
B --> C[Extract Files + Clean Text]
C --> D[AutoMediaHandler]
D --> E{File Type?}
E -->|Image| F[ImageProcessor]
E -->|PDF| G[PDFProcessor]
E -->|Office| H[OfficeProcessor]
E -->|Text| I[TextProcessor]
F --> J[MediaContent Objects]
G --> J
H --> J
I --> J
J --> K{Provider Type?}
K -->|OpenAI| L[OpenAIMediaHandler]
K -->|Anthropic| M[AnthropicMediaHandler]
K -->|Local| N[LocalMediaHandler]
L --> O[Provider-Specific API Format]
M --> O
N --> O
O --> P[LLM API Call]
P --> Q[Response to User]
```
### 5. Error Handling & Resilience
**Multi-Level Fallback Strategy:**
1. **Advanced Processing**: Try specialized libraries (PyMuPDF4LLM, Unstructured)
2. **Basic Processing**: Fall back to simple text extraction
3. **Metadata Only**: If all else fails, provide file metadata
4. **Graceful Degradation**: Best-effort results with clear errors (no silent semantic changes)
**Example of Robust Error Handling:**
```python
try:
# Try advanced PDF processing with PyMuPDF4LLM
content = pdf_processor.extract_with_formatting(file)
except PDFProcessingError:
try:
# Fall back to basic text extraction
content = pdf_processor.extract_basic_text(file)
except Exception:
# Ultimate fallback - provide metadata
content = f"PDF file: {file.name} ({file.size} bytes)"
# Result: Callers get a best-effort output or a clear error message (no silent truncation).
```
## Supported File Types
### Images (Vision Models)
- **Formats**: PNG, JPEG, GIF, WEBP, BMP, TIFF
- **Automatic**: Optimization, resizing, format conversion
- **Features**: EXIF handling, quality optimization for vision models
### Documents
- **Text Files**: TXT, MD, CSV, TSV, JSON with intelligent parsing and data analysis
- **PDF**: Text extraction with PyMuPDF4LLM (when installed), with best-effort structure preservation
- **Office**: DOCX, XLSX, PPTX via Unstructured (when installed), with best-effort extraction
- **Word**: section/paragraph extraction
- **Excel**: sheet-by-sheet extraction
- **PowerPoint**: slide-by-slide extraction
### Audio (policy-driven; optional STT fallback)
- **Formats**: common `audio/*` types (WAV, MP3, M4A, …) as attachments via `media=[...]`
- **Default behavior**: `audio_policy="native_only"` (fails loudly unless the model supports native audio input)
- **Speech-to-text**: `audio_policy="speech_to_text"` runs STT via the capability plugin layer (`llm.audio.transcribe(...)`; typically install `abstractvoice`) and injects a transcript into the main request
- **Auto**: `audio_policy="auto"` uses native audio when supported, otherwise STT when configured, otherwise errors
- **Reserved**: `audio_policy="caption"` is not configured in v0 (must error; non-speech audio analysis needs an explicit capability)
Transparency:
- When STT fallback is used, `GenerateResponse.metadata.media_enrichment[]` records what was injected and which backend was used.
Requirements:
- **Native audio** requires an audio-capable model.
- **STT fallback** requires installing an STT capability plugin (typically `pip install abstractvoice`) and using `audio_policy="auto"`/`"speech_to_text"` (or setting a default via `abstractcore --set-audio-strategy ...`).
### Video (policy-driven; native or frames fallback)
- **Formats**: common `video/*` types as attachments via `media=[...]`
- **Default behavior**: `video_policy="auto"` (native video when supported; otherwise sample frames and route through the vision pipeline)
- **Budgets**: frame count and downscale are explicit and logged (see `abstractcore/providers/base.py`)
Requirements:
- Frame sampling fallback requires **`ffmpeg`/`ffprobe`** available on `PATH`.
- For the sampled-frame path, you also need **image/vision handling**: either a vision-capable main model or configured vision fallback, and (for local frame attachments) `pip install "abstractcore[media]"` so Pillow-based image processing is available.
### Processing Features
- **Intelligent Detection**: Automatic file type recognition and processor selection
- **Content Optimization**: Format-specific processing optimized for LLM consumption
- **Robust Fallback**: Graceful degradation ensures users always get meaningful results
- **Performance Optimized**: Lazy loading and efficient memory usage
- **Testing status**: Coverage varies by provider and modality; see the test suite under `tests/media_handling/`
### Token Estimation & No Truncation Policy
AbstractCore processors **do not silently truncate content**. This design decision ensures:
1. **No data loss**: Full file content is always preserved
2. **User control**: Callers decide how to handle large files (summarize, chunk, error)
3. **Model flexibility**: Works correctly across models with different context limits (8K to 200K+)
**Token estimation** is automatically added to `MediaContent.metadata`:
```python
result = processor.process_file("data.csv")
print(result.media_content.metadata['estimated_tokens']) # e.g., 1500
print(result.media_content.metadata['content_length']) # e.g., 6000 chars
```
**Handlers use this for validation**:
```python
handler = OpenAIMediaHandler()
tokens = handler.estimate_tokens_for_media(media_content)
# Uses metadata['estimated_tokens'] if available, falls back to heuristic
```
For large files that exceed model context limits, use `BasicSummarizer` or implement custom chunking at the application layer.
## Provider Compatibility
### Vision-Enabled Providers
| Provider | Vision Models | Image Support | Document Support |
|----------|---------------|---------------|------------------|
| **OpenAI** | GPT-4o, GPT-4 Turbo with Vision | Supported: Multi-image | Supported: All formats |
| **Anthropic** | Claude 3.5 Sonnet, Claude 4 series | Supported: Up to 20 images | Supported: All formats |
| **Ollama** | qwen2.5vl:7b, gemma3:4b, llama3.2-vision:11b | Supported: Single image | Supported: All formats |
| **LMStudio** | qwen2.5-vl-7b, gemma-3n-e4b, magistral-small-2509 | Supported: Multiple images | Supported: All formats |
### Text-Only Providers
All providers support document processing even without vision capabilities:
| Provider | Document Processing | Text Extraction |
|----------|-------------------|-----------------|
| **HuggingFace** | Supported: All formats | Supported: Embedded in prompt |
| **MLX** | Supported: All formats | Supported: Embedded in prompt |
| **Any Provider** | Supported: Automatic fallback | Supported: Text extraction |
### ⚠️ Model Compatibility Notes (Updated: 2025-10-17)
Some newer vision models may not be immediately available due to rapid development:
**LMStudio Limitations:**
- `qwen3-vl` models (8B, 30B) - Not yet supported in LMStudio
- Use `qwen2.5-vl-7b` as a proven alternative
**HuggingFace Limitations:**
- `Qwen3-VL` models - Require newer transformers architecture
- Install latest transformers: `pip install --upgrade transformers`
- Or use bleeding edge: `pip install git+https://github.com/huggingface/transformers.git`
**Recommended Stable Models (2025-10-17):**
- **LMStudio**: `qwen/qwen2.5-vl-7b`, `google/gemma-3n-e4b`, `mistralai/magistral-small-2509`
- **Ollama**: `qwen2.5vl:7b`, `gemma3:4b`, `llama3.2-vision:11b`
- **OpenAI**: `gpt-4o`, `gpt-4-turbo-with-vision`
- **Anthropic**: `claude-3.5-sonnet`, `claude-4-series`
## Usage Examples
### Vision Analysis
```python
from abstractcore import create_llm
# Analyze images with any vision model
llm = create_llm("openai", model="gpt-4o")
# Single image analysis
response = llm.generate(
"What's happening in this image?",
media=["photo.jpg"]
)
# Multiple images comparison
response = llm.generate(
"Compare these two charts and explain the trends",
media=["chart1.png", "chart2.png"]
)
# Mixed media analysis
response = llm.generate(
"Summarize the report and relate it to what you see in the image",
media=["financial_report.pdf", "stock_chart.png"]
)
```
### Document Processing
```python
# PDF analysis
response = llm.generate(
"Summarize the key findings from this research paper",
media=["research_paper.pdf"]
)
# Office document processing
response = llm.generate(
"Create a summary of this presentation and spreadsheet",
media=["quarterly_results.ppt", "financial_data.xlsx"]
)
# CSV data analysis
response = llm.generate(
"What patterns do you see in this sales data?",
media=["sales_data.csv"]
)
```
### CLI Usage
These examples work in AbstractCore CLI when `abstractcore[media]` is installed and your selected provider/model supports the requested media (or you configured fallbacks):
```bash
# PDF Analysis - Working
python -m abstractcore.utils.cli --prompt "What is this document about? @report.pdf"
# Office Documents - Working
python -m abstractcore.utils.cli --prompt "Summarize this presentation @slides.pptx"
python -m abstractcore.utils.cli --prompt "What data is in @spreadsheet.xlsx"
python -m abstractcore.utils.cli --prompt "Analyze this document @contract.docx"
# Data Files - Working
python -m abstractcore.utils.cli --prompt "What patterns are in @sales_data.csv"
python -m abstractcore.utils.cli --prompt "Analyze this data @metrics.tsv"
# Images - Working
python -m abstractcore.utils.cli --prompt "What's in this image? @screenshot.png"
# Mixed Media - Working
python -m abstractcore.utils.cli --prompt "Compare @chart.png and @data.csv and explain trends"
```
### Cross-provider semantics (what’s consistent)
```python
# AbstractCore exposes a single `media=[...]` parameter across providers, but behavior
# depends on provider/model capabilities and your media policies.
# Documents (PDF/Office/text/CSV/TSV/...) are extracted to text/metadata and injected into the request.
# This generally works across providers because the final payload is text.
media_files = ["report.pdf", "data.xlsx"]
prompt = "Analyze these documents and provide insights"
# OpenAI
openai_llm = create_llm("openai", model="gpt-4o")
openai_response = openai_llm.generate(prompt, media=media_files)
# Anthropic
anthropic_llm = create_llm("anthropic", model="claude-haiku-4-5")
anthropic_response = anthropic_llm.generate(prompt, media=media_files)
# Image/audio/video inputs are policy-driven and require native support or explicit fallbacks.
# See: docs/vision-capabilities.md and docs/media-handling-system.md (policies + fallbacks).
```
### Streaming with Media
```python
# Real-time streaming responses with media
llm = create_llm("openai", model="gpt-4o") # requires: pip install "abstractcore[openai]"
for chunk in llm.generate(
"Describe this image in detail",
media=["complex_diagram.png"],
stream=True
):
print(chunk.content or "", end="", flush=True)
```
## Advanced Features
### Maximum Resolution Optimization (NEW)
AbstractCore automatically optimizes image resolution for each model's maximum capability, ensuring optimal vision results:
```python
from abstractcore import create_llm
# Images are automatically optimized for each model's maximum resolution
llm = create_llm("openai", model="gpt-4o")
response = llm.generate(
"Analyze this image in detail",
media=["photo.jpg"] # Auto-resized to 4096x4096 for GPT-4o
)
# Different model, different optimization
llm = create_llm("ollama", model="qwen2.5vl:7b")
response = llm.generate(
"What's in this image?",
media=["photo.jpg"] # Auto-resized to 3584x3584 for qwen2.5vl
)
```
**Model-Specific Resolution Limits:**
- **GPT-4o**: Up to 4096x4096 pixels
- **Claude 3.5 Sonnet**: Up to 1568x1568 pixels
- **qwen2.5vl:7b**: Up to 3584x3584 pixels
- **gemma3:4b**: Up to 896x896 pixels
- **llama3.2-vision:11b**: Up to 560x560 pixels
**Benefits:**
- **Better Accuracy**: Higher resolution means more detail for the model to analyze
- **Automatic**: No manual configuration required
- **Provider-Aware**: Adapts to each provider's optimal settings
- **Quality Optimization**: Increased JPEG quality (90%) for better compression
### Capability Detection
The system automatically detects model capabilities and adapts accordingly:
```python
from abstractcore.media.capabilities import is_vision_model, supports_images
# Check if a model supports vision
if is_vision_model("gpt-4o"):
print("This model can process images")
if supports_images("claude-3.5-sonnet"):
print("This model supports image analysis")
# Text-only model + image input is policy-driven
llm = create_llm("openai", model="gpt-4") # text-only example
response = llm.generate(
"Analyze this image",
media=["photo.jpg"], # Errors unless vision fallback is configured; see below.
)
```
### Vision fallback (optional; config-driven)
AbstractCore includes an optional **vision fallback** that enables text-only models to process images using a transparent two-stage pipeline (caption → inject short observations).
#### How Vision Fallback Works
When vision fallback is configured and you use a text-only model with images, AbstractCore:
1. **Detects Model Limitations**: Identifies when a text-only model receives an image
2. **Uses Vision Fallback**: Employs a configured vision model to analyze the image
3. **Provides Description**: Passes the image description to the text-only model
4. **Returns Results**: Your text model answers using the injected observations (recorded in `metadata.media_enrichment[]`)
#### Example
Configure a vision captioner once:
```bash
abstractcore --set-vision-provider lmstudio qwen/qwen3-vl-4b
```
Then use any text model with images:
```python
from abstractcore import create_llm
llm = create_llm("lmstudio", model="qwen/qwen3-next-80b") # text-only
resp = llm.generate("What's in this image?", media=["whale_photo.jpg"])
print(resp.content)
```
#### Behind the Scenes
What actually happens (transparent to user):
1. **Stage 1**: `qwen2.5vl:7b` (vision model) analyzes `whale_photo.jpg` → detailed description
2. **Stage 2**: `qwen/qwen3-next-80b` (text-only) processes description + user question → final analysis
#### Configuration Commands
```bash
# Check current status
abstractcore --status
# Download local caption models (optional)
abstractcore --download-vision-model # BLIP base (990MB)
abstractcore --download-vision-model vit-gpt2 # ViT-GPT2 (500MB, CPU-friendly)
abstractcore --download-vision-model git-base # GIT base (400MB, smallest)
# Use an existing vision-capable model as the fallback captioner
abstractcore --set-vision-provider ollama qwen2.5vl:7b
abstractcore --set-vision-provider lmstudio qwen/qwen3-vl-4b
abstractcore --set-vision-provider openai gpt-4o
abstractcore --set-vision-provider anthropic claude-sonnet-4-5
# Interactive setup
abstractcore --config
# Advanced: Fallback chains
abstractcore --add-vision-fallback ollama qwen2.5vl:7b
abstractcore --add-vision-fallback openai gpt-4o
```
#### Benefits of Vision Fallback
- **Universal Compatibility**: Any text-only model can now process images
- **Cost Optimization**: Use cheaper text models for reasoning, vision models only for description
- **Transparent Operation**: Users don't need to change their code
- **Flexible Configuration**: Local models, cloud APIs, or hybrid setups
- **Offline-First**: Works without internet after downloading local models
- **Automatic Fallback**: Graceful degradation when vision not configured
#### Supported Vision Models
**Local Models (Downloaded):**
- **BLIP Base**: 990MB, high quality, CPU/GPU compatible
- **ViT-GPT2**: 500MB, CPU-friendly, good performance
- **GIT Base**: 400MB, smallest size, basic quality
**Provider Models:**
- **Ollama**: `qwen2.5vl:7b`, `llama3.2-vision:11b`, `gemma3:4b`
- **LMStudio**: `qwen/qwen2.5-vl-7b`, `google/gemma-3n-e4b`
- **OpenAI**: `gpt-4o`, `gpt-4-turbo-with-vision`
- **Anthropic**: `claude-3.5-sonnet`, `claude-4-series`
### Custom Processing Options
```python
# Advanced image processing
from abstractcore.media.processors import ImageProcessor
processor = ImageProcessor(
optimize_for_vision=True,
max_dimension=1024,
quality=85
)
# Advanced PDF processing
from abstractcore.media.processors import PDFProcessor
pdf_processor = PDFProcessor(
extract_images=True,
markdown_output=True,
preserve_tables=True
)
```
### Direct Media Processing
```python
# Process files directly (without LLM)
from abstractcore.media import process_file
# Process any supported file
result = process_file("document.pdf")
if result.success:
print(f"Content: {result.media_content.content}")
print(f"Type: {result.media_content.media_type}")
print(f"Metadata: {result.media_content.metadata}")
```
## Recommended Practices
### File Size and Limits
```python
# Check model-specific limits
from abstractcore.media.capabilities import get_media_capabilities
caps = get_media_capabilities("gpt-4o")
print(f"Max images per message: {caps.max_images}")
print(f"Supported formats: {caps.supported_formats}")
```
### Error Handling
```python
try:
response = llm.generate(
"Analyze this file",
media=["large_document.pdf"]
)
except Exception as e:
print(f"Media processing error: {e}")
# Fallback to text-only processing
response = llm.generate("Analyze the uploaded document content")
```
### Performance Tips
```python
# For large documents, consider chunking
from abstractcore.media.processors import PDFProcessor
processor = PDFProcessor(chunk_size=8000) # Process in chunks
# For multiple images, process in batches
image_files = ["img1.jpg", "img2.jpg", "img3.jpg"]
for batch in [image_files[i:i+3] for i in range(0, len(image_files), 3)]:
response = llm.generate("Analyze these images", media=batch)
```
## Model-Specific Examples
### OpenAI GPT-4o
```python
# Multi-image analysis with high detail
llm = create_llm("openai", model="gpt-4o")
response = llm.generate(
"Compare these architectural photos and identify the styles",
media=["building1.jpg", "building2.jpg", "building3.jpg"]
)
```
### Anthropic Claude 3.5 Sonnet
```python
# Document analysis with specialized prompts
llm = create_llm("anthropic", model="claude-3.5-sonnet")
response = llm.generate(
"Provide a comprehensive analysis of this research paper",
media=["academic_paper.pdf"]
)
```
### Local Vision Models
```python
# Ollama with qwen2.5-vl
ollama_llm = create_llm("ollama", model="qwen2.5vl:7b")
response = ollama_llm.generate(
"What objects do you see in this image?",
media=["scene.jpg"]
)
# LMStudio with qwen2.5-vl
lmstudio_llm = create_llm("lmstudio", model="qwen/qwen2.5-vl-7b")
response = lmstudio_llm.generate(
"Describe this chart and its trends",
media=["business_chart.png"]
)
# Ollama with Llama 3.2 Vision
llama_llm = create_llm("ollama", model="llama3.2-vision:11b")
response = llama_llm.generate(
"Analyze this document layout",
media=["document.jpg"]
)
```
## Installation
### Basic Installation
```bash
# Core media handling (images, text, basic documents)
pip install "abstractcore[media]"
```
### Full Installation
```bash
# Media features (PDF + Office docs) are covered by `abstractcore[media]`.
# If you want the full framework install (providers + tools + server + docs), pick one:
pip install "abstractcore[all-apple]" # macOS/Apple Silicon (includes MLX, excludes vLLM)
pip install "abstractcore[all-non-mlx]" # Linux/Windows/Intel Mac (excludes MLX and vLLM)
pip install "abstractcore[all-gpu]" # Linux NVIDIA GPU (includes vLLM, excludes MLX)
```
Advanced: If you prefer to install only the pieces you need (instead of `abstractcore[media]`),
these are the main libraries AbstractCore uses:
- `Pillow` (images)
- `pymupdf4llm` + `pymupdf-layout` (PDF extraction)
- `unstructured[docx,pptx,xlsx,odt,rtf]` (Office docs)
- `pandas` (tabular helpers)
## Troubleshooting
### Common Issues
**Media not processed:**
```python
# Check if media dependencies are installed
try:
response = llm.generate("Test", media=["test.jpg"])
except ImportError as e:
print(f"Missing dependency: {e}")
print('Install with: pip install "abstractcore[media]"')
```
**Vision model not detecting images:**
```python
# Verify model capabilities
from abstractcore.media.capabilities import is_vision_model
if not is_vision_model("your-model"):
print("This model doesn't support vision")
print("Try: gpt-4o, claude-3.5-sonnet, qwen2.5vl:7b, or llama3.2-vision:11b")
```
**Large file processing:**
```python
# For large files, check size limits
import os
file_size = os.path.getsize("large_file.pdf")
if file_size > 10 * 1024 * 1024: # 10MB
print("File may be too large for some providers")
```
### Validation
```bash
# Test your installation
python validate_media_system.py
# Run comprehensive tests
python -m pytest tests/media_handling/ -v
```
## API Reference
### Core Functions
```python
# Main generation with media
llm.generate(prompt, media=files, **kwargs)
# Direct file processing
from abstractcore.media import process_file
result = process_file(file_path)
# Capability detection
from abstractcore.media.capabilities import (
is_vision_model,
supports_images,
get_media_capabilities
)
```
### Media Types
```python
from abstractcore.media.types import MediaType, ContentFormat
# MediaType.IMAGE, MediaType.DOCUMENT, MediaType.TEXT
# ContentFormat.BASE64, ContentFormat.TEXT, ContentFormat.BINARY
```
### Processors
```python
from abstractcore.media.processors import (
ImageProcessor, # Images with PIL
TextProcessor, # Text, CSV, JSON with pandas
PDFProcessor, # PDFs with PyMuPDF4LLM
OfficeProcessor # DOCX, XLSX, PPT with unstructured
)
```
## Next Steps
- **Getting Started Guide** - Complete AbstractCore tutorial
- **API Reference** - Full Python API documentation
- **Glyph + Vision Example** - End-to-end document analysis with a vision model
- **Supported Formats Utility** - Inspect available processors and supported formats
---
The media handling system makes AbstractCore multimodal while maintaining the same "write once, run everywhere" philosophy. Focus on your application logic while AbstractCore handles the complexity of different provider APIs and media formats.
---
### Inlined: `docs/embeddings.md`
# Vector Embeddings Guide
AbstractCore includes built-in support for vector embeddings with **multiple providers** (HuggingFace, Ollama, LMStudio). This guide shows you how to use embeddings for semantic search, RAG applications, and similarity analysis.
**Two ways to use embeddings:**
1. **Python Library** (this guide) - Direct programmatic usage via `EmbeddingManager`
2. **REST API** - HTTP endpoints via AbstractCore server (see Server API Reference)
## Quick Start
### Installation
```bash
# Install with embeddings support
pip install "abstractcore[embeddings]"
```
### First Embeddings
```python
from abstractcore.embeddings import EmbeddingManager
# Option 1: HuggingFace (default) - Local models with optional ONNX acceleration
embedder = EmbeddingManager() # Uses all-MiniLM-L6-v2 by default
# Option 2: Ollama - Local models via Ollama API
embedder = EmbeddingManager(
provider="ollama",
model="granite-embedding:278m"
)
# Option 3: LMStudio - Local models via LMStudio API
embedder = EmbeddingManager(
provider="lmstudio",
model="text-embedding-all-minilm-l6-v2"
)
# Generate embedding for a single text (works with all providers)
embedding = embedder.embed("Machine learning transforms how we process information")
print(f"Embedding dimension: {len(embedding)}") # 384 for MiniLM
# Compute similarity between texts (works with all providers)
similarity = embedder.compute_similarity(
"artificial intelligence",
"machine learning"
)
print(f"Similarity: {similarity:.3f}") # 0.847
```
## Available Providers & Models
AbstractCore supports multiple embedding providers:
### HuggingFace Provider (Default)
Local sentence-transformers models with optional ONNX acceleration (when available).
| Model | Size | Dimensions | Languages | Primary Use Cases |
|-------|------|------------|-----------|----------|
| **all-minilm** (default) | 90M | 384 | English | Fast local development, testing |
| **qwen3-embedding** | 1.5B | 1536 | 100+ | Qwen-based multilingual, instruction-tuned |
| **embeddinggemma** | 300M | 768 | 100+ | General purpose, multilingual |
| **granite** | 278M | 768 | 100+ | Enterprise applications |
```python
# Default: all-MiniLM-L6-v2 (fast and lightweight)
embedder = EmbeddingManager()
# Qwen-based embedding model for multilingual support
embedder = EmbeddingManager(model="qwen3-embedding")
# Google's EmbeddingGemma for multilingual support
embedder = EmbeddingManager(model="embeddinggemma")
# Direct HuggingFace model ID
embedder = EmbeddingManager(model="sentence-transformers/all-MiniLM-L6-v2")
```
### Ollama Provider
Local embedding models via Ollama API. Requires Ollama running locally.
```python
# Setup: Install Ollama and pull an embedding model
# ollama pull granite-embedding:278m
# Use Ollama embeddings
embedder = EmbeddingManager(
provider="ollama",
model="granite-embedding:278m"
)
# Other popular Ollama embedding models:
# - nomic-embed-text (274MB)
# - granite-embedding:107m (smaller, faster)
```
### LMStudio Provider
Local embedding models via LMStudio API. Requires LMStudio running with a loaded model.
```python
# Setup: Start LMStudio and load an embedding model
# Use LMStudio embeddings
embedder = EmbeddingManager(
provider="lmstudio",
model="text-embedding-all-minilm-l6-v2"
)
```
### Provider Comparison
| Provider | Speed | Setup | Privacy | Cost | Primary Use Cases |
|----------|-------|-------|---------|------|----------|
| **HuggingFace** | Fast | Easy | Full | Free | Development, production |
| **Ollama** | Medium | Medium | Full | Free | Privacy, custom models |
| **LMStudio** | Medium | Easy (GUI) | Full | Free | GUI management, testing |
## Core Features
### Single Text Embeddings
```python
embedder = EmbeddingManager()
text = "Python is a versatile programming language"
embedding = embedder.embed(text)
print(f"Text: {text}")
print(f"Embedding: {len(embedding)} dimensions")
print(f"First 5 values: {embedding[:5]}")
```
### Batch Processing (More Efficient)
```python
texts = [
"Python programming language",
"JavaScript for web development",
"Machine learning with Python",
"Data science and analytics"
]
# Process multiple texts at once (much faster)
embeddings = embedder.embed_batch(texts)
print(f"Generated {len(embeddings)} embeddings")
for i, embedding in enumerate(embeddings):
print(f"Text {i+1}: {len(embedding)} dimensions")
```
### Similarity Analysis
```python
# Basic similarity between two texts
similarity = embedder.compute_similarity("cat", "kitten")
print(f"Similarity: {similarity:.3f}") # 0.804
# NEW: Batch similarity - compare one text against many
query = "Python programming"
docs = ["Learn Python basics", "JavaScript guide", "Cooking recipes", "Data science with Python"]
similarities = embedder.compute_similarities(query, docs)
print(f"Batch similarities: {[f'{s:.3f}' for s in similarities]}")
# Output: ['0.785', '0.155', '0.145', '0.580']
# NEW: Similarity matrix - compare all texts against all texts
texts = ["Python programming", "JavaScript development", "Python data science", "Web frameworks"]
matrix = embedder.compute_similarities_matrix(texts)
print(f"Matrix shape: {matrix.shape}") # (4, 4) symmetric matrix
# NEW: Asymmetric matrix for query-document matching
queries = ["Learn Python", "Web development guide"]
knowledge_base = ["Python tutorial", "JavaScript guide", "React framework", "Python for beginners"]
search_matrix = embedder.compute_similarities_matrix(queries, knowledge_base)
print(f"Search matrix: {search_matrix.shape}") # (2, 4) - 2 queries × 4 documents
```
## Practical Applications
### Semantic Search
```python
from abstractcore.embeddings import EmbeddingManager
embedder = EmbeddingManager()
# Document collection
documents = [
"Python is strong for data science and machine learning applications",
"JavaScript enables interactive web pages and modern frontend development",
"React is a popular library for building user interfaces with JavaScript",
"SQL databases store and query structured data efficiently",
"Machine learning algorithms can predict patterns from historical data"
]
def semantic_search(query, documents, top_k=3):
"""Find most relevant documents for a query."""
similarities = []
for i, doc in enumerate(documents):
similarity = embedder.compute_similarity(query, doc)
similarities.append((i, similarity, doc))
# Sort by similarity (highest first)
similarities.sort(key=lambda x: x[1], reverse=True)
return similarities[:top_k]
# Search for relevant documents
query = "web development frameworks"
results = semantic_search(query, documents)
print(f"Query: {query}\n")
for rank, (idx, similarity, doc) in enumerate(results, 1):
print(f"{rank}. Score: {similarity:.3f}")
print(f" {doc}\n")
```
### Simple RAG Pipeline
```python
from abstractcore import create_llm
from abstractcore.embeddings import EmbeddingManager
# Setup
embedder = EmbeddingManager()
llm = create_llm("openai", model="gpt-4o-mini")
# Knowledge base
knowledge_base = [
"The Eiffel Tower is 330 meters tall and was completed in 1889.",
"Paris is the capital city of France with over 2 million inhabitants.",
"The Louvre Museum in Paris houses the famous Mona Lisa painting.",
"French cuisine is known for its wine, cheese, and pastries.",
"The Seine River flows through central Paris."
]
def rag_query(question, knowledge_base, llm, embedder):
"""Answer question using relevant context from knowledge base."""
# Step 1: Find most relevant context
similarities = []
for doc in knowledge_base:
similarity = embedder.compute_similarity(question, doc)
similarities.append((similarity, doc))
# Get top 2 most relevant documents
similarities.sort(reverse=True)
top_contexts = [doc for _, doc in similarities[:2]]
context = "\n".join(top_contexts)
# Step 2: Generate answer using context
prompt = f"""Context:
{context}
Question: {question}
Based on the context above, please answer the question:"""
response = llm.generate(prompt)
return response.content, top_contexts
# Usage
question = "How tall is the Eiffel Tower?"
answer, contexts = rag_query(question, knowledge_base, llm, embedder)
print(f"Question: {question}")
print(f"Answer: {answer}")
print(f"\nUsed context:")
for ctx in contexts:
print(f"- {ctx}")
```
### Document Clustering (NEW)
```python
from abstractcore.embeddings import EmbeddingManager
embedder = EmbeddingManager()
# Documents to cluster
documents = [
"Python programming tutorial for beginners",
"Introduction to machine learning concepts",
"JavaScript web development guide",
"Advanced Python data structures",
"Machine learning with neural networks",
"Building web apps with JavaScript",
"Python for data analysis",
"Deep learning fundamentals",
"React.js frontend development",
"Statistical analysis with Python"
]
# NEW: Automatic semantic clustering
clusters = embedder.find_similar_clusters(
documents,
threshold=0.6, # 60% similarity required
min_cluster_size=2 # At least 2 documents per cluster
)
print(f"Found {len(clusters)} clusters:")
for i, cluster in enumerate(clusters):
print(f"\nCluster {i+1} ({len(cluster)} documents):")
for idx in cluster:
print(f" - {documents[idx]}")
# Example output:
# Cluster 1 (4 documents): Python-related content
# Cluster 2 (2 documents): JavaScript-related content
# Cluster 3 (2 documents): Machine learning content
```
## Performance Optimization
### ONNX Backend (optional)
```python
# Enable ONNX for faster inference
embedder = EmbeddingManager(
model="embeddinggemma",
backend="onnx" # optional
)
# Performance comparison
import time
texts = ["Sample text for performance testing"] * 100
# Time the embedding generation
start_time = time.time()
embeddings = embedder.embed_batch(texts)
duration = time.time() - start_time
print(f"Generated {len(embeddings)} embeddings in {duration:.2f} seconds")
print(f"Speed: {len(embeddings)/duration:.1f} embeddings/second")
```
### Dimension Truncation (Memory/Speed Trade-off)
```python
# Truncate embeddings for faster processing
embedder = EmbeddingManager(
model="embeddinggemma",
output_dims=256 # Reduce from 768 to 256 dimensions
)
embedding = embedder.embed("Test text")
print(f"Truncated embedding dimension: {len(embedding)}") # 256
```
### Advanced Caching (NEW)
```python
# Configure dual-layer caching system
embedder = EmbeddingManager(
cache_size=5000, # Larger memory cache
cache_dir="./embeddings_cache" # Persistent disk cache
)
# Regular embedding with standard caching
embedding1 = embedder.embed("Machine learning text")
# NEW: Normalized embedding with dedicated cache (unit-length vectors for cosine similarity)
normalized = embedder.embed_normalized("Machine learning text")
print(f"Normalized embedding length: {sum(x*x for x in normalized)**0.5:.3f}") # 1.0 (unit length)
# Check comprehensive cache stats
stats = embedder.get_cache_stats()
print(f"Regular cache: {stats['persistent_cache_size']} embeddings")
print(f"Normalized cache: {stats['normalized_cache_size']} embeddings")
print(f"Memory cache hits: {stats['memory_cache_info']['hits']}")
```
## Integration with LLM Providers
### Enhanced Context Selection
```python
from abstractcore import create_llm
from abstractcore.embeddings import EmbeddingManager
def smart_context_selection(query, documents, max_context_length=2000):
"""Select most relevant context that fits within token limits."""
embedder = EmbeddingManager()
# Score all documents
scored_docs = []
for doc in documents:
similarity = embedder.compute_similarity(query, doc)
scored_docs.append((similarity, doc))
# Sort by relevance
scored_docs.sort(reverse=True)
# Select documents that fit within context limit
selected_context = ""
for similarity, doc in scored_docs:
test_context = selected_context + "\n" + doc
if len(test_context) <= max_context_length:
selected_context = test_context
else:
break
return selected_context.strip()
# Usage with LLM
llm = create_llm("anthropic", model="claude-haiku-4-5")
documents = [
"Long document about machine learning...",
"Another document about data science...",
# ... many more documents
]
query = "What is supervised learning?"
context = smart_context_selection(query, documents)
response = llm.generate(f"Context: {context}\n\nQuestion: {query}")
print(response.content)
```
### Multi-language Support
```python
# EmbeddingGemma supports 100+ languages
embedder = EmbeddingManager(model="embeddinggemma")
# Cross-language similarity
similarity = embedder.compute_similarity(
"Hello world", # English
"Bonjour le monde" # French
)
print(f"Cross-language similarity: {similarity:.3f}")
# Multilingual semantic search
documents_multilingual = [
"Machine learning is transforming technology", # English
"L'intelligence artificielle change le monde", # French
"人工智能正在改变世界", # Chinese
"Künstliche Intelligenz verändert die Welt" # German
]
query = "artificial intelligence"
for doc in documents_multilingual:
similarity = embedder.compute_similarity(query, doc)
print(f"{similarity:.3f}: {doc}")
```
## Production Considerations
### Error Handling
```python
from abstractcore.embeddings import EmbeddingManager
def safe_embedding(text, embedder, fallback_value=None):
"""Generate embedding with error handling."""
try:
return embedder.embed(text)
except Exception as e:
print(f"Embedding failed for text: {text[:50]}...")
print(f"Error: {e}")
return fallback_value or [0.0] * 768 # Return zero vector as fallback
embedder = EmbeddingManager()
# Safe embedding generation
text = "Some text that might cause issues"
embedding = safe_embedding(text, embedder)
if embedding:
print(f"Successfully generated embedding: {len(embedding)} dimensions")
else:
print("Using fallback embedding")
```
### Monitoring and Metrics
```python
import time
from abstractcore.embeddings import EmbeddingManager
class MonitoredEmbeddingManager:
def __init__(self, *args, **kwargs):
self.embedder = EmbeddingManager(*args, **kwargs)
self.stats = {
'total_calls': 0,
'total_time': 0,
'cache_hits': 0,
'cache_misses': 0
}
def embed(self, text):
start_time = time.time()
result = self.embedder.embed(text)
duration = time.time() - start_time
self.stats['total_calls'] += 1
self.stats['total_time'] += duration
return result
def get_stats(self):
avg_time = self.stats['total_time'] / max(self.stats['total_calls'], 1)
return {
**self.stats,
'average_time': avg_time,
'calls_per_second': 1 / avg_time if avg_time > 0 else 0
}
# Usage
monitored_embedder = MonitoredEmbeddingManager()
# Generate some embeddings
for i in range(10):
monitored_embedder.embed(f"Test text number {i}")
# Check performance
stats = monitored_embedder.get_stats()
print(f"Total calls: {stats['total_calls']}")
print(f"Average time per call: {stats['average_time']:.3f}s")
print(f"Calls per second: {stats['calls_per_second']:.1f}")
```
## When to Use Embeddings
### Good Use Cases
- **Semantic Search**: Find relevant documents based on meaning, not keywords
- **RAG Applications**: Select relevant context for language model queries
- **Content Recommendation**: Find similar articles, products, or content
- **Clustering**: Group similar documents or texts together
- **Duplicate Detection**: Find near-duplicate content
- **Multi-language Search**: Search across different languages
### Not Ideal For
- **Exact Matching**: Use traditional text search for exact matches
- **Structured Data**: Use SQL databases for structured queries
- **Real-time Critical Applications**: Embedding computation has latency
- **Very Short Texts**: Embeddings work better with meaningful content
- **High-frequency Operations**: Consider caching for repeated queries
## Using Embeddings via REST API
If you prefer HTTP endpoints over Python code, use the AbstractCore server:
```bash
# Start the server
pip install "abstractcore[server]"
python -m abstractcore.server.app
```
**HTTP Request:**
```bash
curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "Machine learning is fascinating",
"model": "huggingface/sentence-transformers/all-MiniLM-L6-v2"
}'
```
**Model IDs via REST API (examples):**
- `huggingface/model-name`
- `ollama/model-name`
- `lmstudio/model-name`
**Complete REST API documentation:** Server API Reference
## Provider-Specific Features
### HuggingFace Features
- **ONNX Acceleration** (when available)
- **Matryoshka Truncation**: Reduce dimensions for efficiency
- **Persistent Caching**: Automatic disk caching of embeddings
### Ollama Features
- **Simple Setup**: Just `ollama pull `
- **Full Privacy**: No data leaves your machine
- **Custom Models**: Use any Ollama-compatible model
### LMStudio Features
- **GUI Management**: Easy model loading via GUI
- **Testing Friendly**: Suitable for experimentation
- **OpenAI Compatible**: Standard API format
## Next Steps
- **Start Simple**: Try the semantic search example with your own data
- **Experiment with Providers**: Compare HuggingFace, Ollama, and LMStudio
- **Optimize Performance**: Use batch processing and caching for production
- **Build RAG**: Combine embeddings with AbstractCore LLMs for RAG applications
- **Use REST API**: Deploy embeddings as HTTP service with the server
## Related Documentation
**Core Library:**
- **Python API Reference** - Complete EmbeddingManager API
- **Getting Started** - Basic AbstractCore setup
- **Examples** - More practical examples
**Server (REST API):**
- **Server Guide** - Server setup and deployment
- **Server API Reference** - REST API endpoints including embeddings
- **Troubleshooting** - Common embedding issues
---
**Remember**: Embeddings are the foundation for semantic understanding. Combined with AbstractCore's multi-provider LLM capabilities, you can build sophisticated AI applications that understand meaning, not just keywords.
---
### Inlined: `docs/centralized-config.md`
# AbstractCore Centralized Configuration
AbstractCore provides a unified configuration system that manages default models, cache directories, logging settings, and other package-wide preferences from a single location.
## Configuration File Location
Configuration is stored in: `~/.abstractcore/config/abstractcore.json`
## Configuration Sections
### Application Defaults
Set default providers and models for specific AbstractCore applications:
```bash
# Set defaults for individual apps
abstractcore --set-app-default summarizer openai gpt-4o-mini
abstractcore --set-app-default cli anthropic claude-haiku-4-5
abstractcore --set-app-default extractor ollama qwen3:4b-instruct
abstractcore --set-app-default intent lmstudio qwen/qwen3-4b-2507
# View current app defaults
abstractcore --status
```
### Global Defaults
Set fallback defaults when app-specific configurations are not available:
```bash
# Set global fallback model
abstractcore --set-global-default ollama/llama3:8b
# Set specialized defaults
abstractcore --set-chat-model openai/gpt-4o-mini
abstractcore --set-code-model anthropic/claude-haiku-4-5
```
### Cache Directories
Configure cache locations for different components:
```bash
# Set cache directories
abstractcore --set-default-cache-dir ~/.cache/abstractcore
abstractcore --set-huggingface-cache-dir ~/.cache/huggingface
abstractcore --set-local-models-cache-dir ~/.abstractcore/models
```
**Default cache locations:**
- Default cache: `~/.cache/abstractcore`
- HuggingFace cache: `~/.cache/huggingface`
- Local models: `~/.abstractcore/models`
### Logging Configuration
Control logging behavior across all AbstractCore components:
#### Setting Log Levels
```bash
# Change console logging level (what you see in terminal)
abstractcore --set-console-log-level DEBUG # Show all messages
abstractcore --set-console-log-level INFO # Show info and above
abstractcore --set-console-log-level WARNING # Show warnings and errors
abstractcore --set-console-log-level ERROR # Show only errors (default)
abstractcore --set-console-log-level CRITICAL # Show only critical errors
abstractcore --set-console-log-level NONE # Disable all console logging
# Change file logging level (when file logging is enabled)
abstractcore --set-file-log-level DEBUG
abstractcore --set-file-log-level INFO
abstractcore --set-file-log-level NONE # Disable all file logging
```
#### File Logging Controls
```bash
# Enable/disable file logging
abstractcore --enable-file-logging # Start saving logs to files
abstractcore --disable-file-logging # Stop saving logs to files
# Set log file location
abstractcore --set-log-base-dir ~/.abstractcore/logs
abstractcore --set-log-base-dir /var/log/abstractcore
```
#### Quick Logging Commands
```bash
# Enable debug mode (sets both console and file to DEBUG)
abstractcore --enable-debug-logging
# Disable console output (keeps file logging if enabled)
abstractcore --disable-console-logging
# Check current logging settings
abstractcore --status # Shows current levels with change commands
```
**Available log levels:** DEBUG, INFO, WARNING, ERROR, CRITICAL, NONE
**Log level descriptions:**
- **DEBUG**: Show all messages including detailed diagnostics
- **INFO**: Show informational messages and above
- **WARNING**: Show warnings, errors, and critical messages
- **ERROR**: Show only errors and critical messages
- **CRITICAL**: Show only critical errors
- **NONE**: Disable all logging completely
**Default logging settings:**
- Console level: ERROR
- File level: DEBUG
- File logging: Disabled by default
- Log base directory: `~/.abstractcore/logs`
### Vision (image fallback for text-only models)
Configure **vision fallback** (two-stage caption → inject observations) for text-only models:
```bash
# Set vision fallback provider/model
abstractcore --set-vision-provider huggingface Salesforce/blip-image-captioning-base
# Optional: add backups (used if the first vision backend fails)
abstractcore --add-vision-fallback lmstudio qwen/qwen3-vl-4b
# Disable vision fallback
abstractcore --disable-vision
```
Notes:
- `abstractcore --set-vision-caption ...` is deprecated but kept for compatibility.
- Vision fallback is only used for **image/video inputs** when the *main* model is text-only.
### Audio (default policy + optional speech-to-text fallback)
Audio attachments are controlled by `audio_policy` and are **strict by default** to avoid silent semantic changes:
```bash
# Enable speech-to-text fallback when audio is attached (requires an STT plugin backend)
pip install abstractvoice
abstractcore --set-audio-strategy auto
# Optional: set a language hint (e.g. en, fr)
abstractcore --set-stt-language fr
```
Notes:
- `audio_policy="native_only"` errors on text-only models (default).
- `audio_policy="speech_to_text"` forces STT and injects a transcript into the request.
- `audio_policy="auto"` uses native audio when supported, otherwise STT when available.
### Video (native vs frames fallback)
Video attachments are controlled by `video_policy`. By default (`auto`), AbstractCore uses native video input when supported, otherwise it samples frames via `ffmpeg` and routes them through image/vision handling.
```bash
abstractcore --set-video-strategy auto
abstractcore --set-video-max-frames 6
abstractcore --set-video-sampling-strategy keyframes
abstractcore --set-video-max-frame-side 1024
```
Notes:
- Frame sampling requires `ffmpeg`/`ffprobe` available on `PATH`.
- If your main model is text-only, frame fallback still requires **vision fallback** to be configured (see above).
### API Keys
Manage API keys for different providers:
```bash
# Set API keys
abstractcore --set-api-key openai sk-your-key-here
abstractcore --set-api-key anthropic your-anthropic-key
abstractcore --set-api-key openrouter your-openrouter-key
# List API key status
abstractcore --list-api-keys
```
### Streaming Configuration
Configure default streaming behavior for CLI:
```bash
# Set streaming behavior
abstractcore --stream on # Enable streaming by default
abstractcore --stream off # Disable streaming by default
# Alternative commands
abstractcore --enable-streaming # Enable streaming by default
abstractcore --disable-streaming # Disable streaming by default
```
**Note**: Streaming only affects CLI behavior. Apps (summarizer, extractor, judge, intent) don't support streaming because they need complete structured outputs.
## Priority System
AbstractCore uses a clear priority hierarchy for configuration:
1. **Explicit Parameters** (highest priority)
```bash
summarizer document.txt --provider openai --model gpt-4o-mini
```
2. **App-Specific Configuration**
```bash
abstractcore --set-app-default summarizer openai gpt-4o-mini
```
3. **Global Configuration**
```bash
abstractcore --set-global-default openai/gpt-4o-mini
```
4. **Hardcoded Defaults** (lowest priority)
- Used when no configuration is available
- Current default: `huggingface/unsloth/Qwen3-4B-Instruct-2507-GGUF`
## Debug Mode
The `--debug` parameter overrides configured logging levels and shows detailed diagnostics:
```bash
# Enable debug mode in apps
summarizer document.txt --debug
extractor data.txt --debug
# Debug output shows:
# 🐛 Debug - Configuration details:
# Provider: huggingface
# Model: unsloth/Qwen3-4B-Instruct-2507-GGUF
# Config source: configured defaults
# Max tokens: 32000
# ...
```
## Configuration Status
View complete configuration status:
```bash
abstractcore --status
```
This displays:
- Application defaults for each app
- Global fallback settings
- Vision configuration
- Embeddings settings
- API key status
- Cache directories
- Logging configuration
- Configuration file location
## Interactive Configuration
Set up configuration interactively:
```bash
abstractcore --config
```
This guides you through:
- Default model selection
- Vision fallback setup
- API key configuration
## Example Workflows
### Initial Setup
```bash
# 1. Check current status
abstractcore --status
# 2. Set global fallback
abstractcore --set-global-default ollama/llama3:8b
# 3. Configure specific apps for optimal performance
abstractcore --set-app-default summarizer openai gpt-4o-mini
abstractcore --set-app-default extractor ollama qwen3:4b-instruct
abstractcore --set-app-default judge anthropic claude-haiku-4-5
# 4. Set API keys as needed
abstractcore --set-api-key openai sk-your-key-here
abstractcore --set-api-key anthropic your-anthropic-key
# Optional (only if you plan to use the OpenRouter provider):
abstractcore --set-api-key openrouter your-openrouter-key
# 5. Configure logging for development
abstractcore --enable-debug-logging
abstractcore --enable-file-logging
# 6. Enable streaming for interactive CLI
abstractcore --stream on
# 7. Verify configuration
abstractcore --status
```
### Development Environment
```bash
# Enable verbose logging for development
abstractcore --set-console-log-level DEBUG
abstractcore --enable-file-logging
abstractcore --set-log-base-dir ./logs
# Use local models to avoid API costs
abstractcore --set-global-default ollama/llama3:8b
abstractcore --set-app-default summarizer ollama qwen3:4b-instruct
```
### Production Environment
```bash
# Use production API services
abstractcore --set-global-default openai/gpt-4o-mini
abstractcore --set-api-key openai $OPENAI_API_KEY
# Set production logging
abstractcore --set-console-log-level WARNING
abstractcore --set-file-log-level INFO
abstractcore --enable-file-logging
abstractcore --set-log-base-dir /var/log/abstractcore
```
## Configuration File Format
The configuration is stored as JSON in `~/.abstractcore/config/abstractcore.json`:
```json
{
"vision": {
"strategy": "two_stage",
"caption_provider": "huggingface",
"caption_model": "Salesforce/blip-image-captioning-base",
"fallback_chain": [
{
"provider": "huggingface",
"model": "Salesforce/blip-image-captioning-base"
}
],
"local_models_path": "~/.abstractcore/models/"
},
"audio": {
"strategy": "native_only",
"stt_backend_id": null,
"stt_language": null,
"caption_provider": null,
"caption_model": null,
"fallback_chain": []
},
"video": {
"strategy": "auto",
"max_frames": 3,
"max_frames_native": 8,
"frame_format": "jpg",
"sampling_strategy": "uniform",
"max_frame_side": 1024,
"max_video_size_bytes": null
},
"embeddings": {
"provider": "huggingface",
"model": "all-minilm-l6-v2"
},
"app_defaults": {
"cli_provider": "huggingface",
"cli_model": "unsloth/Qwen3-4B-Instruct-2507-GGUF",
"summarizer_provider": "openai",
"summarizer_model": "gpt-4o-mini",
"extractor_provider": "ollama",
"extractor_model": "qwen3:4b-instruct",
"judge_provider": "anthropic",
"judge_model": "claude-haiku-4-5",
"intent_provider": "lmstudio",
"intent_model": "qwen/qwen3-4b-2507"
},
"default_models": {
"global_provider": "ollama",
"global_model": "llama3:8b",
"chat_model": null,
"code_model": null
},
"api_keys": {
"openai": null,
"anthropic": null,
"openrouter": null,
"google": null
},
"cache": {
"default_cache_dir": "~/.cache/abstractcore",
"huggingface_cache_dir": "~/.cache/huggingface",
"local_models_cache_dir": "~/.abstractcore/models",
"glyph_cache_dir": "~/.abstractcore/glyph_cache"
},
"logging": {
"console_level": "ERROR",
"file_level": "DEBUG",
"file_logging_enabled": false,
"log_base_dir": null,
"verbatim_enabled": true,
"console_json": false,
"file_json": true
},
"timeouts": {
"default_timeout": 7200.0,
"tool_timeout": 600.0
},
"offline": {
"offline_first": true,
"allow_network": false,
"force_local_files_only": true
},
"streaming": {
"cli_stream_default": false
}
}
```
## Configuration Parameter Reference
### Vision Section
- **strategy**: Vision fallback strategy (`"two_stage"`, `"disabled"`, `"basic_metadata"`)
- **caption_provider**: Provider for vision model (e.g., `"huggingface"`, `"ollama"`)
- **caption_model**: Vision model name (e.g., `"Salesforce/blip-image-captioning-base"`)
- **fallback_chain**: Array of backup vision models to try if primary fails
- **local_models_path**: Directory for local vision model storage
### Audio Section
- **strategy**: Audio input strategy (`"native_only"`, `"speech_to_text"`, `"auto"`)
- **stt_backend_id**: Optional preferred STT backend id (plugin-specific)
- **stt_language**: Optional language hint for STT (e.g. `"en"`, `"fr"`)
### Video Section
- **strategy**: Video input strategy (`"native_only"`, `"frames_caption"`, `"auto"`)
- **max_frames**: Frame budget for frames-based fallback
- **max_frames_native**: Frame budget for native video-capable models
- **sampling_strategy**: `"uniform"` or `"keyframes"`
- **frame_format**: `"jpg"` or `"png"`
- **max_frame_side**: Downscale extracted frames to this max side length (preserves aspect ratio)
- **max_video_size_bytes**: Optional maximum video size allowed for processing (bytes)
### Default Models Section (Global Fallbacks)
- **global_provider** / **global_model**: Default provider/model when app-specific not set (e.g., `"ollama"` / `"llama3:8b"`)
- **chat_model**: Specialized model for chat applications (optional, `provider/model`)
- **code_model**: Specialized model for code generation (optional, `provider/model`)
### App Defaults Section (Per-Application)
- **cli_provider** / **cli_model**: Default for CLI utility
- **summarizer_provider** / **summarizer_model**: Default for document summarization
- **extractor_provider** / **extractor_model**: Default for entity extraction
- **judge_provider** / **judge_model**: Default for text evaluation
- **intent_provider** / **intent_model**: Default for intent analysis
### Embeddings Section
- **provider**: Embeddings provider (`"huggingface"`, `"openai"`, etc.)
- **model**: Embeddings model name (e.g., `"all-minilm-l6-v2"`)
### API Keys Section
- **openai**: OpenAI API key
- **anthropic**: Anthropic API key
- **openrouter**: OpenRouter API key
- **portkey**: Portkey API key
- **google**: Google API key (reserved for future integrations; not required for current built-in providers)
### Cache Section
- **default_cache_dir**: General cache directory for AbstractCore (`~/.cache/abstractcore`)
- **huggingface_cache_dir**: HuggingFace models cache (`~/.cache/huggingface`)
- **local_models_cache_dir**: Local models storage (`~/.abstractcore/models`)
- **glyph_cache_dir**: Glyph cache directory (`~/.abstractcore/glyph_cache`)
### Logging Section
- **console_level**: Console log level (`"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`, `"NONE"`)
- **file_level**: File log level (same options as console_level)
- **log_base_dir**: Directory for log files (`~/.abstractcore/logs`)
- **file_logging_enabled**: Whether to save logs to files (`true`/`false`)
- **verbatim_enabled**: Whether to capture full prompts/responses (`true`/`false`)
- **console_json**: Use JSON format for console output (`true`/`false`)
- **file_json**: Use JSON format for file output (`true`/`false`)
### Streaming Section
- **cli_stream_default**: Default streaming mode for CLI (`true`/`false`)
### Timeouts Section
- **default_timeout**: Default HTTP timeout for provider calls (seconds)
- **tool_timeout**: Default tool execution timeout (seconds)
### Offline Section
- **offline_first**: Default to offline-first behavior
- **allow_network**: Allow network access when offline-first is enabled (for API providers)
- **force_local_files_only**: Force HuggingFace `local_files_only` mode
## Common Configuration Tasks
### How to Change Console Log Level
If you see "Console Level: DEBUG" in the status and want to change it:
```bash
# To reduce console output (recommended for normal use)
abstractcore --set-console-log-level WARNING
# To see more information during development
abstractcore --set-console-log-level INFO
# To see all debug information
abstractcore --set-console-log-level DEBUG
# To completely disable console logging
abstractcore --set-console-log-level NONE
# Verify the change
abstractcore --status
```
### How to Enable File Logging
To start saving logs to files:
```bash
# Enable file logging (saves to ~/.abstractcore/logs by default)
abstractcore --enable-file-logging
# Optional: change log directory first
abstractcore --set-log-base-dir /path/to/your/logs
abstractcore --enable-file-logging
# Verify file logging is enabled
abstractcore --status
```
### How to Set Up Debug Mode
For troubleshooting, enable debug mode:
```bash
# Enable debug for both console and file logging
abstractcore --enable-debug-logging
# This is equivalent to:
# abstractcore --set-console-log-level DEBUG
# abstractcore --set-file-log-level DEBUG
# abstractcore --enable-file-logging
```
### How to Completely Disable Logging
To turn off all logging output:
```bash
# Disable console logging completely
abstractcore --set-console-log-level NONE
# Disable file logging completely (if enabled)
abstractcore --set-file-log-level NONE
abstractcore --disable-file-logging
# Note: --debug parameter in apps will still override NONE
# This maintains the priority system: explicit parameters > config defaults
```
## Troubleshooting
### Configuration Not Loading
If apps don't use configured defaults:
1. Check configuration file exists:
```bash
ls -la ~/.abstractcore/config/abstractcore.json
```
2. Verify configuration content:
```bash
abstractcore --status
```
3. Reset configuration if corrupted:
```bash
rm ~/.abstractcore/config/abstractcore.json
abstractcore --config
```
### Model Initialization Failures
When models fail to initialize, apps show configuration guidance:
```
[ERROR] Failed to initialize LLM 'openai/gpt-4o-mini': API key not configured
[INFO] Solutions:
- Set API key: abstractcore --set-api-key openai sk-...
- Use different provider: summarizer document.txt --provider ollama --model llama3:8b
🔧 Or configure a different default:
- abstractcore --set-app-default summarizer ollama llama3:8b
- abstractcore --status
```
### Debug Information
Use `--debug` to see detailed configuration information:
```bash
summarizer document.txt --debug
```
This shows:
- Which configuration source is being used
- Exact provider and model values
- All parameter values
- Configuration file location
---
### Inlined: `docs/server.md`
# AbstractCore Server
Transform AbstractCore into an OpenAI-compatible API server. One server, all models, any client.
If you want a dedicated **single-model** `/v1` server (one provider/model per worker), see Endpoint.
## Interactive API docs (start here)
Visit while the server is running:
- **Swagger UI**: `http://localhost:8000/docs`
- **ReDoc**: `http://localhost:8000/redoc`
## Quick Start
### Install and Run (2 minutes)
```bash
# Install
pip install "abstractcore[server]"
# Start server
python -m abstractcore.server.app
# Or with uvicorn directly
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000
# Test
curl http://localhost:8000/health
# Response: {"status":"healthy"}
```
### First Request
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
Or with Python:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
model="anthropic/claude-haiku-4-5",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)
```
---
## Configuration
### Environment Variables
```bash
# Provider API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."
export PORTKEY_API_KEY="pk_..." # optional (Portkey)
export PORTKEY_CONFIG="pcfg_..." # required for Portkey routing
# Local providers
export OLLAMA_BASE_URL="http://localhost:11434" # (or legacy: OLLAMA_HOST)
export LMSTUDIO_BASE_URL="http://localhost:1234/v1"
export VLLM_BASE_URL="http://localhost:8000/v1"
# Server bind (only used by `python -m abstractcore.server.app`)
export HOST="0.0.0.0"
export PORT="8000"
# Debug mode
export ABSTRACTCORE_DEBUG=true
# Dangerous (multi-tenant hazard): allow unload_after for providers that can unload shared server state (e.g. Ollama)
export ABSTRACTCORE_ALLOW_UNSAFE_UNLOAD_AFTER=1
```
### Startup Options
```bash
# Using AbstractCore's built-in CLI
python -m abstractcore.server.app --help # View all options
python -m abstractcore.server.app --debug # Debug mode
python -m abstractcore.server.app --host 127.0.0.1 --port 8080 # Custom host/port
python -m abstractcore.server.app --debug --port 8001 # Debug on custom port
# Using uvicorn directly
uvicorn abstractcore.server.app:app --reload # Development with auto-reload
uvicorn abstractcore.server.app:app --workers 4 # Production with multiple workers
uvicorn abstractcore.server.app:app --port 3000 # Custom port
```
---
## API Endpoints
### Chat Completions
**Endpoint:** `POST /v1/chat/completions`
Standard OpenAI-compatible endpoint. Works with all providers.
**Request:**
```json
{
"model": "provider/model-name",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}
```
**Key Parameters:**
- `model` (required): Prefer `"provider/model-name"` (e.g., `"openai/gpt-4o-mini"`). If you pass a bare model name (no `/`), the server will best-effort auto-detect a provider.
- `messages` (required): Array of message objects
- `stream` (optional): Enable streaming responses
- `tools` (optional): Tools for function calling
- `agent_format` (optional, AbstractCore extension): Tool-call syntax output format for agentic clients (`"auto"|"openai"|"codex"|"qwen3"|"llama3"|"gemma"|"xml"|"passthrough"`). When omitted, the server auto-detects from user-agent + model heuristics.
- `api_key` (optional, AbstractCore extension): Provider API key for per-request authentication. Falls back to environment variables (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `OPENROUTER_API_KEY`, `PORTKEY_API_KEY`)
- `base_url` (optional, AbstractCore extension): Override the provider endpoint (include `/v1` for OpenAI-compatible servers like LM Studio / vLLM / OpenRouter)
- `unload_after` (optional, AbstractCore extension): If `true`, calls `llm.unload_model(model)` after the request completes. Disabled for `ollama/*` unless `ABSTRACTCORE_ALLOW_UNSAFE_UNLOAD_AFTER=1`.
- `thinking` (optional, AbstractCore extension): Unified thinking/reasoning control (`null|"auto"|"on"|"off"` or `"low"|"medium"|"high"` when supported)
- `temperature`, `max_tokens`, `top_p`: Standard LLM parameters
**Example with streaming:**
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
stream = client.chat.completions.create(
model="ollama/qwen3-coder:30b",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
```
#### Provider `base_url` override (AbstractCore extension)
Route a provider to a specific endpoint (useful for remote OpenAI-compatible servers):
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "lmstudio/qwen/qwen3-4b-2507",
"base_url": "http://localhost:1234/v1",
"messages": [{"role": "user", "content": "Hello from a remote LM Studio endpoint"}]
}'
```
#### Per-request `api_key` (AbstractCore extension)
Pass API keys directly in requests (useful for multi-tenant scenarios or OpenRouter):
```bash
# OpenRouter with per-request API key
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openrouter/anthropic/claude-3.5-sonnet",
"messages": [{"role": "user", "content": "Hello!"}],
"api_key": "sk-or-v1-your-openrouter-key"
}'
# OpenAI-compatible endpoint with custom auth
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai-compatible/my-model",
"messages": [{"role": "user", "content": "Hello!"}],
"api_key": "your-api-key",
"base_url": "https://my-custom-endpoint.com/v1"
}'
```
If `api_key` is not provided, AbstractCore falls back to environment variables.
### Media generation endpoints (optional)
AbstractCore Server can optionally expose OpenAI-compatible **image generation** and **audio** endpoints.
Important notes:
- These are **interoperability-first** endpoints (return `b64_json` or raw bytes), not an artifact-first durability contract.
- If the required plugin/backend is not available, the server returns `501` with actionable messaging.
#### Images (generate/edit) — requires `abstractvision`
Endpoints:
- `POST /v1/images/generations`
- `POST /v1/images/edits`
Install:
```bash
pip install "abstractcore[server]"
pip install abstractvision
```
#### Audio (STT/TTS) — requires an audio/voice capability plugin (typically `abstractvoice`)
Endpoints:
- `POST /v1/audio/transcriptions` (multipart; `file=...`)
- `POST /v1/audio/speech` (json; `input=...`, optional `voice`, optional `format`)
Install:
```bash
pip install "abstractcore[server]"
pip install abstractvoice
```
Notes:
- `/v1/audio/transcriptions` requires `python-multipart` for form parsing (included in the server extra).
Examples:
```bash
# Speech-to-text (STT)
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F "file=@speech.wav" \
-F "language=en"
# Text-to-speech (TTS)
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input":"Hello!","format":"wav"}' \
--output hello.wav
```
If you want to “ask a model about an audio file”, prefer one of:
- Run STT first (`/v1/audio/transcriptions`) then send the transcript to `POST /v1/chat/completions`, or
- Configure the server’s default audio strategy (`config.audio.strategy`) to enable STT fallback for audio attachments, then attach audio in chat requests.
### Multimodal Requests (Images, Documents, Files)
AbstractCore server supports comprehensive file attachments using OpenAI-compatible multimodal message format, plus AbstractCore's convenient `@filename` syntax.
#### Supported File Types
- **Images**: PNG, JPEG, GIF, WEBP, BMP, TIFF
- **Documents**: PDF, DOCX, XLSX, PPTX
- **Data/Text**: CSV, TSV, TXT, MD, JSON, XML
- **Size Limits**: 10MB per file, 32MB total per request
#### Method 1: @filename Syntax (AbstractCore Extension)
Simple syntax that works with all providers:
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{"role": "user", "content": "What is in this document? @/path/to/report.pdf"}
]
}'
```
#### Method 2: OpenAI Vision API Format (Image URLs)
Standard OpenAI format for images:
```json
{
"model": "anthropic/claude-haiku-4-5",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}
```
**Base64 Images:**
```json
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."
}
}
```
#### Method 3: OpenAI File Format (Forward-Compatible)
AbstractCore supports OpenAI's planned file format with simplified structure (consistent with image_url):
**File URL Format (Recommended - Same Pattern as image_url):**
```json
{
"model": "ollama/qwen3:4b",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this document"},
{
"type": "file",
"file_url": {
"url": "https://example.com/documents/report.pdf"
}
}
]
}
]
}
```
**Local File Path:**
```json
{
"type": "file",
"file_url": {
"url": "/Users/username/documents/data.csv"
}
}
```
**Base64 Data URL:**
```json
{
"type": "file",
"file_url": {
"url": "data:application/pdf;base64,JVBERi0xLjQKMSAwIG9iago model", "message": "Field required", "type": "missing"},
{"field": "body -> messages", "message": "Field required", "type": "missing"}
] | client=127.0.0.1
📋 Request Body (Validation Error) | body={"invalid": "data"}
```
**Request/Response Tracking:**
- Full HTTP request details (method, URL, headers, client IP)
- Response status codes and processing times
- Structured JSON logging for machine processing
**Log Files:**
- `logs/abstractcore_TIMESTAMP.log` - Structured events
- `logs/YYYYMMDD-payloads.jsonl` - Full request bodies
- `logs/verbatim_TIMESTAMP.jsonl` - Complete I/O
**Useful Commands:**
```bash
# Find errors
grep '"level": "error"' logs/abstractcore_*.log
# Track token usage
cat logs/verbatim_*.jsonl | jq '.metadata.tokens | .input + .output' | \
awk '{sum+=$1} END {print "Total:", sum}'
# Monitor specific model
grep '"model": "qwen3-coder:30b"' logs/verbatim_*.jsonl
```
## Common Patterns
### Multi-Provider Fallback
```python
import requests
providers = [
"ollama/qwen3-coder:30b",
"openai/gpt-4o-mini",
"anthropic/claude-haiku-4-5"
]
def generate_with_fallback(prompt):
for model in providers:
try:
response = requests.post(
"http://localhost:8000/v1/chat/completions",
json={"model": model, "messages": [{"role": "user", "content": prompt}]},
timeout=30
)
if response.status_code == 200:
return response.json()
except Exception:
continue
raise Exception("All providers failed")
```
### Local Model Gateway
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-coder:30b
# Use via AbstractCore server
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3-coder:30b",
"messages": [{"role": "user", "content": "Write a Python function"}]
}'
```
---
## Troubleshooting
### Server Won't Start
```bash
# Check port availability
lsof -i :8000
# Use different port
uvicorn abstractcore.server.app:app --port 3000
```
### No Models Available
```bash
# Check providers
curl http://localhost:8000/providers
# Check API keys
echo $OPENAI_API_KEY
# Start Ollama
ollama serve
ollama list
```
### Authentication Errors
```bash
# Set API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Restart server after setting keys
```
---
## Why AbstractCore Server?
- **Universal**: One API for all providers
- **OpenAI Compatible**: Drop-in replacement
- **Simple**: Clean, focused endpoints
- **Fast**: Lightweight, high-performance
- **Debuggable**: Comprehensive logging
- **CLI Ready**: Codex, Gemini CLI, Crush support
- **Production Ready**: Docker, multi-worker, health checks
---
## Related Documentation
- **Getting Started** - Core library quick start
- **Architecture** - System architecture including server
- **Python API Reference** - Core library API
- **Embeddings Guide** - Embeddings deep dive
- **Troubleshooting** - Common issues and solutions
---
**AbstractCore Server** - One server, all models, any client.
---
### Inlined: `docs/endpoint.md`
# Endpoint (single-model `/v1` server)
`abstractcore-endpoint` runs a **single-model** OpenAI-compatible server.
Unlike the multi-provider gateway (Server), this endpoint loads **one** `provider+model` once per worker process and reuses it across requests. It’s useful when you want to host a local backend (for example HF GGUF or MLX) as a stable `/v1` endpoint.
Source: `abstractcore/endpoint/app.py` (entrypoint: `abstractcore-endpoint`).
## When to use this vs the gateway
- Use **Server** when you want `model="provider/model"` routing across many providers/models from one gateway process.
- Use **Endpoint** when you want a dedicated “one worker = one model” process (simpler performance characteristics; fewer per-request initialization costs).
## Install
```bash
pip install "abstractcore[server]"
```
Then install the provider extra you need:
```bash
pip install "abstractcore[mlx]" # Apple Silicon local inference
pip install "abstractcore[huggingface]" # Transformers / torch / llama-cpp-python (heavy)
```
## Run
```bash
# CLI flags
abstractcore-endpoint --provider mlx --model mlx-community/Qwen3-4B --host 0.0.0.0 --port 8001
# Or via env vars
export ABSTRACTENDPOINT_PROVIDER=mlx
export ABSTRACTENDPOINT_MODEL=mlx-community/Qwen3-4B
export ABSTRACTENDPOINT_HOST=0.0.0.0
export ABSTRACTENDPOINT_PORT=8001
abstractcore-endpoint
```
Health check:
```bash
curl http://localhost:8001/health
```
## Use with an OpenAI-compatible client
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8001/v1", api_key="unused")
resp = client.chat.completions.create(
model="anything", # ignored/validated in single-model mode
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)
```
## Prompt cache control plane (optional)
If the underlying provider exposes prompt-cache controls, the endpoint also exposes a small control plane under `/acore/prompt_cache/*` (see `abstractcore/endpoint/app.py`):
- `GET /acore/prompt_cache/stats`
- `POST /acore/prompt_cache/set`
- `POST /acore/prompt_cache/update`
- `POST /acore/prompt_cache/fork`
- `POST /acore/prompt_cache/clear`
- `POST /acore/prompt_cache/prepare_modules`
For caching concepts, see Session Management and Architecture.
---
### Inlined: `docs/troubleshooting.md`
# AbstractCore Troubleshooting Guide
Complete troubleshooting guide for AbstractCore core library and server, including common mistakes and how to avoid them.
## Table of Contents
- Common Mistakes to Avoid
- Quick Diagnosis
- Installation Issues
- Core Library Issues
- Server Issues
- Provider-Specific Issues
- Performance Issues
- Best Practices
- Debug Techniques
---
## Common Mistakes to Avoid
Understanding common pitfalls helps prevent issues before they occur.
### Top mistakes (fast fixes)
1. **Incorrect provider configuration**
- *Symptom*: Authentication failures, no model response
- *Quick Fix*: Set API keys via environment variables (or persist them with `abstractcore --set-api-key ...`)
- See: Authentication Errors
2. **Not handling tool calls**
- *Symptom*: Tools not executing, streaming interruptions
- *Quick Fix*: Use `@tool` decorator and handle tool calls properly
- See: Tool Calls Not Working
3. **Missing provider extras**
- *Symptom*: `ModuleNotFoundError` for providers
- *Quick Fix*: Install provider-specific packages with `pip install "abstractcore[provider]"`
- See: ModuleNotFoundError
4. **LM Studio server not enabled**
- *Symptom*: Connection refused, no response from LM Studio
- *Quick Fix*: Enable "Status: Running" toggle in LM Studio GUI
- See: LM Studio Server Not Enabled
5. **Context length too small (LM Studio/Ollama)**
- *Symptom*: 400 Bad Request, truncated responses, errors with long inputs
- *Quick Fix*: Set "Default Context Length" to "Model Maximum" in LM Studio
- See: Context Length Too Small
### Common Mistake Patterns
#### Mistake: Missing or Incorrect API Keys
**You'll See:**
- `ProviderAPIError: Authentication failed`
- No response from the model
- Cryptic error messages about credentials
**Why This Happens:**
- API keys not set as environment variables
- Whitespace or copying errors in key
- Incorrect key permissions or expired credentials
**Solution:** See Authentication Errors for complete fix.
**Prevention:**
- Use environment variables for sensitive credentials
- Store keys in `.env` files (add to `.gitignore`)
- Regularly rotate and update API keys
- Use secret management tools for production
#### Mistake: Incorrect Tool Call Handling
**You'll See:**
- Tools not executing during generation
- Partial or missing tool call results
- Streaming interruptions
**Why This Happens:**
- Not using `@tool` decorator
- Incorrect tool definition format
- Not handling tool responses
**Solution:**
```python
from abstractcore import create_llm, tool
# Use @tool decorator for automatic tool definition
@tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"Weather in {city}: sunny, 72°F"
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate(
"What's the weather in Tokyo?",
tools=[get_weather] # Pass decorated function directly
)
```
**Prevention:**
- Always use `@tool` decorator for automatic tool definitions
- Use type hints for all parameters
- Add clear docstrings for tool descriptions
- Handle tool execution errors gracefully
- See: Tool Calls Not Working
#### Mistake: Overlooking Error Handling
**You'll See:**
- Unhandled exceptions
- Silent failures in tool or generation calls
- Unexpected application crashes
**Why This Happens:**
- Not catching provider-specific exceptions
- Assuming 100% reliability of LLM responses
- No retry or fallback mechanisms
**Solution:**
```python
from abstractcore import create_llm
from abstractcore.exceptions import ProviderAPIError, RateLimitError
providers = [
("openai", "gpt-4o-mini"),
("anthropic", "claude-haiku-4-5"),
("ollama", "qwen3-coder:30b")
]
def generate_with_fallback(prompt):
for provider, model in providers:
try:
llm = create_llm(provider, model=model)
return llm.generate(prompt)
except (ProviderAPIError, RateLimitError) as e:
print(f"Failed with {provider}: {e}")
continue
raise Exception("All providers failed")
```
**Prevention:**
- Always use try/except blocks
- Implement provider fallback strategies
- Log and monitor errors systematically
- Design for graceful degradation
#### Mistake: Memory and Performance Bottlenecks
**You'll See:**
- High memory consumption
- Slow response times
- Out-of-memory errors during long generations
**Why This Happens:**
- Not managing token limits
- Generating overly long responses
- Inefficient streaming configurations
**Solution:**
```python
# Optimize memory and performance
response = llm.generate(
"Complex task",
max_tokens=1000, # Limit response length
timeout=30, # Set reasonable timeout
temperature=0.7 # Control creativity/randomness
)
```
**Prevention:**
- Always set `max_tokens`
- Use streaming for long responses
- Monitor memory usage in production
- See: Performance Issues
#### Mistake: Hardcoding Credentials
**You'll See:**
- Exposed API keys in code
- Inflexible configuration management
- Security vulnerabilities
**Why This Happens:**
- Copying example code directly
- Not understanding configuration best practices
- Lack of environment-based configuration
**Solution:**
```python
import os
from abstractcore import create_llm
# Best practice: Load from environment
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
DEFAULT_MODEL = os.getenv('DEFAULT_LLM_MODEL', 'gpt-4o-mini')
llm = create_llm(
"openai",
model=DEFAULT_MODEL,
api_key=OPENAI_API_KEY
)
```
**Prevention:**
- Never hardcode API keys or sensitive data
- Use environment variables
- Implement configuration management libraries
- Follow 12-factor app configuration principles
---
## Quick Diagnosis
Run these checks first:
```bash
# Check Python version
python --version # Should be 3.9+
# Check AbstractCore installation
pip show abstractcore
# Test core library
python -c "from abstractcore import create_llm; print('✓ Core library OK')"
# Test server (if installed)
curl http://localhost:8000/health # Should return {"status":"healthy"}
```
---
## Installation Issues
### Issue: ModuleNotFoundError
**Symptoms:**
```
ModuleNotFoundError: No module named .abstractcore.
ModuleNotFoundError: No module named 'openai'
```
**Solutions:**
```bash
# Install AbstractCore
pip install abstractcore
# Install with specific provider
pip install "abstractcore[openai]"
pip install "abstractcore[anthropic]"
# Local OpenAI-compatible servers (Ollama, LMStudio, vLLM, llama.cpp, ...) work with the core install.
# Install the full feature set (pick one)
pip install "abstractcore[all-apple]" # macOS/Apple Silicon (includes MLX, excludes vLLM)
pip install "abstractcore[all-non-mlx]" # Linux/Windows/Intel Mac (excludes MLX and vLLM)
pip install "abstractcore[all-gpu]" # Linux NVIDIA GPU (includes vLLM, excludes MLX)
# Verify installation
pip list | grep abstract
```
### Issue: Dependency Conflicts
**Symptoms:**
```
ERROR: pip's dependency resolver does not currently take into account all the packages...
```
**Solutions:**
```bash
# Create clean environment
python3 -m venv .venv
source .venv/bin/activate # Linux/Mac
# OR
.venv\Scripts\activate # Windows
# Fresh install
pip install --upgrade pip
pip install "abstractcore[all-apple]" # macOS/Apple Silicon
# or: pip install "abstractcore[all-non-mlx]" # Linux/Windows/Intel Mac
# or: pip install "abstractcore[all-gpu]" # Linux NVIDIA GPU
# If still failing, try one provider at a time
pip install "abstractcore[openai]"
```
---
## Core Library Issues
### Issue: Authentication Errors
**Symptoms:**
```
Error: OpenAI API key not found
Error: Authentication failed
Error: Invalid API key
```
**Solutions:**
```bash
# Check if API key is set
echo $OPENAI_API_KEY # Should show your key
echo $ANTHROPIC_API_KEY
# Set API key
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Add to shell profile for persistence
echo 'export OPENAI_API_KEY="sk-..."' >> ~/.bashrc
source ~/.bashrc
# Verify key format
# OpenAI: starts with "sk-"
# Anthropic: starts with "sk-ant-"
# Test authentication
python -c "from abstractcore import create_llm; llm = create_llm('openai', model='gpt-4o-mini'); print(llm.generate('test').content)"
```
### Issue: Model Not Found
**Symptoms:**
```
Error: Model 'qwen3-coder:30b' not found
Error: Unsupported model
```
**Solutions:**
**For Ollama:**
```bash
# Check available models
ollama list
# Pull missing model
ollama pull qwen3-coder:30b
# Verify Ollama is running
ollama serve
```
**For LMStudio:**
```bash
# Check LMStudio server
curl http://localhost:1234/v1/models
# In LMStudio GUI:
# 1. Go to "Local Server" tab
# 2. Select model from dropdown
# 3. Click "Start Server"
```
**For OpenAI/Anthropic:**
```python
# Use correct model names
llm = create_llm("openai", model="gpt-4o-mini") # ✓ Correct
llm = create_llm("openai", model="gpt4") # ✗ Wrong
llm = create_llm("anthropic", model="claude-haiku-4-5") # ✓ Correct
llm = create_llm("anthropic", model="claude-3") # ✗ Wrong
```
### Issue: Connection Errors
**Symptoms:**
```
Connection refused
Timeout error
Network error
```
**Solutions:**
**For Ollama:**
```bash
# Start Ollama service
ollama serve
# Check if running
curl http://localhost:11434/api/tags
# If using custom host
export OLLAMA_HOST="http://localhost:11434"
```
**For LMStudio:**
```bash
# Verify server is running
curl http://localhost:1234/v1/models
# Check port in LMStudio GUI (usually 1234)
```
**For Cloud Providers:**
```bash
# Test network connection
ping api.openai.com
ping api.anthropic.com
# Check proxy settings
echo $HTTP_PROXY
echo $HTTPS_PROXY
# Disable proxy if needed
unset HTTP_PROXY
unset HTTPS_PROXY
```
### Issue: Tool Calls Not Working
**Symptoms:**
- Tools not being called
- Empty tool responses
- Tool format errors
**Solutions:**
```python
from abstractcore import create_llm, tool
# Ensure @tool decorator is used
@tool
def get_weather(city: str) -> str:
"""Get weather for a city."""
return f"Weather in {city}: sunny, 72°F"
# Use tool correctly
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate(
"What's the weather in Paris?",
tools=[get_weather] # Pass as list
)
# Check if tool was called
if hasattr(response, 'tool_calls') and response.tool_calls:
print("Tools were called")
```
---
## Server Issues
### Issue: Server Won't Start
**Symptoms:**
```
Address already in use
Port 8000 is already allocated
```
**Solutions:**
```bash
# Check what's using port 8000
lsof -i :8000 # Linux/Mac
netstat -ano | findstr :8000 # Windows
# Kill process on port
kill -9 $(lsof -t -i:8000) # Linux/Mac
# Use different port
uvicorn abstractcore.server.app:app --port 3000
```
### Issue: Client complains about missing API key
**Symptoms:**
- Your OpenAI-compatible client/CLI refuses to run without an API key (even though your server is local).
**Solutions:**
```bash
# Most OpenAI-compatible clients accept a dummy key for local servers.
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"
# Verify they're set
echo "$OPENAI_BASE_URL"
echo "$OPENAI_API_KEY"
```
### Issue: Server Running but No Response
**Symptoms:**
- curl hangs
- No response from endpoints
- Timeout errors
**Solutions:**
```bash
# Check server is actually running
curl http://localhost:8000/health
# Check server logs
tail -f logs/abstractcore_*.log
# Enable debug mode
export ABSTRACTCORE_DEBUG=true
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000
# Test with simple request
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "test"}]}'
```
### Issue: Models Not Showing
**Symptoms:**
```
curl http://localhost:8000/v1/models returns empty list
```
**Solutions:**
```bash
# Check if providers are configured
curl http://localhost:8000/providers
# Verify provider setup:
# For Ollama
ollama list # Should show models
ollama serve # Make sure it's running
# For OpenAI
echo $OPENAI_API_KEY # Should be set
# For Anthropic
echo $ANTHROPIC_API_KEY # Should be set
# For LMStudio
curl http://localhost:1234/v1/models # Should return models
```
### Issue: Tool Calls Not Working with CLI
**Symptoms:**
- Codex/Crush/Gemini CLI not detecting tools
- Tool format errors in streaming
**Solutions:**
```bash
# AbstractCore Server controls tool-call syntax via `agent_format` (request field) or auto-detection.
# - OpenAI/Codex style: structured tool calls are returned in `tool_calls` fields.
# - Tag-based formats: tool calls are emitted as tagged content for clients that parse from assistant text.
# If you control requests (curl/custom client), force a format with `agent_format`:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3:4b-instruct-2507-q4_K_M",
"messages": [{"role": "user", "content": "Use the tool."}],
"tools": [{"type":"function","function":{"name":"get_weather","description":"...","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
"agent_format": "llama3"
}'
```
See Server (agentic CLI integration) for details and supported formats.
---
## Provider-Specific Issues
### Ollama
**Issue: Ollama not responding**
```bash
# Restart Ollama
pkill ollama
ollama serve
# Check status
curl http://localhost:11434/api/tags
# List models
ollama list
# Pull model if missing
ollama pull qwen3-coder:30b
```
**Issue: Out of memory**
```bash
# Use smaller models
ollama pull gemma3:1b # Only 1GB
ollama pull qwen3:4b-instruct-2507-q4_K_M # 4GB
# Check system memory
free -h # Linux
vm_stat # macOS
# Close other applications
```
### OpenAI
**Issue: Rate limits**
```bash
# Check your rate limits
# https://platform.openai.com/account/rate-limits
# Implement backoff in code
import time
try:
response = llm.generate("prompt")
except RateLimitError:
time.sleep(20) # Wait before retry
```
**Issue: Billing**
```bash
# Check billing dashboard
# https://platform.openai.com/account/billing
# Verify payment method is added
# Check usage limits aren't exceeded
```
### Anthropic
**Issue: API key format**
```bash
# Anthropic keys start with "sk-ant-"
echo $ANTHROPIC_API_KEY # Should start with sk-ant-
# Get key from console
# https://console.anthropic.com/
```
### LMStudio
#### Issue: Connection refused
```bash
# Verify LMStudio server is running
# Check LMStudio GUI shows "Server running"
# Test connection
curl http://localhost:1234/v1/models
# Check port number in LMStudio (usually 1234)
```
#### Issue: LM Studio Server Not Enabled
```bash
# CRITICAL: Ensure LM Studio server is enabled in the GUI
# 1. Open LM Studio application
# 2. Look for "Status: Running" toggle switch in the interface
# 3. Make sure the toggle is switched to "ON" (green background, white handle on right)
# 4. If the toggle shows "OFF", click it to enable the server
# 5. Verify the server is running by checking the status indicator
# Test server availability
curl http://localhost:1234/v1/models
# If still failing, check LM Studio logs for any error messages
```
#### Issue: Context Length Too Small (400 Bad Request, Truncated Responses)
```bash
# Problem: LLM returns 400 Bad Request, truncated output, or errors with long inputs
# Root Cause: Insufficient context length configured for the model or server
# Solution 1: Increase Default Context Length (RECOMMENDED)
# This is the most robust way to ensure all models use maximum available context
# 1. Open LM Studio application
# 2. Go to "App Settings" → "General" tab
# 3. Find "Model Defaults" → "Default Context Length"
# 4. Set dropdown to "Model Maximum" (or highest available value like 131072)
# 5. Restart LM Studio server for changes to take effect
# Solution 2: Increase Context Length per Model (Alternative)
# This method applies context length setting to a specific model
# 1. Open LM Studio application
# 2. Go to "My Models" tab
# 3. Select the specific model you are using
# 4. Look for "Context Length" slider/input (usually under "Load" or "Context" tab)
# 5. Adjust slider to maximum value (e.g., 131072 tokens)
# 6. Reload the model for changes to take effect
# Solution 3: Increase Context Length via API Request (Advanced)
# For Ollama, or if you need to override settings for LM Studio via API
# For Ollama:
ollama run -c
# Example: ollama run llama2 -c 4096
# For LM Studio via API (often handled automatically by AbstractCore):
# Include in request payload:
# {
# "model": "your-model-name",
# "prompt": "Your long prompt here...",
# "options": {
# "num_ctx": 4096 # Or your desired context length
# }
# }
# Verification:
# After adjusting, test with a long prompt that previously failed
# Check server logs for any warnings or errors related to context
```
---
## Performance Issues
### Issue: Slow Responses
**Diagnosis:**
```bash
# Time a request
time python -c "from abstractcore import create_llm; llm = create_llm('ollama', model='qwen3:4b-instruct-2507-q4_K_M'); print(llm.generate('test').content)"
```
**Solutions:**
**Use Faster Models:**
```python
# Faster cloud models
llm = create_llm("openai", model="gpt-4o-mini") # Fast
llm = create_llm("anthropic", model="claude-haiku-4-5") # Fast
# Faster local models
llm = create_llm("ollama", model="gemma3:1b") # Very fast
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M") # Balanced
```
**Enable Streaming:**
```python
# Improves perceived speed
for chunk in llm.generate("Long response", stream=True):
print(chunk.content, end="", flush=True)
```
**Optimize Parameters:**
```python
response = llm.generate(
"prompt",
max_tokens=500, # Limit length
temperature=0.3 # Lower = faster
)
```
### Issue: High Memory Usage
**Solutions:**
```bash
# Use smaller models
ollama pull gemma3:1b # 1GB instead of 30GB
# Close other applications
# For MLX on Mac
# Use 4-bit quantized models
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit")
```
---
## Best Practices
Follow these best practices to avoid issues:
### Configuration Management
- Use environment variables for API keys
- Never commit credentials to version control
- Use `.env` files (add to `.gitignore`)
- Implement configuration validation
- Use secret management in production
### Tool Development
- Always use `@tool` decorator
- Add type hints to all parameters
- Write clear docstrings
- Handle edge cases and errors
- Test tools independently first
### Error Handling
- Always use try/except blocks
- Implement provider fallback strategies
- Log errors systematically
- Design for graceful degradation
- Monitor error rates in production
### Performance
- Always set `max_tokens`
- Use streaming for long responses
- Batch similar requests when possible
- Monitor memory usage
- Profile slow operations
### Security
- Validate all user inputs
- Sanitize file paths and commands
- Use least privilege principle
- Regular security audits
- Keep dependencies updated
---
## Debug Techniques
### Enable Debug Logging
**Core Library:**
```python
import logging
logging.basicConfig(level=logging.DEBUG)
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
```
**Server:**
```bash
# Enable debug mode
export ABSTRACTCORE_DEBUG=true
# Start with debug logging
uvicorn abstractcore.server.app:app --log-level debug
# Monitor logs
tail -f logs/abstractcore_*.log
```
### Analyze Logs
```bash
# Find errors
grep '"level": "error"' logs/abstractcore_*.log
# Track specific request
grep "req_abc123" logs/abstractcore_*.log
# Monitor latency
cat logs/verbatim_*.jsonl | jq '.metadata.latency_ms'
# Token usage
cat logs/verbatim_*.jsonl | jq '.metadata.tokens | .input + .output' | \
awk '{sum+=$1} END {print "Total:", sum}'
```
### Test in Isolation
```python
# Test provider directly
from abstractcore import create_llm
try:
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate("Hello")
print(f"✓ Success: {response.content}")
except Exception as e:
print(f"✗ Error: {e}")
```
### Collect Debug Information
```bash
# Create debug report
echo "=== System ===" > debug_report.txt
uname -a >> debug_report.txt
python --version >> debug_report.txt
echo "=== Packages ===" >> debug_report.txt
pip freeze | grep -E "abstract|openai|anthropic" >> debug_report.txt
echo "=== Environment ===" >> debug_report.txt
env | grep -E "ABSTRACT|OPENAI|ANTHROPIC|OLLAMA" >> debug_report.txt
echo "=== Tests ===" >> debug_report.txt
python -c "from abstractcore import create_llm; print('Core library: OK')" >> debug_report.txt 2>&1
curl http://localhost:8000/health >> debug_report.txt 2>&1
cat debug_report.txt
```
---
## Common Error Messages
| Error | Meaning | Solution |
|-------|---------|----------|
| `ModuleNotFoundError` | Package not installed | `pip install abstractcore` (then add provider extras as needed) |
| `Authentication Error` | Invalid API key | Check API key environment variable |
| `Connection refused` | Service not running | Start Ollama/LMStudio/server |
| `LM Studio connection failed` | LM Studio server not enabled | Enable "Status: Running" toggle in LM Studio GUI |
| `400 Bad Request` (LM Studio) | Context length too small | Increase Default Context Length to "Model Maximum" in LM Studio |
| `Model not found` | Model unavailable | Pull model or check name |
| `Rate limit exceeded` | Too many requests | Wait or upgrade plan |
| `Timeout` | Request took too long | Use smaller model or increase timeout |
| `Out of memory` | Insufficient RAM | Use smaller model |
| `Port already in use` | Another process using port | Kill process or use different port |
---
## Getting Help
If you're still stuck:
1. **Check Documentation:**
- Getting Started - Core library quick start
- Prerequisites - Provider setup
- Python API Reference - Core library API
- Server Guide - Server setup
- Server API Reference - REST API endpoints
2. **Enable Debug Mode:**
```bash
export ABSTRACTCORE_DEBUG=true
```
3. **Collect Information:**
- Error messages
- Debug logs
- System information
- Steps to reproduce
4. **Community Support:**
- GitHub Issues: github.com/lpalbou/AbstractCore/issues
- GitHub Discussions: github.com/lpalbou/AbstractCore/discussions
---
**Remember**: Most issues are configuration-related. Double-check environment variables, API keys, and that services are running before diving deep into debugging.
---
### Inlined: `docs/faq.md`
# FAQ
## What do I get with `pip install abstractcore`?
The default install is intentionally lightweight. It includes the core API (`create_llm`, `BasicSession`, tool definitions, structured output plumbing) and uses only small dependencies (`pydantic`, `httpx`).
Anything heavy (provider SDKs, torch/transformers, PDF parsing, embeddings models, web scraping deps, the HTTP server) is behind install extras. See Getting Started and Prerequisites.
## Which extra do I need for my provider?
- OpenAI: `pip install "abstractcore[openai]"`
- Anthropic: `pip install "abstractcore[anthropic]"`
- HuggingFace (transformers/torch; heavy): `pip install "abstractcore[huggingface]"`
- MLX (Apple Silicon; heavy): `pip install "abstractcore[mlx]"`
- vLLM integration (GPU; heavy): `pip install "abstractcore[vllm]"`
These providers work with the core install (no provider extra): `ollama`, `lmstudio`, `openrouter`, `openai-compatible`.
## How do I combine extras?
```bash
# zsh: keep quotes
pip install "abstractcore[openai,media,tools]"
```
For “turnkey” installs, see `README.md` (`all-apple`, `all-non-mlx`, `all-gpu`).
## Why did my install pull `torch` / take a long time?
You probably installed a heavy extra (most commonly `abstractcore[huggingface]`, `abstractcore[mlx]`, or `abstractcore[all-*]`). The core install (`pip install abstractcore`) does not include torch/transformers.
## What’s the difference between “provider” and “model”?
- **Provider**: a backend adapter (`openai`, `anthropic`, `ollama`, `lmstudio`, …)
- **Model**: a provider-specific model name (for example `gpt-4o-mini` or `qwen3:4b-instruct-2507-q4_K_M`)
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
```
## How does AbstractCore relate to AbstractFramework / AbstractRuntime?
AbstractCore is one of the core packages in the **AbstractFramework** ecosystem:
- AbstractFramework (umbrella): https://github.com/lpalbou/AbstractFramework
- AbstractCore (this package): unified LLM interface + cross-provider infrastructure
- AbstractRuntime: durable tool/effect execution, workflows, and state persistence — https://github.com/lpalbou/abstractruntime
AbstractCore is usable standalone. In the ecosystem, the common pattern is:
- AbstractCore produces `resp.content` + `resp.tool_calls`
- a runtime (for example AbstractRuntime) decides whether/how to execute tools (policy, sandboxing, retries, persistence)
See Architecture and Tool Calling.
## How do I connect to a local server (Ollama / LMStudio / vLLM / llama.cpp / LocalAI)?
Use the matching provider and set `base_url` (or the provider’s base-url env var).
We recommend open-source/local providers first; cloud and gateway providers are optional.
Examples:
```python
from abstractcore import create_llm
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M", base_url="http://localhost:11434")
llm = create_llm("lmstudio", model="qwen/qwen3-4b-2507", base_url="http://localhost:1234/v1")
llm = create_llm("vllm", model="Qwen/Qwen3-Coder-30B-A3B-Instruct", base_url="http://localhost:8000/v1")
```
For a generic OpenAI-compatible endpoint, use `openai-compatible`:
```python
llm = create_llm("openai-compatible", model="my-model", base_url="http://localhost:1234/v1")
```
See Prerequisites for setup details and env var names.
## Why do gateway providers return “unsupported parameter” errors (temperature/max_tokens)?
Gateways like Portkey and OpenRouter forward your payload to the routed backend model, and strict families (for example OpenAI reasoning models like gpt-5/o1) reject unsupported parameters.
In AbstractCore’s gateway providers:
- Portkey uses `PORTKEY_API_KEY` and `PORTKEY_CONFIG` (config id) for routing.
- Optional params (`temperature`, `top_p`, `max_output_tokens`) are only sent when you explicitly set them.
- Reasoning families (gpt-5/o1) drop `temperature`/`top_p` and use `max_completion_tokens` instead of `max_tokens`.
If you still see errors, confirm:
- You aren’t mixing routing modes (config vs virtual key vs provider-direct).
- You’re not injecting parameters via Portkey config overrides that the backend rejects.
## How do I set API keys and defaults?
You can use environment variables, or persist settings via the config CLI:
```bash
abstractcore --config
abstractcore --set-api-key openai sk-...
abstractcore --set-api-key anthropic sk-ant-...
abstractcore --status
```
Config is stored in `~/.abstractcore/config/abstractcore.json`. See Centralized Config.
## Why aren’t tools executed automatically?
By default, AbstractCore runs in **pass-through** mode (`execute_tools=False`): it returns tool calls in `resp.tool_calls`, and your host/runtime decides whether/how to execute them.
Automatic execution (`execute_tools=True`) exists but is deprecated for most use cases. See Tool Calling.
## What’s the difference between `web_search`, `skim_websearch`, `skim_url`, and `fetch_url`?
These built-in web tools live in `abstractcore.tools.common_tools` and require:
```bash
pip install "abstractcore[tools]"
```
- `web_search`: fuller DuckDuckGo result set (good when you want breadth or more options).
- `skim_websearch`: compact/filtered search results (good default for agents to keep prompts smaller). Defaults to 5 results and truncates long snippets.
- `skim_url`: fast URL triage (fetches only a prefix and extracts lightweight metadata + a short preview). Defaults: `max_bytes=200_000`, `max_preview_chars=1200`, `max_headings=8`.
- `fetch_url`: full fetch + parsing for text-first types (HTML→Markdown, JSON/XML/text). For PDFs/images/other binaries it returns metadata and optional previews; it does **not** do full PDF text extraction. It downloads up to 10MB by default; use `include_full_content=False` for smaller outputs.
Recommended workflow: `skim_websearch` → `skim_url` → `fetch_url` (use `include_full_content=False` when you want a smaller `fetch_url` output).
## How do I preserve tool-call markup in `response.content` for agentic CLIs?
Use tool-call syntax rewriting:
- Python: pass `tool_call_tags=...` to `generate()` / `agenerate()`
- Server: set `agent_format` in requests
See Tool Syntax Rewriting.
## How do I get structured output (typed objects) instead of parsing JSON?
Pass a Pydantic model via `response_model=...`:
```python
from pydantic import BaseModel
from abstractcore import create_llm
class Answer(BaseModel):
title: str
bullets: list[str]
llm = create_llm("openai", model="gpt-4o-mini")
result = llm.generate("Summarize HTTP/3 in 3 bullets.", response_model=Answer)
```
See Structured Output.
## Why does structured output retry or fail validation?
Structured output is validated against your schema. If validation fails, AbstractCore retries with feedback (up to the configured retry limit). Common fixes:
- simplify schemas (fewer nested structures; fewer strict constraints)
- tighten prompts (be explicit about allowed values and ranges)
- increase timeouts for slow backends
See Structured Output and Troubleshooting.
## Why do PDFs / Office docs / images not work?
Those require the media extra:
```bash
pip install "abstractcore[media]"
```
Then pass `media=[...]` to `generate()` or use the media pipeline. See Media Handling.
## How do I attach audio or video?
Audio and video attachments are supported via `media=[...]`, but they are **policy-driven** by design:
- **Audio** defaults to `audio_policy="native_only"` (fails loudly unless the model supports native audio input).
- **Video** defaults to `video_policy="auto"` (native video when supported; otherwise sample frames and route through image/vision handling). Frame sampling requires `ffmpeg`/`ffprobe`.
Speech-to-text fallback for audio (`audio_policy="speech_to_text"` or `"auto"`) typically requires installing `abstractvoice` (capability plugin).
You can set defaults via the config CLI:
```bash
abstractcore --set-audio-strategy auto
abstractcore --set-video-strategy auto
abstractcore --set-video-max-frames 6
```
See:
- Media Handling (policies + fallbacks)
- Vision Capabilities (image/video input + fallback behavior)
## How do I do speech-to-text (STT) or text-to-speech (TTS)?
Install the optional capability plugin package:
```bash
pip install abstractvoice
```
Then use the deterministic capability surfaces:
```python
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini") # provider/model is only for LLM calls; STT/TTS are deterministic
print(llm.capabilities.status()) # shows which capability backends are available/selected
wav_bytes = llm.voice.tts("Hello", format="wav")
text = llm.audio.transcribe("speech.wav")
```
If you run the optional HTTP server, you can also use OpenAI-compatible endpoints:
- `POST /v1/audio/transcriptions`
- `POST /v1/audio/speech`
See: Server and Capabilities.
## How do I generate or edit images?
Generative vision is intentionally not part of AbstractCore’s default install. Use `abstractvision`:
```bash
pip install abstractvision
```
You can use it through AbstractCore’s `llm.vision.*` capability plugin surface (typically configured via an OpenAI-compatible images endpoint), or through AbstractCore Server’s optional endpoints:
- `POST /v1/images/generations`
- `POST /v1/images/edits`
See: Server, Capabilities, and `abstractvision/docs/reference/abstractcore-integration.md` (in the AbstractVision repo).
## What are “glyphs” and what do they require?
Glyph visual-text compression is an optional feature for long documents. Install:
- `pip install "abstractcore[compression]"` (renderer)
- plus `pip install "abstractcore[media]"` if you want PDF extraction support
See Glyph Visual-Text Compression.
## How do I use embeddings?
Embeddings are opt-in:
```bash
pip install "abstractcore[embeddings]"
```
Then import from the embeddings module:
```python
from abstractcore.embeddings import EmbeddingManager
```
See Embeddings.
## Do I need the HTTP server?
No. The server is optional and is mainly for:
- exposing one OpenAI-compatible `/v1` endpoint that can route to multiple providers/models
- integrating with OpenAI-compatible clients and agentic CLIs
Install and run:
```bash
pip install "abstractcore[server]"
python -m abstractcore.server.app
```
See Server.
## Where are logs and traces?
- Logging (console/file) is configured via the config CLI and config file. See Structured Logging.
- Interaction tracing is opt-in (`enable_tracing=True`). See Interaction Tracing.
## I’m getting HTTP timeouts. What should I change?
- Per-provider: pass `timeout=...` to `create_llm(...)` (`timeout=None` means unlimited).
- Process-wide default: set `abstractcore --set-default-timeout 0` (0 = unlimited), or set a larger value.
- Some CLI apps have their own `--timeout` flags; run `--help` for the exact behavior.
See Troubleshooting and Centralized Config.
## HuggingFace won’t download models — why?
The HuggingFace provider respects AbstractCore’s offline-first settings. If you want HuggingFace to fetch from the Hub, update `~/.abstractcore/config/abstractcore.json`:
- set `"offline_first": false`
- set `"force_local_files_only": false`
Restart your Python process after changing this (the provider reads these settings at import time).
## Is AbstractCore a full agent/RAG framework?
AbstractCore focuses on provider abstraction + infrastructure (tools, structured output, media handling, tracing). It does not ship a full RAG pipeline or multi-step agent orchestration. See Capabilities.
---
### Inlined: `docs/architecture.md`
# AbstractCore Architecture
AbstractCore provides a unified interface to major LLM providers with production-oriented reliability features. This document explains how it works internally and why it's designed this way.
If you're new to AbstractCore and want to start building quickly, read:
- `docs/getting-started.md`
- `docs/api.md`
Related docs (user-facing):
- Media inputs (images/audio/video + documents): `docs/media-handling-system.md`
- Vision input + fallback: `docs/vision-capabilities.md`
- Capability plugins (voice/audio/vision): `docs/capabilities.md`
- OpenAI-compatible gateway server: `docs/server.md`
- Single-model OpenAI-compatible endpoint: `docs/endpoint.md`
- Tool calling semantics (passthrough vs execution): `docs/tool-calling.md`
## System Overview
AbstractCore operates as a Python library and can also be exposed via **optional OpenAI-compatible HTTP servers**:
- **Gateway server (multi-provider)**: `abstractcore.server.app` (docs: `docs/server.md`)
- **Endpoint server (single-model)**: `abstractcore.endpoint.app` (docs: `docs/endpoint.md`)
```mermaid
graph TD
A[Your Application] --> B[AbstractCore API]
AA[HTTP Clients] --> BB[AbstractCore Server]
BB --> B
B --> C[Provider Interface]
C --> D[Event System]
C --> E[Tool System]
C --> F[Retry System]
C --> G[Provider Implementations]
G --> H[OpenAI Provider]
G --> HH[OpenAI-Compatible Provider]
G --> I[Anthropic Provider]
G --> J[Ollama Provider]
G --> K[MLX Provider]
G --> L[LMStudio Provider]
G --> M[HuggingFace Provider]
G --> MM[vLLM Provider]
G --> MN[OpenRouter Provider]
G --> MP[Portkey Provider]
H --> N[OpenAI API]
HH --> NN[OpenAI-Compatible /v1 Endpoint]
I --> O[Anthropic API]
J --> P[Ollama Server]
K --> Q[MLX Models]
L --> R[LMStudio Server]
M --> S[HuggingFace Models]
MM --> RR[vLLM Server]
MN --> RO[OpenRouter API]
MP --> RP[Portkey API Gateway]
style B fill:#e1f5fe
style BB fill:#4caf50
style C fill:#f3e5f5
style G fill:#fff3e0
```
## Design Principles
### 1. Provider Abstraction
**Goal**: Same interface for all providers
**Implementation**: Common interface with provider-specific implementations
### 2. Production Reliability
**Goal**: Handle real-world failures gracefully
**Implementation**: Built-in retry logic, circuit breakers, comprehensive error handling
### 3. Universal Tool Support
**Goal**: Tools work everywhere, even with providers that don't support them natively
**Implementation**: Native support where available, intelligent prompting as fallback
### 4. Simplicity Over Features
**Goal**: Clean, focused API that's easy to understand
**Implementation**: Minimal core with clear extension points
### 5. Optional HTTP Access
**Goal**: Flexible deployment as library or server
**Implementation**: OpenAI-compatible REST API built on core library
## Core Components
### 1. Factory Pattern (`create_llm`)
The main entry point uses the factory pattern for clean provider instantiation:
```mermaid
graph LR
A[create_llm] --> B{Provider Type}
B --> C[OpenAI Provider]
B --> D[Anthropic Provider]
B --> E[Ollama Provider]
B --> F[Other Providers...]
C --> G[Configured Instance]
D --> G
E --> G
F --> G
style A fill:#4caf50
style G fill:#2196f3
```
```python
from abstractcore import create_llm
# Factory creates the right provider with proper configuration
llm = create_llm("openai", model="gpt-4o-mini", temperature=0.7)
# OpenAI-compatible /v1 endpoints (LMStudio, vLLM, custom proxies)
llm_local = create_llm("lmstudio", model="qwen/qwen3-4b-2507", base_url="http://localhost:1234/v1")
llm_openrouter = create_llm("openrouter", model="openai/gpt-4o-mini") # requires OPENROUTER_API_KEY
llm_portkey = create_llm("portkey", model="gpt-4o-mini", config_id="pcfg_...") # requires PORTKEY_API_KEY + PORTKEY_CONFIG
```
Gateway providers (OpenRouter/Portkey) route to external backends; AbstractCore forwards only **explicit** generation parameters to avoid sending defaults that strict backends reject.
### 2. Provider Interface
All providers implement `AbstractCoreInterface` (see `abstractcore/core/interface.py`):
```python
class AbstractCoreInterface(ABC):
@abstractmethod
def generate(
self,
prompt: str,
messages: Optional[List[Dict[str, str]]] = None,
system_prompt: Optional[str] = None,
tools: Optional[List[Dict[str, Any]]] = None,
media: Optional[List[Union[str, Dict[str, Any], "MediaContent"]]] = None,
stream: bool = False,
thinking: Optional[Union[bool, str]] = None,
**kwargs,
) -> Union[GenerateResponse, Iterator[GenerateResponse]]:
"""Generate a response (or a stream of chunks)."""
@abstractmethod
def get_capabilities(self) -> List[str]:
"""Get provider capabilities"""
@abstractmethod
def unload_model(self, model_name: str) -> None:
"""Unload/cleanup resources for a specific model (best-effort)."""
```
This ensures:
- **Consistency**: Same methods across all providers
- **Reliability**: Standardized error handling
- **Extensibility**: Easy to add new providers
- **Memory Management**: Explicit control over model lifecycle
#### Response Normalization (Model Output Cleanup)
`BaseProvider` also applies **asset-driven response normalization** so downstream code sees clean, consistent output across providers:
- **Output wrappers**: Strip configured leading/trailing wrapper tokens (e.g., GLM `<|begin_of_box|>…<|end_of_box|>`)
- **Harmony transcripts (GPT-OSS)**: Extract `<|channel|>final` into `GenerateResponse.content` and capture `<|channel|>analysis` as `GenerateResponse.metadata["reasoning"]` (non-streaming)
- **Thinking tags**: Extract inline `...` blocks into `GenerateResponse.metadata["reasoning"]` (when configured)
**Why this belongs in `BaseProvider` (even for streaming):**
- These artifacts are **model/template-specific**, not provider-specific (the same model can be served via Ollama, vLLM, LMStudio, HF, or MLX)
- In streaming mode, wrappers often appear in the first/last chunks; stripping them incrementally avoids leaking markup into UIs and tool parsers without buffering the full response
Configuration comes from `abstractcore/assets/architecture_formats.json` and `abstractcore/assets/model_capabilities.json`; implementation lives in `abstractcore/architectures/response_postprocessing.py`.
#### Memory Management
The `unload_model(model_name)` method is a **best-effort resource cleanup hook**.
- **API providers** (OpenAI, Anthropic): typically a no-op (safe to call).
- **Local / self-hosted providers**: behavior is provider-specific:
- some can actively release memory (or request server-side eviction),
- others can only close client connections and rely on server-side TTL/auto-eviction.
- Example: **LMStudio** does not expose an explicit “unload model” API; `unload_model()` closes HTTP clients and relies on LMStudio TTL/auto-evict.
In the OpenAI-compatible AbstractCore server (`abstractcore.server.app`), requests can set `unload_after` (default `false`)
to call `llm.unload_model(model)` after the request completes. For providers that can unload shared server state (e.g. Ollama),
this is disabled by default and must be explicitly enabled by the server operator.
```python
# Load model, use it, then free memory
llm = create_llm("ollama", model="large-model")
response = llm.generate("Hello")
llm.unload_model(llm.model) # Explicitly free memory
del llm
```
This is critical for:
- Test suites that load multiple models sequentially
- Memory-constrained environments (<32GB RAM)
- Production systems serving different models sequentially
### 3. Media Handling System
AbstractCore includes a policy-driven media handling system that enables file attachments across all providers:
```mermaid
graph TD
A[User Input: @file.pdf] --> B[MessagePreprocessor]
B --> C[Extract Files + Clean Text]
C --> D[AutoMediaHandler]
D --> E{File Type Detection}
E -->|Images| F[ImageProcessor]
E -->|PDFs| G[PDFProcessor]
E -->|Office| H[OfficeProcessor]
E -->|Text/CSV| I[TextProcessor]
F --> J[MediaContent Objects]
G --> J
H --> J
I --> J
J --> K{Provider Type}
K -->|OpenAI| L[OpenAI Format]
K -->|Anthropic| M[Anthropic Format]
K -->|Local| N[Text Embedding]
L --> O[Provider API Call]
M --> O
N --> O
style D fill:#4caf50
style J fill:#2196f3
style O fill:#ff9800
```
#### Media System Architecture
**Core Components:**
- **MessagePreprocessor**: Parses `@filename` syntax in CLI and extracts file references
- **AutoMediaHandler**: Intelligent coordinator that selects appropriate processors
- **Specialized Processors**:
- `ImageProcessor` (PIL-based for images)
- `PDFProcessor` (PyMuPDF4LLM for documents)
- `OfficeProcessor` (Unstructured for DOCX/XLSX/PPTX)
- `TextProcessor` (pandas for CSV/TSV data analysis)
- **Provider Handlers**: Format media content for each provider's API requirements
**Provider-Specific Formatting:**
```python
# Same MediaContent gets formatted differently:
# OpenAI (JSON with image_url):
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
]
}
# Anthropic (Messages API with source):
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this"},
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}}
]
}
# Local (Text embedding):
"Analyze this\n\nImage description: A chart showing quarterly trends..."
```
**Graceful Fallback Strategy:**
1. **Advanced Processing**: PyMuPDF4LLM, Unstructured libraries
2. **Basic Processing**: Simple text extraction
3. **Metadata Fallback**: File information and properties
4. **Degrades gracefully for documents**: PDFs/Office/text aim to return best-effort extracted text/metadata rather than crashing.
5. **Policy-driven for true multimodal inputs**: for image/audio/video message parts, behavior is policy-driven; unsupported requests fail loudly unless an explicit enrichment fallback is configured (see `docs/media-handling-system.md` and `docs/vision-capabilities.md`).
#### Unified Media API
The same `media=[]` parameter works across all providers:
```python
# Universal API - works with any provider
llm = create_llm("openai", model="gpt-4o") # or "anthropic", "ollama", etc.
response = llm.generate(
"Analyze these files",
media=["report.pdf", "chart.png", "data.xlsx"]
)
```
**CLI Integration:**
```bash
# Simple @filename syntax works everywhere
python -m abstractcore.utils.cli --prompt "What's in @document.pdf and @image.jpg"
```
#### Capability plugins (voice/audio/vision)
To keep the default `abstractcore` install dependency-light while still enabling deterministic modality APIs, AbstractCore supports optional **capability plugins**:
- `abstractvoice` provides `core.voice` + `core.audio` (TTS/STT).
- `abstractvision` provides `core.vision` (T2I/I2I/T2V/I2V; backend-pluggable).
Discovery:
- `llm.capabilities.status()` returns a JSON-safe snapshot (which backends are available/selected, plus install hints).
- Convenience facades exist as properties: `llm.voice`, `llm.audio`, `llm.vision` (lazy; missing plugins raise actionable errors).
### 4. Request Lifecycle
```mermaid
sequenceDiagram
participant App as Your App
participant Core as AbstractCore
participant Events as Event System
participant Retry as Retry Logic
participant Provider as LLM Provider
participant Tools as Tool System
App->>Core: generate("prompt", tools=tools)
Core->>Events: emit(GENERATION_STARTED)
Core->>Retry: wrap_with_retry()
alt Provider Call Success
Retry->>Provider: API call
Provider->>Retry: response
Retry->>Core: successful response
else Provider Call Fails
Retry->>Provider: API call (attempt 1)
Provider->>Retry: rate limit error
Retry->>Retry: wait with backoff
Retry->>Provider: API call (attempt 2)
Provider->>Retry: success
Retry->>Core: successful response
end
alt Has Tool Calls
Core->>Events: emit(TOOL_STARTED)
Core->>Tools: execute_tools()
Tools->>Core: tool results
Core->>Events: emit(TOOL_COMPLETED)
end
Core->>Events: emit(GENERATION_COMPLETED)
Core->>App: GenerateResponse
```
Note: in the Python API, `execute_tools` defaults to `False` (**pass-through**). Tool calls are returned in `GenerateResponse.tool_calls` for your host/runtime to execute. `execute_tools=True` exists for simple demos but is deprecated for most production use cases. The optional HTTP gateway server runs in pass-through mode.
### 5. Tool System Architecture
The tool system provides universal tool-call detection (and optional local execution) across all providers:
```mermaid
graph TD
A[LLM Response] --> B{Has Tool Calls?}
B -->|No| C[Return Response]
B -->|Yes| D[Parse Tool Calls]
D --> E[Event: TOOL_STARTED]
E --> F{Event Prevented?}
F -->|Yes| G[Skip Tool Execution]
F -->|No| H[Execute Tools]
H --> I[Collect Results]
I --> J[Event: TOOL_COMPLETED]
J --> K[Append Results to Response]
K --> C
style D fill:#ffeb3b
style H fill:#4caf50
style E fill:#ff9800
```
#### Tool Execution Flow
1. **Tool Detection**: Parse tool calls from LLM response
2. **Event Emission**: Emit `TOOL_STARTED` (preventable)
3. **Optional local execution (deprecated)**: execute tools inside AbstractCore when `execute_tools=True` (providers never execute arbitrary local tools)
4. **Result Collection**: Gather results and error information
5. **Event Emission**: Emit `TOOL_COMPLETED` with results
6. **Response Integration**: Append tool results to original response
#### Provider-Specific Tool Handling with Tag Rewriting
```mermaid
graph LR
A[Tool Definition] --> B{Provider Type}
B --> C[OpenAI: Native JSON]
B --> D[Anthropic: Native XML]
B --> E[Ollama: Architecture-specific]
B --> F[Others: Prompted Format]
C --> G[LLM Generation]
D --> G
E --> G
F --> G
G --> H[Tool Call Tag Rewriter]
H --> I[Target Format Conversion]
I --> J[Universal Tool Parser]
J --> K[Local Tool Execution]
style A fill:#e1f5fe
style H fill:#ff9800
style I fill:#9c27b0
style K fill:#4caf50
```
#### Tool Call Tag Rewriting System
AbstractCore includes a sophisticated tag rewriting system that enables compatibility with any agentic CLI:
**Rewriting Pipeline**:
```mermaid
graph TD
A[Raw LLM Response] --> B[Pattern Detection]
B --> C{Tag Format Needed?}
C -->|No| D[Default Qwen3 Format]
C -->|Yes| E[Target Format Conversion]
E --> F{Format Type}
F -->|Predefined| G[llama3, xml, gemma, etc.]
F -->|Custom| H[User-defined Tags]
G --> I[Rewritten Tool Call]
H --> I
D --> I
I --> J[Tool Execution]
style B fill:#2196f3
style E fill:#ff9800
style I fill:#4caf50
```
**Supported Formats**:
- **Default (Qwen3)**: `<|tool_call|>...JSON...|tool_call|>` - Compatible with Codex CLI
- **LLaMA3**: `...JSON...` - Compatible with Crush CLI
- **XML**: `...JSON...` - Compatible with Gemini CLI
- **Gemma**: ````tool_code...JSON...```` - Compatible with Gemma models
- **Custom**: Any user-defined format (e.g., `[TOOL]...JSON...[/TOOL]`)
**Real-Time Integration**:
- **Streaming Compatible**: Works seamlessly with unified streaming architecture
- **Zero Latency**: No additional processing delays
- **Universal Detection**: Automatically detects source format from any model
- **Graceful Fallback**: Returns original content if rewriting fails
### 6. Retry and Reliability System
Production-grade error handling with multiple layers:
```mermaid
graph TD
A[LLM Request] --> B[Retry Manager]
B --> C{Error Type}
C -->|Rate Limit| D[Exponential Backoff]
C -->|Network Error| D
C -->|Timeout| D
C -->|Auth Error| E[Fail Fast]
C -->|Invalid Request| E
D --> F{Max Attempts?}
F -->|No| G[Wait + Jitter]
G --> H[Retry Request]
H --> B
F -->|Yes| I[Circuit Breaker]
I --> J{Failure Threshold?}
J -->|No| K[Return Error]
J -->|Yes| L[Open Circuit]
L --> M[Fail Fast for Duration]
style D fill:#ff9800
style I fill:#f44336
style L fill:#d32f2f
```
#### Retry Configuration
```python
from abstractcore import create_llm
from abstractcore.core.retry import RetryConfig
config = RetryConfig(
max_attempts=3, # Try up to 3 times
initial_delay=1.0, # Start with 1 second delay
max_delay=60.0, # Cap at 1 minute
use_jitter=True, # Add randomness
failure_threshold=5, # Circuit breaker after 5 failures
recovery_timeout=60.0 # Test recovery after 1 minute
)
llm = create_llm("openai", model="gpt-4o-mini", retry_config=config)
```
### 7. Event System
Observability hooks through events:
```mermaid
graph TD
A[LLM Operation] --> B[Event Emission]
B --> C[Global Event Bus]
C --> D[Event Listeners]
D --> E[Monitoring]
D --> F[Logging]
D --> G[Cost Tracking]
D --> H[Tool Control]
D --> I[Custom Logic]
E --> J[Metrics Dashboard]
F --> K[Log Files]
G --> L[Cost Alerts]
H --> M[Security Gates]
I --> N[Business Logic]
style B fill:#9c27b0
style C fill:#673ab7
style H fill:#f44336
```
#### Event Types and Use Cases
```python
from abstractcore.events import EventType, on_global
# Cost monitoring (best-effort estimate; based on token usage)
def monitor_costs(event):
if event.type != EventType.GENERATION_COMPLETED:
return
cost = event.data.get("cost_usd")
if isinstance(cost, (int, float)) and cost > 0.10:
alert(f"High estimated cost: ${cost:.2f}")
# Tool monitoring
def log_tools(event):
if event.type == EventType.TOOL_COMPLETED:
log(f"Tool completed: {event.data.get('tool_name')}")
# Performance tracking
def track_performance(event):
if event.type != EventType.GENERATION_COMPLETED:
return
duration_ms = event.data.get("duration_ms")
if isinstance(duration_ms, (int, float)) and duration_ms > 10_000:
log(f"Slow request: {float(duration_ms):.0f}ms")
on_global(EventType.GENERATION_COMPLETED, monitor_costs)
on_global(EventType.TOOL_COMPLETED, log_tools)
on_global(EventType.GENERATION_COMPLETED, track_performance)
```
### 8. Structured Output System with Streaming Integration
Type-safe responses with automatic validation, retry, and unified streaming:
```mermaid
graph TD
A[LLM Generate] --> B{Streaming Mode?}
B -->|Yes| C[Unified Streaming Processor]
B -->|No| D[Standard JSON Parsing]
C --> E[Incremental Tool Detector]
E --> F[Real-time Chunk Processing]
F --> G[Tool Call Detection]
G --> H[Mid-Stream Tool Execution]
D --> I[Parse JSON]
I --> J{Valid JSON?}
J -->|No| K[Retry with Error Feedback]
J -->|Yes| L[Pydantic Validation]
L --> M{Valid Model?}
M -->|No| K
M -->|Yes| N[Return Typed Object]
K --> O{Max Retries?}
O -->|No| A
O -->|Yes| P[Raise ValidationError]
style C fill:#4caf50
style E fill:#2196f3
style F fill:#ff9800
style G fill:#9c27b0
style K fill:#f44336
```
#### Unified Streaming Architecture
AbstractCore’s streaming system provides character-by-character streaming with incremental tool detection and optional tool-call syntax rewriting.
**Architecture Components**:
```mermaid
graph TD
A[Stream Input] --> B[UnifiedStreamProcessor]
B --> C[IncrementalToolDetector]
C --> D[Tag Rewriter]
D --> E[Tool Execution (optional)]
E --> F[Stream Output]
B --> G[Character-by-Character Handling]
G --> H[Intelligent Buffering]
H --> C
style B fill:#4caf50
style C fill:#2196f3
style D fill:#ff9800
style E fill:#9c27b0
```
**Key Features**:
1. **Unified Streaming Strategy**
- Single consistent approach across all providers
- Best-effort time-to-first-token (TTFT) telemetry for debugging
- Minimal buffering (incremental parsing)
2. **Incremental Tool Detection**
- Real-time tool call detection during streaming
- Emits `chunk.tool_calls` as soon as a full tool call is detected
- Handles partial tool calls across chunk boundaries
3. **Character-by-Character Streaming**
- Handles micro-chunking from providers (very small deltas)
- Intelligent buffering for partial tool calls
- Robust parsing with auto-repair for malformed JSON
4. **Tool Call Tag Rewriting Integration**
- Real-time format conversion during streaming
- Support for multiple formats (Qwen3, LLaMA3, Gemma, XML, custom)
- Designed to avoid large buffering while keeping tool calls structured
**Streaming with Tag Rewriting Example**:
```python
from abstractcore import create_llm, tool
@tool
def analyze_code(code: str) -> str:
"""Return a small, deterministic analysis."""
return f"chars={len(code)}"
llm = create_llm("ollama", model="qwen3:4b-instruct") # requires Ollama running (default: http://localhost:11434)
for chunk in llm.generate(
"Write a Python function, then call analyze_code on it.",
stream=True,
tools=[analyze_code],
tool_call_tags="llama3", # Emit ... style tags
):
print(chunk.content or "", end="", flush=True)
if chunk.tool_calls:
print(f"\nTool calls: {chunk.tool_calls}")
# Output format: {"name": "analyze_code"}...
```
Implementation pointers (source of truth):
- Unified streaming + tool detection: `abstractcore/providers/streaming.py`
- Streaming wrapper + TTFT metadata: `abstractcore/providers/base.py`
#### Automatic Error Feedback
When validation fails, AbstractCore provides detailed feedback to the LLM:
```python
# If LLM returns invalid data, AbstractCore automatically retries with:
"""
IMPORTANT: Your previous response had validation errors:
• Field 'age': Age must be positive (got -25)
• Field 'email': Invalid email format
Please correct these errors and provide valid JSON.
"""
```
### 9. Session Management
Simple conversation memory without complexity:
```mermaid
graph LR
A[BasicSession] --> B[Message History]
A --> C[System Prompt]
A --> D[Provider Reference]
B --> E[generate()]
C --> E
D --> E
E --> F[Add to History]
F --> G[Return Response]
A --> H[save()/load()]
H --> I[JSON Persistence]
style A fill:#2196f3
style B fill:#4caf50
```
### 10. Server Architecture (Optional Component)
The AbstractCore server provides OpenAI-compatible HTTP endpoints built on top of the core library:
```mermaid
graph TD
A[HTTP Client] --> B[FastAPI Server]
B --> C{Endpoint Router}
C --> D[/v1/chat/completions]
C --> E[/v1/embeddings]
C --> F[/v1/models]
C --> G[/providers]
C --> Img[/v1/images/* (optional)]
C --> Aud[/v1/audio/* (optional)]
C --> Cache[/acore/prompt_cache/*]
D --> H[Request Validation]
E --> H
F --> I[Provider Discovery]
G --> I
H --> J[AbstractCore Library]
I --> J
J --> K[Provider Interface]
K --> L[LLM Providers]
style B fill:#4caf50
style J fill:#e1f5fe
style K fill:#f3e5f5
```
**Architecture Layers**:
1. **HTTP Layer**: FastAPI-based REST API with request validation
2. **Translation Layer**: Converts HTTP requests to AbstractCore library calls
3. **Core Layer**: Uses the full AbstractCore provider system
4. **Response Layer**: Transforms responses to OpenAI-compatible format
**Key Capabilities**:
- **OpenAI Compatibility**: Drop-in replacement for OpenAI API clients
- **Universal Provider Access**: Single API for all providers (OpenAI, Anthropic, Ollama, etc.)
- **Format Conversion**: Automatic tool call format conversion for agentic CLIs
- **Streaming Support**: Server-sent events for real-time responses
- **Model Discovery**: Dynamic model listing across all providers
- **Embedding Support**: Multi-provider embedding generation (HuggingFace, Ollama, LMStudio)
- **Optional Vision Endpoints**: OpenAI-compatible `/v1/images/generations` and `/v1/images/edits` (plus `/v1/vision/*` control plane) delegated to `abstractvision` (safe-by-default; requires explicit config).
- **Optional Audio Endpoints**: OpenAI-compatible `/v1/audio/transcriptions` and `/v1/audio/speech` delegated to capability plugins (typically `abstractvoice`).
- **Prompt Cache Control Plane**: `/acore/prompt_cache/*` proxy endpoints for cache stats/set/update/fork/clear (best-effort; typically targets an `abstractcore.endpoint` upstream).
**Request Flow Example**:
```mermaid
sequenceDiagram
participant Client
participant Server as FastAPI Server
participant Core as AbstractCore
participant Provider as LLM Provider
Client->>Server: POST /v1/chat/completions
Server->>Server: Validate Request
Server->>Core: create_llm(provider, model)
Server->>Core: llm.generate(messages, tools)
Core->>Provider: API call with retry logic
Provider->>Core: Response
Core->>Core: Execute tools if needed
Core->>Server: GenerateResponse
Server->>Server: Convert to OpenAI format
Server->>Client: HTTP Response (streaming or complete)
```
**Server Features**:
- **Automatic Retry**: Built-in retry logic from core library
- **Event System**: Full observability through events
- **Debug Logging**: Comprehensive request/response logging
- **Health Checks**: `/health` endpoint for monitoring
- **Interactive Docs**: Auto-generated Swagger UI at `/docs`
- **Multi-Worker Support**: Production deployment with multiple workers
## Architecture Benefits
### 1. Provider Agnostic
- **Same code works everywhere**: Switch providers by changing one line
- **No vendor lock-in**: Easy migration between cloud and local providers
- **Consistent semantics**: tools, streaming, and structured output follow the same API surface (provider/model differences still apply)
### 2. Production Ready
- **Automatic reliability**: Built-in retry logic and circuit breakers
- **Comprehensive observability**: Events for every operation
- **Error handling**: Proper error classification and handling
### 3. Extensible
- **Event system**: Hook into any operation
- **Tool system**: Add new tools easily
- **Provider system**: Add new providers with minimal code
### 4. Performance Optimized
- **Lazy loading**: Providers loaded only when needed
- **Connection pooling**: Reuse HTTP connections
- **Efficient parsing**: Optimized JSON and tool parsing
## Extension Points
AbstractCore is designed to be extended:
### Adding a New Provider
```python
from abstractcore.providers.base import BaseProvider
class MyProvider(BaseProvider):
def generate(self, prompt: str, **kwargs) -> GenerateResponse:
# Implement provider-specific logic
return GenerateResponse(content="...")
def get_capabilities(self) -> List[str]:
return ["text_generation", "streaming"]
```
### Adding Tools
```python
from abstractcore import tool
@tool
def my_custom_tool(param: str) -> str:
"""Custom tool that does something useful."""
return f"Processed: {param}"
```
## Performance Characteristics
AbstractCore’s overhead is usually small compared to model inference and network latency. If performance matters, benchmark on your target provider/model/hardware.
Common levers:
- Provider choice and base URL latency
- Concurrency (async + connection pooling)
- Streaming vs non-streaming
- Structured output (schema size, retry behavior)
- Tool execution strategy (pass-through vs host execution)
## Security Considerations
### 1. Tool Execution Safety
- **Local execution (optional)**: tool execution is local (never executed by the provider); by default tool calls are returned for your host/runtime to execute
- **Event prevention**: Stop dangerous tools before execution
- **Input validation**: Validate tool parameters
### 2. API Key Management
- **Environment variables**: Secure key storage
- **Avoid logging**: treat logs as sensitive; do not log secrets (AbstractCore tries to avoid printing keys in logs)
- **Provider isolation**: Keys scoped to specific providers
### 3. Data Privacy
- **Local options**: Support for local providers (Ollama, MLX)
- **No persistent storage by default**: conversation state lives in memory (for example `BasicSession`) unless you explicitly save it or enable tracing/logging
- **Transparent processing**: All operations are observable through events
## Testing Strategy
The repo uses a mix of unit tests and integration tests. Some tests are provider-/network-/hardware-dependent and are opt-in.
Quick pointers:
- Run: `pytest -q`
- Vision tests: `tests/README_VISION_TESTING.md`
- Seed tests: `tests/README_SEED_TESTING.md`
- Streaming/tool parsing tests: `tests/streaming/` and `tests/test_agentic_cli_compatibility.py`
- Server/endpoint tests: `tests/server/` and `tests/test_abstractendpoint_singleton_provider.py`
## Integration with AbstractFramework
AbstractCore is a core package in the **AbstractFramework** ecosystem:
- AbstractFramework (umbrella): https://github.com/lpalbou/AbstractFramework
- AbstractCore (this repo): https://github.com/lpalbou/AbstractCore
- AbstractRuntime: https://github.com/lpalbou/abstractruntime
In this ecosystem, AbstractCore focuses on **LLM I/O + provider abstraction**, while AbstractRuntime focuses on **durable execution** (effects/tools/workflows/state). AbstractCore remains usable standalone; when you need durability/policy/sandboxing around tools, plug it into a runtime (for example AbstractRuntime).
```mermaid
graph TD
subgraph "UI Layer (peers)"
A[AbstractCode Terminal CLI]
B[AbstractFlow Visual Editor React + ReactFlow]
end
A -.->|optional| F[AbstractFlow Engine]
B --> F
F --> C[AbstractAgent]
A --> C
C --> D[AbstractRuntime]
D --> E[AbstractCore]
E --> G[LLM Providers]
style E fill:#e1f5fe
style A fill:#fff3e0
style B fill:#fff3e0
style F fill:#f3e5f5
style C fill:#f3e5f5
style D fill:#f3e5f5
```
### Framework Layers
- **UI Layer** (peers):
- AbstractCode: Terminal CLI for interactive sessions
- AbstractFlow Visual Editor: Web-based diagram editor (React + ReactFlow + FastAPI)
- **AbstractFlow**: Multi-agent orchestration engine + visual editor
- **AbstractAgent**: Agent patterns (ReactAgent, CodeActAgent) with durable execution
- **AbstractRuntime**: Effect system, workflows, state persistence
AbstractCode can optionally use AbstractFlow for running flows. AbstractFlow includes its own visual editor for designing workflows.
## Summary
AbstractCore's architecture prioritizes:
1. **Reliability** - Production-grade error handling and retry logic
2. **Simplicity** - Clean APIs that are easy to understand and use
3. **Universality** - Same interface and features across all providers
4. **Extensibility** - Clear extension points for advanced features
5. **Observability** - Comprehensive events for monitoring and control
6. **Flexibility** - Deploy as Python library or OpenAI-compatible HTTP server
The result is a foundation that works reliably in production while remaining simple enough to learn quickly and flexible enough to build advanced applications on top of.
---
### Inlined: `docs/examples.md`
# Practical Examples
This guide shows real-world use cases for AbstractCore with complete, copy-paste examples. All examples work across any provider - just change the provider name.
## Table of Contents
- Basic Usage
- Glyph Visual-Text Compression
- Tool Calling Examples
- Tool Call Syntax Rewriting Examples
- Structured Output Examples
- Streaming Examples
- Session Management
- Interaction Tracing (Observability)
- Production Patterns
- Integration Examples
## Basic Usage
### Simple Q&A
```python
from abstractcore import create_llm
# Works with any provider
llm = create_llm("openai", model="gpt-4o-mini") # or "anthropic", "ollama"...
response = llm.generate("What is the difference between Python and JavaScript?")
print(response.content)
```
### Multiple Providers Comparison
```python
from abstractcore import create_llm
providers = [
("openai", "gpt-4o-mini"),
("anthropic", "claude-haiku-4-5"),
("ollama", "qwen3:4b-instruct")
]
question = "Explain Python list comprehensions with examples"
for provider_name, model in providers:
try:
llm = create_llm(provider_name, model=model)
response = llm.generate(question)
print(f"\n--- {provider_name.upper()} ---")
print(response.content[:200] + "...")
except Exception as e:
print(f"{provider_name} failed: {e}")
```
### Provider Fallback
```python
from abstractcore import create_llm
def generate_with_fallback(prompt, **kwargs):
"""Try multiple providers until one works."""
providers = [
("openai", "gpt-4o-mini"),
("anthropic", "claude-haiku-4-5"),
("ollama", "qwen3:4b-instruct")
]
for provider_name, model in providers:
try:
llm = create_llm(provider_name, model=model)
return llm.generate(prompt, **kwargs)
except Exception as e:
print(f"{provider_name} failed: {e}")
continue
raise Exception("All providers failed")
# Usage
response = generate_with_fallback("What is machine learning?")
print(response.content)
```
## Glyph Visual-Text Compression
Glyph compression renders long text into images for vision-capable models to reduce effective token usage (often 3–4x on long text; depends on content/model).
Requires `pip install "abstractcore[compression]"` (and `pip install "abstractcore[media]"` if you want PDF/Office text extraction).
### Automatic Compression with Ollama
```python
from abstractcore import create_llm
# Use a vision-capable model - Glyph works automatically
llm = create_llm("ollama", model="llama3.2-vision:11b")
# Large documents are automatically compressed when beneficial
response = llm.generate(
"What are the key findings and methodology in this research paper?",
media=["research_paper.pdf"] # Automatically compressed if size > threshold
)
print(f"Analysis: {response.content}")
print(f"Processing time: {response.gen_time}ms")
# Check if compression was used
if response.metadata and response.metadata.get('compression_used'):
stats = response.metadata.get('compression_stats', {})
print(f"✅ Glyph compression used!")
print(f"Compression ratio: {stats.get('compression_ratio', 'N/A')}x")
print(f"Original tokens: {stats.get('original_tokens', 'N/A')}")
print(f"Compressed tokens: {stats.get('compressed_tokens', 'N/A')}")
```
### Explicit Compression Control
```python
from abstractcore import create_llm
# Force compression for testing
llm = create_llm("ollama", model="qwen2.5vl:7b")
# Always compress
response = llm.generate(
"Summarize the main conclusions of this document",
media=["long_document.pdf"],
glyph_compression="always" # Force compression
)
# Never compress (for comparison)
response_no_compression = llm.generate(
"Summarize the main conclusions of this document",
media=["long_document.pdf"],
glyph_compression="never" # Disable compression
)
print(f"With compression: {response.gen_time}ms")
print(f"Without compression: {response_no_compression.gen_time}ms")
```
### Custom Configuration
```python
from abstractcore import create_llm
from abstractcore.compression import GlyphConfig
# Configure compression behavior
glyph_config = GlyphConfig(
enabled=True,
global_default="auto", # "auto", "always", "never"
quality_threshold=0.95, # Minimum quality score (0-1)
target_compression_ratio=3.0, # Target compression ratio
provider_optimization=True, # Enable provider-specific optimization
cache_enabled=True, # Enable compression caching
provider_profiles={
"ollama": {
"dpi": 150, # Higher DPI for better quality
"font_size": 9, # Smaller font for more content
"quality_threshold": 0.95
}
}
)
llm = create_llm("ollama", model="granite3.2-vision:latest", glyph_config=glyph_config)
response = llm.generate(
"Analyze the figures and tables in this academic paper",
media=["academic_paper.pdf"]
)
```
### Performance Benchmarking
```python
import time
from abstractcore import create_llm
def benchmark_glyph_compression(document_path, model_name="llama3.2-vision:11b"):
"""Compare processing with and without Glyph compression"""
llm = create_llm("ollama", model=model_name)
# Test without compression
start = time.time()
response_no_glyph = llm.generate(
"Provide a detailed analysis of this document",
media=[document_path],
glyph_compression="never"
)
time_no_glyph = time.time() - start
# Test with compression
start = time.time()
response_glyph = llm.generate(
"Provide a detailed analysis of this document",
media=[document_path],
glyph_compression="always"
)
time_glyph = time.time() - start
# Compare results
print(f"📊 Glyph Compression Benchmark")
print(f"Document: {document_path}")
print(f"Model: {model_name}")
print(f"")
print(f"Without Glyph: {time_no_glyph:.2f}s")
print(f"With Glyph: {time_glyph:.2f}s")
print(f"Speedup: {time_no_glyph/time_glyph:.2f}x")
print(f"")
print(f"Response quality comparison:")
print(f"No Glyph length: {len(response_no_glyph.content)} chars")
print(f"Glyph length: {len(response_glyph.content)} chars")
return response_glyph, response_no_glyph
# Run benchmark
glyph_response, normal_response = benchmark_glyph_compression("large_document.pdf")
```
### Multi-Provider Testing
```python
from abstractcore import create_llm
# Test Glyph across different providers and models
models_to_test = [
("ollama", "llama3.2-vision:11b"),
("ollama", "qwen2.5vl:7b"),
("ollama", "granite3.2-vision:latest"),
# Add LMStudio if running
# ("lmstudio", "your-vision-model"),
]
document = "research_paper.pdf"
question = "What are the key innovations presented in this paper?"
for provider, model in models_to_test:
try:
print(f"\n🧪 Testing {provider} - {model}")
llm = create_llm(provider, model=model)
response = llm.generate(
question,
media=[document],
glyph_compression="auto"
)
print(f"✅ Success - {response.gen_time}ms")
print(f"Response: {response.content[:100]}...")
# Check compression usage
if response.metadata and response.metadata.get('compression_used'):
print(f"🎨 Glyph compression was used")
else:
print(f"📝 Standard processing was used")
except Exception as e:
print(f"❌ Failed: {e}")
```
**Key Benefits Demonstrated:**
- **Automatic optimization**: Glyph decides when compression is beneficial
- **Transparent integration**: Works with existing media handling code
- **Quality preservation**: No loss of analytical accuracy
- **Provider flexibility**: Works across Ollama, LMStudio, and other vision providers
Learn more about Glyph configuration and advanced features
## Tool Calling Examples
### Weather Tool
```python
from abstractcore import create_llm
import requests
def get_weather(city: str, units: str = "metric") -> str:
"""Get current weather for a city."""
# In production, use a real weather API
# This is a simulated implementation
temperatures = {
"paris": "22°C, sunny",
"london": "15°C, cloudy",
"tokyo": "28°C, humid",
"new york": "18°C, windy"
}
return temperatures.get(city.lower(), f"Weather data not available for {city}")
# Tool definition
weather_tool = {
"name": "get_weather",
"description": "Get current weather information for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
},
"units": {
"type": "string",
"enum": ["metric", "imperial"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
# Works with any provider that supports tools
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate(
"What's the weather like in Paris and London?",
tools=[weather_tool]
)
print(response.content)
print(response.tool_calls) # Structured tool call requests (host/runtime executes them)
```
### Calculator Tool
```python
from abstractcore import create_llm
import math
def calculate(expression: str) -> str:
"""Safely evaluate mathematical expressions."""
try:
# In production, use a proper expression parser
# This is simplified for demo purposes
allowed_chars = set('0123456789+-*/.() ')
if not all(c in allowed_chars for c in expression):
return "Error: Invalid characters in expression"
result = eval(expression)
return f"{expression} = {result}"
except Exception as e:
return f"Error calculating {expression}: {str(e)}"
def sqrt(number: float) -> str:
"""Calculate square root."""
try:
result = math.sqrt(number)
return f"√{number} = {result}"
except Exception as e:
return f"Error: {str(e)}"
# Tool definitions
tools = [
{
"name": "calculate",
"description": "Perform basic mathematical calculations",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Mathematical expression"}
},
"required": ["expression"]
}
},
{
"name": "sqrt",
"description": "Calculate square root of a number",
"parameters": {
"type": "object",
"properties": {
"number": {"type": "number", "description": "Number to calculate square root of"}
},
"required": ["number"]
}
}
]
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate(
"What is 25 * 4 + 12, and what's the square root of 144?",
tools=tools
)
print(response.content)
print(response.tool_calls) # Structured tool call requests (host/runtime executes them)
```
### File Operations Tool
```python
from abstractcore import create_llm
from pathlib import Path
import os
def list_files(directory: str = ".") -> str:
"""List files in a directory."""
try:
path = Path(directory)
if not path.exists():
return f"Directory {directory} does not exist"
files = []
for item in path.iterdir():
if item.is_file():
files.append(f"FILE: {item.name}")
elif item.is_dir():
files.append(f"DIR: {item.name}/")
return f"Contents of {directory}:\n" + "\n".join(sorted(files))
except Exception as e:
return f"Error listing files: {str(e)}"
def read_file(filename: str) -> str:
"""Read contents of a text file."""
try:
path = Path(filename)
if not path.exists():
return f"File {filename} does not exist"
content = path.read_text(encoding='utf-8')
return f"Contents of {filename}:\n{content}"
except Exception as e:
return f"Error reading file: {str(e)}"
# Tool definitions
file_tools = [
{
"name": "list_files",
"description": "List files and directories in a given path",
"parameters": {
"type": "object",
"properties": {
"directory": {"type": "string", "description": "Directory path to list"}
}
}
},
{
"name": "read_file",
"description": "Read the contents of a text file",
"parameters": {
"type": "object",
"properties": {
"filename": {"type": "string", "description": "Path to the file to read"}
},
"required": ["filename"]
}
}
]
llm = create_llm("anthropic", model="claude-haiku-4-5")
response = llm.generate(
"List the files in the current directory and read the README.md file if it exists",
tools=file_tools
)
print(response.content)
print(response.tool_calls) # Structured tool call requests (host/runtime executes them)
```
## Tool Call Syntax Rewriting Examples
> **Real-time tool call format conversion for agentic CLI compatibility**
Tool call syntax rewriting enables AbstractCore to work seamlessly with any agentic CLI by converting tool calls to the expected format in real-time. This happens automatically during generation, including streaming.
> **Related**: Tool Call Syntax Rewriting Guide
### Codex CLI Integration (Qwen3 Tags)
```python
from abstractcore import create_llm
# Define tools (standard JSON format)
weather_tool = {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
# Codex CLI expects qwen3-style tool-call tags in assistant content.
# By default, AbstractCore strips tool-call markup from `response.content`;
# pass `tool_call_tags` to preserve/emit the tags for downstream parsers.
llm = create_llm("ollama", model="qwen3:4b-instruct")
response = llm.generate("What's the weather in Tokyo?", tools=[weather_tool], tool_call_tags="qwen3")
print(response.content)
print(response.tool_calls)
# Content includes: <|tool_call|>{"name": "get_weather", "arguments": {"city": "Tokyo"}}|tool_call|>
```
### Crush CLI Integration
```python
# Crush CLI expects LLaMA3 format - just specify the format
llm = create_llm("ollama", model="qwen3:4b-instruct")
response = llm.generate("Get weather for London", tools=[weather_tool], tool_call_tags="llama3")
print(response.content)
# Output includes: {"name": "get_weather", "arguments": {"city": "London"}}
```
### Custom CLI Format
```python
# Your custom CLI expects: [TOOL]...JSON...[/TOOL]
llm = create_llm("ollama", model="qwen3:4b-instruct")
response = llm.generate("Check weather in Paris", tools=[weather_tool], tool_call_tags="[TOOL],[/TOOL]")
print(response.content)
# Output includes: [TOOL]{"name": "get_weather", "arguments": {"city": "Paris"}}[/TOOL]
```
### Real-Time Streaming with Tag Rewriting
```python
# Streaming works seamlessly with any format
calculator_tool = {
"name": "calculate",
"description": "Perform mathematical calculations",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
llm = create_llm("ollama", model="qwen3-coder:30b")
print("AI: ", end="", flush=True)
for chunk in llm.generate(
"Calculate 15 * 23 and explain the result",
tools=[calculator_tool],
stream=True,
tool_call_tags="llama3",
):
print(chunk.content, end="", flush=True)
# Tool calls are surfaced in real-time (execution is host/runtime-owned)
if chunk.tool_calls:
for tool_call in chunk.tool_calls:
print(f"\n[TOOL CALL] {tool_call}")
print("\n")
# Shows: {"name": "calculate", "arguments": {"expression": "15 * 23"}}
# Tool execution is owned by the host/runtime.
```
### Multiple Tools with Different Formats
```python
# Define multiple tools
tools = [
{
"name": "get_weather",
"description": "Get weather information",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
},
{
"name": "calculate",
"description": "Perform calculations",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
},
{
"name": "list_files",
"description": "List files in a directory",
"parameters": {
"type": "object",
"properties": {"directory": {"type": "string"}},
"required": ["directory"]
}
}
]
# Test with XML format for Gemini CLI
llm = create_llm("ollama", model="qwen3:4b-instruct")
response = llm.generate(
"What's 2+2, weather in NYC, and files in current directory?",
tools=tools,
tool_call_tags="xml",
)
print(response.content)
print(response.tool_calls)
# All tool calls converted to: {"name": "...", "arguments": {...}}
```
### Session-Based Format Configuration
```python
from abstractcore import BasicSession
# Apply a consistent tool-call tag format across a session by reusing a variable
tool_call_tags = "llama3"
llm = create_llm("ollama", model="qwen3:4b-instruct")
session = BasicSession(provider=llm)
session.generate("Calculate 10 * 5", tools=[calculator_tool], tool_call_tags=tool_call_tags)
session.generate("What's the weather like?", tools=[weather_tool], tool_call_tags=tool_call_tags)
session.generate("List files in documents", tools=[{
"name": "list_files",
"description": "List directory contents",
"parameters": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]
}
}], tool_call_tags=tool_call_tags)
# All responses contain: ...JSON...
```
### Production Monitoring with Events
```python
from abstractcore.events import EventType, on_global
# Monitor tool usage across different formats
def log_tool_calls(event):
# Tool execution events are emitted when tools are executed (e.g., via ToolRegistry
# or when using `execute_tools=True` (deprecated)).
print(f"[TOOL EVENT] {event.type}: {event.data}")
on_global(EventType.TOOL_COMPLETED, log_tool_calls)
# Test with different formats
for format_name in ["qwen3", "llama3", "xml"]:
llm = create_llm("ollama", model="qwen3:4b-instruct")
response = llm.generate("Calculate 5 * 5", tools=[calculator_tool], tool_call_tags=format_name)
print(f"{format_name} format result: {response.content[:100]}...")
```
**Key Benefits**:
- Per-call configuration: pass `tool_call_tags=...` when you need tool-call markup preserved/rewritten in `response.content`
- Real-time processing: No post-processing delays
- Streaming compatible: Works with streaming mode
- Format flexibility: Predefined formats plus custom tags
> **Related**: Tool Call Syntax Rewriting Guide | Unified Streaming Architecture
## Structured Output Examples
> **Complete Guide**: Structured Output Documentation - Native vs prompted strategies, provider support, schema design best practices
### User Profile Extraction
```python
from abstractcore import create_llm
from pydantic import BaseModel, field_validator
from typing import Optional
class UserProfile(BaseModel):
name: str
age: int
email: str
occupation: Optional[str] = None
interests: list[str] = []
@field_validator('age')
@classmethod
def validate_age(cls, v):
if v < 0 or v > 150:
raise ValueError('Age must be between 0 and 150')
return v
@field_validator('email')
@classmethod
def validate_email(cls, v):
if '@' not in v:
raise ValueError('Invalid email format')
return v
llm = create_llm("openai", model="gpt-4o-mini")
# Text with user information
user_text = """
Hi, I'm Sarah Johnson, I'm 28 years old and work as a software engineer.
My email is sarah.johnson@techcorp.com. I love hiking, photography, and cooking.
"""
# Extract structured data with automatic validation
user = llm.generate(
f"Extract user profile from: {user_text}",
response_model=UserProfile
)
print(f"Name: {user.name}")
print(f"Age: {user.age}")
print(f"Email: {user.email}")
print(f"Occupation: {user.occupation}")
print(f"Interests: {', '.join(user.interests)}")
```
### Product Catalog Extraction
```python
from abstractcore import create_llm
from pydantic import BaseModel, field_validator
from typing import List
from enum import Enum
class ProductCategory(str, Enum):
ELECTRONICS = "electronics"
CLOTHING = "clothing"
BOOKS = "books"
HOME = "home"
SPORTS = "sports"
class Product(BaseModel):
name: str
price: float
category: ProductCategory
description: str
in_stock: bool = True
@field_validator('price')
@classmethod
def validate_price(cls, v):
if v <= 0:
raise ValueError('Price must be positive')
return v
class ProductCatalog(BaseModel):
products: List[Product]
total_count: int
@field_validator('total_count')
@classmethod
def validate_count(cls, v, info):
products = info.data.get('products', [])
if v != len(products):
raise ValueError(f'Total count {v} does not match products length {len(products)}')
return v
llm = create_llm("anthropic", model="claude-haiku-4-5")
catalog_text = """
Our store has these items:
1. Gaming Laptop - $1299.99 - High-performance laptop for gaming and work
2. Wireless Headphones - $199.99 - Noise-cancelling bluetooth headphones
3. Python Programming Book - $49.99 - Complete guide to Python programming
4. Coffee Maker - $89.99 - Automatic drip coffee maker, currently out of stock
"""
catalog = llm.generate(
f"Extract product catalog from: {catalog_text}",
response_model=ProductCatalog
)
print(f"Total products: {catalog.total_count}")
for product in catalog.products:
status = "In Stock" if product.in_stock else "Out of Stock"
print(f"- {product.name}: ${product.price} ({product.category}) - {status}")
```
### Code Review Analysis
```python
from abstractcore import create_llm
from pydantic import BaseModel
from typing import List
from enum import Enum
class Severity(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class CodeIssue(BaseModel):
line_number: int
severity: Severity
issue_type: str
description: str
suggestion: str
class CodeReview(BaseModel):
language: str
overall_quality: str
issues: List[CodeIssue]
recommendations: List[str]
llm = create_llm("ollama", model="qwen3:4b-instruct")
code_to_review = '''
def calculate_average(numbers):
total = 0
for num in numbers:
total += num
return total / len(numbers)
def process_data(data):
if data == None:
return []
result = []
for item in data:
result.append(item * 2)
return result
'''
review = llm.generate(
f"Review this Python code for issues:\n{code_to_review}",
response_model=CodeReview
)
print(f"Language: {review.language}")
print(f"Overall Quality: {review.overall_quality}")
print(f"\nIssues Found ({len(review.issues)}):")
for issue in review.issues:
print(f" Line {issue.line_number}: [{issue.severity.upper()}] {issue.issue_type}")
print(f" Problem: {issue.description}")
print(f" Fix: {issue.suggestion}\n")
print("Recommendations:")
for rec in review.recommendations:
print(f" - {rec}")
```
## Streaming Examples
### Basic Streaming (Unified 2025)
```python
# Streaming uses a unified processor across providers (exact chunking depends on the backend)
from abstractcore import create_llm
llm = create_llm("anthropic", model="claude-haiku-4-5")
print("AI Story Generator: ", end="", flush=True)
for chunk in llm.generate(
"Write a short story about a programmer who discovers their code is alive",
stream=True
):
print(chunk.content or "", end="", flush=True)
print("\n")
```
### Advanced Streaming with Progress and Performance Tracking
```python
from abstractcore import create_llm
import time
def streaming_with_insights(prompt):
# Supports any provider: OpenAI, Anthropic, Ollama, MLX
llm = create_llm("openai", model="gpt-4o-mini")
print("Generating response...")
start_time = time.time()
chunks = []
print("Response: ", end="", flush=True)
for chunk in llm.generate(prompt, stream=True):
chunks.append(chunk)
print(chunk.content or "", end="", flush=True)
# Optional real-time performance insights
if len(chunks) % 10 == 0:
current_time = time.time() - start_time
chars_generated = sum(len(c.content or "") for c in chunks)
print(f"\n[PROGRESS] {len(chunks)} chunks, {chars_generated} chars, {current_time:.1f}s")
# Final performance summary
total_time = time.time() - start_time
total_chars = sum(len(chunk.content or "") for chunk in chunks)
print(f"\n\n[STATS] Streaming Performance:")
print(f"- Total Chunks: {len(chunks)}")
print(f"- Total Characters: {total_chars}")
print(f"- Duration: {total_time:.2f}s")
print(f"- Speed: {total_chars/total_time:.0f} chars/sec")
# Usage with various prompts
streaming_with_insights("Explain quantum computing in simple terms")
```
### Real-Time Streaming with Tools (Unified Implementation)
```python
from abstractcore import create_llm
from datetime import datetime
def get_current_time() -> str:
"""Get the current time."""
return datetime.now().strftime("%H:%M:%S")
def get_weather(city: str) -> str:
"""Get current weather for a city."""
weather_data = {
"New York": "Sunny, 22°C",
"London": "Cloudy, 15°C",
"Tokyo": "Partly cloudy, 25°C"
}
return weather_data.get(city, f"Weather data unavailable for {city}")
time_tool = {
"name": "get_current_time",
"description": "Get the current time",
"parameters": {"type": "object", "properties": {}}
}
weather_tool = {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "Name of the city"}
}
}
}
# Works similarly across providers (exact chunking depends on the backend)
llm = create_llm("ollama", model="qwen3:4b-instruct")
print("AI Assistant: ", end="", flush=True)
for chunk in llm.generate(
"What time is it right now? And can you tell me the weather in New York?",
tools=[time_tool, weather_tool],
stream=True
):
# Real-time chunk processing and tool call detection
print(chunk.content or "", end="", flush=True)
# Tool calls are surfaced as structured dicts; execute them in your host/runtime.
if chunk.tool_calls:
print(f"\n[TOOL] Tool calls: {chunk.tool_calls}")
print("\n") # Newline after streaming
# Notes:
# - Real-time tool call detection
# - Streams chunks as they arrive (minimal buffering)
# - Works with OpenAI, Anthropic, Ollama, MLX (provider-dependent details)
```
### Performance-Optimized Streaming
```python
from abstractcore import create_llm
import time
def compare_providers(prompt):
"""Compare streaming performance across providers."""
providers = [
("openai", "gpt-4o-mini"),
("anthropic", "claude-haiku-4-5"),
("ollama", "qwen3:4b-instruct")
]
for provider, model in providers:
try:
llm = create_llm(provider, model=model)
print(f"\n[TEST] {provider.upper()} - {model}")
start_time = time.time()
chunks = []
for chunk in llm.generate(prompt, stream=True):
chunks.append(chunk)
print(chunk.content or "", end="", flush=True)
total_time = time.time() - start_time
total_chars = sum(len(chunk.content or "") for chunk in chunks)
print(f"\n\n[PERF] {provider.upper()} Performance:")
print(f"- Chunks: {len(chunks)}")
print(f"- Characters: {total_chars}")
print(f"- Duration: {total_time:.2f}s")
print(f"- Speed: {total_chars/total_time:.0f} chars/sec")
except Exception as e:
print(f"[ERROR] {provider} failed: {e}")
# Compare streaming performance
compare_providers("Write a creative short story about artificial intelligence")
```
**Streaming Features**:
- Time-to-first-token depends on provider/model/network
- Unified strategy across all providers
- Real-time tool call detection
- Streams chunks as they arrive (minimal buffering)
- Supports: OpenAI, Anthropic, Ollama, MLX, LMStudio, HuggingFace
- Robust error handling for malformed responses
## Session Management
### Basic Conversation
```python
from abstractcore import create_llm, BasicSession
llm = create_llm("openai", model="gpt-4o-mini")
session = BasicSession(
provider=llm,
system_prompt="You are a helpful coding tutor. Always provide examples."
)
# Multi-turn conversation
print("=== Conversation Start ===")
response1 = session.generate("Hi, I'm learning Python. What are decorators?")
print("User: Hi, I'm learning Python. What are decorators?")
print(f"AI: {response1.content}\n")
response2 = session.generate("Can you show me a practical example?")
print("User: Can you show me a practical example?")
print(f"AI: {response2.content}\n")
response3 = session.generate("What was my first question?")
print("User: What was my first question?")
print(f"AI: {response3.content}\n")
print(f"Total messages in conversation: {len(session.messages)}")
```
### Session Persistence
```python
from abstractcore import create_llm, BasicSession
from pathlib import Path
# Create and use session
llm = create_llm("anthropic", model="claude-haiku-4-5")
session = BasicSession(
provider=llm,
system_prompt="You are a travel advisor. Help plan trips."
)
# Have a conversation
session.generate("I want to plan a trip to Japan")
session.generate("I'm interested in both modern cities and traditional culture")
session.generate("My budget is around $3000 for 10 days")
# Save session
session_file = Path("travel_planning_session.json")
session.save(session_file)
print(f"Session saved to {session_file}")
# Later: Load session and continue
new_session = BasicSession.load(session_file, provider=llm)
response = new_session.generate("What were we discussing?")
print(f"AI remembers: {response.content}")
# Clean up
session_file.unlink() # Delete the file
```
### Context Management
```python
from abstractcore import create_llm, BasicSession
def create_coding_assistant():
"""Create a specialized coding assistant session."""
llm = create_llm("ollama", model="qwen3:4b-instruct")
system_prompt = """
You are an expert Python coding assistant. For each request:
1. Provide working code examples
2. Explain the code clearly
3. Mention potential issues or improvements
4. Keep responses concise but complete
"""
return BasicSession(provider=llm, system_prompt=system_prompt)
# Usage
assistant = create_coding_assistant()
# The assistant will remember the context throughout the conversation
assistant.generate("I need a function to validate email addresses")
assistant.generate("Now add logging to that function")
assistant.generate("How would I test this function?")
print(f"Conversation history: {len(assistant.messages)} messages")
# Clear history but keep system prompt
assistant.clear_history()
print(f"After clearing: {len(assistant.messages)} messages") # Just system prompt remains
```
## Interaction Tracing (Observability)
### Basic Tracing
Enable tracing to capture complete LLM interaction history for debugging and transparency:
```python
from abstractcore import create_llm
# Enable tracing on provider
llm = create_llm(
'openai',
model='gpt-4o-mini',
enable_tracing=True,
max_traces=100 # Keep last 100 interactions (ring buffer)
)
# Generate with custom metadata
response = llm.generate(
"Explain quantum computing",
temperature=0.7,
trace_metadata={
'user_id': 'user_123',
'session_type': 'educational',
'topic': 'quantum_physics'
}
)
# Access trace by ID
trace_id = response.metadata['trace_id']
trace = llm.get_traces(trace_id=trace_id)
print(f"Trace ID: {trace['trace_id']}")
print(f"Timestamp: {trace['timestamp']}")
print(f"Prompt: {trace['prompt']}")
print(f"Response: {trace['response']['content'][:100]}...")
print(f"Tokens: {trace['response']['usage']['total_tokens']}")
print(f"Time: {trace['response']['generation_time_ms']:.2f}ms")
print(f"Custom metadata: {trace['metadata']}")
```
### Session-Level Tracing
Automatically track all interactions in a session with correlation:
```python
from abstractcore import create_llm
from abstractcore.core.session import BasicSession
llm = create_llm('openai', model='gpt-4o-mini', enable_tracing=True)
session = BasicSession(provider=llm, enable_tracing=True)
# All interactions automatically traced
session.generate("What is Python?")
session.generate("Give me an example")
session.generate("Explain list comprehensions")
# Get all session traces
traces = session.get_interaction_history()
print(f"\nSession ID: {session.id}")
print(f"Total interactions: {len(traces)}")
for i, trace in enumerate(traces, 1):
print(f"\nInteraction {i}:")
print(f" Prompt: {trace['prompt']}")
print(f" Tokens: {trace['response']['usage']['total_tokens']}")
print(f" Time: {trace['response']['generation_time_ms']:.0f}ms")
print(f" Session ID: {trace['metadata']['session_id']}")
```
### Multi-Step Workflow with Retries
Track code generation workflows with retry attempts:
```python
from abstractcore import create_llm
from abstractcore.core.session import BasicSession
llm = create_llm('openai', model='gpt-4o-mini', enable_tracing=True)
session = BasicSession(provider=llm, enable_tracing=True)
# Step 1: Generate code
response = session.generate(
"Write a Python function to calculate fibonacci numbers",
system_prompt="You are a Python code generator. Only output code.",
step_type='code_generation',
attempt_number=1,
temperature=0
)
code = response.content
success = False
# Step 2-4: Execute with retry logic
for attempt in range(1, 4):
try:
exec(code) # Simulate execution
success = True
break
except Exception as e:
# Retry with error context
response = session.generate(
f"Previous code failed: {e}. Fix it.",
step_type='code_generation',
attempt_number=attempt + 1,
temperature=0
)
code = response.content
# Get workflow summary
traces = session.get_interaction_history()
print(f"\nWorkflow Summary:")
print(f"Total attempts: {len(traces)}")
print(f"Final status: {'Success' if success else 'Failed'}")
for trace in traces:
step = trace['metadata']['step_type']
attempt = trace['metadata']['attempt_number']
tokens = trace['response']['usage']['total_tokens']
print(f" {step} (Attempt {attempt}): {tokens} tokens")
```
### Export Traces
Export traces to different formats for analysis:
```python
from abstractcore import create_llm
from abstractcore.utils import export_traces, summarize_traces
llm = create_llm('openai', model='gpt-4o-mini', enable_tracing=True)
# Generate some interactions
for i in range(5):
llm.generate(f"Question {i+1}", temperature=0)
traces = llm.get_traces()
# Export to JSONL (one JSON per line)
export_traces(traces, format='jsonl', file_path='traces.jsonl')
# Export to pretty JSON
export_traces(traces, format='json', file_path='traces.json')
# Export to Markdown report
export_traces(traces, format='markdown', file_path='trace_report.md')
# Get summary statistics
summary = summarize_traces(traces)
print(f"\nSummary:")
print(f" Total interactions: {summary['total_interactions']}")
print(f" Total tokens: {summary['total_tokens']}")
print(f" Average tokens: {summary['avg_tokens_per_interaction']:.0f}")
print(f" Total time: {summary['total_time_ms']:.2f}ms")
print(f" Average time: {summary['avg_time_ms']:.2f}ms")
print(f" Providers: {summary['providers']}")
print(f" Models: {summary['models']}")
```
### Retrieve Specific Traces
Different ways to retrieve traces:
```python
from abstractcore import create_llm
llm = create_llm('openai', model='gpt-4o-mini', enable_tracing=True)
# Generate some interactions
for i in range(10):
llm.generate(f"Test {i}", temperature=0)
# Get all traces
all_traces = llm.get_traces()
print(f"Total traces: {len(all_traces)}")
# Get last 5 traces
recent = llm.get_traces(last_n=5)
print(f"Last 5 prompts: {[t['prompt'] for t in recent]}")
# Get specific trace by ID
response = llm.generate("Specific query", temperature=0)
trace_id = response.metadata['trace_id']
trace = llm.get_traces(trace_id=trace_id)
print(f"Specific trace: {trace['prompt']}")
```
Learn more about Interaction Tracing
## Production Patterns
### Retry and Error Handling
```python
from abstractcore import create_llm
from abstractcore.core.retry import RetryConfig
from abstractcore.exceptions import ProviderAPIError, RateLimitError
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def create_production_llm():
"""Create LLM with production-grade retry configuration."""
retry_config = RetryConfig(
max_attempts=3,
initial_delay=1.0,
max_delay=30.0,
use_jitter=True,
failure_threshold=5
)
return create_llm(
"openai",
model="gpt-4o-mini",
retry_config=retry_config,
timeout=30
)
def safe_generate(prompt, **kwargs):
"""Generate with comprehensive error handling."""
llm = create_production_llm()
try:
logger.info(f"Generating response for prompt: {prompt[:50]}...")
response = llm.generate(prompt, **kwargs)
logger.info(f"Response generated successfully: {len(response.content)} chars")
return response
except RateLimitError as e:
logger.warning(f"Rate limited: {e}")
raise
except ProviderAPIError as e:
logger.error(f"API error: {e}")
raise
except Exception as e:
logger.error(f"Unexpected error: {e}")
raise
# Usage
try:
response = safe_generate("What is machine learning?")
print(response.content)
except Exception as e:
print(f"Generation failed: {e}")
```
### Cost Monitoring
```python
from abstractcore import create_llm
from abstractcore.events import EventType, on_global
from datetime import datetime
import json
class CostMonitor:
def __init__(self, budget_limit=10.0):
self.total_cost = 0.0
self.budget_limit = budget_limit
self.requests = []
# Register event handlers
on_global(EventType.GENERATION_COMPLETED, self.track_cost)
def track_cost(self, event):
"""Track costs from generation events."""
cost = event.data.get("cost_usd")
if cost:
# NOTE: `cost_usd` is a best-effort estimate based on token usage.
cost_f = float(cost)
self.total_cost += cost_f
self.requests.append({
'timestamp': event.timestamp.isoformat(),
'provider': event.data.get('provider'),
'model': event.data.get('model'),
'cost_usd': cost_f,
'tokens_input': event.data.get('tokens_input'),
'tokens_output': event.data.get('tokens_output')
})
print(f"[COST] ${cost_f:.4f} | Total: ${self.total_cost:.4f}")
if self.total_cost > self.budget_limit:
print(f"[WARN] BUDGET EXCEEDED: ${self.total_cost:.4f} > ${self.budget_limit}")
def get_report(self):
"""Get cost report."""
return {
'total_cost': self.total_cost,
'budget_limit': self.budget_limit,
'total_requests': len(self.requests),
'average_cost': self.total_cost / len(self.requests) if self.requests else 0,
'requests': self.requests
}
# Usage
monitor = CostMonitor(budget_limit=1.0) # $1 budget
llm = create_llm("openai", model="gpt-4o-mini")
# Make some requests
for i in range(3):
response = llm.generate(f"Tell me a fact about number {i+1}")
print(f"Fact {i+1}: {response.content[:100]}...\n")
# Get report
report = monitor.get_report()
print(f"\n[REPORT] Final Cost Summary:")
print(f"Total cost: ${report['total_cost']:.4f}")
print(f"Requests: {report['total_requests']}")
print(f"Average per request: ${report['average_cost']:.4f}")
```
### Load Balancing
```python
from abstractcore import create_llm
import random
import time
from typing import List, Tuple
class LoadBalancer:
def __init__(self, providers: List[Tuple[str, str]]):
"""Initialize with list of (provider, model) tuples."""
self.providers = []
self.weights = []
for provider_name, model in providers:
try:
llm = create_llm(provider_name, model=model)
self.providers.append((llm, provider_name, model))
self.weights.append(1.0) # Equal weight initially
print(f"[OK] {provider_name} ({model}) ready")
except Exception as e:
print(f"[FAIL] {provider_name} ({model}) failed: {e}")
def generate(self, prompt, **kwargs):
"""Generate using weighted random selection."""
if not self.providers:
raise Exception("No providers available")
# Weighted random selection
provider_data = random.choices(
self.providers,
weights=self.weights,
k=1
)[0]
llm, provider_name, model = provider_data
try:
start_time = time.time()
response = llm.generate(prompt, **kwargs)
duration = time.time() - start_time
print(f"[OK] {provider_name} responded in {duration:.2f}s")
return response
except Exception as e:
print(f"[FAIL] {provider_name} failed: {e}")
# Remove failed provider temporarily
idx = self.providers.index(provider_data)
self.weights[idx] *= 0.1 # Reduce weight dramatically
raise
# Usage
balancer = LoadBalancer([
("openai", "gpt-4o-mini"),
("anthropic", "claude-haiku-4-5"),
("ollama", "qwen3:4b-instruct")
])
# Make requests - they'll be distributed across available providers
for i in range(5):
try:
response = balancer.generate(f"Tell me about topic number {i+1}")
print(f"Response {i+1}: {response.content[:50]}...\n")
except Exception as e:
print(f"Request {i+1} failed: {e}\n")
```
## Integration Examples
### FastAPI Integration
```python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from abstractcore import create_llm, BasicSession
from typing import Optional
import uuid
app = FastAPI(title="AbstractCore API")
# Global LLM instance
llm = create_llm("openai", model="gpt-4o-mini")
# Store sessions in memory (use Redis in production)
sessions = {}
class ChatRequest(BaseModel):
message: str
session_id: Optional[str] = None
system_prompt: Optional[str] = None
class ChatResponse(BaseModel):
response: str
session_id: str
@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
try:
# Get or create session
if request.session_id and request.session_id in sessions:
session = sessions[request.session_id]
else:
session_id = request.session_id or str(uuid.uuid4())
session = BasicSession(
provider=llm,
system_prompt=request.system_prompt or "You are a helpful assistant."
)
sessions[session_id] = session
# Generate response
response = session.generate(request.message)
return ChatResponse(
response=response.content,
session_id=session_id
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.delete("/sessions/{session_id}")
async def clear_session(session_id: str):
if session_id in sessions:
del sessions[session_id]
return {"message": "Session cleared"}
raise HTTPException(status_code=404, detail="Session not found")
# Run with: uvicorn main:app --reload
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
```
### Gradio Web Interface
```python
import gradio as gr
from abstractcore import create_llm, BasicSession
from typing import List, Tuple
class ChatInterface:
def __init__(self):
self.llm = create_llm("anthropic", model="claude-haiku-4-5")
self.session = BasicSession(
provider=self.llm,
system_prompt="You are a helpful AI assistant."
)
def chat(self, message: str, history: List[Tuple[str, str]]) -> Tuple[str, List[Tuple[str, str]]]:
"""Handle chat interaction."""
try:
response = self.session.generate(message)
history.append((message, response.content))
return "", history
except Exception as e:
history.append((message, f"Error: {str(e)}"))
return "", history
def clear(self) -> Tuple[str, List]:
"""Clear conversation history."""
self.session.clear_history()
return "", []
# Create interface
chat_interface = ChatInterface()
with gr.Blocks(title="AbstractCore Chat") as demo:
gr.Markdown("# AbstractCore Chat Interface")
chatbot = gr.Chatbot(label="Conversation", height=400)
msg = gr.Textbox(
label="Message",
placeholder="Type your message here...",
lines=2
)
with gr.Row():
submit = gr.Button("Send", variant="primary")
clear = gr.Button("Clear", variant="secondary")
# Event handlers
msg.submit(
chat_interface.chat,
inputs=[msg, chatbot],
outputs=[msg, chatbot]
)
submit.click(
chat_interface.chat,
inputs=[msg, chatbot],
outputs=[msg, chatbot]
)
clear.click(
chat_interface.clear,
outputs=[msg, chatbot]
)
if __name__ == "__main__":
demo.launch(share=True)
```
### Jupyter Notebook Integration
```python
# Cell 1: Setup
from abstractcore import create_llm
from IPython.display import display, Markdown, HTML
import json
# Create LLM instance
llm = create_llm("openai", model="gpt-4o-mini")
def display_response(response, title="AI Response"):
"""Pretty display for Jupyter notebooks."""
html = f"""
{title}
{response.content}
"""
display(HTML(html))
print("AbstractCore setup complete!")
# Cell 2: Basic Usage
response = llm.generate("Explain quantum computing in simple terms")
display_response(response, "Quantum Computing Explanation")
# Cell 3: Structured Output
from pydantic import BaseModel
from typing import List
class LearningPlan(BaseModel):
topic: str
difficulty: str
estimated_hours: int
prerequisites: List[str]
learning_steps: List[str]
plan = llm.generate(
"Create a learning plan for someone who wants to learn machine learning",
response_model=LearningPlan
)
# Display as nice table
display(HTML(f"""
Topic:
{plan.topic}
Difficulty:
{plan.difficulty}
Estimated Hours:
{plan.estimated_hours}
Prerequisites:
{', '.join(plan.prerequisites)}
"""))
display(Markdown("### Learning Steps:"))
for i, step in enumerate(plan.learning_steps, 1):
display(Markdown(f"{i}. {step}"))
```
### Discord Bot Integration
```python
import discord
from discord.ext import commands
from abstractcore import create_llm, BasicSession
import asyncio
# Bot setup
intents = discord.Intents.default()
intents.message_content = True
bot = commands.Bot(command_prefix='!', intents=intents)
# LLM setup
llm = create_llm("anthropic", model="claude-haiku-4-5")
sessions = {} # Store user sessions
@bot.event
async def on_ready():
print(f'{bot.user} has connected to Discord!')
@bot.command(name='ask')
async def ask(ctx, *, question):
"""Ask the AI a question."""
user_id = ctx.author.id
# Get or create session for user
if user_id not in sessions:
sessions[user_id] = BasicSession(
provider=llm,
system_prompt="You are a helpful Discord bot assistant. Keep responses concise."
)
try:
# Show typing indicator
async with ctx.typing():
response = sessions[user_id].generate(question)
# Discord has a 2000 character limit
content = response.content
if len(content) > 2000:
content = content[:1997] + "..."
await ctx.reply(content)
except Exception as e:
await ctx.reply(f"Sorry, I encountered an error: {str(e)}")
@bot.command(name='clear')
async def clear_session(ctx):
"""Clear your conversation history."""
user_id = ctx.author.id
if user_id in sessions:
sessions[user_id].clear_history()
await ctx.reply("Your conversation history has been cleared!")
else:
await ctx.reply("You don't have an active session to clear.")
@bot.command(name='stats')
async def stats(ctx):
"""Show session statistics."""
user_id = ctx.author.id
if user_id in sessions:
session = sessions[user_id]
message_count = len(session.messages)
await ctx.reply(f"Your session has {message_count} messages.")
else:
await ctx.reply("You don't have an active session.")
# Run bot (add your Discord bot token)
# bot.run('YOUR_DISCORD_BOT_TOKEN')
```
## Next Steps
These examples show AbstractCore's versatility across different use cases. To continue learning:
1. **Start with basics** - Try the simple Q&A examples
2. **Add tools** - Experiment with the tool calling examples
3. **Structure output** - Use Pydantic models for type-safe responses
4. **Go production** - Implement error handling and monitoring
5. **Build apps** - Use the integration examples as starting points
For more information:
- Getting Started - Basic setup and usage
- Capabilities - What AbstractCore can do
- Prerequisites - Provider setup and configuration
- API Reference - Complete API documentation
---
**Remember**: All these examples work with any provider - just change the `create_llm()` call to switch between OpenAI, Anthropic, Ollama, MLX, and others!
---
### Inlined: `docs/mcp.md`
# MCP (Model Context Protocol)
AbstractCore treats MCP as a **tool-server protocol** (not an LLM provider).
The `abstractcore.mcp` module provides:
- a minimal MCP JSON-RPC client (Streamable HTTP) → `abstractcore.mcp.McpClient`
- a minimal MCP stdio client (spawn a subprocess) → `abstractcore.mcp.McpStdioClient`
- tool discovery (`tools/list`) and conversion into AbstractCore tool specs → `abstractcore.mcp.McpToolSource`
## What you can do today
### 1) Discover tools from an MCP server
```python
from abstractcore.mcp import McpClient, McpToolSource
client = McpClient(url="http://localhost:3000") # MCP streamable HTTP endpoint
source = McpToolSource(server_id="local", client=client)
tools = source.list_tool_specs()
```
Each returned tool spec is an AbstractCore-compatible dict you can pass to `tools=[...]` in
`generate()`/`agenerate()`. Tool names are namespaced as:
`mcp::::`
See `abstractcore/abstractcore/mcp/naming.py`.
### 2) Execute MCP tools in your host/runtime
AbstractCore’s default execution path is **passthrough** (`execute_tools=False`): the model can
request tool calls and you execute them in your host/runtime.
The built-in `abstractcore.tools.registry.execute_tools()` executes Python callables registered in
the (deprecated) global registry; it does **not** automatically route MCP tool calls. For MCP, your
host/runtime should detect names starting with `mcp::` and dispatch them to an MCP client.
```python
from abstractcore.mcp import McpClient, parse_namespaced_tool_name
client = McpClient(url="http://localhost:3000")
def execute_mcp_tool_call(call: dict) -> dict:
parsed = parse_namespaced_tool_name(call.get("name", ""))
if not parsed:
raise ValueError("Not an MCP tool call")
server_id, tool_name = parsed
return client.call_tool(name=tool_name, arguments=call.get("arguments") or {})
```
## Transports supported
### Streamable HTTP
`McpClient` posts JSON-RPC to the server URL. It automatically sets an `Accept` header compatible
with streamable HTTP (`application/json, text/event-stream`) and will capture `MCP-Session-Id`
responses when provided.
See `abstractcore/abstractcore/mcp/client.py`.
### stdio
`McpStdioClient` spawns an MCP server subprocess and communicates over stdin/stdout with JSON-RPC,
including a best-effort initialization handshake.
See `abstractcore/abstractcore/mcp/stdio_client.py`.
## Configuration helpers
`create_mcp_client(config=...)` supports both HTTP and stdio config shapes:
```python
from abstractcore.mcp import create_mcp_client
client = create_mcp_client(config={"url": "http://localhost:3000"})
client = create_mcp_client(config={"transport": "stdio", "command": ["my-mcp-server", "--stdio"]})
```
See `abstractcore/abstractcore/mcp/factory.py`.
## Current limitations
- MCP is currently a **library-level** integration (tool discovery + clients). AbstractCore’s HTTP
server does not expose MCP management endpoints.
- Tool execution routing for `mcp::...` names is host/runtime responsibility.
---
### Inlined: `docs/structured-logging.md`
# Structured Logging
AbstractCore uses Python logging throughout the library. You can control console verbosity and optional file logging via the centralized config CLI.
Default behavior (no overrides): **console shows only ERROR and above**.
## Configure with the CLI
```bash
# Show current config (including logging)
abstractcore --status
# Console verbosity
abstractcore --set-console-log-level DEBUG
abstractcore --set-console-log-level INFO
abstractcore --set-console-log-level WARNING
abstractcore --set-console-log-level ERROR
abstractcore --set-console-log-level NONE
# File logging (disabled by default)
abstractcore --enable-file-logging
abstractcore --disable-file-logging
abstractcore --set-log-base-dir ~/.abstractcore/logs
# Convenience
abstractcore --enable-debug-logging
abstractcore --disable-console-logging
```
Logging defaults live in `~/.abstractcore/config/abstractcore.json`. See Centralized Config for the schema.
## Verbatim capture (prompts/responses)
Some components can capture full prompts and responses in logs/traces. This is controlled by `verbatim_enabled` in the centralized config file (`~/.abstractcore/config/abstractcore.json`). Disable it if you may handle sensitive data.
## In-code usage
```python
from abstractcore.utils.structured_logging import get_logger
logger = get_logger(__name__)
logger.info("startup", component="my_app", version="1.0.0")
```
---
### Inlined: `docs/api-reference.md`
# API Reference
Complete reference for the AbstractCore API. All examples work across any provider.
## Table of Contents
- Core Functions
- Classes
- AbstractCoreInterface
- generate()
- agenerate()
- BasicSession
- generate()
- agenerate()
- Event System
- Retry Configuration
- Embeddings
- Exceptions
## Core Functions
### create_llm()
Creates an LLM provider instance.
```python
def create_llm(
provider: str,
model: Optional[str] = None,
retry_config: Optional[RetryConfig] = None,
**kwargs
) -> AbstractCoreInterface
```
**Parameters:**
- `provider` (str): Provider name ("openai", "anthropic", "ollama", "mlx", "lmstudio", "huggingface")
- `model` (str, optional): Model name. If not provided, uses provider default
- `retry_config` (RetryConfig, optional): Custom retry configuration
- `**kwargs`: Provider-specific parameters
**Provider-specific parameters:**
- `api_key` (str): API key for cloud providers
- `base_url` (str): Custom endpoint URL
- `temperature` (float): Sampling temperature (0.0-1.0, controls creativity)
- `seed` (int): Random seed for deterministic outputs (✅ OpenAI, Ollama, MLX, HuggingFace, LMStudio; ⚠️ Anthropic issues warning)
- `max_tokens` (int): Maximum output tokens
- `timeout` (int): Request timeout in seconds
- `top_p` (float): Nucleus sampling parameter
**Returns:** AbstractCoreInterface instance
**Example:**
```python
from abstractcore import create_llm
# Basic usage
llm = create_llm("openai", model="gpt-4o-mini")
# With configuration
llm = create_llm(
"anthropic",
model="claude-haiku-4-5",
temperature=0.7,
max_tokens=1000,
timeout=30
)
# Local provider
llm = create_llm("ollama", model="qwen2.5-coder:7b", base_url="http://localhost:11434")
```
## Classes
### AbstractCoreInterface
Base interface for all LLM providers. All providers implement this interface.
#### generate()
Generate text response from the LLM.
```python
def generate(
self,
prompt: str,
messages: Optional[List[Dict]] = None,
system_prompt: Optional[str] = None,
tools: Optional[List[Dict]] = None,
response_model: Optional[BaseModel] = None,
retry_strategy: Optional[Retry] = None,
stream: bool = False,
thinking: Optional[bool | str] = None,
**kwargs
) -> Union[GenerateResponse, Iterator[GenerateResponse]]
```
**Parameters:**
- `prompt` (str): Text prompt to generate from
- `messages` (List[Dict], optional): Conversation messages in OpenAI format
- `system_prompt` (str, optional): System prompt to set context
- `tools` (List[Dict], optional): Tools the LLM can call
- `response_model` (BaseModel, optional): Pydantic model for structured output
- `retry_strategy` (Retry, optional): Custom retry strategy for structured output
- `stream` (bool): Enable streaming response
- `thinking` (bool | str, optional): Unified thinking/reasoning control (`"auto"|"on"|"off"` or `"low"|"medium"|"high"` when supported)
- `**kwargs`: Additional generation parameters
**Returns:**
- If `stream=False`: GenerateResponse
- If `stream=True`: Iterator[GenerateResponse]
**Examples:**
**Basic Generation:**
```python
response = llm.generate("What is machine learning?")
print(response.content)
```
**With System Prompt:**
```python
response = llm.generate(
"Explain Python decorators",
system_prompt="You are a Python expert. Always provide code examples."
)
```
**Structured Output:**
```python
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
person = llm.generate(
"Extract: John Doe is 25 years old",
response_model=Person
)
print(f"{person.name}, age {person.age}")
```
> **See**: Structured Output Guide for comprehensive documentation
**Tool Calling:**
```python
def get_weather(city: str) -> str:
return f"Weather in {city}: sunny, 22°C"
tools = [{
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}]
response = llm.generate("What's the weather in Paris?", tools=tools)
```
**Streaming:**
```python
print("AI: ", end="")
for chunk in llm.generate(
"Create a Python function with a tool",
stream=True,
tools=tools
):
# Real-time chunk processing
print(chunk.content or "", end="", flush=True)
# Tool calls are surfaced as structured dicts; execute them in your host/runtime.
if chunk.tool_calls:
print(f"\nTool calls: {chunk.tool_calls}")
```
**Streaming notes**:
- Streaming uses a unified processor across providers; exact chunking behavior depends on the backend.
- Tool calls are surfaced as structured dicts in `chunk.tool_calls`; execute them in your host/runtime (pass-through by default).
- If you need tool-call markup preserved/re-written in `chunk.content`, pass `tool_call_tags=...` (see Tool Call Syntax Rewriting).
- In streaming mode, AbstractCore records a best-effort TTFT metric in `chunk.metadata["_timing"]["ttft_ms"]` when available (for debugging/observability).
#### agenerate()
Async version of `generate()` for concurrent request execution.
```python
async def agenerate(
self,
prompt: str,
messages: Optional[List[Dict]] = None,
system_prompt: Optional[str] = None,
tools: Optional[List[Dict]] = None,
response_model: Optional[BaseModel] = None,
stream: bool = False,
**kwargs
) -> Union[GenerateResponse, AsyncIterator[GenerateResponse]]
```
**Parameters:** Same as `generate()`
**Returns:**
- If `stream=False`: GenerateResponse
- If `stream=True`: AsyncIterator[GenerateResponse]
**Examples:**
**Basic Async:**
```python
import asyncio
async def main():
response = await llm.agenerate("What is quantum computing?")
print(response.content)
asyncio.run(main())
```
**Concurrent Requests:**
```python
async def batch_process():
tasks = [
llm.agenerate("Summarize Python"),
llm.agenerate("Summarize JavaScript"),
llm.agenerate("Summarize Rust")
]
responses = await asyncio.gather(*tasks)
for response in responses:
print(response.content)
asyncio.run(batch_process())
```
**Async Streaming:**
```python
async def stream_response():
async for chunk in llm.agenerate("Tell me a story", stream=True):
print(chunk.content, end='', flush=True)
asyncio.run(stream_response())
```
**Multi-Provider Comparison:**
```python
async def compare_providers():
openai = create_llm("openai", model="gpt-4o-mini")
claude = create_llm("anthropic", model="claude-haiku-4-5")
responses = await asyncio.gather(
openai.agenerate("What is 2+2?"),
claude.agenerate("What is 2+2?")
)
print(f"OpenAI: {responses[0].content}")
print(f"Claude: {responses[1].content}")
asyncio.run(compare_providers())
```
**Features:**
- Works across AbstractCore providers (cloud + local); some use native async, others fall back to `asyncio.to_thread()`
- Faster batch operations via concurrent execution (depends on provider, network, and hardware)
- Full streaming support with AsyncIterator
- Compatible with FastAPI and async web frameworks
- Zero breaking changes to sync API
#### get_capabilities()
Get provider capabilities.
```python
def get_capabilities(self) -> List[str]
```
**Returns:** List of capability strings
**Example:**
```python
capabilities = llm.get_capabilities()
print(capabilities) # ['text_generation', 'tool_calling', 'streaming', 'vision']
```
#### unload_model(model_name)
Unload/cleanup resources for a specific model (best-effort).
```python
def unload_model(self, model_name: str) -> None
```
For local providers (Ollama, MLX, HuggingFace, LMStudio), this explicitly frees model memory or releases client resources. For API providers (OpenAI, Anthropic), this is typically a no-op but safe to call.
**Provider-specific behavior:**
- **Ollama**: Sends `keep_alive=0` to immediately unload from server
- **MLX**: Clears model/tokenizer references and forces garbage collection
- **HuggingFace**: Closes llama.cpp resources (GGUF) or clears model references
- **LMStudio**: Closes HTTP connection (server auto-manages via TTL)
- **OpenAI/Anthropic**: No-op (safe to call)
**Example:**
```python
# Load and use a large model
llm = create_llm("ollama", model="qwen3-coder:30b")
response = llm.generate("Hello world")
# Explicitly free memory when done
llm.unload_model(llm.model)
del llm
# Now safe to load another large model
llm2 = create_llm("mlx", model="mlx-community/Qwen3-30B-4bit")
```
**Use cases:**
- Test suites testing multiple models sequentially
- Memory-constrained environments (<32GB RAM)
- Sequential model loading in production systems
### GenerateResponse
Response object from LLM generation with **consistent token terminology** and **generation time tracking**.
```python
@dataclass
class GenerateResponse:
content: Optional[str]
raw_response: Any
model: Optional[str]
finish_reason: Optional[str]
usage: Optional[Dict[str, int]]
tool_calls: Optional[List[Dict]]
metadata: Optional[Dict]
gen_time: Optional[float] # Generation time in milliseconds
# Consistent token access properties
@property
def input_tokens(self) -> Optional[int]:
"""Get input tokens with consistent terminology."""
@property
def output_tokens(self) -> Optional[int]:
"""Get output tokens with consistent terminology."""
@property
def total_tokens(self) -> Optional[int]:
"""Get total tokens."""
```
**Attributes:**
- `content` (str): Generated text content
- `raw_response` (Any): Raw provider response
- `model` (str): Model used for generation
- `finish_reason` (str): Why generation stopped ("stop", "length", "tool_calls")
- `usage` (Dict): Token usage information
- `tool_calls` (List[Dict]): Tools called by the LLM
- `metadata` (Dict): Additional metadata (notably `metadata["reasoning"]` when a provider/model exposes thinking/reasoning)
- `gen_time` (float): Generation time in milliseconds, rounded to 1 decimal place
**Token and Timing Access Examples:**
```python
response = llm.generate("Explain quantum computing")
# Best-effort access across supported providers (may be None depending on backend/config)
print(f"Input tokens: {response.input_tokens}") # None if usage isn't reported/estimated
print(f"Output tokens: {response.output_tokens}") # None if usage isn't reported/estimated
print(f"Total tokens: {response.total_tokens}") # None if usage isn't reported/estimated
print(f"Generation time: {response.gen_time}ms") # None if timing wasn't captured
# Comprehensive summary
print(f"Summary: {response.get_summary()}") # Model | Tokens | Time | Tools
# Raw usage dictionary (provider-specific format)
print(f"Usage details: {response.usage}")
```
**Token Count Sources:**
- **Provider APIs**: OpenAI, Anthropic, LMStudio (native API token counts)
- **AbstractCore Calculation**: MLX, HuggingFace (using `token_utils.py`)
- **Mixed Sources**: Ollama (combination of provider and calculated tokens)
**Backward Compatibility**: Legacy `prompt_tokens` and `completion_tokens` keys remain available in `response.usage` dictionary.
**Methods:**
#### has_tool_calls()
```python
def has_tool_calls(self) -> bool
```
Returns True if the response contains tool calls.
#### get_tools_executed()
```python
def get_tools_executed(self) -> List[str]
```
Returns list of tool names that were executed.
**Example:**
```python
response = llm.generate("What's 2+2?", tools=[calculator_tool])
print(f"Content: {response.content}")
print(f"Model: {response.model}")
print(f"Tokens: {response.usage}")
if response.has_tool_calls():
print(f"Tools used: {response.get_tools_executed()}")
```
### BasicSession
Manages conversation context and history.
```python
class BasicSession:
def __init__(
self,
provider: AbstractCoreInterface,
system_prompt: Optional[str] = None,
temperature: Optional[float] = None,
seed: Optional[int] = None,
**kwargs
):
```
**Parameters:**
- `provider` (AbstractCoreInterface): LLM provider instance
- `system_prompt` (str, optional): System prompt for the conversation
- `temperature` (float, optional): Default temperature for all generations (0.0-1.0)
- `seed` (int, optional): Default seed for deterministic outputs (provider support varies)
- `**kwargs`: Additional session parameters (tools, timeouts, etc.)
**Attributes:**
- `messages` (List[Message]): Conversation history
- `provider` (AbstractCoreInterface): LLM provider
- `system_prompt` (str): System prompt
**Methods:**
#### generate()
```python
def generate(self, prompt: str, **kwargs) -> GenerateResponse
```
Generate response and add to conversation history.
#### agenerate()
```python
async def agenerate(
self,
prompt: str,
name: Optional[str] = None,
location: Optional[str] = None,
**kwargs
) -> Union[GenerateResponse, AsyncIterator[GenerateResponse]]
```
Async version of `generate()`. Maintains conversation history with async execution.
**Example:**
```python
import asyncio
async def chat():
session = BasicSession(provider=llm)
# Async conversation
response1 = await session.agenerate("My name is Alice")
response2 = await session.agenerate("What's my name?")
print(response2.content) # References Alice
asyncio.run(chat())
```
#### add_message()
```python
def add_message(self, role: str, content: str, **metadata) -> Message
```
Add message to conversation history.
#### clear_history()
```python
def clear_history(self, keep_system: bool = True) -> None
```
Clear conversation history, optionally keeping system prompt.
#### save()
```python
def save(self, filepath: Path) -> None
```
Save session to JSON file.
#### load()
```python
@classmethod
def load(cls, filepath: Path, provider: AbstractCoreInterface) -> "BasicSession"
```
Load session from JSON file.
**Example:**
```python
from abstractcore import create_llm, BasicSession
llm = create_llm("openai", model="gpt-4o-mini")
session = BasicSession(
provider=llm,
system_prompt="You are a helpful coding tutor.",
temperature=0.3, # Focused responses
seed=42 # Consistent outputs
)
# Multi-turn conversation
response1 = session.generate("What are Python decorators?")
response2 = session.generate("Show me an example", temperature=0.7) # Override for this call
print(f"Conversation has {len(session.messages)} messages")
# Save session
session.save(Path("conversation.json"))
# Load later
loaded_session = BasicSession.load(Path("conversation.json"), llm)
```
### Message
Represents a conversation message.
```python
@dataclass
class Message:
role: str
content: str
timestamp: Optional[datetime] = None
name: Optional[str] = None
metadata: Optional[Dict] = None
```
**Methods:**
#### to_dict()
```python
def to_dict(self) -> Dict
```
Convert message to dictionary.
#### from_dict()
```python
@classmethod
def from_dict(cls, data: Dict) -> "Message"
```
Create message from dictionary.
## Event System
### EventType
Available event types for monitoring.
```python
class EventType(Enum):
# Generation events
GENERATION_STARTED = "generation_started"
GENERATION_COMPLETED = "generation_completed"
# Tool events
TOOL_STARTED = "tool_started"
TOOL_PROGRESS = "tool_progress"
TOOL_COMPLETED = "tool_completed"
# Error handling
ERROR = "error"
# Retry and resilience events
RETRY_ATTEMPTED = "retry_attempted"
RETRY_EXHAUSTED = "retry_exhausted"
# Useful events
VALIDATION_FAILED = "validation_failed"
SESSION_CREATED = "session_created"
SESSION_CLEARED = "session_cleared"
COMPACTION_STARTED = "compaction_started"
COMPACTION_COMPLETED = "compaction_completed"
# Runtime/workflow events
WORKFLOW_STEP_STARTED = "workflow_step_started"
WORKFLOW_STEP_COMPLETED = "workflow_step_completed"
WORKFLOW_STEP_WAITING = "workflow_step_waiting"
WORKFLOW_STEP_FAILED = "workflow_step_failed"
```
### on_global()
Register global event handler.
```python
def on_global(event_type: EventType, handler: Callable[[Event], None]) -> None
```
**Parameters:**
- `event_type` (EventType): Event type to listen for
- `handler` (Callable): Function to call when event occurs
**Example:**
```python
from abstractcore.events import EventType, on_global
def cost_monitor(event):
cost = event.data.get("cost_usd")
if cost:
# NOTE: `cost_usd` is a best-effort estimate based on token usage.
print(f"Estimated cost: ${cost:.4f}")
def tool_monitor(event):
# Tool event payload shape varies by emitter.
# - Single-tool execution: {"tool_name": ..., "success": ..., ...}
# - Batch execution: {"tool_results": [{"name": ..., "success": ...}, ...], ...}
tool_name = event.data.get("tool_name")
if tool_name:
print(f"Tool completed: {tool_name} success={event.data.get('success')}")
return
for r in event.data.get("tool_results", []) or []:
print(f"Tool completed: {r.get('name')} success={r.get('success')} error={r.get('error')}")
# Register handlers
on_global(EventType.GENERATION_COMPLETED, cost_monitor)
on_global(EventType.TOOL_COMPLETED, tool_monitor)
# Now all LLM operations will trigger these handlers
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate("Hello world")
```
### Event
Event object passed to handlers.
```python
@dataclass
class Event:
type: EventType
timestamp: datetime
data: Dict[str, Any]
source: Optional[str] = None
```
## Retry Configuration
### RetryConfig
Configuration for provider-level retry behavior.
```python
@dataclass
class RetryConfig:
max_attempts: int = 3
initial_delay: float = 1.0
max_delay: float = 60.0
exponential_base: float = 2.0
use_jitter: bool = True
failure_threshold: int = 5
recovery_timeout: float = 60.0
half_open_max_calls: int = 2
```
**Parameters:**
- `max_attempts` (int): Maximum retry attempts
- `initial_delay` (float): Initial delay in seconds
- `max_delay` (float): Maximum delay in seconds
- `exponential_base` (float): Base for exponential backoff
- `use_jitter` (bool): Add randomness to delays
- `failure_threshold` (int): Circuit breaker failure threshold
- `recovery_timeout` (float): Circuit breaker recovery timeout
- `half_open_max_calls` (int): Max calls in half-open state
**Example:**
```python
from abstractcore import create_llm
from abstractcore.core.retry import RetryConfig
config = RetryConfig(
max_attempts=5,
initial_delay=2.0,
use_jitter=True,
failure_threshold=3
)
llm = create_llm("openai", model="gpt-4o-mini", retry_config=config)
```
### FeedbackRetry
Retry strategy for structured output validation failures.
```python
class FeedbackRetry:
def __init__(self, max_attempts: int = 3):
self.max_attempts = max_attempts
```
**Example:**
```python
from abstractcore.structured import FeedbackRetry
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
custom_retry = FeedbackRetry(max_attempts=5)
user = llm.generate(
"Extract user: John Doe, 25",
response_model=User,
retry_strategy=custom_retry
)
```
## Embeddings
### EmbeddingManager
Manages text embeddings using SOTA models.
```python
class EmbeddingManager:
def __init__(
self,
model: str = "embeddinggemma",
backend: str = "auto",
output_dims: Optional[int] = None,
cache_size: int = 1000,
cache_dir: Optional[str] = None
):
```
**Parameters:**
- `model` (str): Model name ("embeddinggemma", "granite", "stella-400m")
- `backend` (str): Backend ("auto", "pytorch", "onnx")
- `output_dims` (int, optional): Truncate output dimensions
- `cache_size` (int): Memory cache size
- `cache_dir` (str, optional): Disk cache directory
**Methods:**
#### embed()
```python
def embed(self, text: str) -> List[float]
```
Generate embedding for single text.
#### embed_batch()
```python
def embed_batch(self, texts: List[str]) -> List[List[float]]
```
Generate embeddings for multiple texts (more efficient).
#### compute_similarity()
```python
def compute_similarity(self, text1: str, text2: str) -> float
```
Compute cosine similarity between two texts.
**Example:**
```python
from abstractcore.embeddings import EmbeddingManager
embedder = EmbeddingManager(model="embeddinggemma")
# Single embedding
embedding = embedder.embed("Hello world")
print(f"Embedding dimension: {len(embedding)}")
# Batch embeddings
embeddings = embedder.embed_batch(["Hello", "World", "AI"])
# Similarity
similarity = embedder.compute_similarity("cat", "kitten")
print(f"Similarity: {similarity:.3f}")
```
## Exceptions
### Base Exceptions
#### AbstractCoreError
```python
class AbstractCoreError(Exception):
"""Base exception for AbstractCore."""
```
#### ProviderAPIError
```python
class ProviderAPIError(AbstractCoreError):
"""Provider API error."""
```
#### ModelNotFoundError
```python
class ModelNotFoundError(AbstractCoreError):
"""Model not found error."""
```
#### AuthenticationError
```python
class AuthenticationError(ProviderAPIError):
"""Authentication error."""
```
#### RateLimitError
```python
class RateLimitError(ProviderAPIError):
"""Rate limit error."""
```
### Usage
```python
from abstractcore.exceptions import ProviderAPIError, RateLimitError
try:
response = llm.generate("Hello world")
except RateLimitError:
print("Rate limited, wait and retry")
except ProviderAPIError as e:
print(f"API error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
```
## Advanced Usage Patterns
### Custom Provider Configuration
```python
# Provider with all options
llm = create_llm(
provider="openai",
model="gpt-4o-mini",
api_key="your-key",
temperature=0.7,
max_tokens=1000,
top_p=0.9,
timeout=30,
retry_config=RetryConfig(max_attempts=5)
)
```
### Multi-Provider Setup
```python
providers = {
"fast": create_llm("openai", model="gpt-4o-mini"),
"smart": create_llm("openai", model="gpt-4o"),
"long_context": create_llm("anthropic", model="claude-haiku-4-5"),
"local": create_llm("ollama", model="qwen2.5-coder:7b")
}
def route_request(prompt, task_type="general"):
if task_type == "simple":
return providers["fast"].generate(prompt)
elif task_type == "complex":
return providers["smart"].generate(prompt)
elif len(prompt) > 50000:
return providers["long_context"].generate(prompt)
else:
return providers["local"].generate(prompt)
```
### Production Monitoring
```python
from abstractcore.events import EventType, on_global
import logging
# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Cost tracking
total_cost = 0.0
def production_monitor(event):
global total_cost
if event.type == EventType.GENERATION_COMPLETED:
cost = event.data.get("cost_usd")
if cost:
# NOTE: `cost_usd` is a best-effort estimate based on token usage.
total_cost += float(cost)
logger.info(f"Estimated cost: ${float(cost):.4f}, Total: ${total_cost:.4f}")
duration_ms = event.data.get("duration_ms")
if isinstance(duration_ms, (int, float)) and duration_ms > 10_000:
logger.warning(f"Slow request: {float(duration_ms):.0f}ms")
elif event.type == EventType.ERROR:
logger.error(f"Error: {event.data.get('error')}")
elif event.type == EventType.RETRY_ATTEMPTED:
logger.info(f"Retrying due to: {event.data.get('error_type')}")
on_global(EventType.GENERATION_COMPLETED, production_monitor)
on_global(EventType.ERROR, production_monitor)
on_global(EventType.RETRY_ATTEMPTED, production_monitor)
```
---
For more examples and use cases, see:
- Getting Started - Basic setup and usage
- Examples - Practical use cases
- Prerequisites - Provider setup and configuration
- Capabilities - What AbstractCore can do