Getting Started

AbstractCore is a unified Python interface for cloud, gateway, and local LLM providers. The default install is lightweight; add only the extras your application needs.

Table of Contents

Prerequisites Installation Providers and models Your first call Sessions (multi-turn) Thinking / reasoning (best-effort) Streaming Tool calling Structured output Media input (images/audio/video + documents) Async CLI (optional) Next steps

Prerequisites

  • Python 3.9+
  • pip

Installation

Extras compose. For example, abstractcore[remote,media,tools] installs hosted API SDKs plus document/media handling and built-in tools in one command.

# Core: local HTTP servers and gateways that need no SDK
# Includes Ollama, LM Studio, OpenRouter, Portkey, and OpenAI-compatible /v1 endpoints
pip install abstractcore

# Hosted API SDKs (OpenAI + Anthropic). OpenRouter/Portkey still work from core.
pip install "abstractcore[remote]"

# Individual provider SDKs / local runtimes
pip install "abstractcore[openai]"       # OpenAI SDK
pip install "abstractcore[anthropic]"    # Anthropic SDK
pip install "abstractcore[huggingface]"  # Transformers / torch (heavy)
pip install "abstractcore[mlx]"          # Apple Silicon local inference (heavy)
pip install "abstractcore[vllm]"         # NVIDIA CUDA / ROCm inference (heavy)

# Optional features
pip install "abstractcore[tools]"        # built-in tools (web/file/command helpers)
pip install "abstractcore[media]"        # images, PDFs, Office docs
pip install "abstractcore[compression]"  # glyph visual-text compression (Pillow renderer)
pip install "abstractcore[embeddings]"   # EmbeddingManager + local embedding models
pip install "abstractcore[tokens]"       # precise token counting (tiktoken)
pip install "abstractcore[server]"       # OpenAI-compatible HTTP gateway

# Combine extras (zsh: keep quotes)
pip install "abstractcore[remote,media,tools]"

# Turnkey local-runtime installs
pip install "abstractcore[all-apple]"    # Apple Silicon: remote SDKs + HF/GGUF + MLX + features + server
pip install "abstractcore[all-gpu]"      # NVIDIA GPU: remote SDKs + HF/GGUF + vLLM + features + server

Local OpenAI-compatible servers (Ollama, LMStudio, vLLM, llama.cpp, LocalAI, etc.) work with the core install; you just point AbstractCore at the server base URL. See Prerequisites for provider setup.

Optional capability plugins (deterministic multimodal outputs):

pip install abstractvoice   # enables llm.voice / llm.audio (TTS/STT)
pip install abstractvision  # enables llm.vision (generative vision; typically via an OpenAI-compatible images endpoint)

See: Capabilities and Server.

Providers and models

AbstractCore uses a provider ID plus a model name:

from abstractcore import create_llm

llm = create_llm("openai", model="gpt-4o-mini")
# llm = create_llm("anthropic", model="claude-haiku-4-5")
# llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
# llm = create_llm("lmstudio", model="qwen/qwen3-4b-2507")
# llm = create_llm("openai-compatible", model="default", base_url="http://localhost:1234/v1")

Tip: you can omit model=..., but it’s usually better to pass an explicit model to avoid surprises when defaults change.

Open-source-first: start with local providers (Ollama, LMStudio, MLX, HuggingFace), then add cloud or gateway providers as needed.

Gateway providers (OpenRouter, Portkey) examples:

from abstractcore import create_llm

llm_openrouter = create_llm("openrouter", model="openai/gpt-4o-mini")
llm_portkey = create_llm("portkey", model="gpt-5-mini", api_key="PORTKEY_API_KEY", config_id="pcfg_...")

Note: gateway providers only forward optional generation params (e.g. temperature, top_p, max_output_tokens) when you explicitly set them.

Your first call

OpenAI example (requires pip install "abstractcore[openai]"):

from abstractcore import create_llm

llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("What is the capital of France?")
print(resp.content)

Sessions (multi-turn)

Use a session to keep conversation state (system prompt + message history) across turns:

from abstractcore import BasicSession, create_llm

llm = create_llm("openai", model="gpt-4o-mini")
session = BasicSession(provider=llm, system_prompt="You are a helpful assistant.")

print(session.generate("Hello!").content)
print(session.generate("Now continue.").content)

For prompt-cache-aware long chats (reuse stable prefixes like system/tools/files), use CachedSession: - See Prompt Caching.

Thinking / reasoning (best-effort)

Many modern models can optionally emit a reasoning/thinking trace (sometimes in a separate channel, sometimes inline). AbstractCore exposes a single unified control:

from abstractcore import create_llm

llm = create_llm("lmstudio", model="qwen3.5-27b@q4_k_m", base_url="http://localhost:1234/v1")

# Disable thinking (tries to suppress any reasoning trace)
resp = llm.generate("Compute 17*23 - 19*11. Reply with the integer only.", thinking="none")
print(resp.content)

# Enable thinking (levels are best-effort; not all backends support budgets)
resp = llm.generate("Solve a hard logic puzzle.", thinking="high")
print(resp.content)
print(resp.metadata.get("reasoning"))  # when the backend exposes it

Notes: - For Qwen3 / Qwen3.5 on LM Studio, AbstractCore uses LM Studio’s model template variables (enable_thinking / enableThinking) and a Qwen template “hard switch” for thinking="none" (empty <think></think>), rather than injecting “Reasoning effort …” text into the system prompt. - For Qwen3 / Qwen3.5 GGUF via HuggingFaceProvider (llama-cpp-python), there is no template-kwargs knob exposed by llama-cpp-python today, so thinking="none" also uses the Qwen hard-switch marker. If GGUF loading fails due to huge advertised context windows, AbstractCore will retry with smaller n_ctx values (best-effort); you can also pass max_tokens=... when constructing HuggingFaceProvider() to explicitly control llama.cpp n_ctx. - For Ollama, enabling thinking may consume a lot of output tokens in the thinking channel; consider using a larger max_output_tokens when thinking is enabled.

For server usage (OpenAI-compatible HTTP), see Server and Generation Parameters.

Streaming

from abstractcore import create_llm

llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
for chunk in llm.generate("Write a short poem about distributed systems.", stream=True):
    print(chunk.content or "", end="", flush=True)

Tool calling

AbstractCore supports native tool calling (when the provider supports it) and prompted tool syntax (when it doesn’t).

By default, tool execution is pass-through (execute_tools=False): you get tool calls in resp.tool_calls, and your host/runtime decides how to execute them.

In the AbstractFramework ecosystem, AbstractRuntime is the recommended runtime for executing tool calls durably (policy, retries, persistence). See Architecture and Tool Calling.

from abstractcore import create_llm, tool

@tool
def get_weather(city: str) -> str:
    return f"{city}: 22°C and sunny"

llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("What's the weather in Paris? Use the tool.", tools=[get_weather])

print(resp.content)
print(resp.tool_calls)

See Tool Calling and Tool Syntax Rewriting (tool_call_tags, server agent_format).

Note: - If you pass both tools=[...] and response_model=... to generate(), AbstractCore uses a 2-pass hybrid flow (tool-capable call, then structured-output call). Streaming is not supported in this hybrid mode.

Built-in tools (optional)

If you want a ready-made toolset for agentic scripts, install:

pip install "abstractcore[tools]"

Then import from abstractcore.tools.common_tools:

  • skim_websearch vs web_search: compact/filtered links vs full results
  • skim_url vs fetch_url: fast URL triage (small output) vs full fetch + parsing for text-first types (HTML/JSON/text)

See Tool Calling for a recommended workflow and the full built-in tool list.

Structured output

Pass a Pydantic model via response_model=... to get a typed result back (instead of parsing JSON yourself):

from pydantic import BaseModel
from abstractcore import create_llm

class Answer(BaseModel):
    title: str
    bullets: list[str]

llm = create_llm("openai", model="gpt-4o-mini")
answer = llm.generate("Summarize HTTP/3 in 3 bullets.", response_model=Answer)
print(answer.bullets)

See Structured Output for strategy details and limitations.

Media input (images/audio/video + documents)

Images and document extraction require pip install "abstractcore[media]" (Pillow + PDF/Office deps).

from abstractcore import create_llm

llm = create_llm("anthropic", model="claude-haiku-4-5")
resp = llm.generate("Describe the image.", media=["./image.png"])
print(resp.content)

Audio and video attachments are also supported, but they are policy-driven (no silent semantic changes): - audio: audio_policy (native_only|speech_to_text|auto|caption) - video: video_policy (native_only|frames_caption|auto)

Speech-to-text fallback (audio_policy="speech_to_text") typically requires installing abstractvoice (capability plugin).

What you need (quick checklist): - Images: abstractcore[media] + either a vision-capable model (VLM/VL) or configured vision fallback (abstractcore --set-vision-provider PROVIDER MODEL). - Video: ffmpeg/ffprobe on PATH + either a vision-capable model or configured vision fallback (for frame sampling). Native video input is model/provider dependent. - Audio: either an audio-capable model or speech-to-text fallback via abstractvoice + audio_policy="auto"/"speech_to_text".

Defaults can be configured via the config CLI (abstractcore --config, abstractcore --status). See Centralized Config.

If your main model is text-only, you can configure vision fallback (two-stage captioning) so images are automatically described and injected as short observations. See Media Handling, Vision Capabilities, and Centralized Config.

For long documents, AbstractCore can optionally apply Glyph visual-text compression. Install pip install "abstractcore[compression]" (and pip install "abstractcore[media]" for PDFs) and see Glyph Visual-Text Compression.

Async

import asyncio
from abstractcore import create_llm

async def main():
    llm = create_llm("openai", model="gpt-4o-mini")
    resp = await llm.agenerate("Give me 3 bullet points about HTTP caching.")
    print(resp.content)

asyncio.run(main())

CLI (optional)

# Configure defaults and API keys
abstractcore --config
abstractcore --status

# Interactive chat
abstractcore-chat --provider openai --model gpt-4o-mini

Next steps

  • Prerequisites — provider setup (keys, base URLs, hardware notes)
  • FAQ — common questions and setup gotchas
  • Examples — end-to-end patterns and recipes
  • API (Python) — public API map and common patterns
  • API Reference — complete function/class listing
  • Troubleshooting — common errors and fixes
  • Server — OpenAI-compatible HTTP gateway
  • Endpoint — single-model OpenAI-compatible endpoint (one provider/model per worker)