HTTP Server Guide

Run AbstractCore as an OpenAI-compatible API gateway: one server, any provider, any model, any client.

Why use the server?

OpenAI-compatible

Use standard clients, route via model="provider/model".

One endpoint

Switch providers without changing your application code.

Tools + media

Tools, images, PDFs, Office docs, and @filename attachments.

Table of Contents

Quick Start Configuration API Endpoints
Model Routing Multimodal (Files) Tools + agent_format
Agentic CLI Integration Deployment Notes

Quick Start

Install and run the server:

pip install "abstractcore[server]"

# Start server
python -m abstractcore.server.app

# Or with uvicorn directly
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

# Health check
curl http://localhost:8000/health

First request (cURL)

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

First request (Python)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

resp = client.chat.completions.create(
    model="anthropic/claude-haiku-4-5",
    messages=[{"role": "user", "content": "Explain quantum computing in 3 bullets."}],
)
print(resp.choices[0].message.content)

Configuration

Environment variables

# Provider API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."

# Local providers
export OLLAMA_BASE_URL="http://localhost:11434"          # (or legacy: OLLAMA_HOST)
export LMSTUDIO_BASE_URL="http://localhost:1234/v1"
export VLLM_BASE_URL="http://localhost:8000/v1"

# Defaults (optional)
export ABSTRACTCORE_DEFAULT_PROVIDER=openai
export ABSTRACTCORE_DEFAULT_MODEL=gpt-4o-mini

# Debug mode
export ABSTRACTCORE_DEBUG=true

# Multi-tenant hazard: allow unload_after for providers that can unload shared server state
export ABSTRACTCORE_ALLOW_UNSAFE_UNLOAD_AFTER=1

Startup options

python -m abstractcore.server.app --help
python -m abstractcore.server.app --debug
python -m abstractcore.server.app --host 127.0.0.1 --port 8080

# uvicorn examples
uvicorn abstractcore.server.app:app --reload
uvicorn abstractcore.server.app:app --workers 4

API Endpoints

The server is OpenAI-compatible by default, with a few useful AbstractCore extensions.

  • Chat: POST /v1/chat/completions
  • Responses API: POST /v1/responses
  • Embeddings: POST /v1/embeddings
  • Images (generate): POST /v1/images/generations
  • Images (edit): POST /v1/images/edits
  • Audio (STT): POST /v1/audio/transcriptions
  • Audio (TTS): POST /v1/audio/speech
  • Model discovery: GET /v1/models
  • Provider status: GET /providers
  • Health: GET /health

Model discovery

curl http://localhost:8000/v1/models
curl "http://localhost:8000/v1/models?provider=ollama"
curl "http://localhost:8000/v1/models?type=text-embedding"
curl "http://localhost:8000/v1/models?provider=ollama&type=text-embedding"

Media generation endpoints (optional)

AbstractCore Server can optionally expose OpenAI-compatible image generation and audio endpoints. These are interoperability-first endpoints; if the required plugin/backend is not available, the server returns 501 with actionable messaging.

Images (generate/edit) — requires abstractvision

  • POST /v1/images/generations
  • POST /v1/images/edits
pip install "abstractcore[server]" abstractvision
python -m abstractcore.server.app

abstractvision typically needs an OpenAI-compatible images backend. Configure ABSTRACTVISION_BASE_URL + ABSTRACTVISION_API_KEY + ABSTRACTVISION_MODEL_ID (or the equivalent config settings) to point to that backend.

Audio (STT/TTS) — requires abstractvoice

These endpoints do not require an LLM provider configuration.

  • /v1/audio/transcriptions requires pip install abstractvoice (multipart parsing via python-multipart is included in abstractcore[server]).
  • /v1/audio/speech requires pip install abstractvoice on the server.
# Speech-to-text (multipart)
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F file=@./call.wav \
  -F language=en

# Text-to-speech (JSON)
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello from AbstractCore","format":"wav"}' \
  --output out.wav

See Audio & Voice (STT/TTS) for library-side usage and configuration defaults.

Model Routing

Route to any provider/model using model="provider/model-name", for example:

  • openai/gpt-4o-mini
  • anthropic/claude-haiku-4-5
  • ollama/qwen3:4b-instruct
  • lmstudio/qwen/qwen3-4b-2507

Per-request extensions

AbstractCore also supports a few optional request fields:

  • api_key: per-request provider key (falls back to env vars if omitted)
  • base_url: override provider endpoint (include /v1 for OpenAI-compatible servers)
  • unload_after: unload model after request (restricted for some providers unless explicitly enabled)
  • thinking: unified thinking/reasoning control (best-effort)

Multimodal Requests (Images, Documents, Audio/Video)

Two common ways to attach files. Images and documents are handled automatically; audio/video inputs are policy-driven (audio_policy, video_policy) to avoid silent semantic changes.

1) @filename syntax (AbstractCore extension)

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "Summarize this report. @/path/to/report.pdf"}
    ]
  }'

2) OpenAI vision-style messages

For full examples (image URLs, base64, and policy-driven audio/video fallbacks), see Media Handling, Vision Capabilities, and Audio & Voice.

Tools + agent_format

Tool calls are supported on /v1/chat/completions and /v1/responses.

  • Use tools as you would with OpenAI-compatible clients.
  • Set agent_format to request a specific tool-call syntax (useful for agentic CLIs/parsers).

See Tool Syntax Rewriting for details and supported formats.

Agentic CLI Integration

Point agentic CLIs at the server by setting OpenAI-compatible environment variables.

Codex CLI

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"
export ABSTRACTCORE_API_KEY="unused"

codex --model "ollama/qwen3-coder:30b" "Write a factorial function"

Tool-call format defaults

# Example: prefer LLaMA3-style tool markup by default
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=llama3
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=false

Deployment Notes

  • Use uvicorn ... --workers N for multiple workers in production.
  • Use /health for monitoring and readiness checks.
  • Be cautious with unload_after in multi-tenant environments.

Related Documentation

Tool Calling

Universal tools across providers

Tool Syntax Rewriting

agent_format and tool_call_tags

Media Handling

Files, documents, and vision fallback

Embeddings

Semantic search + RAG building blocks