HTTP Server Guide

Transform AbstractCore into an OpenAI-compatible API server. One server, all models, any client.

🚀 Quick Start (2 minutes)

Install and Run

# Install
pip install abstractcore[server]

# Start server
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

# Test
curl http://localhost:8000/health
# Response: {"status":"healthy"}

First Request (cURL)

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

First Request (Python)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1", 
    api_key="unused"
)

response = client.chat.completions.create(
    model="anthropic/claude-3-5-haiku-latest",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

⚙️ Configuration

Environment Variables

# Provider API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Local providers
export OLLAMA_HOST="http://localhost:11434"
export LMSTUDIO_HOST="http://localhost:1234"

# Default settings
export ABSTRACTCORE_DEFAULT_PROVIDER=openai
export ABSTRACTCORE_DEFAULT_MODEL=gpt-4o-mini

# Debug mode
export ABSTRACTCORE_DEBUG=true

Startup Options

# Development with auto-reload
uvicorn abstractcore.server.app:app --reload

# Production with multiple workers
uvicorn abstractcore.server.app:app --workers 4

# Custom port
uvicorn abstractcore.server.app:app --port 3000

🌐 API Endpoints

POST /v1/chat/completions

Standard OpenAI-compatible endpoint. Works with all providers.

{
  "model": "provider/model-name",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Key Parameters:

model (required): Format "provider/model-name" (e.g., "openai/gpt-4o-mini")
messages (required): Array of message objects
stream (optional): Enable streaming responses
tools (optional): Tools for function calling
temperature, max_tokens, top_p: Standard LLM parameters

Streaming Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

stream = client.chat.completions.create(
    model="ollama/qwen3-coder:30b",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

POST /v1/responses (NEW)

100% OpenAI-compatible Responses API with native file support

{
  "model": "gpt-4o",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Analyze this document"},
        {"type": "input_file", "file_url": "https://example.com/report.pdf"}
      ]
    }
  ],
  "stream": false
}

Key Features:

Native File Support: input_file type designed for document attachments
Cleaner API: Explicit separation between text and files
Optional Streaming: Streaming opt-in with "stream": true
Backward Compatible: Legacy messages format still works
All Providers: Works with OpenAI, Anthropic, Ollama, LMStudio, etc.

Supported Media Types

import requests

# PDF from URL
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "anthropic/claude-3.5-sonnet",
        "input": [{
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Summarize this report"},
                {"type": "input_file", "file_url": "https://example.com/report.pdf"}
            ]
        }]
    }
)

# Excel from local path
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "openai/gpt-4o",
        "input": [{
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Analyze this data"},
                {"type": "input_file", "file_url": "/path/to/spreadsheet.xlsx"}
            ]
        }]
    }
)

# Base64-encoded CSV
import base64
with open("data.csv", "rb") as f:
    csv_data = base64.b64encode(f.read()).decode()

response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "lmstudio/qwen/qwen3-next-80b",
        "input": [{
            "role": "user",
            "content": [
                {"type": "input_text", "text": "What trends do you see?"},
                {"type": "input_file", "file_url": f"data:text/csv;base64,{csv_data}"}
            ]
        }]
    }
)

Multiple Files

# Analyze multiple files together
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "openai/gpt-4o",
        "input": [{
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Compare these documents"},
                {"type": "input_file", "file_url": "https://example.com/report1.pdf"},
                {"type": "input_file", "file_url": "https://example.com/report2.pdf"},
                {"type": "input_file", "file_url": "https://example.com/chart.png"}
            ]
        }],
        "max_tokens": 4000
    }
)

File Types Supported: Images (PNG, JPEG, GIF, WEBP, BMP, TIFF), Documents (PDF, DOCX, XLSX, PPTX), Data (CSV, TSV, TXT, MD, JSON)

Learn more: Media Handling System Guide

Media Handling with @filename Syntax

AbstractCore server supports simple @filename syntax in standard chat completions:

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

# Simple @filename syntax
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Analyze @report.pdf and @chart.png"}]
)

# Works with any provider
response = client.chat.completions.create(
    model="anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Summarize @document.docx"}]
)

# Multiple files
response = client.chat.completions.create(
    model="ollama/qwen3-coder:30b",
    messages=[{"role": "user", "content": "Compare @file1.pdf, @file2.pdf, and @data.csv"}]
)

Universal Support: Same syntax works across all providers with automatic media processing and provider-specific formatting.

POST /v1/embeddings

Generate embedding vectors for semantic search, RAG, and similarity analysis.

{
  "input": "Text to embed",
  "model": "huggingface/sentence-transformers/all-MiniLM-L6-v2"
}

Supported Providers:

HuggingFace: Local models with ONNX acceleration
Ollama: ollama/granite-embedding:278m, etc.
LMStudio: Any loaded embedding model

Batch Embedding

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["text 1", "text 2", "text 3"],
    "model": "ollama/granite-embedding:278m"
  }'

GET /v1/models

List all available models from configured providers.

# All models
curl http://localhost:8000/v1/models

# Ollama models only
curl http://localhost:8000/v1/models?provider=ollama

GET /providers

Complete provider metadata with model counts and capabilities.

curl http://localhost:8000/providers
# Returns: Complete provider metadata including:
# - Model counts (137+ models on our test instance - it will vary on yours depending of the providers and models you install)
# - Supported features (tools, streaming, etc.)
# - Authentication requirements
# - Local vs cloud provider status

GET /providers/{provider}/models

Models for specific provider with detailed information.

curl http://localhost:8000/providers/ollama/models

GET /health

Server health check for monitoring.

curl http://localhost:8000/health
# Response: {"status":"healthy"}

🤖 Agentic CLI Integration

Use AbstractCore server with agentic CLI tools like Codex, Crush, and Gemini CLI.

Codex CLI (Qwen3 Format)

# Setup with tool syntax rewriting
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"
export ABSTRACTCORE_TOOL_FORMAT="qwen3"  # <|tool_call|>...JSON...

# Use with any model
codex --model "ollama/qwen3-coder:30b" "Write a factorial function"

Crush CLI (LLaMA3 format)

# Configure server
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=llama3
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=false
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

# Configure CLI
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"

# Use
crush --model "anthropic/claude-3-5-haiku-latest" "Explain this code"

Gemini CLI (XML format)

# Configure server
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=xml
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=false
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

# Configure CLI
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"

# Use
gemini-cli --model "ollama/qwen3-coder:30b" "Review this project"

🔧 Tool Call Format Configuration

# Set format for your CLI
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=qwen3    # Codex CLI
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=llama3   # Crush CLI
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=xml      # Gemini CLI

# Control tool execution
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=true   # Server executes
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=false  # Return to client

🚀 Deployment

Docker

FROM python:3.9-slim

RUN pip install abstractcore[server]

ENV ABSTRACTCORE_DEFAULT_PROVIDER=openai
ENV ABSTRACTCORE_DEFAULT_MODEL=gpt-4o-mini

EXPOSE 8000

CMD ["uvicorn", "abstractcore.server.app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

docker build -t abstractcore-server .
docker run -p 8000:8000 -e OPENAI_API_KEY=$OPENAI_API_KEY abstractcore-server

Production with Gunicorn

pip install gunicorn

gunicorn abstractcore.server.app:app \
  --worker-class uvicorn.workers.UvicornWorker \
  --workers 4 \
  --bind 0.0.0.0:8000

Docker Compose

version: '3.8'

services:
  abstractcore:
    image: abstractcore-server:latest
    ports:
      - "8000:8000"
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    restart: unless-stopped

🔍 Debug and Monitoring

Enable Debug Logging

export ABSTRACTCORE_DEBUG=true
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

Log Files:

logs/abstractcore_TIMESTAMP.log - Structured events
logs/YYYYMMDD-payloads.jsonl - Full request bodies
logs/verbatim_TIMESTAMP.jsonl - Complete I/O

Useful Commands

# Find errors
grep '"level": "error"' logs/abstractcore_*.log

# Track token usage
cat logs/verbatim_*.jsonl | jq '.metadata.tokens | .input + .output' | \
  awk '{sum+=$1} END {print "Total:", sum}'

# Monitor specific model
grep '"model": "qwen3-coder:30b"' logs/verbatim_*.jsonl

📊 Interactive Documentation

Visit while server is running:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

💡 Common Patterns

Multi-Provider Fallback

import requests

providers = [
    "ollama/qwen3-coder:30b",
    "openai/gpt-4o-mini",
    "anthropic/claude-3-5-haiku-latest"
]

def generate_with_fallback(prompt):
    for model in providers:
        try:
            response = requests.post(
                "http://localhost:8000/v1/chat/completions",
                json={"model": model, "messages": [{"role": "user", "content": prompt}]},
                timeout=30
            )
            if response.status_code == 200:
                return response.json()
        except Exception:
            continue
    raise Exception("All providers failed")

Local Model Gateway

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-coder:30b

# Use via AbstractCore server
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/qwen3-coder:30b",
    "messages": [{"role": "user", "content": "Write a Python function"}]
  }'

🌟 Why AbstractCore Server?

✅ Universal

One API for all providers

✅ OpenAI Compatible

Drop-in replacement

✅ Simple

Clean, focused endpoints

✅ Fast

Lightweight, high-performance

✅ Debuggable

Comprehensive logging

✅ CLI Ready

Codex, Gemini CLI, Crush support