HTTP Server Guide
Transform AbstractCore into an OpenAI-compatible API server. One server, all models, any client.
🚀 Quick Start (2 minutes)
Install and Run
# Install
pip install abstractcore[server]
# Start server
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000
# Test
curl http://localhost:8000/health
# Response: {"status":"healthy"}
First Request (cURL)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
First Request (Python)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="unused"
)
response = client.chat.completions.create(
model="anthropic/claude-3-5-haiku-latest",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
⚙️ Configuration
Environment Variables
# Provider API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Local providers
export OLLAMA_HOST="http://localhost:11434"
export LMSTUDIO_HOST="http://localhost:1234"
# Default settings
export ABSTRACTCORE_DEFAULT_PROVIDER=openai
export ABSTRACTCORE_DEFAULT_MODEL=gpt-4o-mini
# Debug mode
export ABSTRACTCORE_DEBUG=true
Startup Options
# Development with auto-reload
uvicorn abstractcore.server.app:app --reload
# Production with multiple workers
uvicorn abstractcore.server.app:app --workers 4
# Custom port
uvicorn abstractcore.server.app:app --port 3000
🌐 API Endpoints
POST /v1/chat/completions
Standard OpenAI-compatible endpoint. Works with all providers.
{
"model": "provider/model-name",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}
Key Parameters:
model
(required): Format"provider/model-name"
(e.g.,"openai/gpt-4o-mini"
)messages
(required): Array of message objectsstream
(optional): Enable streaming responsestools
(optional): Tools for function callingtemperature
,max_tokens
,top_p
: Standard LLM parameters
Streaming Example
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
stream = client.chat.completions.create(
model="ollama/qwen3-coder:30b",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
POST /v1/responses (NEW)
100% OpenAI-compatible Responses API with native file support
{
"model": "gpt-4o",
"input": [
{
"role": "user",
"content": [
{"type": "input_text", "text": "Analyze this document"},
{"type": "input_file", "file_url": "https://example.com/report.pdf"}
]
}
],
"stream": false
}
Key Features:
- Native File Support:
input_file
type designed for document attachments - Cleaner API: Explicit separation between text and files
- Optional Streaming: Streaming opt-in with
"stream": true
- Backward Compatible: Legacy
messages
format still works - All Providers: Works with OpenAI, Anthropic, Ollama, LMStudio, etc.
Supported Media Types
import requests
# PDF from URL
response = requests.post(
"http://localhost:8000/v1/responses",
json={
"model": "anthropic/claude-3.5-sonnet",
"input": [{
"role": "user",
"content": [
{"type": "input_text", "text": "Summarize this report"},
{"type": "input_file", "file_url": "https://example.com/report.pdf"}
]
}]
}
)
# Excel from local path
response = requests.post(
"http://localhost:8000/v1/responses",
json={
"model": "openai/gpt-4o",
"input": [{
"role": "user",
"content": [
{"type": "input_text", "text": "Analyze this data"},
{"type": "input_file", "file_url": "/path/to/spreadsheet.xlsx"}
]
}]
}
)
# Base64-encoded CSV
import base64
with open("data.csv", "rb") as f:
csv_data = base64.b64encode(f.read()).decode()
response = requests.post(
"http://localhost:8000/v1/responses",
json={
"model": "lmstudio/qwen/qwen3-next-80b",
"input": [{
"role": "user",
"content": [
{"type": "input_text", "text": "What trends do you see?"},
{"type": "input_file", "file_url": f"data:text/csv;base64,{csv_data}"}
]
}]
}
)
Multiple Files
# Analyze multiple files together
response = requests.post(
"http://localhost:8000/v1/responses",
json={
"model": "openai/gpt-4o",
"input": [{
"role": "user",
"content": [
{"type": "input_text", "text": "Compare these documents"},
{"type": "input_file", "file_url": "https://example.com/report1.pdf"},
{"type": "input_file", "file_url": "https://example.com/report2.pdf"},
{"type": "input_file", "file_url": "https://example.com/chart.png"}
]
}],
"max_tokens": 4000
}
)
File Types Supported: Images (PNG, JPEG, GIF, WEBP, BMP, TIFF), Documents (PDF, DOCX, XLSX, PPTX), Data (CSV, TSV, TXT, MD, JSON)
Learn more: Media Handling System Guide
Media Handling with @filename Syntax
AbstractCore server supports simple @filename
syntax in
standard chat completions:
import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
# Simple @filename syntax
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Analyze @report.pdf and @chart.png"}]
)
# Works with any provider
response = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet",
messages=[{"role": "user", "content": "Summarize @document.docx"}]
)
# Multiple files
response = client.chat.completions.create(
model="ollama/qwen3-coder:30b",
messages=[{"role": "user", "content": "Compare @file1.pdf, @file2.pdf, and @data.csv"}]
)
Universal Support: Same syntax works across all providers with automatic media processing and provider-specific formatting.
POST /v1/embeddings
Generate embedding vectors for semantic search, RAG, and similarity analysis.
{
"input": "Text to embed",
"model": "huggingface/sentence-transformers/all-MiniLM-L6-v2"
}
Supported Providers:
- HuggingFace: Local models with ONNX acceleration
- Ollama:
ollama/granite-embedding:278m
, etc. - LMStudio: Any loaded embedding model
Batch Embedding
curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": ["text 1", "text 2", "text 3"],
"model": "ollama/granite-embedding:278m"
}'
GET /v1/models
List all available models from configured providers.
# All models
curl http://localhost:8000/v1/models
# Ollama models only
curl http://localhost:8000/v1/models?provider=ollama
GET /providers
Complete provider metadata with model counts and capabilities.
curl http://localhost:8000/providers
# Returns: Complete provider metadata including:
# - Model counts (137+ models on our test instance - it will vary on yours depending of the providers and models you install)
# - Supported features (tools, streaming, etc.)
# - Authentication requirements
# - Local vs cloud provider status
GET /providers/{provider}/models
Models for specific provider with detailed information.
curl http://localhost:8000/providers/ollama/models
GET /health
Server health check for monitoring.
curl http://localhost:8000/health
# Response: {"status":"healthy"}
🤖 Agentic CLI Integration
Use AbstractCore server with agentic CLI tools like Codex, Crush, and Gemini CLI.
Codex CLI (Qwen3 Format)
# Setup with tool syntax rewriting
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"
export ABSTRACTCORE_TOOL_FORMAT="qwen3" # <|tool_call|>...JSON...|tool_call|>
# Use with any model
codex --model "ollama/qwen3-coder:30b" "Write a factorial function"
Crush CLI (LLaMA3 format)
# Configure server
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=llama3
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=false
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000
# Configure CLI
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"
# Use
crush --model "anthropic/claude-3-5-haiku-latest" "Explain this code"
Gemini CLI (XML format)
# Configure server
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=xml
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=false
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000
# Configure CLI
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"
# Use
gemini-cli --model "ollama/qwen3-coder:30b" "Review this project"
🔧 Tool Call Format Configuration
# Set format for your CLI
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=qwen3 # Codex CLI
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=llama3 # Crush CLI
export ABSTRACTCORE_DEFAULT_TOOL_CALL_TAGS=xml # Gemini CLI
# Control tool execution
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=true # Server executes
export ABSTRACTCORE_DEFAULT_EXECUTE_TOOLS=false # Return to client
🚀 Deployment
Docker
FROM python:3.9-slim
RUN pip install abstractcore[server]
ENV ABSTRACTCORE_DEFAULT_PROVIDER=openai
ENV ABSTRACTCORE_DEFAULT_MODEL=gpt-4o-mini
EXPOSE 8000
CMD ["uvicorn", "abstractcore.server.app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
docker build -t abstractcore-server .
docker run -p 8000:8000 -e OPENAI_API_KEY=$OPENAI_API_KEY abstractcore-server
Production with Gunicorn
pip install gunicorn
gunicorn abstractcore.server.app:app \
--worker-class uvicorn.workers.UvicornWorker \
--workers 4 \
--bind 0.0.0.0:8000
Docker Compose
version: '3.8'
services:
abstractcore:
image: abstractcore-server:latest
ports:
- "8000:8000"
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- OPENAI_API_KEY=${OPENAI_API_KEY}
restart: unless-stopped
🔍 Debug and Monitoring
Enable Debug Logging
export ABSTRACTCORE_DEBUG=true
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000
Log Files:
logs/abstractcore_TIMESTAMP.log
- Structured eventslogs/YYYYMMDD-payloads.jsonl
- Full request bodieslogs/verbatim_TIMESTAMP.jsonl
- Complete I/O
Useful Commands
# Find errors
grep '"level": "error"' logs/abstractcore_*.log
# Track token usage
cat logs/verbatim_*.jsonl | jq '.metadata.tokens | .input + .output' | \
awk '{sum+=$1} END {print "Total:", sum}'
# Monitor specific model
grep '"model": "qwen3-coder:30b"' logs/verbatim_*.jsonl
📊 Interactive Documentation
Visit while server is running:
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc
💡 Common Patterns
Multi-Provider Fallback
import requests
providers = [
"ollama/qwen3-coder:30b",
"openai/gpt-4o-mini",
"anthropic/claude-3-5-haiku-latest"
]
def generate_with_fallback(prompt):
for model in providers:
try:
response = requests.post(
"http://localhost:8000/v1/chat/completions",
json={"model": model, "messages": [{"role": "user", "content": prompt}]},
timeout=30
)
if response.status_code == 200:
return response.json()
except Exception:
continue
raise Exception("All providers failed")
Local Model Gateway
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-coder:30b
# Use via AbstractCore server
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3-coder:30b",
"messages": [{"role": "user", "content": "Write a Python function"}]
}'
🌟 Why AbstractCore Server?
✅ Universal
One API for all providers
✅ OpenAI Compatible
Drop-in replacement
✅ Simple
Clean, focused endpoints
✅ Fast
Lightweight, high-performance
✅ Debuggable
Comprehensive logging
✅ CLI Ready
Codex, Gemini CLI, Crush support