Prerequisites & Setup Guide
This guide walks you through setting up AbstractCore with different LLM providers. Choose the provider(s) that are suitable for your needs — you can use multiple providers in the same application.
Table of Contents
Quick Decision Guide Core Installation Cloud Provider Setup Local Provider Setup Troubleshooting Testing Your Setup Security NotesQuick Decision Guide
Want to get started immediately? → OpenAI Setup (requires API key)
Want free local models? → Ollama Setup (free, runs on your machine)
Have Apple Silicon Mac? → MLX Setup (optimized for M1/M2/M3/M4 chips)
Have NVIDIA GPU? → vLLM Setup (production GPU inference; NVIDIA CUDA only)
Want a GUI for local models? → LMStudio Setup (easiest local setup)
Want a gateway/proxy? → Gateway Provider Setup (OpenRouter/Portkey routing + governance)
Using a custom OpenAI-compatible /v1 endpoint? → OpenAI-Compatible Setup
Core Installation
Install AbstractCore, then add the extras you need:
# Core (small default)
pip install abstractcore
# Providers (only if you use them)
pip install "abstractcore[openai]" # OpenAI SDK
pip install "abstractcore[anthropic]" # Anthropic SDK
pip install "abstractcore[huggingface]" # Transformers / torch (heavy)
pip install "abstractcore[mlx]" # Apple Silicon only (heavy)
pip install "abstractcore[vllm]" # NVIDIA CUDA/ROCm only (heavy)
# Optional features
pip install "abstractcore[tools]" # built-in web tools (web_search, skim_websearch, skim_url, fetch_url)
pip install "abstractcore[media]" # images, PDFs, Office docs
pip install "abstractcore[embeddings]" # EmbeddingManager + local embedding models
pip install "abstractcore[tokens]" # precise token counting (tiktoken)
pip install "abstractcore[server]" # OpenAI-compatible HTTP gateway
pip install "abstractcore[compression]" # Glyph visual-text compression (Pillow renderer)
# Turnkey "everything" installs (pick one)
pip install "abstractcore[all-apple]" # macOS/Apple Silicon (includes MLX, excludes vLLM)
pip install "abstractcore[all-non-mlx]" # Linux/Windows/Intel Mac (excludes MLX and vLLM)
pip install "abstractcore[all-gpu]" # Linux NVIDIA GPU (includes vLLM, excludes MLX)
Hardware Notes:
- [mlx] - Only works on Apple Silicon (M1/M2/M3/M4)
- [vllm] - Only works with NVIDIA CUDA GPUs
- [all-apple] - Best for Apple Silicon (includes MLX, excludes vLLM)
- [all-non-mlx] - Best for Linux/Windows/Intel Mac (excludes MLX and vLLM)
- [all-gpu] - Best for Linux NVIDIA GPU (includes vLLM, excludes MLX)
Cloud Provider Setup
OpenAI Setup
Best for: Production applications and OpenAI’s hosted models
1. Get API Key
- Go to OpenAI API Dashboard
- Create account or sign in
- Click "Create new secret key"
- Copy the key (starts with
sk-)
2. Set Environment Variable
# Option 1: Export in terminal (temporary)
export OPENAI_API_KEY="sk-your-actual-api-key-here"
# Option 2: Add to ~/.bashrc or ~/.zshrc (permanent)
echo 'export OPENAI_API_KEY="sk-your-actual-api-key-here"' >> ~/.bashrc
source ~/.bashrc
# Option 3: Create .env file in your project
echo 'OPENAI_API_KEY=sk-your-actual-api-key-here' > .env
3. Test Setup
from abstractcore import create_llm
# Test with an example model (use any model available on your account)
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate("Say hello in French")
print(response.content) # Should output: "Bonjour!"
Model names: Use any model supported by your account (examples: gpt-4o-mini, gpt-4o).
Anthropic Setup
Best for: Claude models via Anthropic’s API
1. Get API Key
- Go to Anthropic Console
- Create account or sign in
- Go to "API Keys" section
- Click "Create Key"
- Copy the key (starts with
sk-ant-)
2. Set Environment Variable
# Option 1: Export in terminal (temporary)
export ANTHROPIC_API_KEY="sk-ant-your-actual-api-key-here"
# Option 2: Add to shell profile (permanent)
echo 'export ANTHROPIC_API_KEY="sk-ant-your-actual-api-key-here"' >> ~/.bashrc
source ~/.bashrc
# Option 3: Create .env file
echo 'ANTHROPIC_API_KEY=sk-ant-your-actual-api-key-here' > .env
3. Test Setup
from abstractcore import create_llm
# Test with an example model (use any model available on your account)
llm = create_llm("anthropic", model="claude-haiku-4-5")
response = llm.generate("Explain Python in one sentence")
print(response.content)
Model names: Use any model supported by your account (examples: claude-haiku-4-5, claude-sonnet-4-5).
Gateway Provider Setup (OpenRouter, Portkey)
Best for: routing, observability/governance, and unified billing across multiple backends.
Gateways expose an OpenAI-compatible /v1 endpoint and forward your payload to the routed backend model. Because some backends are strict (for example OpenAI reasoning families like gpt-5/o1 reject unsupported parameters), AbstractCore’s gateway providers forward optional generation parameters (like temperature, top_p, max_output_tokens) only when explicitly set.
OpenRouter Setup
- Create an API key: https://openrouter.ai/keys
- Set the environment variable:
export OPENROUTER_API_KEY="sk-or-..."
# Optional override (default: https://openrouter.ai/api/v1)
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
- Test:
from abstractcore import create_llm
llm = create_llm("openrouter", model="openai/gpt-4o-mini")
resp = llm.generate("Say hello in French")
print(resp.content)
Portkey Setup
Portkey routes requests using a config id (commonly pcfg_...).
- Create an API key and config in Portkey, then copy:
PORTKEY_API_KEY-
PORTKEY_CONFIG(config id) -
Set environment variables:
export PORTKEY_API_KEY="pk_..."
export PORTKEY_CONFIG="pcfg_..."
# Optional override (default: https://api.portkey.ai/v1)
export PORTKEY_BASE_URL="https://api.portkey.ai/v1"
- Test:
from abstractcore import create_llm
llm = create_llm("portkey", model="gpt-4o-mini", config_id="pcfg_...")
resp = llm.generate("Say hello in French")
print(resp.content)
Local Provider Setup
Ollama Setup
Best for: Privacy, no API keys, offline usage, customization
Requirements: 8GB+ RAM, works on Mac/Linux/Windows
1. Install Ollama
macOS:
curl -fsSL https://ollama.com/install.sh | sh
# OR download from https://ollama.com/download
Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows: 1. Download installer from ollama.com/download 2. Run the installer 3. Restart terminal
2. Start Ollama Service
# Start Ollama server (runs in background)
ollama serve
3. Download Models
# Pull any model you want to use, then verify it's installed.
ollama pull qwen3:4b-instruct-2507-q4_K_M
ollama list
4. Test Setup
from abstractcore import create_llm
# Test with any model you installed via `ollama pull ...`
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
response = llm.generate("What is Python?")
print(response.content)
MLX Setup
Best for: M1/M2/M3/M4 Macs, optimized inference, good speed
Requirements: Apple Silicon Mac (M1/M2/M3/M4)
1. Install MLX Dependencies
# MLX is automatically installed with AbstractCore
pip install "abstractcore[mlx]"
2. Download Models
MLX models are automatically downloaded when first used. Popular options:
from abstractcore import create_llm
# Models are auto-downloaded on first use
llm = create_llm("mlx", model="mlx-community/Qwen2.5-Coder-7B-Instruct-4bit") # 4.2GB
# OR
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit") # 1.8GB
3. Test Setup
from abstractcore import create_llm
# Test with a good balance model
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit")
response = llm.generate("Explain machine learning briefly")
print(response.content)
Popular MLX Models:
- mlx-community/Llama-3.2-3B-Instruct-4bit - 1.8GB, fast
- mlx-community/Qwen2.5-Coder-7B-Instruct-4bit - 4.2GB, suitable for code
- mlx-community/Llama-3.1-8B-Instruct-4bit - 4.7GB, high quality
LMStudio Setup
Best for: Easy GUI management, Windows users, non-technical users
Requirements: 8GB+ RAM, works on Mac/Linux/Windows
1. Install LMStudio
- Download from lmstudio.ai
- Install the application
- Launch LMStudio
2. Download Models
- Open LMStudio
- Go to "Discover" tab
- Search for recommended models:
microsoft/Phi-3-mini-4k-instruct-gguf(small, fast)microsoft/Phi-3-medium-4k-instruct-gguf(medium quality)meta-llama/Llama-2-7b-chat-gguf(good general purpose)- Click download for your preferred model
3. Start Local Server
- Go to "Local Server" tab in LMStudio
- Select your downloaded model
- Click "Start Server"
- Note the port (usually 1234)
4. Test Setup
from abstractcore import create_llm
# LM Studio exposes an OpenAI-compatible server (default: http://localhost:1234/v1).
# Use the model ID shown in LM Studio (or try "local-model" if unsure).
llm = create_llm("lmstudio", model="local-model", base_url="http://localhost:1234/v1")
resp = llm.generate("Hello, how are you?")
print(resp.content)
HuggingFace Setup
Best for: Latest research models, custom models, GGUF files
Requirements: 8GB+ RAM, Python environment
1. Install Dependencies
pip install "abstractcore[huggingface]"
2. Optional: Get HuggingFace Token
For private models or higher rate limits:
- Go to huggingface.co/settings/tokens
- Create a "Read" token
- Set environment variable:
export HUGGINGFACE_TOKEN="hf_your-token-here"
3. Test Setup
from abstractcore import create_llm
# Use a small model for testing (auto-downloads)
llm = create_llm("huggingface", model="microsoft/DialoGPT-medium")
response = llm.generate("Hello there!")
print(response.content)
Popular HuggingFace Models:
- microsoft/DialoGPT-medium - Good for conversation
- facebook/blenderbot-400M-distill - Conversational AI
- microsoft/CodeBERT-base - Code understanding
vLLM Setup
Best for: Production GPU deployments, high-throughput inference, tensor parallelism
Requirements: - NVIDIA GPU with CUDA support (A100, H100, RTX 4090, etc.) - Linux operating system - CUDA 12.1+ installed - 16GB+ VRAM recommended - NOT compatible with: Apple Silicon, AMD GPUs, CPU-only systems
NVIDIA CUDA only. If you’re on Apple Silicon, use MLX. If you’re on CPU-only, use Ollama/HuggingFace.
⚠️ Hardware Compatibility Warning
vLLM ONLY works with NVIDIA CUDA GPUs. It will NOT work on: - ❌ Apple Silicon (M1/M2/M3/M4) - Use MLX provider instead - ❌ AMD GPUs - Use HuggingFace or Ollama instead - ❌ Intel integrated graphics - ❌ CPU-only systems
1. Install vLLM
# Install AbstractCore with vLLM support
pip install "abstractcore[vllm]"
# This installs vLLM which requires NVIDIA CUDA
# If you get CUDA errors, ensure CUDA 12.1+ is installed:
# https://developer.nvidia.com/cuda-downloads
2. Start vLLM Server
IMPORTANT: Check your GPU setup first to avoid Out Of Memory (OOM) errors:
# Check available GPUs
nvidia-smi
# Shows: GPU name, VRAM capacity, and current usage
# Example: 4x NVIDIA L4 (23GB each) = 92GB total
Choose the right startup command based on your hardware:
# Single GPU (24GB+) - Works for 7B-14B models
vllm serve Qwen/Qwen2.5-Coder-7B-Instruct --port 8000
# Single GPU (24GB+) - For 30B models, reduce memory
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
--port 8000 \
--gpu-memory-utilization 0.85 \
--max-model-len 4096
# Multiple GPUs (RECOMMENDED for 30B models) - Use tensor parallelism
# Example: 4x NVIDIA L4 (23GB each)
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
--host 0.0.0.0 --port 8000 \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.9 \
--max-model-len 8192 \
--max-num-seqs 128
# Multiple GPUs + LoRA support (Production setup)
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
--host 0.0.0.0 --port 8000 \
--tensor-parallel-size 4 \
--enable-lora --max-loras 4 \
--gpu-memory-utilization 0.9 \
--max-model-len 8192 \
--max-num-seqs 128
Key Parameters:
- --tensor-parallel-size N - Split model across N GPUs (REQUIRED for 30B+ models on <40GB GPUs)
- --gpu-memory-utilization 0.9 - Use 90% of GPU memory (leave 10% for CUDA overhead)
- --max-model-len - Maximum context length (reduce if OOM)
- --max-num-seqs - Maximum concurrent sequences (128 recommended for 30B models, default 256 may cause OOM)
- --enable-lora - Enable dynamic LoRA adapter loading
- --max-loras - Maximum number of LoRA adapters to keep in memory
Troubleshooting OOM Errors:
If you see CUDA out of memory errors:
- Reduce concurrent sequences:
--max-num-seqs 128(or 64, 32 for tighter memory) - Enable tensor parallelism:
--tensor-parallel-size 2(or 4, 8 depending on GPU count) - Reduce memory usage:
--gpu-memory-utilization 0.85 --max-model-len 4096 - Use smaller model:
Qwen/Qwen2.5-Coder-7B-Instructinstead of 30B - Use quantized model:
Qwen/Qwen2.5-Coder-30B-Instruct-AWQ(4-bit quantization)
Test server is running:
# Check server health
curl http://localhost:8000/health
# List available models
curl http://localhost:8000/v1/models
# Test generation
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
"messages": [{"role": "user", "content": "Say hello"}],
"max_tokens": 50
}'
3. Test Setup
from abstractcore import create_llm
# Basic generation
llm = create_llm("vllm", model="Qwen/Qwen3-Coder-30B-A3B-Instruct")
response = llm.generate("Write a Python function to sort a list")
print(response.content)
# With guided JSON (vLLM-specific feature)
response = llm.generate(
"List 3 programming languages",
guided_json={
"type": "object",
"properties": {
"languages": {"type": "array", "items": {"type": "string"}}
}
}
)
print(response.content)
4. vLLM-Specific Features
Guided Decoding (syntax-constrained generation):
# Regex-constrained generation
response = llm.generate(
"Write a Python function",
guided_regex=r"def \w+\([^)]*\):\n(?:\s{4}.*\n)+"
)
# JSON schema enforcement
response = llm.generate(
"Extract person info",
guided_json={"type": "object", "properties": {...}}
)
Multi-LoRA (1 base model → many specialized agents):
# Load specialized adapters
llm.load_adapter("sql-expert", "/models/adapters/sql-lora")
llm.load_adapter("react-dev", "/models/adapters/react-lora")
# Route to specialized adapter
response = llm.generate("Write SQL query", model="sql-expert")
Beam Search (higher accuracy for complex tasks):
response = llm.generate(
"Solve this complex algorithm problem...",
use_beam_search=True,
best_of=5 # Generate 5 candidates, return best
)
Environment Variables
# vLLM server URL (default: http://localhost:8000/v1)
export VLLM_BASE_URL="http://192.168.1.100:8000/v1"
# Optional API key (if server started with --api-key)
export VLLM_API_KEY="your-api-key"
# HuggingFace cache (shared with HF/MLX providers)
export HF_HOME="~/.cache/huggingface"
Available Models:
- Qwen/Qwen3-Coder-30B-A3B-Instruct (default) - Excellent for code
- meta-llama/Llama-3.1-8B-Instruct - Good general purpose
- mistralai/Mistral-7B-Instruct-v0.3 - Fast and efficient
- Any HuggingFace model compatible with vLLM
Performance notes: Throughput depends on model size, context length, concurrency, quantization, and GPU. See vLLM docs for tuning knobs (--tensor-parallel-size, --max-model-len, --max-num-seqs, …).
OpenAI-Compatible Setup
Best for: any OpenAI-compatible /v1 endpoint (llama.cpp servers, LocalAI, text-generation-webui, custom proxies, etc.)
AbstractCore supports a generic OpenAI-compatible provider plus specific convenience providers (LM Studio, vLLM, OpenRouter, Portkey).
1. Get the endpoint base URL
You must include /v1 for OpenAI-compatible servers:
export OPENAI_COMPATIBLE_BASE_URL="http://localhost:1234/v1"
# Optional (if your endpoint requires auth)
export OPENAI_COMPATIBLE_API_KEY="your-api-key"
2. Test Setup
from abstractcore import create_llm
llm = create_llm("openai-compatible", model="default", base_url="http://localhost:1234/v1")
resp = llm.generate("Say hello in French")
print(resp.content)
Troubleshooting
Common Issues
"No module named .abstractcore."
# Make sure you installed AbstractCore
pip install abstractcore
"OpenAI API key not found"
# Check if environment variable is set
echo $OPENAI_API_KEY
# If empty, set it:
export OPENAI_API_KEY="sk-your-key-here"
"Connection error to Ollama"
# Make sure Ollama is running
ollama serve
# Check if models are available
ollama list
# Pull a model if none available
ollama pull gemma3:1b
"Model not found in MLX"
# Use exact model names from HuggingFace MLX community
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit")
"LMStudio connection refused"
# Make sure LMStudio server is running on correct port
# Check LMStudio logs for the exact port and URL
Memory Issues
"Out of memory" with local models
# Try smaller models
ollama pull gemma3:1b # Only 1.3GB
ollama pull tinyllama # Only 637MB
# Or increase swap space on Linux
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
MLX models too slow
# Use 4-bit quantized models for faster inference
llm = create_llm("mlx", model="mlx-community/Llama-3.2-3B-Instruct-4bit")
API Key Issues
OpenAI billing issues
- Check your billing dashboard
- Add payment method if needed
- Check usage limits
Anthropic rate limits
- Check your console
- Upgrade to higher tier if needed
- Implement retry logic in your code
Testing Your Setup
Universal Test Script
Save this as test_setup.py and run it to test all your providers:
#!/usr/bin/env python3
"""Test script for AbstractCore providers"""
import os
from abstractcore import create_llm
def test_provider(provider_name, model, **kwargs):
"""Test a specific provider"""
try:
print(f"\n🧪 Testing {provider_name} with {model}...")
llm = create_llm(provider_name, model=model, **kwargs)
response = llm.generate("Say 'Hello from AbstractCore!'")
print(f"[OK] {provider_name}: {response.content}")
return True
except Exception as e:
print(f"[FAIL] {provider_name}: {e}")
return False
def main():
print("AbstractCore Provider Test Suite")
print("=" * 40)
results = {}
# Test cloud providers (if API keys available)
if os.getenv("OPENAI_API_KEY"):
results["OpenAI"] = test_provider("openai", "gpt-4o-mini")
else:
print("\n⚠️ Skipping OpenAI (no OPENAI_API_KEY)")
if os.getenv("ANTHROPIC_API_KEY"):
results["Anthropic"] = test_provider("anthropic", "claude-haiku-4-5")
else:
print("\n⚠️ Skipping Anthropic (no ANTHROPIC_API_KEY)")
if os.getenv("OPENROUTER_API_KEY"):
results["OpenRouter"] = test_provider("openrouter", "openai/gpt-4o-mini")
else:
print("\n⚠️ Skipping OpenRouter (no OPENROUTER_API_KEY)")
# Test local providers
results["Ollama"] = test_provider("ollama", "gemma3:1b")
try:
results["MLX"] = test_provider("mlx", "mlx-community/Llama-3.2-3B-Instruct-4bit")
except:
print("\n⚠️ Skipping MLX (not on Apple Silicon or model not available)")
try:
# Note: OpenAI-compatible servers expect `/v1` in the base URL (LM Studio default is http://localhost:1234/v1)
results["LMStudio"] = test_provider("lmstudio", "qwen/qwen3-4b-2507", base_url="http://localhost:1234/v1")
except:
print("\n⚠️ Skipping LMStudio (server not running on localhost:1234)")
# Summary
print("\n" + "=" * 40)
print("Test Results:")
working = [name for name, success in results.items() if success]
if working:
print(f"[OK] Working providers: {', '.join(working)}")
else:
print("[FAIL] No providers working")
print("\n[INFO] Next steps:")
print("- Add API keys for cloud providers")
print("- Install Ollama and download models")
print("- Start LMStudio local server")
print("- See docs/prerequisites.md for detailed setup")
if __name__ == "__main__":
main()
Run the test:
python test_setup.py
Live API smoke tests (opt-in)
Some tests are intentionally real network calls and are disabled by default. To enable them, set:
- ABSTRACTCORE_RUN_LIVE_API_TESTS=1
Example (OpenRouter):
ABSTRACTCORE_RUN_LIVE_API_TESTS=1 OPENROUTER_API_KEY="$OPENROUTER_API_KEY" \
python -m pytest -q tests/test_graceful_fallback.py::test_openrouter_generation_smoke
Local provider smoke tests use ABSTRACTCORE_RUN_LOCAL_PROVIDER_TESTS=1 (and ABSTRACTCORE_RUN_MLX_TESTS=1 for MLX).
Security Notes
API Keys
- Never commit API keys to version control
- Use environment variables or
.envfiles - Rotate keys periodically
- Monitor usage for unexpected spikes
Local Models
- Local models keep data on your machine
- No internet required after initial download
- Models can be large (1GB-20GB+)
- Some models may have usage restrictions
Network Security
- LMStudio and Ollama servers run locally by default
- Be careful exposing servers to network (use authentication)
- Consider firewall rules for production deployments
This setup guide should get you running with any AbstractCore provider. Choose what works well for your use case - you can always add more providers later!