# AbstractCore - Complete Documentation

This file contains the complete documentation for AbstractCore, a unified Python library providing a consistent interface to all major LLM providers.

## Table of Contents

1. [Project Overview](#project-overview) - What AbstractCore does and supported providers
2. [Installation](#installation) - Quick setup with provider-specific options
3. [Centralized Configuration](#centralized-configuration) - Global defaults and preferences management
4. [Media Handling System](#media-handling-system) - Universal file attachment and processing
5. [Vision Capabilities](#vision-capabilities) - Image analysis across all providers
6. [Getting Started](#getting-started) - Essential code examples and patterns
7. [Provider Discovery](#provider-discovery) - Programmatic provider and model discovery
8. [Built-in CLI Applications](#built-in-cli-applications) - Ready-to-use terminal tools
9. [Debug & Advanced Features](#debug-capabilities-and-self-healing-json) - Troubleshooting and performance
10. [Complete CLI Parameters](#complete-cli-parameters-reference) - All available options
11. [Documentation Index](#documentation-index) - Links to detailed guides

---

## Project Overview

AbstractCore is a comprehensive Python library providing a unified interface to all major LLM providers (OpenAI, Anthropic, Ollama, MLX, LMStudio, HuggingFace) with production-grade features including universal media handling, vision capabilities, tool calling, streaming, centralized configuration, and session management.

### Key Features

- **Universal Provider Support**: OpenAI, Anthropic, Ollama, LMStudio, MLX, HuggingFace with identical syntax
- **Centralized Configuration**: Global defaults, app-specific preferences, API key management with `~/.abstractcore/config/abstractcore.json`
- **Media Handling System**: Universal file attachment with `@filename` CLI syntax and `media=[]` API parameter - supports images, PDFs, Office docs, CSV/TSV
- **Vision Capabilities**: Image analysis across all providers with automatic resolution optimization and vision fallback for text-only models
- **Tool Calling**: Universal @tool decorator works across ALL providers with intelligent format conversion
- **Streaming**: Real-time responses for interactive applications with tool call detection
- **Session Management**: Persistent conversations with metadata and automatic context management
- **Structured Output**: Pydantic model validation with automatic retries and error feedback
- **Embeddings**: SOTA embedding models for semantic search and RAG applications
- **Event System**: Monitor costs, performance, tool executions, and media processing
- **Built-in CLI Apps**: Ready-to-use terminal applications (`summarizer`, `extractor`, `judge`) for document processing
- **HTTP Server**: OpenAI-compatible REST API with provider discovery and unified access
- **Production Ready**: Robust error handling, retries, circuit breakers, and graceful degradation

### Supported Providers

| Provider | Features |
|----------|----------|
| **OpenAI** | Native tool calls, streaming, structured output, vision (GPT-4o), media processing |
| **Anthropic** | Native tool calls, streaming, structured output, vision (Claude 3.5), media processing |
| **Ollama** | Prompted tool calls, streaming, local models, vision (qwen2.5vl:7b), media processing |
| **LMStudio** | Prompted tool calls, streaming, local models, vision (qwen2.5-vl), media processing |
| **MLX** | Prompted tool calls, streaming, Apple Silicon, vision models, media processing |
| **HuggingFace** | Prompted tool calls, streaming, open models, vision models, media processing |

---

## Installation

### Basic Installation

```bash
pip install abstractcore
```

### Provider-Specific Installation

```bash
# OpenAI
pip install abstractcore[openai]

# Anthropic
pip install abstractcore[anthropic]

# Ollama (local models)
pip install abstractcore[ollama]

# LMStudio (local models)
pip install abstractcore[lmstudio]

# MLX (Apple Silicon)
pip install abstractcore[mlx]

# HuggingFace
pip install abstractcore[huggingface]

# Media handling (images, basic documents)
pip install abstractcore[media]

# Server support
pip install abstractcore[server]

# Embeddings
pip install abstractcore[embeddings]

# Everything (includes all features)
pip install abstractcore[all]
```

---

## Centralized Configuration

AbstractCore provides a unified configuration system that manages default models, cache directories, logging settings, and other package-wide preferences from a single location. This eliminates the need to specify providers and models repeatedly and provides consistent behavior across all applications.

### Configuration File Location

Configuration is stored in: `~/.abstractcore/config/abstractcore.json`

### Quick Configuration Setup

```bash
# Check current configuration status
abstractcore --status

# Set global fallback model (used when no app-specific default is configured)
abstractcore --set-global-default ollama/llama3:8b

# Set app-specific defaults for optimal performance
abstractcore --set-app-default summarizer openai gpt-4o-mini
abstractcore --set-app-default extractor ollama qwen3:4b-instruct
abstractcore --set-app-default judge anthropic claude-3-5-haiku
abstractcore --set-app-default cli huggingface unsloth/Qwen3-4B-Instruct-2507-GGUF

# Set API keys
abstractcore --set-api-key openai sk-your-key-here
abstractcore --set-api-key anthropic your-anthropic-key

# Configure logging
abstractcore --set-console-log-level WARNING  # Reduce console output
abstractcore --enable-file-logging            # Save logs to files

# Interactive setup
abstractcore --configure
```

### Configuration Priority System

AbstractCore uses a clear priority hierarchy:

1. **Explicit Parameters** (highest priority)
   ```bash
   summarizer document.txt --provider openai --model gpt-4o-mini
   ```

2. **App-Specific Configuration**
   ```bash
   abstractcore --set-app-default summarizer openai gpt-4o-mini
   ```

3. **Global Configuration**
   ```bash
   abstractcore --set-global-default openai/gpt-4o-mini
   ```

4. **Hardcoded Defaults** (lowest priority)
   - Current default: `huggingface/unsloth/Qwen3-4B-Instruct-2507-GGUF`

### Application Defaults

Set default providers and models for specific AbstractCore applications:

```bash
# Set defaults for individual apps
abstractcore --set-app-default summarizer openai gpt-4o-mini
abstractcore --set-app-default cli anthropic claude-3-5-haiku
abstractcore --set-app-default extractor ollama qwen3:4b-instruct
abstractcore --set-app-default judge anthropic claude-3-5-haiku

# View current app defaults
abstractcore --status
```

### Logging Configuration

Control logging behavior across all AbstractCore components:

```bash
# Change console logging level
abstractcore --set-console-log-level DEBUG    # Show all messages
abstractcore --set-console-log-level INFO     # Show info and above
abstractcore --set-console-log-level WARNING  # Show warnings and errors only (default)
abstractcore --set-console-log-level ERROR    # Show only errors
abstractcore --set-console-log-level NONE     # Disable all console logging

# File logging controls
abstractcore --enable-file-logging      # Start saving logs to files
abstractcore --disable-file-logging     # Stop saving logs to files
abstractcore --set-log-base-dir ~/.abstractcore/logs

# Quick commands
abstractcore --enable-debug-logging     # Sets both console and file to DEBUG
abstractcore --disable-console-logging  # Keeps file logging if enabled
```

### Cache and Storage Configuration

```bash
# Set cache directories
abstractcore --set-default-cache-dir ~/.cache/abstractcore
abstractcore --set-huggingface-cache-dir ~/.cache/huggingface
abstractcore --set-local-models-cache-dir ~/.abstractcore/models
```

### Vision Fallback Configuration

Configure vision processing for text-only models:

```bash
# Download local vision model (recommended)
abstractcore --download-vision-model

# Use existing Ollama model
abstractcore --set-vision-caption qwen2.5vl:7b

# Use cloud API
abstractcore --set-vision-provider openai --model gpt-4o

# Disable vision fallback
abstractcore --disable-vision
```

### Streaming Defaults

```bash
# Set default streaming behavior for CLI
abstractcore --stream on           # Enable streaming by default
abstractcore --stream off          # Disable streaming by default
```

### Configuration Status

View complete configuration with helpful change commands:

```bash
abstractcore --status
```

This displays a hierarchical dashboard showing:
- 🎯 Application Defaults (CLI, Summarizer, Extractor, Judge)
- 🌐 Global Fallback settings
- 👁️ Media Processing configuration
- 🔑 Provider Access (API key status)
- 📝 Logging configuration
- 💾 Storage locations

### Common Configuration Workflows

**First-Time Setup:**
```bash
# Check what's available
abstractcore --status

# Configure for development (free local models)
abstractcore --set-global-default ollama/llama3:8b
abstractcore --set-console-log-level WARNING

# Add API keys when ready for cloud providers
abstractcore --set-api-key openai sk-your-key-here
abstractcore --set-api-key anthropic your-anthropic-key

# Verify everything works
abstractcore --status
```

**Development Environment:**
```bash
# Optimize for local development
abstractcore --set-global-default ollama/llama3:8b  # Free local models
abstractcore --enable-debug-logging                 # Detailed logs for debugging
abstractcore --set-app-default cli ollama qwen3:4b  # Fast model for CLI testing
```

**Production Environment:**
```bash
# Configure for production reliability and performance
abstractcore --set-global-default openai/gpt-4o-mini  # Reliable cloud provider
abstractcore --set-console-log-level WARNING          # Reduce noise
abstractcore --enable-file-logging                    # Persistent logs
abstractcore --set-app-default summarizer openai gpt-4o-mini  # Optimize for quality
```

**Multi-Environment Approach:**
```bash
# Use different providers for different applications
abstractcore --set-app-default cli ollama qwen3:4b           # Fast for development
abstractcore --set-app-default summarizer openai gpt-4o-mini # Quality for documents
abstractcore --set-app-default judge anthropic claude-3-5-haiku # Detailed analysis
```

For complete configuration reference, see: [Centralized Configuration Guide](docs/centralized-config.md)

---

## Media Handling System

AbstractCore provides a **production-ready unified media handling system** that enables seamless file attachment and processing across all LLM providers and models. The system automatically processes images, documents, and other media files using the same simple API, with intelligent provider-specific formatting and graceful fallback handling.

### Key Features

- **Universal API**: Same `media=[]` parameter works across all providers (OpenAI, Anthropic, Ollama, LMStudio, etc.)
- **CLI Integration**: Simple `@filename` syntax in CLI for instant file attachment
- **Intelligent Processing**: Automatic file type detection with specialized processors for each format
- **Provider Adaptation**: Automatic formatting for each provider's API requirements
- **Robust Fallback**: Graceful degradation when advanced processing fails
- **Cross-Format Support**: Images, PDFs, Office documents, CSV/TSV, text files all work seamlessly

### Quick Start

```python
from abstractcore import create_llm

# Works with any provider - just change the provider name
llm = create_llm("openai", model="gpt-4o", api_key="your-key")
response = llm.generate(
    "What's in this image and document?",
    media=["photo.jpg", "report.pdf"]
)

# Same code works with any provider
llm = create_llm("anthropic", model="claude-3.5-sonnet")
response = llm.generate(
    "Analyze these materials",
    media=["chart.png", "data.csv", "presentation.pptx"]
)
```

### CLI Integration

Use the simple `@filename` syntax to attach any file type:

```bash
# PDF Analysis
python -m abstractcore.utils.cli --prompt "What is this document about? @report.pdf"

# Office Documents
python -m abstractcore.utils.cli --prompt "Summarize this presentation @slides.pptx"
python -m abstractcore.utils.cli --prompt "What data is in @spreadsheet.xlsx"
python -m abstractcore.utils.cli --prompt "Analyze this document @contract.docx"

# Data Files
python -m abstractcore.utils.cli --prompt "What patterns are in @sales_data.csv"

# Images
python -m abstractcore.utils.cli --prompt "What's in this image? @screenshot.png"

# Mixed Media
python -m abstractcore.utils.cli --prompt "Compare @chart.png and @data.csv and explain trends"
```

### Supported File Types

**Images (Vision Models):**
- **Formats**: PNG, JPEG, GIF, WEBP, BMP, TIFF
- **Features**: Automatic optimization, resizing, format conversion, EXIF handling

**Documents:**
- **Text Files**: TXT, MD, CSV, TSV, JSON with intelligent parsing and data analysis
- **PDF**: Full text extraction with PyMuPDF4LLM, preserves formatting and structure
- **Office**: DOCX, XLSX, PPTX with complete content extraction using Unstructured library
  - **Word**: Full document analysis with structure preservation
  - **Excel**: Sheet-by-sheet extraction with data analysis
  - **PowerPoint**: Slide content extraction with comprehensive analysis

### How It Works Behind the Scenes

The media system uses a sophisticated multi-layer architecture:

1. **File Attachment Processing**: CLI `@filename` syntax and Python `media=[]` parameter
2. **Intelligent Processing**: AutoMediaHandler selects appropriate processors (Image, PDF, Office, Text)
3. **Provider Formatting**: Same content formatted differently for each provider's API
4. **Graceful Fallback**: Multi-level fallback ensures users always get meaningful results

**Provider-Specific Formatting Example:**
```python
# OpenAI Format (JSON)
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Analyze these files"},
    {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0..."}},
    {"type": "text", "text": "PDF Content: # Report Title\n\nExecutive Summary..."}
  ]
}

# Anthropic Format (Messages API)
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Analyze these files"},
    {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "iVBORw0..."}},
    {"type": "text", "text": "PDF Content: # Report Title\n\nExecutive Summary..."}
  ]
}
```

For complete media handling documentation, see: [Media Handling System Guide](docs/media-handling-system.md)

---

## Vision Capabilities

AbstractCore provides comprehensive **vision capabilities** that enable seamless image analysis across multiple AI providers and models. The system automatically handles image optimization, provider-specific formatting, and intelligent fallback mechanisms.

### Overview

AbstractCore provides comprehensive vision capabilities across all major AI providers with automatic image optimization and intelligent fallback mechanisms. The same code works identically whether you're using cloud APIs (OpenAI, Anthropic) or local models (Ollama, LMStudio).

### Supported Providers and Models

**Cloud Providers:**
- **OpenAI**: GPT-4o, GPT-4 Turbo Vision (multiple images, up to 4096×4096)
- **Anthropic**: Claude 3.5 Sonnet, Claude 3 Haiku (up to 20 images, 1568×1568)

**Local Providers:**
- **Ollama**: qwen2.5vl:7b, llama3.2-vision:11b, gemma3:4b
- **LMStudio**: qwen/qwen2.5-vl-7b, google/gemma-3n-e4b
- **HuggingFace**: Qwen2.5-VL variants, LLaVA models
- **MLX**: Vision models via MLX framework

**Image Formats**: PNG, JPEG, GIF, WEBP, BMP, TIFF with automatic optimization

### Basic Vision Analysis

```python
from abstractcore import create_llm

# Works with any vision-capable provider
llm = create_llm("openai", model="gpt-4o")

# Single image analysis
response = llm.generate(
    "What objects do you see in this image?",
    media=["photo.jpg"]
)

# Multiple images comparison
response = llm.generate(
    "Compare these architectural styles and identify differences",
    media=["building1.jpg", "building2.jpg", "building3.jpg"]
)
```

### Cross-Provider Consistency

```python
# Same code works across all providers
image_files = ["chart.png", "document.pdf"]
prompt = "Analyze the data in these files"

# All work identically
openai_response = create_llm("openai", model="gpt-4o").generate(prompt, media=image_files)
anthropic_response = create_llm("anthropic", model="claude-3-5-sonnet").generate(prompt, media=image_files)
ollama_response = create_llm("ollama", model="qwen2.5vl:7b").generate(prompt, media=image_files)
```

### Vision Fallback System

The **Vision Fallback System** enables text-only models to process images through a transparent two-stage pipeline:

```bash
# Configure vision fallback (one-time setup)
abstractcore --download-vision-model              # Download local model (recommended)
# OR
abstractcore --set-vision-caption qwen2.5vl:7b    # Use existing Ollama model
# OR
abstractcore --set-vision-provider openai --model gpt-4o  # Use cloud API
```

```python
# After configuration, text-only models can process images seamlessly
text_llm = create_llm("lmstudio", model="qwen/qwen3-next-80b")  # No native vision

response = text_llm.generate(
    "What's happening in this image?",
    media=["complex_scene.jpg"]
)
# Works transparently: vision model analyzes image → text model processes description
```

### Automatic Resolution Optimization

AbstractCore automatically optimizes images for each model's maximum capability:

```python
# Images automatically optimized per model
llm = create_llm("openai", model="gpt-4o")
response = llm.generate("Analyze this", media=["photo.jpg"])  # Auto-resized to 4096×4096

llm = create_llm("ollama", model="qwen2.5vl:7b")
response = llm.generate("Analyze this", media=["photo.jpg"])  # Auto-resized to 3584×3584
```

### Structured Vision Analysis

```python
# Get structured responses with specific requirements
llm = create_llm("openai", model="gpt-4o")

response = llm.generate("""
Analyze this image and provide:
- objects: list of objects detected
- colors: dominant colors
- setting: location/environment
- activities: what's happening

Format as JSON.
""", media=["scene.jpg"])

import json
analysis = json.loads(response.content)
```

For complete vision capabilities documentation, see: [Vision Capabilities Guide](docs/vision-capabilities.md)

---

## Getting Started

### Create LLM with Any Provider

AbstractCore supports 6 major providers with **identical syntax**. Choose your provider:

```python
from abstractcore import create_llm

# OpenAI (requires OPENAI_API_KEY)
llm = create_llm("openai", model="gpt-4o-mini")

# Anthropic (requires ANTHROPIC_API_KEY)
llm = create_llm("anthropic", model="claude-3-5-haiku-latest")

# Ollama (local - ensure Ollama is running)
llm = create_llm("ollama", model="qwen3-coder:30b")

# LMStudio (local - ensure LMStudio server is running)
llm = create_llm("lmstudio", model="llama-3.2-8b-instruct")

# MLX (Apple Silicon only)
llm = create_llm("mlx", model="qwen3-air-4bit")

# HuggingFace (requires HUGGINGFACE_API_TOKEN for gated models)
llm = create_llm("huggingface", model="microsoft/DialoGPT-large")

# List available models for any provider
available_models = llm.list_available_models()
print(f"Available models: {available_models}")

# Same interface for all providers
response = llm.generate("What is the capital of France?")
print(response.content)
```

### Essential Patterns

```python
from abstractcore import create_llm
from abstractcore.tools import tool
from pydantic import BaseModel

# 1. Basic Usage - Works with any provider
llm = create_llm("openai", model="gpt-4o-mini")  # or any provider above
response = llm.generate("What is the capital of France?")
print(response.content)

# 2. Universal Tool Calling
@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Weather in {city}: 72°F, Sunny"

response = llm.generate("What's the weather in Paris?", tools=[get_weather])

# 3. Structured Output
class Person(BaseModel):
    name: str
    age: int

person = llm.generate("Extract: John Doe is 25", response_model=Person)
print(f"{person.name}, age {person.age}")
```

### Common Development Patterns

```python
# Streaming responses for user interfaces
for chunk in llm.generate("Tell me a story", stream=True):
    print(chunk.content, end="", flush=True)

# Session management for chatbots
from abstractcore import BasicSession
session = BasicSession(llm, system_prompt="You are a helpful assistant.")
session.add_message('user', 'My name is Alice')
response = session.generate('What is my name?')  # Remembers context

# Error handling for production apps
try:
    response = llm.generate("Complex request")
except Exception as e:
    print(f"Error: {e}")
    # Handle gracefully

# Configuration for different environments
from abstractcore import create_llm

# Development (local models)
dev_llm = create_llm("ollama", model="qwen3:4b")

# Production (cloud APIs)
prod_llm = create_llm("openai", model="gpt-4o-mini", api_key="your-key")

# Cost-sensitive applications (smaller models)
cost_llm = create_llm("anthropic", model="claude-3-5-haiku-latest")
```

---

## Provider Discovery

Discover available providers and their capabilities programmatically. This is particularly useful for building applications that need to adapt to available providers or for administrative tooling.

### Basic Provider Discovery

```python
from abstractcore.providers import get_all_providers_with_models

# Get information about all available providers
providers = get_all_providers_with_models()

for provider in providers:
    print(f"Provider: {provider['display_name']}")
    print(f"Models available: {provider['model_count']}")
    print(f"Local provider: {provider['local_provider']}")
    print(f"Auth required: {provider['authentication_required']}")
    print(f"Features: {', '.join(provider['supported_features'])}")
    if provider.get('installation_extras'):
        print(f"Install: pip install abstractcore[{provider['installation_extras']}]")
    print("---")
```

### Provider Status and Model Discovery

```python
from abstractcore.providers import (
    list_available_providers,        # Get provider names
    get_provider_info,              # Detailed provider info
    is_provider_available,          # Check availability
    get_available_models_for_provider  # Get models for provider
)

# Check available providers
available = list_available_providers()
print(f"Available providers: {available}")

# Get models for a specific provider
if is_provider_available("ollama"):
    models = get_available_models_for_provider("ollama")
    print(f"Ollama models: {models[:5]}...")  # First 5 models

# Get detailed provider information
if is_provider_available("openai"):
    info = get_provider_info("openai")
    print(f"Default model: {info.default_model}")
    print(f"Features: {info.supported_features}")
```

### HTTP API Discovery

Access provider information through the HTTP API:

```bash
# Get all providers
curl http://localhost:8000/providers

# Get specific provider
curl http://localhost:8000/providers/openai

# Get models for provider
curl http://localhost:8000/providers/ollama/models
```

The response includes comprehensive metadata:
```json
{
  "providers": [
    {
      "name": "openai",
      "display_name": "OpenAI",
      "model_count": 15,
      "status": "available",
      "local_provider": false,
      "authentication_required": true,
      "supported_features": ["chat", "completion", "embeddings", "native_tools"],
      "models": ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]
    }
  ]
}
```

---

## Universal Tool Calling

Enable LLMs to call Python functions across **ALL** providers, including those without native tool support. This allows you to build interactive applications where the LLM can execute code, query databases, make API calls, or perform any Python operation.

```python
from abstractcore import create_llm, BasicSession
from abstractcore.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get current weather for a specified city."""
    return f"Weather in {city}: 72°F, Sunny"

@tool
def calculate(expression: str) -> float:
    """Perform mathematical calculations."""
    return eval(expression)  # Simplified for demo

# Method 1: Direct provider usage
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
response = llm.generate(
    "What's the weather in Tokyo and what's 15 * 23?",
    tools=[get_weather, calculate]  # Pass tools directly
)

# Method 2: Session with registered tools (recommended for conversations)
session = BasicSession(llm, tools=[get_weather, calculate])
response = session.generate("What's the weather in Tokyo?")  # Uses registered tools

# Method 3: Session with per-call tools (overrides registered tools)
response = session.generate("Calculate 15 * 23", tools=[calculate])  # Only use calculate
```

### Enhanced Metadata

The `@tool` decorator supports rich metadata that gets automatically injected into system prompts:

```python
@tool(
    description="Search the database for records matching the query",
    tags=["database", "search", "query"],
    when_to_use="When the user asks for specific data from the database",
    examples=[
        {
            "description": "Find all users named John",
            "arguments": {
                "query": "name=John",
                "table": "users"
            }
        }
    ]
)
def search_database(query: str, table: str = "users") -> str:
    """Search the database for records matching the query."""
    return f"Searching {table} for: {query}"
```

### Universal Tool Support

AbstractCore's tool system works across all providers through two mechanisms:

1. **Native Tool Support**: For providers with native tool APIs (OpenAI, Anthropic)
2. **Intelligent Prompting**: For providers without native tool support (Ollama, MLX, LMStudio)

AbstractCore automatically:
- Detects the model architecture (Qwen3, LLaMA3, etc.)
- Formats tools with examples into system prompt
- Parses tool calls from response using appropriate format
- Executes tools locally and returns results

### Architecture-Aware Tool Call Detection

| Architecture | Format | Example |
|-------------|--------|---------|
| **Qwen3** | `<|tool_call|>...JSON...</|tool_call|>` | `<|tool_call|>{"name": "get_weather", "arguments": {"city": "Paris"}}</|tool_call|>` |
| **LLaMA3** | `<function_call>...JSON...</function_call>` | `<function_call>{"name": "get_weather", "arguments": {"city": "Paris"}}</function_call>` |
| **OpenAI/Anthropic** | Native API tool calls | Structured JSON in API response |
| **XML-based** | `<tool_call>...JSON...</tool_call>` | `<tool_call>{"name": "get_weather", "arguments": {"city": "Paris"}}</tool_call>` |

### Tool Chaining

Tools can call other tools or return data that triggers additional tool calls:

```python
@tool
def get_user_location(user_id: str) -> str:
    """Get the location of a user."""
    locations = {"user123": "Paris", "user456": "Tokyo"}
    return locations.get(user_id, "Unknown")

@tool
def get_weather(city: str) -> str:
    """Get weather for a city."""
    return f"Weather in {city}: 72°F, sunny"

# LLM can chain these tools:
response = llm.generate(
    "What's the weather like for user123?",
    tools=[get_user_location, get_weather]
)
# LLM will first call get_user_location, then get_weather with the result
```

### Tool Syntax Rewriting

**Cutting-edge solution to 2024-2025 tool call format fragmentation**

AbstractCore provides real-time tool call format conversion, solving the current ecosystem fragmentation where different agent frameworks expect incompatible tool call formats. This addresses a critical compatibility challenge in the modern LLM agent landscape.

#### The Format Fragmentation Problem (2024-2025)

```python
# Different models/frameworks expect different formats:
# Qwen3: <|tool_call|>{"name": "func", "args": {...}}</|tool_call|>
# LLaMA3: <function_call>{"name": "func", "args": {...}}</function_call>
# XML-based: <tool_call>{"name": "func", "args": {...}}</tool_call>
# Claude Code: [[tool_use]]{"name": "func", "args": {...}}[[/tool_use]]
# OpenAI/Anthropic: Native JSON API calls

# AbstractCore automatically handles ALL formats
```

#### Dynamic Format Conversion

```python
from abstractcore import create_llm
from abstractcore.tools.tag_rewriter import ToolCallTagRewriter

# Custom tag configuration for any framework
llm = create_llm(
    "ollama",
    model="qwen3-coder:30b",
    tool_call_tags="function_call"  # Converts to: <function_call>...JSON...</function_call>
)

# Multiple custom formats
llm = create_llm(
    "mlx",
    model="qwen3-air-4bit",
    tool_call_tags=("[[tool_use]]", "[[/tool_use]]")  # Start and end tags
)

# Real-time format detection and conversion
llm = create_llm(
    "lmstudio",
    model="llama-3.2-8b-instruct",
    auto_detect_format=True,  # Automatically detect model's preferred format
    fallback_format="xml"     # Fallback if detection fails
)
```

#### Predefined Agent Framework Configurations

```python
from abstractcore.tools.tag_rewriter import create_tag_rewriter

# Pre-configured for popular agent frameworks
formats = {
    "codex": "qwen3",          # <|tool_call|>...JSON...</|tool_call|>
    "crush": "llama3",         # <function_call>...JSON...</function_call>
    "gemini": "xml",           # <tool_call>...JSON...</tool_call>
    "claude_code": "claude",   # [[tool_use]]...JSON...[[/tool_use]]
    "openai": "native",        # Native OpenAI API format
    "anthropic": "native",     # Native Anthropic API format
}

# Apply framework-specific format
rewriter = create_tag_rewriter("codex")  # For Codex CLI compatibility
llm = create_llm("ollama", model="qwen3-coder:30b", tag_rewriter=rewriter)

# Custom framework format
custom_rewriter = create_tag_rewriter(
    start_tag="<custom_tool>",
    end_tag="</custom_tool>",
    json_wrapper=True
)
```

#### Streaming-Aware Rewriting

Advanced streaming support with tool call detection across chunks:

```python
# Streaming with incremental tool detection
llm = create_llm(
    "ollama",
    model="qwen3-coder:30b",
    tool_call_tags="function_call",
    streaming_buffer_size=512,    # Buffer size for partial tool calls
    detect_partial_tools=True     # Detect tools split across chunks
)

# Stream with real-time rewriting
for chunk in llm.generate("Use the calculator tool", tools=[calculate], stream=True):
    # Tool calls are automatically detected and rewritten in real-time
    if chunk.tool_calls:
        print(f"Tool detected: {chunk.tool_calls[0].name}")
    print(chunk.content, end="", flush=True)
```

#### Advanced Format Management

```python
from abstractcore.tools.tag_rewriter import ToolCallTagRewriter

# Create sophisticated rewriter
rewriter = ToolCallTagRewriter(
    input_format="qwen3",         # Expected input format
    output_format="llama3",       # Desired output format
    preserve_native=True,         # Keep native tool calls unchanged
    validate_json=True,           # Validate JSON structure
    repair_malformed=True,        # Attempt to repair broken JSON
    streaming_compatible=True     # Enable streaming support
)

# Multi-format support in single session
session = BasicSession(llm)
session.set_tool_format("qwen3")    # Use Qwen3 format for this conversation

# Format switching mid-conversation
session.switch_tool_format("xml")   # Switch to XML format

# Format compatibility checking
compatible = rewriter.is_compatible("llama3", "qwen3")
supported_formats = rewriter.supported_formats()
```

#### Production Integration Examples

```python
# Integration with different agent CLIs

# For Codex CLI users
codex_llm = create_llm(
    "ollama",
    model="qwen3-coder:30b",
    tool_call_tags="tool_call",     # Codex expects <|tool_call|>
    preserve_whitespace=False       # Codex prefers compact JSON
)

# For Crush CLI users
crush_llm = create_llm(
    "lmstudio",
    model="llama-3.2-8b-instruct",
    tool_call_tags="function_call", # Crush expects <function_call>
    json_indent=2                   # Crush prefers formatted JSON
)

# For Claude Code users
claude_code_llm = create_llm(
    "anthropic",
    model="claude-3-5-haiku-latest",
    tool_call_tags=("[[tool_use]]", "[[/tool_use]]"),  # Claude Code format
    include_reasoning=True          # Include reasoning in tool calls
)
```

#### Benefits of Tool Syntax Rewriting

- **Universal Compatibility**: Works with any agent framework without code changes
- **Ecosystem Bridge**: Connects incompatible tool calling standards
- **Future-Proof**: Easily add support for new formats as they emerge
- **Performance Optimized**: Minimal overhead, streaming-compatible
- **Zero Configuration**: Automatic format detection in most cases
- **Developer Friendly**: Simple API for complex format transformations

---

## Core Features

### Session Management

Comprehensive conversation management with memory, analytics, and auto-compaction:

```python
from abstractcore import BasicSession, create_llm

llm = create_llm("openai", model="gpt-4o-mini")
session = BasicSession(llm, system_prompt="You are a helpful assistant.")

# Basic conversation with memory
response1 = session.generate("My name is Alice")
response2 = session.generate("What's my name?")  # Remembers context

# Advanced analytics and insights
summary = session.generate_summary()
assessment = session.generate_assessment()
facts = session.extract_facts()

# Auto-compaction with SOTA chat history compression
session.compact(target_tokens=8000)  # Compresses while preserving context

# Timeout management
session.set_timeout(30)  # 30 second default timeout
session.set_tool_timeout(60)  # 60 second tool-specific timeout
recovery_timeout = session.get_recovery_timeout()

# Session serialization with metadata
session.save("conversation.json", summary=True, assessment=True, facts=True)
session.save("conversation_archive.json", format="session-archive/v1")

# Load with full context restoration
loaded_session = BasicSession.load("conversation.json")
loaded_session = BasicSession.from_archive("conversation_archive.json")

# Session metadata and statistics
print(f"Messages: {len(session.messages)}")
print(f"Total tokens: {session.total_tokens}")
print(f"Session age: {session.age_minutes} minutes")
```

#### Session Analytics

Extract insights from conversation history:

```python
# Generate conversation summary
summary = session.generate_summary(
    style="executive",  # executive, technical, conversational
    length="brief"      # brief, standard, detailed
)

# Assess conversation quality and outcomes
assessment = session.generate_assessment(
    criteria=["clarity", "helpfulness", "accuracy"],
    include_score=True
)

# Extract structured facts and entities
facts = session.extract_facts(
    entity_types=["person", "organization", "date", "location"],
    confidence_threshold=0.8
)

# Custom analytics with focus areas
insights = session.generate_insights(
    focus="technical_decisions",
    format="structured"
)
```

#### Auto-Compaction System

Intelligent conversation history compression that preserves context:

```python
# Automatic compaction when approaching token limits
session = BasicSession(llm, auto_compact=True, max_tokens=32000)

# Manual compaction with strategies
session.compact(
    target_tokens=8000,
    strategy="semantic",       # semantic, chronological, importance
    preserve_recent=10,        # Keep last 10 messages uncompacted
    preserve_system=True       # Always preserve system prompt
)

# Compaction with custom criteria
session.compact(
    target_tokens=6000,
    preserve_patterns=["@tool", "important:"],  # Preserve messages with these patterns
    compression_ratio=0.7      # Target 70% size reduction
)
```

### Token Management

Unified token parameter vocabulary and budget management across all providers:

```python
from abstractcore import create_llm
from abstractcore.utils.token_utils import estimate_tokens, calculate_token_budget

# Unified token parameters
llm = create_llm(
    "openai",
    model="gpt-4o-mini",
    max_tokens=32000,           # Context window (input + output)
    max_output_tokens=8000,     # Maximum output tokens
    max_input_tokens=24000      # Maximum input tokens (auto-calculated if not set)
)

# Token estimation and validation
text = "Your input text here..."
estimated = estimate_tokens(text, model="gpt-4o-mini")
print(f"Estimated tokens: {estimated}")

# Budget-based token management
budget = calculate_token_budget(
    context_window=32000,
    output_tokens=8000,
    system_prompt="You are a helpful assistant.",
    reserve_ratio=0.1  # Reserve 10% for safety
)

print(f"Available input tokens: {budget['available_input']}")
print(f"Reserved tokens: {budget['reserved']}")
print(f"System prompt tokens: {budget['system_tokens']}")
```

#### Provider-Specific Parameter Mapping

AbstractCore automatically maps unified parameters to provider-specific formats:

```python
# Unified parameters work across all providers
providers = ["openai", "anthropic", "ollama", "lmstudio", "mlx", "huggingface"]

for provider in providers:
    llm = create_llm(
        provider,
        model="default",
        max_tokens=16000,        # Maps to provider-specific parameter
        max_output_tokens=4000,  # Maps to max_tokens, num_predict, etc.
        temperature=0.7
    )

    # Same interface, different internal mappings:
    # OpenAI: max_tokens, max_completion_tokens
    # Anthropic: max_tokens
    # Ollama: num_ctx, num_predict
    # LMStudio: max_tokens, max_tokens_to_sample
    # MLX: max_tokens
    # HuggingFace: max_length, max_new_tokens
```

#### Token Budget Validation

Automatic validation and warnings for token limits:

```python
# Enable token validation
llm = create_llm(
    "anthropic",
    model="claude-3-5-haiku-latest",
    max_tokens=200000,
    max_output_tokens=8000,
    validate_tokens=True,  # Enable validation
    warn_threshold=0.9     # Warn at 90% capacity
)

# Token usage monitoring
response = llm.generate("Write a detailed analysis...")

print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Cost estimate: ${response.usage.cost_usd:.4f}")

# Budget warnings
if response.usage.input_ratio > 0.9:
    print("Warning: Approaching input token limit")
if response.usage.output_ratio > 0.9:
    print("Warning: Approaching output token limit")
```

#### Advanced Token Management

```python
from abstractcore.utils.token_utils import TokenManager

# Create token manager for complex scenarios
manager = TokenManager(
    context_window=128000,
    target_output=8000,
    safety_margin=1000
)

# Chunk management for large inputs
large_text = "..." # Very large text
chunks = manager.chunk_text(
    large_text,
    chunk_size=manager.max_input_tokens,
    overlap=200,
    preserve_sentences=True
)

# Process chunks with budget tracking
results = []
for i, chunk in enumerate(chunks):
    print(f"Processing chunk {i+1}/{len(chunks)}")

    # Validate chunk fits in budget
    if manager.validate_input(chunk):
        response = llm.generate(chunk)
        results.append(response.content)

        # Update budget tracking
        manager.update_usage(response.usage)
    else:
        print(f"Chunk {i+1} exceeds token budget, skipping")

# Combine results
final_result = manager.combine_results(results)
```

### Embeddings

SOTA embedding models for semantic search and RAG applications:

```python
from abstractcore.embeddings import EmbeddingManager

# HuggingFace (default)
embedder = EmbeddingManager(model="sentence-transformers/all-MiniLM-L6-v2")

# Ollama
embedder = EmbeddingManager(model="granite-embedding:278m", provider="ollama")

# LMStudio
embedder = EmbeddingManager(model="text-embedding-all-minilm-l6-v2-embedding", provider="lmstudio")

# Generate embeddings
embedding = embedder.embed("Hello world")
embeddings = embedder.embed_batch(["Hello", "World", "AI"])

# Similarity computation
similarity = embedder.compute_similarity("Hello", "Hi there")
```

### Event System

Comprehensive event-driven architecture for monitoring, control, and observability across all operations:

#### Event Types

```python
from abstractcore.events import EventType

# Core generation events
EventType.GENERATION_STARTED      # LLM generation begins
EventType.GENERATION_COMPLETED    # LLM generation finishes

# Tool execution events
EventType.TOOL_STARTED            # Tool execution begins
EventType.TOOL_COMPLETED          # Tool execution finishes

# Error and retry events
EventType.ERROR                   # Any error occurred
EventType.RETRY_ATTEMPTED         # Retry attempt made
EventType.RETRY_EXHAUSTED         # All retries failed

# Validation events
EventType.VALIDATION_FAILED       # Input/output validation failed

# Session events
EventType.SESSION_CREATED         # New session created
EventType.SESSION_CLEARED         # Session history cleared

# Compaction events
EventType.COMPACTION_STARTED      # Session compaction begins
EventType.COMPACTION_COMPLETED    # Session compaction finishes
```

#### Global Event Bus

```python
from abstractcore.events import GlobalEventBus, on_global, emit_global

# Register global event handlers
def cost_monitor(event):
    if event.cost_usd and event.cost_usd > 0.10:
        print(f"High cost request: ${event.cost_usd}")

def performance_monitor(event):
    if event.duration_ms and event.duration_ms > 10000:
        print(f"Slow request: {event.duration_ms}ms")

# Register handlers
on_global(EventType.GENERATION_COMPLETED, cost_monitor)
on_global(EventType.GENERATION_COMPLETED, performance_monitor)

# Access global event bus directly
bus = GlobalEventBus()
bus.on(EventType.TOOL_STARTED, tool_security_check)
bus.off(EventType.TOOL_STARTED, tool_security_check)  # Unregister
bus.clear()  # Clear all handlers

# Emit custom events
emit_global(EventType.ERROR, {
    "error_type": "custom_error",
    "message": "Custom error occurred",
    "metadata": {"component": "custom"}
})
```

#### Event Prevention and Control

```python
def prevent_dangerous_tools(event):
    """Prevent execution of dangerous tools."""
    for call in event.data.get('tool_calls', []):
        if call.name in ['delete_file', 'system_command', 'execute_shell']:
            event.prevent()  # Stop execution immediately
            print(f"Blocked dangerous tool: {call.name}")

def limit_tool_execution_time(event):
    """Prevent tools that might run too long."""
    tool_name = event.data.get('tool_name')
    if tool_name in ['database_scan', 'file_search'] and not event.data.get('timeout'):
        event.prevent()
        print(f"Tool {tool_name} requires timeout parameter")

# Register prevention handlers (execute before tool runs)
on_global(EventType.TOOL_STARTED, prevent_dangerous_tools)
on_global(EventType.TOOL_STARTED, limit_tool_execution_time)
```

#### Production Observability

```python
# Comprehensive monitoring setup
def setup_production_monitoring():
    """Setup production-grade event monitoring."""

    # Cost tracking
    def track_costs(event):
        if event.cost_usd:
            log_cost_metric(event.cost_usd, event.provider, event.model)

    # Performance monitoring
    def track_performance(event):
        metrics = {
            'duration_ms': event.duration_ms,
            'tokens_used': event.tokens_used,
            'provider': event.provider,
            'model': event.model
        }
        send_to_metrics_system(metrics)

    # Error tracking
    def track_errors(event):
        error_data = {
            'error_type': event.error_type,
            'message': event.message,
            'provider': event.provider,
            'stack_trace': event.stack_trace
        }
        send_to_error_tracking(error_data)

    # Tool usage analytics
    def track_tool_usage(event):
        tool_metrics = {
            'tool_name': event.data.get('tool_name'),
            'execution_time': event.duration_ms,
            'success': event.data.get('success', True)
        }
        log_tool_analytics(tool_metrics)

    # Register all monitors
    on_global(EventType.GENERATION_COMPLETED, track_costs)
    on_global(EventType.GENERATION_COMPLETED, track_performance)
    on_global(EventType.ERROR, track_errors)
    on_global(EventType.TOOL_COMPLETED, track_tool_usage)

setup_production_monitoring()
```

### Memory Management

For local providers, explicit memory management:

```python
# Explicit memory management for local models
llm = create_llm("ollama", model="large-model")
response = llm.generate("Hello")
llm.unload()  # Free memory
del llm
```

## Built-in CLI Applications

Three production-ready terminal tools - no Python code required:

### Common Use Cases

```bash
# Generate executive summaries for business documents
summarizer quarterly_report.pdf --style executive --length brief --output executive_summary.txt

# Extract entities and relationships for research
extractor research_paper.pdf --format json-ld --entity-types person,organization,technology --output entities.json

# Evaluate document quality for different contexts
judge technical_spec.md --criteria clarity,accuracy,completeness --context "technical documentation"

# Process large documents with chunking
summarizer large_document.pdf --chunk-size 15000 --style structured --verbose

# Fast entity extraction for content processing
extractor articles/ --mode fast --format json --entity-types person,organization --output results/

# Code review and quality assessment
judge src/main.py --context "code review" --focus "error handling,documentation" --format plain
```

### Alternative Methods

```bash
# Method 1: Direct commands (recommended)
summarizer document.txt --style executive
extractor report.pdf --format triples
judge essay.md --criteria soundness

# Method 2: Via Python module
python -m abstractcore.apps.summarizer document.txt --style executive
python -m abstractcore.apps.extractor report.pdf --format triples
python -m abstractcore.apps.judge essay.md --criteria soundness
```

### Python API

```python
from abstractcore.processing import BasicSummarizer, BasicExtractor, BasicJudge

# Use programmatically
summarizer = BasicSummarizer()
summary = summarizer.summarize(text, style="executive", length="brief")

extractor = BasicExtractor()
kg = extractor.extract(text, output_format="jsonld")

judge = BasicJudge()
assessment = judge.evaluate(text, context="code review", focus="error handling")
```

---

## HTTP Server

Comprehensive OpenAI-compatible REST API with advanced routing and agent CLI integration:

### Quick Start

```bash
# Start server with default configuration
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

# Start with custom configuration
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000 --workers 4 --reload

# Production deployment
gunicorn abstractcore.server.app:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
```

### Complete Endpoint Reference

#### Core Endpoints

```bash
# Chat completions (OpenAI-compatible)
POST /v1/chat/completions

# Text completions
POST /v1/completions

# Embeddings
POST /v1/embeddings

# OpenAI Responses API (100% Compatible)
POST /v1/responses

# Provider management
GET /providers
GET /providers/{provider_name}
GET /providers/{provider_name}/models

# Health and status
GET /health
GET /status
GET /metrics
```

#### Provider Discovery API

```bash
# Get all available providers with comprehensive metadata
curl http://localhost:8000/providers

# Get specific provider information
curl http://localhost:8000/providers/openai

# Get models for specific provider
curl http://localhost:8000/providers/ollama/models

# Provider status with error details
curl http://localhost:8000/providers/status
```

### OpenAI-Compatible Usage

Drop-in replacement for OpenAI API with any provider:

```python
import openai

# Standard OpenAI client configuration
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="unused"  # API key not required for local providers
)

# Route to any provider using model format: provider/model
response = client.chat.completions.create(
    model="ollama/qwen3-coder:30b",      # Ollama provider
    messages=[{"role": "user", "content": "Hello!"}]
)

# Alternative: Use anthropic provider
response = client.chat.completions.create(
    model="anthropic/claude-3-5-haiku-latest",    # Anthropic provider
    messages=[{"role": "user", "content": "Write code"}],
    max_tokens=8000,
    temperature=0.7
)

# Embeddings with any provider
embeddings = client.embeddings.create(
    model="ollama/granite-embedding:278m",
    input=["Hello", "World"]
)
```

### Multimodal Requests (Files, Images, Documents)

AbstractCore server provides comprehensive support for multimodal requests through multiple compatible formats:

#### 1. AbstractCore @filename Syntax

Convenient syntax that works across all providers:

```python
import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

# Simple file attachment
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Analyze @report.pdf and @chart.png"}]
)

# Works with any provider
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",
    messages=[{"role": "user", "content": "Summarize @document.docx"}]
)
```

#### 2. OpenAI Vision API Format (with NEW type="file" Support)

Standard OpenAI multimodal format with enhanced file support:

```python
# Image with URL
response = client.chat.completions.create(
    model="ollama/qwen2.5vl:7b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

# NEW: Explicit file type for documents (PDF, DOCX, XLSX, CSV, etc.)
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this document"},
            {
                "type": "file",
                "file_url": {"url": "https://example.com/report.pdf"}
            }
        ]
    }]
)

# Mixed media (images, documents, data files)
response = client.chat.completions.create(
    model="anthropic/claude-3.5-sonnet",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Compare this chart with the data"},
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
            {"type": "file", "file_url": {"url": "https://example.com/data.csv"}}
        ]
    }]
)

# Document with base64 data (PDF, DOCX, CSV, etc.)
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="lmstudio/qwen/qwen3-next-80b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Summarize this document"},
            {"type": "file", "file_url": {"url": f"data:application/pdf;base64,{pdf_data}"}}
        ]
    }]
)

# Image with base64 data
import base64

with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}
            }
        ]
    }]
)
```

#### 3. OpenAI File Format (Forward-Compatible)

AbstractCore supports OpenAI's planned file format with simplified structure (consistent with image_url):

```python
# HTTP/HTTPS URLs
response = client.chat.completions.create(
    model="anthropic/claude-3-5-haiku-latest",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What are the key findings in this report?"},
            {
                "type": "file",
                "file_url": {
                    "url": "https://example.com/documents/report.pdf"
                }
            }
        ]
    }]
)

# Local file paths
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this local spreadsheet"},
            {
                "type": "file",
                "file_url": {
                    "url": "/Users/username/documents/spreadsheet.xlsx"
                }
            }
        ]
    }]
)

# Base64 data URLs
import base64

with open("report.pdf", "rb") as f:
    file_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="ollama/qwen3:4b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this document?"},
            {
                "type": "file",
                "file_url": {
                    "url": f"data:application/pdf;base64,{file_data}"
                }
            }
        ]
    }]
)
```

**Perfect Consistency with image_url:**
```python
# Images and files use identical URL pattern
content = [
    {"type": "text", "text": "Compare these"},
    {
        "type": "image_url",
        "image_url": {"url": "https://example.com/chart.png"}
    },
    {
        "type": "file",
        "file_url": {"url": "https://example.com/data.xlsx"}
    }
]
```

#### Supported File Types

- **Images**: PNG, JPEG, GIF, WEBP, BMP, TIFF
- **Documents**: PDF, DOCX, XLSX, PPTX
- **Data/Text**: CSV, TSV, TXT, MD, JSON, XML
- **Size Limits**: 10MB per file, 32MB total per request

### OpenAI Responses API (/v1/responses)

AbstractCore 2.5.0 introduces 100% OpenAI-compatible `/v1/responses` endpoint with native `input_file` support:

#### Why Use /v1/responses?

- **OpenAI Compatible**: Drop-in replacement for OpenAI's Responses API
- **Native File Support**: `input_file` type designed specifically for document attachments
- **Cleaner API**: Explicit separation between text (`input_text`) and files (`input_file`)
- **Backward Compatible**: Existing `messages` format still works alongside new `input` format
- **Optional Streaming**: Streaming opt-in with `"stream": true` (defaults to `false`)

#### OpenAI Responses API Format

```python
import requests

# Standard OpenAI Responses API format
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "gpt-4o",
        "input": [
            {
                "role": "user",
                "content": [
                    {"type": "input_text", "text": "Analyze this document"},
                    {"type": "input_file", "file_url": "https://example.com/report.pdf"}
                ]
            }
        ],
        "stream": False  # Optional streaming (defaults to False)
    }
)

# Works with any provider
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "lmstudio/qwen/qwen3-next-80b",
        "input": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": "Analyze the letter and provide a summary of the key points."
                    },
                    {
                        "type": "input_file",
                        "file_url": "https://www.berkshirehathaway.com/letters/2024ltr.pdf"
                    }
                ]
            }
        ]
    }
)

# Streaming mode (opt-in)
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "anthropic/claude-3.5-sonnet",
        "input": [
            {
                "role": "user",
                "content": [
                    {"type": "input_text", "text": "Summarize this report"},
                    {"type": "input_file", "file_url": "https://example.com/report.pdf"}
                ]
            }
        ],
        "stream": True  # Enable real-time streaming
    },
    stream=True  # Important for streaming responses
)

for line in response.iter_lines():
    if line:
        print(line.decode())
```

#### Legacy Format (Still Supported)

The endpoint automatically detects and supports the legacy `messages` format:

```python
# Legacy format (backward compatible)
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "openai/gpt-4",
        "messages": [
            {"role": "user", "content": "Tell me a story"}
        ],
        "stream": False
    }
)
```

#### Automatic Format Detection

The server automatically detects which format you're using:
- **OpenAI Format**: Presence of `input` field → converts to internal format
- **Legacy Format**: Presence of `messages` field → processes directly
- **Error**: Missing both `input` and `messages` → returns 400 error with clear message

#### Supported Media Types in input_file

All file types supported via URL, local path, or base64:

```python
# PDF from URL
{"type": "input_file", "file_url": "https://example.com/report.pdf"}

# Excel from local path
{"type": "input_file", "file_url": "/path/to/spreadsheet.xlsx"}

# CSV from base64
{"type": "input_file", "file_url": "data:text/csv;base64,RGF0ZSxQcm9kdW..."}

# PowerPoint from URL
{"type": "input_file", "file_url": "https://example.com/presentation.pptx"}
```

#### Complete Example with Multiple Files

```python
response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "openai/gpt-4o",
        "input": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": "Compare the financial data in the spreadsheet with the PDF report and the chart image"
                    },
                    {"type": "input_file", "file_url": "https://example.com/financial_data.xlsx"},
                    {"type": "input_file", "file_url": "https://example.com/annual_report.pdf"},
                    {"type": "input_file", "file_url": "https://example.com/quarterly_chart.png"}
                ]
            }
        ],
        "max_tokens": 2000,
        "temperature": 0.7,
        "stream": False
    }
)

print(response.json()["choices"][0]["message"]["content"])

#### Mixed Content Example

Combine multiple file types in a single request:

```python
# Multiple files with different formats
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Compare the chart with the spreadsheet data and summarize the document"},
            {
                "type": "image_url",
                "image_url": {"url": "data:image/png;base64,iVBORw0KGgoA..."}
            },
            {
                "type": "file",
                "file_url": {
                    "url": "https://example.com/data/spreadsheet.xlsx"
                }
            },
            {
                "type": "file",
                "file_url": {
                    "url": "/Users/username/documents/summary.pdf"
                }
            }
        ]
    }]
)
```

### Advanced Server Features

#### Request ID Tracking and Structured Logging

```python
import requests
import json

# All requests include tracking
response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "model": "ollama/qwen3-coder:30b",
        "messages": [{"role": "user", "content": "Hello"}],
        "stream": False
    },
    headers={"X-Request-ID": "my-custom-id"}  # Optional custom ID
)

# Response includes request tracking
data = response.json()
print(f"Request ID: {data.get('request_id')}")
print(f"Provider: {data.get('provider')}")
print(f"Model: {data.get('model')}")
print(f"Duration: {data.get('duration_ms')}ms")
```

#### Agent CLI Integration Through Syntax Conversion

The server automatically handles tool call format conversion for different agent CLIs:

```python
# Request with tool calls for Codex CLI
response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "model": "ollama/qwen3-coder:30b",
        "messages": [{"role": "user", "content": "Use the calculator"}],
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "calculate",
                    "description": "Perform calculations",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "expression": {"type": "string"}
                        }
                    }
                }
            }
        ],
        "tool_format": "codex"  # Server converts to Qwen3 format
    }
)

# Tool calls are automatically converted to requested format
```

#### Streaming Support

```python
import requests

# Streaming chat completions
response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "model": "ollama/qwen3-coder:30b",
        "messages": [{"role": "user", "content": "Write a story"}],
        "stream": True
    },
    stream=True
)

# Process streaming response
for line in response.iter_lines():
    if line:
        data = json.loads(line.decode('utf-8').replace('data: ', ''))
        if data.get('choices'):
            content = data['choices'][0]['delta'].get('content', '')
            print(content, end='', flush=True)
```

### Server Configuration

#### Environment Variables

```bash
# Server configuration
export ABSTRACTCORE_HOST=0.0.0.0
export ABSTRACTCORE_PORT=8000
export ABSTRACTCORE_WORKERS=4

# Provider configuration
export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
export HUGGINGFACE_API_TOKEN=your_hf_token

# Server features
export ABSTRACTCORE_ENABLE_CORS=true
export ABSTRACTCORE_MAX_REQUEST_SIZE=10485760  # 10MB
export ABSTRACTCORE_REQUEST_TIMEOUT=300        # 5 minutes
export ABSTRACTCORE_ENABLE_METRICS=true
```

#### Custom Configuration

```python
# custom_server.py
from abstractcore.server.app import app
from abstractcore.server.config import ServerConfig

# Custom server configuration
config = ServerConfig(
    host="0.0.0.0",
    port=8000,
    enable_cors=True,
    max_request_size=10 * 1024 * 1024,  # 10MB
    request_timeout=300,                 # 5 minutes
    enable_metrics=True,
    log_level="INFO",
    provider_timeout=60,                 # Provider timeout
    default_provider="ollama",           # Default provider
    enable_tool_conversion=True          # Enable tool format conversion
)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host=config.host, port=config.port)
```

### Production Deployment

#### Docker Deployment

```dockerfile
# Dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8000

CMD ["uvicorn", "abstractcore.server.app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
```

#### Kubernetes Deployment

```yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: abstractcore-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: abstractcore-server
  template:
    metadata:
      labels:
        app: abstractcore-server
    spec:
      containers:
      - name: abstractcore-server
        image: abstractcore:latest
        ports:
        - containerPort: 8000
        env:
        - name: ABSTRACTCORE_WORKERS
          value: "4"
        - name: ABSTRACTCORE_ENABLE_METRICS
          value: "true"
---
apiVersion: v1
kind: Service
metadata:
  name: abstractcore-service
spec:
  selector:
    app: abstractcore-server
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer
```

### Monitoring and Observability

```python
# Access server metrics
response = requests.get("http://localhost:8000/metrics")
metrics = response.json()

print(f"Total requests: {metrics['total_requests']}")
print(f"Active connections: {metrics['active_connections']}")
print(f"Average response time: {metrics['avg_response_time_ms']}ms")
print(f"Provider status: {metrics['providers']}")
```

### Benefits

- **Drop-in OpenAI Replacement**: Use any provider through familiar OpenAI API
- **Agent CLI Compatible**: Automatic tool call format conversion
- **Production Ready**: Request tracking, metrics, health checks
- **Multi-Provider Routing**: Route to any provider using model prefix
- **Comprehensive Logging**: Structured logs with request tracing
- **Scalable**: Supports horizontal scaling with load balancers

## Production Resilience

Enterprise-grade error handling, retries, and circuit breaker patterns for robust applications:

### RetryManager and Circuit Breaker

```python
from abstractcore import create_llm
from abstractcore.resilience import RetryManager, CircuitBreaker

# Configure retry strategy
retry_manager = RetryManager(
    max_attempts=3,              # Maximum retry attempts
    backoff_strategy="exponential",  # exponential, linear, fixed
    base_delay=1.0,             # Base delay in seconds
    max_delay=30.0,             # Maximum delay between retries
    jitter=True,                # Add randomness to prevent thundering herd
    retry_exceptions=[           # Which exceptions to retry
        "ConnectionError",
        "TimeoutError",
        "RateLimitError"
    ]
)

# Configure circuit breaker
circuit_breaker = CircuitBreaker(
    failure_threshold=5,         # Failures before opening circuit
    success_threshold=3,         # Successes needed to close circuit
    timeout=60,                  # Seconds to wait before trying again
    monitor_window=300          # Time window for failure counting
)

# Create LLM with resilience features
llm = create_llm(
    "openai",
    model="gpt-4o-mini",
    retry_manager=retry_manager,
    circuit_breaker=circuit_breaker
)

# Resilient generation with automatic retries
response = llm.generate("What is the capital of France?")
# Automatically retries on transient failures
```

### Recovery Timeout Management

```python
from abstractcore.resilience import TimeoutManager

# Configure comprehensive timeout strategy
timeout_manager = TimeoutManager(
    connection_timeout=10,       # Connection establishment timeout
    read_timeout=60,            # Response read timeout
    total_timeout=120,          # Total request timeout
    recovery_timeout=30,        # Time to wait before retry after failure
    adaptive_timeout=True,      # Adjust timeouts based on response patterns
    timeout_multiplier=1.5      # Multiplier for progressive timeout increases
)

# Apply to LLM
llm = create_llm(
    "anthropic",
    model="claude-3-5-haiku-latest",
    timeout_manager=timeout_manager
)

# Session-level timeout management
from abstractcore import BasicSession

session = BasicSession(llm)
session.set_timeout(45)              # Default timeout for all operations
session.set_tool_timeout(120)        # Specific timeout for tool execution
session.set_recovery_timeout(60)     # Recovery timeout after failures

# Get current timeout settings
timeouts = session.get_timeout_config()
print(f"Generation timeout: {timeouts['generation']}")
print(f"Tool timeout: {timeouts['tool']}")
print(f"Recovery timeout: {timeouts['recovery']}")
```

### Automatic Backoff Strategies

```python
from abstractcore.resilience import BackoffStrategy

# Exponential backoff with jitter
exponential_backoff = BackoffStrategy(
    strategy="exponential",
    base_delay=1.0,
    max_delay=60.0,
    multiplier=2.0,
    jitter=True,                # Adds ±25% randomness
    jitter_range=0.25
)

# Linear backoff
linear_backoff = BackoffStrategy(
    strategy="linear",
    base_delay=2.0,
    increment=1.5,
    max_delay=30.0
)

# Custom backoff function
def custom_backoff(attempt, context):
    """Custom backoff based on error type and provider load."""
    if context.error_type == "RateLimitError":
        return min(60, 2 ** attempt)  # Exponential for rate limits
    elif context.provider_load > 0.8:
        return 5 + attempt * 2        # Linear increase when provider busy
    else:
        return 1                      # Fast retry for other errors

custom_backoff_strategy = BackoffStrategy(
    strategy="custom",
    custom_function=custom_backoff
)

# Apply backoff strategy
llm = create_llm(
    "ollama",
    model="qwen3-coder:30b",
    backoff_strategy=exponential_backoff
)
```

### Exception Hierarchy and Error Handling

```python
from abstractcore.exceptions import (
    AbstractCoreError,           # Base exception
    ProviderAPIError,           # Provider-specific API errors
    ModelNotFoundError,         # Model not available
    AuthenticationError,        # Invalid credentials
    RateLimitError,            # Rate limit exceeded
    TimeoutError,              # Request timeout
    ValidationError,           # Input validation failed
    TokenLimitError,           # Token limit exceeded
    CircuitBreakerError        # Circuit breaker open
)

try:
    response = llm.generate("Complex request")
except RateLimitError as e:
    print(f"Rate limited: {e.retry_after} seconds")
    print(f"Provider: {e.provider}")
    print(f"Current usage: {e.current_usage}")

except TokenLimitError as e:
    print(f"Token limit exceeded: {e.token_count}/{e.token_limit}")
    print(f"Suggestion: {e.suggestion}")

except CircuitBreakerError as e:
    print(f"Service unavailable: {e.estimated_recovery_time}")
    print(f"Failure count: {e.failure_count}")

except ProviderAPIError as e:
    print(f"Provider error: {e.provider_message}")
    print(f"Error code: {e.error_code}")
    print(f"Retryable: {e.is_retryable}")

except AbstractCoreError as e:
    print(f"AbstractCore error: {e}")
    print(f"Context: {e.context}")
```

### Health Checks and Monitoring

```python
from abstractcore.resilience import HealthChecker

# Configure health monitoring
health_checker = HealthChecker(
    check_interval=30,          # Health check every 30 seconds
    failure_threshold=3,        # Mark unhealthy after 3 failures
    success_threshold=2,        # Mark healthy after 2 successes
    check_timeout=10,          # Timeout for health checks
    checks=[
        "provider_connectivity",
        "model_availability",
        "token_limits",
        "response_latency"
    ]
)

# Monitor provider health
llm = create_llm(
    "openai",
    model="gpt-4o-mini",
    health_checker=health_checker
)

# Check current health status
health_status = llm.get_health_status()
print(f"Provider healthy: {health_status['healthy']}")
print(f"Last check: {health_status['last_check']}")
print(f"Response time: {health_status['avg_response_time']}ms")
print(f"Success rate: {health_status['success_rate']}%")

# Automatic failover based on health
if not health_status['healthy']:
    # Automatically switch to backup provider
    backup_llm = create_llm("anthropic", model="claude-3-5-haiku-latest")
    response = backup_llm.generate("Fallback request")
```

### Production Configuration Patterns

```python
# Production-ready configuration
from abstractcore import create_llm
from abstractcore.resilience import ProductionConfig

# Pre-configured production settings
production_config = ProductionConfig(
    # Retry configuration
    max_retries=3,
    retry_delay=1.0,
    retry_backoff=2.0,

    # Circuit breaker
    circuit_failure_threshold=5,
    circuit_timeout=60,

    # Timeouts
    connection_timeout=10,
    read_timeout=60,
    total_timeout=120,

    # Health monitoring
    health_check_interval=30,
    health_failure_threshold=3,

    # Error handling
    log_all_errors=True,
    alert_on_circuit_open=True,
    metrics_collection=True
)

# Apply production configuration
llm = create_llm(
    "openai",
    model="gpt-4o-mini",
    production_config=production_config
)

# Session with production resilience
session = BasicSession(
    llm,
    production_config=production_config,
    auto_recovery=True,          # Automatically recover from failures
    persistence_enabled=True,    # Persist conversation through failures
    checkpoint_interval=10       # Save checkpoint every 10 messages
)
```

### Graceful Degradation

```python
from abstractcore.resilience import GracefulDegradation

# Configure fallback strategies
degradation = GracefulDegradation(
    fallback_providers=["anthropic", "ollama"],  # Fallback order
    reduce_quality=True,         # Use faster/smaller models if needed
    cache_responses=True,        # Cache for offline scenarios
    offline_mode=True           # Enable offline capabilities
)

# LLM with graceful degradation
llm = create_llm(
    "openai",
    model="gpt-4o",
    graceful_degradation=degradation
)

# Automatically handles provider failures
try:
    response = llm.generate("Complex analysis task")
    # Will automatically:
    # 1. Retry with exponential backoff
    # 2. Switch to Anthropic if OpenAI fails
    # 3. Switch to Ollama if both cloud providers fail
    # 4. Use cached response if all providers fail
except Exception as e:
    print(f"All fallback strategies exhausted: {e}")
```

### Benefits

- **Enterprise Reliability**: Circuit breakers and retry mechanisms prevent cascading failures
- **Intelligent Backoff**: Prevents overwhelming providers with failed requests
- **Automatic Recovery**: Self-healing systems with minimal manual intervention
- **Comprehensive Monitoring**: Health checks and metrics for proactive issue detection
- **Graceful Degradation**: Maintains functionality even during partial system failures
- **Production Ready**: Battle-tested patterns for high-availability applications

## Debug Capabilities and Self-Healing JSON

Robust debugging and error recovery features:

```bash
# Debug raw LLM responses for troubleshooting
judge document.txt --debug --provider lmstudio --model qwen/qwen3-next-80b

# Automatic JSON self-repair handles truncated/malformed responses
# Uses self_fixes.py for intelligent error recovery
# Increased token limits prevent truncation: max_tokens=32k, max_output_tokens=8k

# Focus areas for targeted evaluation (new --focus parameter)
judge README.md --focus "architectural diagrams, technical comparisons" --debug
```

**Key Features:**
- **Self-Healing JSON**: Automatically repairs truncated or malformed JSON responses
- **Debug Mode**: `--debug` flag shows raw LLM responses for troubleshooting
- **Focus Areas**: `--focus` parameter for targeted evaluation (replaces `--custom-criteria`)
- **Increased Token Limits**: Default `max_tokens=32000`, `max_output_tokens=8000` prevent truncation
- **Consistent CLI Syntax**: All apps use space syntax (`--param value`) for consistency

### Complete CLI Parameters Reference

#### Extractor Parameters
```bash
# Core parameters
--focus FOCUS                    # Specific focus area (e.g., "technology", "business")
--format {json-ld,triples,json,yaml}  # Output format
--entity-types TYPES             # Comma-separated types (person,organization,location,etc.)
--output OUTPUT                  # Output file path

# Performance & Quality
--mode {fast,balanced,thorough}  # Extraction mode (balanced=default)
--iterate N                      # Refinement iterations (default: 1)
--similarity-threshold 0.0-1.0   # Entity deduplication threshold (default: 0.85)
--no-embeddings                  # Disable semantic deduplication
--minified                       # Compact JSON output

# LLM Configuration
--provider PROVIDER --model MODEL  # Custom LLM provider/model
--max-tokens 32000               # Context window (default: 32000)
--max-output-tokens 8000         # Output tokens (default: 8000)
--timeout 300                    # HTTP timeout seconds (default: 300)
--chunk-size 8000                # Chunk size for large files
```

#### Judge Parameters
```bash
# Evaluation Configuration
--criteria CRITERIA              # Standard criteria (clarity,soundness,etc.)
--focus FOCUS                    # Primary focus areas for evaluation
--context CONTEXT                # Evaluation context description
--reference FILE_OR_TEXT         # Reference content for comparison

# Output & Debug
--format {json,plain,yaml}       # Output format (default: json)
--debug                          # Show raw LLM responses
--include-criteria               # Include detailed criteria explanations
--exclude-global                 # Skip global assessment for multiple files

# LLM Configuration  
--temperature 0.1                # Evaluation consistency (default: 0.1)
--max-tokens 32000               # Context window (default: 32000)
--max-output-tokens 8000         # Output tokens (default: 8000)
--timeout 300                    # HTTP timeout seconds
```

#### Summarizer Parameters
```bash
# Content Configuration
--style {structured,narrative,objective,analytical,executive,conversational}
--length {brief,standard,detailed,comprehensive}
--focus FOCUS                    # Specific focus area
--chunk-size 8000                # Chunk size for large files (max: 32000)

# Output & Performance
--output OUTPUT                  # Output file path
--max-tokens 32000               # Context window (default: 32000)
--max-output-tokens 8000         # Output tokens (default: 8000)
--verbose                        # Show detailed progress
```

---

## Documentation Index

### Essential Guides
- **[Getting Started](docs/getting-started.md)**: 5-minute setup and first LLM call
- **[API Reference](docs/api-reference.md)**: Complete Python API with examples
- **[Centralized Configuration](docs/centralized-config.md)**: Global defaults, app preferences, and configuration management
- **[Media Handling System](docs/media-handling-system.md)**: Universal file attachment and processing across all providers
- **[Vision Capabilities](docs/vision-capabilities.md)**: Image analysis across 7 providers with automatic optimization
- **[Tool Calling](docs/tool-calling.md)**: Universal @tool decorator system
- **[Session Management](docs/session.md)**: Persistent conversations with analytics

### CLI Applications
- **[Summarizer](docs/apps/basic-summarizer.md)**: Document summarization (`summarizer doc.pdf`)
- **[Extractor](docs/apps/basic-extractor.md)**: Knowledge graph extraction (`extractor report.txt`)
- **[Judge](docs/apps/basic-judge.md)**: Text evaluation (`judge essay.txt`)

### Advanced Topics
- **[Server Guide](docs/server.md)**: OpenAI-compatible HTTP API
- **[Embeddings](docs/embeddings.md)**: Semantic search and RAG
- **[Architecture](docs/architecture.md)**: System design overview with media processing
- **Error Handling**: Comprehensive exception hierarchy for robust applications
- **Token Management**: Centralized token counting, cost estimation, and management utilities (`abstractcore.utils.token_utils`)

### Examples & Support
- **[Examples](examples/)**: Progressive tutorials and real-world patterns
- **[Troubleshooting](docs/troubleshooting.md)**: Common issues and solutions
- **[GitHub Issues](https://github.com/lpalbou/AbstractCore/issues)**: Report bugs and get help

---

## Quick Links

- **GitHub**: https://github.com/lpalbou/AbstractCore
- **PyPI**: `pip install abstractcore[all]`
- **License**: MIT | **Python**: 3.9+ | **Status**: Production Ready

**For Developers and Architects:**
- **Provider Agnostic**: Switch between OpenAI, Anthropic, Ollama, and others without code changes
- **Production Ready**: Enterprise-grade error handling, retries, and resilience patterns
- **Universal Tool Calling**: Enable function calling on any provider, even those without native support
- **Media Processing**: Attach any file type (PDFs, images, Office docs) with simple `@filename` syntax
- **Built-in Apps**: Skip boilerplate with ready-to-use CLI tools for common document processing tasks