# AbstractCore - Complete Documentation This file contains the complete documentation for AbstractCore, a unified Python library providing a consistent interface to all major LLM providers. ## Table of Contents 1. [Project Overview](#project-overview) - What AbstractCore does and supported providers 2. [Installation](#installation) - Quick setup with provider-specific options 3. [Centralized Configuration](#centralized-configuration) - Global defaults and preferences management 4. [Media Handling System](#media-handling-system) - Universal file attachment and processing 5. [Vision Capabilities](#vision-capabilities) - Image analysis across all providers 6. [Getting Started](#getting-started) - Essential code examples and patterns 7. [Provider Discovery](#provider-discovery) - Programmatic provider and model discovery 8. [Built-in CLI Applications](#built-in-cli-applications) - Ready-to-use terminal tools 9. [Debug & Advanced Features](#debug-capabilities-and-self-healing-json) - Troubleshooting and performance 10. [Complete CLI Parameters](#complete-cli-parameters-reference) - All available options 11. [Documentation Index](#documentation-index) - Links to detailed guides --- ## Project Overview AbstractCore is a comprehensive Python library providing a unified interface to all major LLM providers (OpenAI, Anthropic, Ollama, MLX, LMStudio, HuggingFace) with production-grade features including universal media handling, vision capabilities, tool calling, streaming, centralized configuration, and session management. ### Key Features - **Universal Provider Support**: OpenAI, Anthropic, Ollama, LMStudio, MLX, HuggingFace with identical syntax - **Centralized Configuration**: Global defaults, app-specific preferences, API key management with `~/.abstractcore/config/abstractcore.json` - **Media Handling System**: Universal file attachment with `@filename` CLI syntax and `media=[]` API parameter - supports images, PDFs, Office docs, CSV/TSV - **Vision Capabilities**: Image analysis across all providers with automatic resolution optimization and vision fallback for text-only models - **Tool Calling**: Universal @tool decorator works across ALL providers with intelligent format conversion - **Streaming**: Real-time responses for interactive applications with tool call detection - **Session Management**: Persistent conversations with metadata and automatic context management - **Structured Output**: Pydantic model validation with automatic retries and error feedback - **Embeddings**: SOTA embedding models for semantic search and RAG applications - **Event System**: Monitor costs, performance, tool executions, and media processing - **Built-in CLI Apps**: Ready-to-use terminal applications (`summarizer`, `extractor`, `judge`) for document processing - **HTTP Server**: OpenAI-compatible REST API with provider discovery and unified access - **Production Ready**: Robust error handling, retries, circuit breakers, and graceful degradation ### Supported Providers | Provider | Features | |----------|----------| | **OpenAI** | Native tool calls, streaming, structured output, vision (GPT-4o), media processing | | **Anthropic** | Native tool calls, streaming, structured output, vision (Claude 3.5), media processing | | **Ollama** | Prompted tool calls, streaming, local models, vision (qwen2.5vl:7b), media processing | | **LMStudio** | Prompted tool calls, streaming, local models, vision (qwen2.5-vl), media processing | | **MLX** | Prompted tool calls, streaming, Apple Silicon, vision models, media processing | | **HuggingFace** | Prompted tool calls, streaming, open models, vision models, media processing | --- ## Installation ### Basic Installation ```bash pip install abstractcore ``` ### Provider-Specific Installation ```bash # OpenAI pip install abstractcore[openai] # Anthropic pip install abstractcore[anthropic] # Ollama (local models) pip install abstractcore[ollama] # LMStudio (local models) pip install abstractcore[lmstudio] # MLX (Apple Silicon) pip install abstractcore[mlx] # HuggingFace pip install abstractcore[huggingface] # Media handling (images, basic documents) pip install abstractcore[media] # Server support pip install abstractcore[server] # Embeddings pip install abstractcore[embeddings] # Everything (includes all features) pip install abstractcore[all] ``` --- ## Centralized Configuration AbstractCore provides a unified configuration system that manages default models, cache directories, logging settings, and other package-wide preferences from a single location. This eliminates the need to specify providers and models repeatedly and provides consistent behavior across all applications. ### Configuration File Location Configuration is stored in: `~/.abstractcore/config/abstractcore.json` ### Quick Configuration Setup ```bash # Check current configuration status abstractcore --status # Set global fallback model (used when no app-specific default is configured) abstractcore --set-global-default ollama/llama3:8b # Set app-specific defaults for optimal performance abstractcore --set-app-default summarizer openai gpt-4o-mini abstractcore --set-app-default extractor ollama qwen3:4b-instruct abstractcore --set-app-default judge anthropic claude-3-5-haiku abstractcore --set-app-default cli huggingface unsloth/Qwen3-4B-Instruct-2507-GGUF # Set API keys abstractcore --set-api-key openai sk-your-key-here abstractcore --set-api-key anthropic your-anthropic-key # Configure logging abstractcore --set-console-log-level WARNING # Reduce console output abstractcore --enable-file-logging # Save logs to files # Interactive setup abstractcore --configure ``` ### Configuration Priority System AbstractCore uses a clear priority hierarchy: 1. **Explicit Parameters** (highest priority) ```bash summarizer document.txt --provider openai --model gpt-4o-mini ``` 2. **App-Specific Configuration** ```bash abstractcore --set-app-default summarizer openai gpt-4o-mini ``` 3. **Global Configuration** ```bash abstractcore --set-global-default openai/gpt-4o-mini ``` 4. **Hardcoded Defaults** (lowest priority) - Current default: `huggingface/unsloth/Qwen3-4B-Instruct-2507-GGUF` ### Application Defaults Set default providers and models for specific AbstractCore applications: ```bash # Set defaults for individual apps abstractcore --set-app-default summarizer openai gpt-4o-mini abstractcore --set-app-default cli anthropic claude-3-5-haiku abstractcore --set-app-default extractor ollama qwen3:4b-instruct abstractcore --set-app-default judge anthropic claude-3-5-haiku # View current app defaults abstractcore --status ``` ### Logging Configuration Control logging behavior across all AbstractCore components: ```bash # Change console logging level abstractcore --set-console-log-level DEBUG # Show all messages abstractcore --set-console-log-level INFO # Show info and above abstractcore --set-console-log-level WARNING # Show warnings and errors only (default) abstractcore --set-console-log-level ERROR # Show only errors abstractcore --set-console-log-level NONE # Disable all console logging # File logging controls abstractcore --enable-file-logging # Start saving logs to files abstractcore --disable-file-logging # Stop saving logs to files abstractcore --set-log-base-dir ~/.abstractcore/logs # Quick commands abstractcore --enable-debug-logging # Sets both console and file to DEBUG abstractcore --disable-console-logging # Keeps file logging if enabled ``` ### Cache and Storage Configuration ```bash # Set cache directories abstractcore --set-default-cache-dir ~/.cache/abstractcore abstractcore --set-huggingface-cache-dir ~/.cache/huggingface abstractcore --set-local-models-cache-dir ~/.abstractcore/models ``` ### Vision Fallback Configuration Configure vision processing for text-only models: ```bash # Download local vision model (recommended) abstractcore --download-vision-model # Use existing Ollama model abstractcore --set-vision-caption qwen2.5vl:7b # Use cloud API abstractcore --set-vision-provider openai --model gpt-4o # Disable vision fallback abstractcore --disable-vision ``` ### Streaming Defaults ```bash # Set default streaming behavior for CLI abstractcore --stream on # Enable streaming by default abstractcore --stream off # Disable streaming by default ``` ### Configuration Status View complete configuration with helpful change commands: ```bash abstractcore --status ``` This displays a hierarchical dashboard showing: - 🎯 Application Defaults (CLI, Summarizer, Extractor, Judge) - 🌐 Global Fallback settings - 👁️ Media Processing configuration - 🔑 Provider Access (API key status) - 📝 Logging configuration - 💾 Storage locations ### Common Configuration Workflows **First-Time Setup:** ```bash # Check what's available abstractcore --status # Configure for development (free local models) abstractcore --set-global-default ollama/llama3:8b abstractcore --set-console-log-level WARNING # Add API keys when ready for cloud providers abstractcore --set-api-key openai sk-your-key-here abstractcore --set-api-key anthropic your-anthropic-key # Verify everything works abstractcore --status ``` **Development Environment:** ```bash # Optimize for local development abstractcore --set-global-default ollama/llama3:8b # Free local models abstractcore --enable-debug-logging # Detailed logs for debugging abstractcore --set-app-default cli ollama qwen3:4b # Fast model for CLI testing ``` **Production Environment:** ```bash # Configure for production reliability and performance abstractcore --set-global-default openai/gpt-4o-mini # Reliable cloud provider abstractcore --set-console-log-level WARNING # Reduce noise abstractcore --enable-file-logging # Persistent logs abstractcore --set-app-default summarizer openai gpt-4o-mini # Optimize for quality ``` **Multi-Environment Approach:** ```bash # Use different providers for different applications abstractcore --set-app-default cli ollama qwen3:4b # Fast for development abstractcore --set-app-default summarizer openai gpt-4o-mini # Quality for documents abstractcore --set-app-default judge anthropic claude-3-5-haiku # Detailed analysis ``` For complete configuration reference, see: [Centralized Configuration Guide](docs/centralized-config.md) --- ## Media Handling System AbstractCore provides a **production-ready unified media handling system** that enables seamless file attachment and processing across all LLM providers and models. The system automatically processes images, documents, and other media files using the same simple API, with intelligent provider-specific formatting and graceful fallback handling. ### Key Features - **Universal API**: Same `media=[]` parameter works across all providers (OpenAI, Anthropic, Ollama, LMStudio, etc.) - **CLI Integration**: Simple `@filename` syntax in CLI for instant file attachment - **Intelligent Processing**: Automatic file type detection with specialized processors for each format - **Provider Adaptation**: Automatic formatting for each provider's API requirements - **Robust Fallback**: Graceful degradation when advanced processing fails - **Cross-Format Support**: Images, PDFs, Office documents, CSV/TSV, text files all work seamlessly ### Quick Start ```python from abstractcore import create_llm # Works with any provider - just change the provider name llm = create_llm("openai", model="gpt-4o", api_key="your-key") response = llm.generate( "What's in this image and document?", media=["photo.jpg", "report.pdf"] ) # Same code works with any provider llm = create_llm("anthropic", model="claude-3.5-sonnet") response = llm.generate( "Analyze these materials", media=["chart.png", "data.csv", "presentation.pptx"] ) ``` ### CLI Integration Use the simple `@filename` syntax to attach any file type: ```bash # PDF Analysis python -m abstractcore.utils.cli --prompt "What is this document about? @report.pdf" # Office Documents python -m abstractcore.utils.cli --prompt "Summarize this presentation @slides.pptx" python -m abstractcore.utils.cli --prompt "What data is in @spreadsheet.xlsx" python -m abstractcore.utils.cli --prompt "Analyze this document @contract.docx" # Data Files python -m abstractcore.utils.cli --prompt "What patterns are in @sales_data.csv" # Images python -m abstractcore.utils.cli --prompt "What's in this image? @screenshot.png" # Mixed Media python -m abstractcore.utils.cli --prompt "Compare @chart.png and @data.csv and explain trends" ``` ### Supported File Types **Images (Vision Models):** - **Formats**: PNG, JPEG, GIF, WEBP, BMP, TIFF - **Features**: Automatic optimization, resizing, format conversion, EXIF handling **Documents:** - **Text Files**: TXT, MD, CSV, TSV, JSON with intelligent parsing and data analysis - **PDF**: Full text extraction with PyMuPDF4LLM, preserves formatting and structure - **Office**: DOCX, XLSX, PPTX with complete content extraction using Unstructured library - **Word**: Full document analysis with structure preservation - **Excel**: Sheet-by-sheet extraction with data analysis - **PowerPoint**: Slide content extraction with comprehensive analysis ### How It Works Behind the Scenes The media system uses a sophisticated multi-layer architecture: 1. **File Attachment Processing**: CLI `@filename` syntax and Python `media=[]` parameter 2. **Intelligent Processing**: AutoMediaHandler selects appropriate processors (Image, PDF, Office, Text) 3. **Provider Formatting**: Same content formatted differently for each provider's API 4. **Graceful Fallback**: Multi-level fallback ensures users always get meaningful results **Provider-Specific Formatting Example:** ```python # OpenAI Format (JSON) { "role": "user", "content": [ {"type": "text", "text": "Analyze these files"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0..."}}, {"type": "text", "text": "PDF Content: # Report Title\n\nExecutive Summary..."} ] } # Anthropic Format (Messages API) { "role": "user", "content": [ {"type": "text", "text": "Analyze these files"}, {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "iVBORw0..."}}, {"type": "text", "text": "PDF Content: # Report Title\n\nExecutive Summary..."} ] } ``` For complete media handling documentation, see: [Media Handling System Guide](docs/media-handling-system.md) --- ## Vision Capabilities AbstractCore provides comprehensive **vision capabilities** that enable seamless image analysis across multiple AI providers and models. The system automatically handles image optimization, provider-specific formatting, and intelligent fallback mechanisms. ### Overview AbstractCore provides comprehensive vision capabilities across all major AI providers with automatic image optimization and intelligent fallback mechanisms. The same code works identically whether you're using cloud APIs (OpenAI, Anthropic) or local models (Ollama, LMStudio). ### Supported Providers and Models **Cloud Providers:** - **OpenAI**: GPT-4o, GPT-4 Turbo Vision (multiple images, up to 4096×4096) - **Anthropic**: Claude 3.5 Sonnet, Claude 3 Haiku (up to 20 images, 1568×1568) **Local Providers:** - **Ollama**: qwen2.5vl:7b, llama3.2-vision:11b, gemma3:4b - **LMStudio**: qwen/qwen2.5-vl-7b, google/gemma-3n-e4b - **HuggingFace**: Qwen2.5-VL variants, LLaVA models - **MLX**: Vision models via MLX framework **Image Formats**: PNG, JPEG, GIF, WEBP, BMP, TIFF with automatic optimization ### Basic Vision Analysis ```python from abstractcore import create_llm # Works with any vision-capable provider llm = create_llm("openai", model="gpt-4o") # Single image analysis response = llm.generate( "What objects do you see in this image?", media=["photo.jpg"] ) # Multiple images comparison response = llm.generate( "Compare these architectural styles and identify differences", media=["building1.jpg", "building2.jpg", "building3.jpg"] ) ``` ### Cross-Provider Consistency ```python # Same code works across all providers image_files = ["chart.png", "document.pdf"] prompt = "Analyze the data in these files" # All work identically openai_response = create_llm("openai", model="gpt-4o").generate(prompt, media=image_files) anthropic_response = create_llm("anthropic", model="claude-3-5-sonnet").generate(prompt, media=image_files) ollama_response = create_llm("ollama", model="qwen2.5vl:7b").generate(prompt, media=image_files) ``` ### Vision Fallback System The **Vision Fallback System** enables text-only models to process images through a transparent two-stage pipeline: ```bash # Configure vision fallback (one-time setup) abstractcore --download-vision-model # Download local model (recommended) # OR abstractcore --set-vision-caption qwen2.5vl:7b # Use existing Ollama model # OR abstractcore --set-vision-provider openai --model gpt-4o # Use cloud API ``` ```python # After configuration, text-only models can process images seamlessly text_llm = create_llm("lmstudio", model="qwen/qwen3-next-80b") # No native vision response = text_llm.generate( "What's happening in this image?", media=["complex_scene.jpg"] ) # Works transparently: vision model analyzes image → text model processes description ``` ### Automatic Resolution Optimization AbstractCore automatically optimizes images for each model's maximum capability: ```python # Images automatically optimized per model llm = create_llm("openai", model="gpt-4o") response = llm.generate("Analyze this", media=["photo.jpg"]) # Auto-resized to 4096×4096 llm = create_llm("ollama", model="qwen2.5vl:7b") response = llm.generate("Analyze this", media=["photo.jpg"]) # Auto-resized to 3584×3584 ``` ### Structured Vision Analysis ```python # Get structured responses with specific requirements llm = create_llm("openai", model="gpt-4o") response = llm.generate(""" Analyze this image and provide: - objects: list of objects detected - colors: dominant colors - setting: location/environment - activities: what's happening Format as JSON. """, media=["scene.jpg"]) import json analysis = json.loads(response.content) ``` For complete vision capabilities documentation, see: [Vision Capabilities Guide](docs/vision-capabilities.md) --- ## Getting Started ### Create LLM with Any Provider AbstractCore supports 6 major providers with **identical syntax**. Choose your provider: ```python from abstractcore import create_llm # OpenAI (requires OPENAI_API_KEY) llm = create_llm("openai", model="gpt-4o-mini") # Anthropic (requires ANTHROPIC_API_KEY) llm = create_llm("anthropic", model="claude-3-5-haiku-latest") # Ollama (local - ensure Ollama is running) llm = create_llm("ollama", model="qwen3-coder:30b") # LMStudio (local - ensure LMStudio server is running) llm = create_llm("lmstudio", model="llama-3.2-8b-instruct") # MLX (Apple Silicon only) llm = create_llm("mlx", model="qwen3-air-4bit") # HuggingFace (requires HUGGINGFACE_API_TOKEN for gated models) llm = create_llm("huggingface", model="microsoft/DialoGPT-large") # List available models for any provider available_models = llm.list_available_models() print(f"Available models: {available_models}") # Same interface for all providers response = llm.generate("What is the capital of France?") print(response.content) ``` ### Essential Patterns ```python from abstractcore import create_llm from abstractcore.tools import tool from pydantic import BaseModel # 1. Basic Usage - Works with any provider llm = create_llm("openai", model="gpt-4o-mini") # or any provider above response = llm.generate("What is the capital of France?") print(response.content) # 2. Universal Tool Calling @tool def get_weather(city: str) -> str: """Get current weather for a city.""" return f"Weather in {city}: 72°F, Sunny" response = llm.generate("What's the weather in Paris?", tools=[get_weather]) # 3. Structured Output class Person(BaseModel): name: str age: int person = llm.generate("Extract: John Doe is 25", response_model=Person) print(f"{person.name}, age {person.age}") ``` ### Common Development Patterns ```python # Streaming responses for user interfaces for chunk in llm.generate("Tell me a story", stream=True): print(chunk.content, end="", flush=True) # Session management for chatbots from abstractcore import BasicSession session = BasicSession(llm, system_prompt="You are a helpful assistant.") session.add_message('user', 'My name is Alice') response = session.generate('What is my name?') # Remembers context # Error handling for production apps try: response = llm.generate("Complex request") except Exception as e: print(f"Error: {e}") # Handle gracefully # Configuration for different environments from abstractcore import create_llm # Development (local models) dev_llm = create_llm("ollama", model="qwen3:4b") # Production (cloud APIs) prod_llm = create_llm("openai", model="gpt-4o-mini", api_key="your-key") # Cost-sensitive applications (smaller models) cost_llm = create_llm("anthropic", model="claude-3-5-haiku-latest") ``` --- ## Provider Discovery Discover available providers and their capabilities programmatically. This is particularly useful for building applications that need to adapt to available providers or for administrative tooling. ### Basic Provider Discovery ```python from abstractcore.providers import get_all_providers_with_models # Get information about all available providers providers = get_all_providers_with_models() for provider in providers: print(f"Provider: {provider['display_name']}") print(f"Models available: {provider['model_count']}") print(f"Local provider: {provider['local_provider']}") print(f"Auth required: {provider['authentication_required']}") print(f"Features: {', '.join(provider['supported_features'])}") if provider.get('installation_extras'): print(f"Install: pip install abstractcore[{provider['installation_extras']}]") print("---") ``` ### Provider Status and Model Discovery ```python from abstractcore.providers import ( list_available_providers, # Get provider names get_provider_info, # Detailed provider info is_provider_available, # Check availability get_available_models_for_provider # Get models for provider ) # Check available providers available = list_available_providers() print(f"Available providers: {available}") # Get models for a specific provider if is_provider_available("ollama"): models = get_available_models_for_provider("ollama") print(f"Ollama models: {models[:5]}...") # First 5 models # Get detailed provider information if is_provider_available("openai"): info = get_provider_info("openai") print(f"Default model: {info.default_model}") print(f"Features: {info.supported_features}") ``` ### HTTP API Discovery Access provider information through the HTTP API: ```bash # Get all providers curl http://localhost:8000/providers # Get specific provider curl http://localhost:8000/providers/openai # Get models for provider curl http://localhost:8000/providers/ollama/models ``` The response includes comprehensive metadata: ```json { "providers": [ { "name": "openai", "display_name": "OpenAI", "model_count": 15, "status": "available", "local_provider": false, "authentication_required": true, "supported_features": ["chat", "completion", "embeddings", "native_tools"], "models": ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"] } ] } ``` --- ## Universal Tool Calling Enable LLMs to call Python functions across **ALL** providers, including those without native tool support. This allows you to build interactive applications where the LLM can execute code, query databases, make API calls, or perform any Python operation. ```python from abstractcore import create_llm, BasicSession from abstractcore.tools import tool @tool def get_weather(city: str) -> str: """Get current weather for a specified city.""" return f"Weather in {city}: 72°F, Sunny" @tool def calculate(expression: str) -> float: """Perform mathematical calculations.""" return eval(expression) # Simplified for demo # Method 1: Direct provider usage llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M") response = llm.generate( "What's the weather in Tokyo and what's 15 * 23?", tools=[get_weather, calculate] # Pass tools directly ) # Method 2: Session with registered tools (recommended for conversations) session = BasicSession(llm, tools=[get_weather, calculate]) response = session.generate("What's the weather in Tokyo?") # Uses registered tools # Method 3: Session with per-call tools (overrides registered tools) response = session.generate("Calculate 15 * 23", tools=[calculate]) # Only use calculate ``` ### Enhanced Metadata The `@tool` decorator supports rich metadata that gets automatically injected into system prompts: ```python @tool( description="Search the database for records matching the query", tags=["database", "search", "query"], when_to_use="When the user asks for specific data from the database", examples=[ { "description": "Find all users named John", "arguments": { "query": "name=John", "table": "users" } } ] ) def search_database(query: str, table: str = "users") -> str: """Search the database for records matching the query.""" return f"Searching {table} for: {query}" ``` ### Universal Tool Support AbstractCore's tool system works across all providers through two mechanisms: 1. **Native Tool Support**: For providers with native tool APIs (OpenAI, Anthropic) 2. **Intelligent Prompting**: For providers without native tool support (Ollama, MLX, LMStudio) AbstractCore automatically: - Detects the model architecture (Qwen3, LLaMA3, etc.) - Formats tools with examples into system prompt - Parses tool calls from response using appropriate format - Executes tools locally and returns results ### Architecture-Aware Tool Call Detection | Architecture | Format | Example | |-------------|--------|---------| | **Qwen3** | `<|tool_call|>...JSON...` | `<|tool_call|>{"name": "get_weather", "arguments": {"city": "Paris"}}` | | **LLaMA3** | `...JSON...` | `{"name": "get_weather", "arguments": {"city": "Paris"}}` | | **OpenAI/Anthropic** | Native API tool calls | Structured JSON in API response | | **XML-based** | `...JSON...` | `{"name": "get_weather", "arguments": {"city": "Paris"}}` | ### Tool Chaining Tools can call other tools or return data that triggers additional tool calls: ```python @tool def get_user_location(user_id: str) -> str: """Get the location of a user.""" locations = {"user123": "Paris", "user456": "Tokyo"} return locations.get(user_id, "Unknown") @tool def get_weather(city: str) -> str: """Get weather for a city.""" return f"Weather in {city}: 72°F, sunny" # LLM can chain these tools: response = llm.generate( "What's the weather like for user123?", tools=[get_user_location, get_weather] ) # LLM will first call get_user_location, then get_weather with the result ``` ### Tool Syntax Rewriting **Cutting-edge solution to 2024-2025 tool call format fragmentation** AbstractCore provides real-time tool call format conversion, solving the current ecosystem fragmentation where different agent frameworks expect incompatible tool call formats. This addresses a critical compatibility challenge in the modern LLM agent landscape. #### The Format Fragmentation Problem (2024-2025) ```python # Different models/frameworks expect different formats: # Qwen3: <|tool_call|>{"name": "func", "args": {...}} # LLaMA3: {"name": "func", "args": {...}} # XML-based: {"name": "func", "args": {...}} # Claude Code: [[tool_use]]{"name": "func", "args": {...}}[[/tool_use]] # OpenAI/Anthropic: Native JSON API calls # AbstractCore automatically handles ALL formats ``` #### Dynamic Format Conversion ```python from abstractcore import create_llm from abstractcore.tools.tag_rewriter import ToolCallTagRewriter # Custom tag configuration for any framework llm = create_llm( "ollama", model="qwen3-coder:30b", tool_call_tags="function_call" # Converts to: ...JSON... ) # Multiple custom formats llm = create_llm( "mlx", model="qwen3-air-4bit", tool_call_tags=("[[tool_use]]", "[[/tool_use]]") # Start and end tags ) # Real-time format detection and conversion llm = create_llm( "lmstudio", model="llama-3.2-8b-instruct", auto_detect_format=True, # Automatically detect model's preferred format fallback_format="xml" # Fallback if detection fails ) ``` #### Predefined Agent Framework Configurations ```python from abstractcore.tools.tag_rewriter import create_tag_rewriter # Pre-configured for popular agent frameworks formats = { "codex": "qwen3", # <|tool_call|>...JSON... "crush": "llama3", # ...JSON... "gemini": "xml", # ...JSON... "claude_code": "claude", # [[tool_use]]...JSON...[[/tool_use]] "openai": "native", # Native OpenAI API format "anthropic": "native", # Native Anthropic API format } # Apply framework-specific format rewriter = create_tag_rewriter("codex") # For Codex CLI compatibility llm = create_llm("ollama", model="qwen3-coder:30b", tag_rewriter=rewriter) # Custom framework format custom_rewriter = create_tag_rewriter( start_tag="", end_tag="", json_wrapper=True ) ``` #### Streaming-Aware Rewriting Advanced streaming support with tool call detection across chunks: ```python # Streaming with incremental tool detection llm = create_llm( "ollama", model="qwen3-coder:30b", tool_call_tags="function_call", streaming_buffer_size=512, # Buffer size for partial tool calls detect_partial_tools=True # Detect tools split across chunks ) # Stream with real-time rewriting for chunk in llm.generate("Use the calculator tool", tools=[calculate], stream=True): # Tool calls are automatically detected and rewritten in real-time if chunk.tool_calls: print(f"Tool detected: {chunk.tool_calls[0].name}") print(chunk.content, end="", flush=True) ``` #### Advanced Format Management ```python from abstractcore.tools.tag_rewriter import ToolCallTagRewriter # Create sophisticated rewriter rewriter = ToolCallTagRewriter( input_format="qwen3", # Expected input format output_format="llama3", # Desired output format preserve_native=True, # Keep native tool calls unchanged validate_json=True, # Validate JSON structure repair_malformed=True, # Attempt to repair broken JSON streaming_compatible=True # Enable streaming support ) # Multi-format support in single session session = BasicSession(llm) session.set_tool_format("qwen3") # Use Qwen3 format for this conversation # Format switching mid-conversation session.switch_tool_format("xml") # Switch to XML format # Format compatibility checking compatible = rewriter.is_compatible("llama3", "qwen3") supported_formats = rewriter.supported_formats() ``` #### Production Integration Examples ```python # Integration with different agent CLIs # For Codex CLI users codex_llm = create_llm( "ollama", model="qwen3-coder:30b", tool_call_tags="tool_call", # Codex expects <|tool_call|> preserve_whitespace=False # Codex prefers compact JSON ) # For Crush CLI users crush_llm = create_llm( "lmstudio", model="llama-3.2-8b-instruct", tool_call_tags="function_call", # Crush expects json_indent=2 # Crush prefers formatted JSON ) # For Claude Code users claude_code_llm = create_llm( "anthropic", model="claude-3-5-haiku-latest", tool_call_tags=("[[tool_use]]", "[[/tool_use]]"), # Claude Code format include_reasoning=True # Include reasoning in tool calls ) ``` #### Benefits of Tool Syntax Rewriting - **Universal Compatibility**: Works with any agent framework without code changes - **Ecosystem Bridge**: Connects incompatible tool calling standards - **Future-Proof**: Easily add support for new formats as they emerge - **Performance Optimized**: Minimal overhead, streaming-compatible - **Zero Configuration**: Automatic format detection in most cases - **Developer Friendly**: Simple API for complex format transformations --- ## Core Features ### Session Management Comprehensive conversation management with memory, analytics, and auto-compaction: ```python from abstractcore import BasicSession, create_llm llm = create_llm("openai", model="gpt-4o-mini") session = BasicSession(llm, system_prompt="You are a helpful assistant.") # Basic conversation with memory response1 = session.generate("My name is Alice") response2 = session.generate("What's my name?") # Remembers context # Advanced analytics and insights summary = session.generate_summary() assessment = session.generate_assessment() facts = session.extract_facts() # Auto-compaction with SOTA chat history compression session.compact(target_tokens=8000) # Compresses while preserving context # Timeout management session.set_timeout(30) # 30 second default timeout session.set_tool_timeout(60) # 60 second tool-specific timeout recovery_timeout = session.get_recovery_timeout() # Session serialization with metadata session.save("conversation.json", summary=True, assessment=True, facts=True) session.save("conversation_archive.json", format="session-archive/v1") # Load with full context restoration loaded_session = BasicSession.load("conversation.json") loaded_session = BasicSession.from_archive("conversation_archive.json") # Session metadata and statistics print(f"Messages: {len(session.messages)}") print(f"Total tokens: {session.total_tokens}") print(f"Session age: {session.age_minutes} minutes") ``` #### Session Analytics Extract insights from conversation history: ```python # Generate conversation summary summary = session.generate_summary( style="executive", # executive, technical, conversational length="brief" # brief, standard, detailed ) # Assess conversation quality and outcomes assessment = session.generate_assessment( criteria=["clarity", "helpfulness", "accuracy"], include_score=True ) # Extract structured facts and entities facts = session.extract_facts( entity_types=["person", "organization", "date", "location"], confidence_threshold=0.8 ) # Custom analytics with focus areas insights = session.generate_insights( focus="technical_decisions", format="structured" ) ``` #### Auto-Compaction System Intelligent conversation history compression that preserves context: ```python # Automatic compaction when approaching token limits session = BasicSession(llm, auto_compact=True, max_tokens=32000) # Manual compaction with strategies session.compact( target_tokens=8000, strategy="semantic", # semantic, chronological, importance preserve_recent=10, # Keep last 10 messages uncompacted preserve_system=True # Always preserve system prompt ) # Compaction with custom criteria session.compact( target_tokens=6000, preserve_patterns=["@tool", "important:"], # Preserve messages with these patterns compression_ratio=0.7 # Target 70% size reduction ) ``` ### Token Management Unified token parameter vocabulary and budget management across all providers: ```python from abstractcore import create_llm from abstractcore.utils.token_utils import estimate_tokens, calculate_token_budget # Unified token parameters llm = create_llm( "openai", model="gpt-4o-mini", max_tokens=32000, # Context window (input + output) max_output_tokens=8000, # Maximum output tokens max_input_tokens=24000 # Maximum input tokens (auto-calculated if not set) ) # Token estimation and validation text = "Your input text here..." estimated = estimate_tokens(text, model="gpt-4o-mini") print(f"Estimated tokens: {estimated}") # Budget-based token management budget = calculate_token_budget( context_window=32000, output_tokens=8000, system_prompt="You are a helpful assistant.", reserve_ratio=0.1 # Reserve 10% for safety ) print(f"Available input tokens: {budget['available_input']}") print(f"Reserved tokens: {budget['reserved']}") print(f"System prompt tokens: {budget['system_tokens']}") ``` #### Provider-Specific Parameter Mapping AbstractCore automatically maps unified parameters to provider-specific formats: ```python # Unified parameters work across all providers providers = ["openai", "anthropic", "ollama", "lmstudio", "mlx", "huggingface"] for provider in providers: llm = create_llm( provider, model="default", max_tokens=16000, # Maps to provider-specific parameter max_output_tokens=4000, # Maps to max_tokens, num_predict, etc. temperature=0.7 ) # Same interface, different internal mappings: # OpenAI: max_tokens, max_completion_tokens # Anthropic: max_tokens # Ollama: num_ctx, num_predict # LMStudio: max_tokens, max_tokens_to_sample # MLX: max_tokens # HuggingFace: max_length, max_new_tokens ``` #### Token Budget Validation Automatic validation and warnings for token limits: ```python # Enable token validation llm = create_llm( "anthropic", model="claude-3-5-haiku-latest", max_tokens=200000, max_output_tokens=8000, validate_tokens=True, # Enable validation warn_threshold=0.9 # Warn at 90% capacity ) # Token usage monitoring response = llm.generate("Write a detailed analysis...") print(f"Input tokens: {response.usage.input_tokens}") print(f"Output tokens: {response.usage.output_tokens}") print(f"Total tokens: {response.usage.total_tokens}") print(f"Cost estimate: ${response.usage.cost_usd:.4f}") # Budget warnings if response.usage.input_ratio > 0.9: print("Warning: Approaching input token limit") if response.usage.output_ratio > 0.9: print("Warning: Approaching output token limit") ``` #### Advanced Token Management ```python from abstractcore.utils.token_utils import TokenManager # Create token manager for complex scenarios manager = TokenManager( context_window=128000, target_output=8000, safety_margin=1000 ) # Chunk management for large inputs large_text = "..." # Very large text chunks = manager.chunk_text( large_text, chunk_size=manager.max_input_tokens, overlap=200, preserve_sentences=True ) # Process chunks with budget tracking results = [] for i, chunk in enumerate(chunks): print(f"Processing chunk {i+1}/{len(chunks)}") # Validate chunk fits in budget if manager.validate_input(chunk): response = llm.generate(chunk) results.append(response.content) # Update budget tracking manager.update_usage(response.usage) else: print(f"Chunk {i+1} exceeds token budget, skipping") # Combine results final_result = manager.combine_results(results) ``` ### Embeddings SOTA embedding models for semantic search and RAG applications: ```python from abstractcore.embeddings import EmbeddingManager # HuggingFace (default) embedder = EmbeddingManager(model="sentence-transformers/all-MiniLM-L6-v2") # Ollama embedder = EmbeddingManager(model="granite-embedding:278m", provider="ollama") # LMStudio embedder = EmbeddingManager(model="text-embedding-all-minilm-l6-v2-embedding", provider="lmstudio") # Generate embeddings embedding = embedder.embed("Hello world") embeddings = embedder.embed_batch(["Hello", "World", "AI"]) # Similarity computation similarity = embedder.compute_similarity("Hello", "Hi there") ``` ### Event System Comprehensive event-driven architecture for monitoring, control, and observability across all operations: #### Event Types ```python from abstractcore.events import EventType # Core generation events EventType.GENERATION_STARTED # LLM generation begins EventType.GENERATION_COMPLETED # LLM generation finishes # Tool execution events EventType.TOOL_STARTED # Tool execution begins EventType.TOOL_COMPLETED # Tool execution finishes # Error and retry events EventType.ERROR # Any error occurred EventType.RETRY_ATTEMPTED # Retry attempt made EventType.RETRY_EXHAUSTED # All retries failed # Validation events EventType.VALIDATION_FAILED # Input/output validation failed # Session events EventType.SESSION_CREATED # New session created EventType.SESSION_CLEARED # Session history cleared # Compaction events EventType.COMPACTION_STARTED # Session compaction begins EventType.COMPACTION_COMPLETED # Session compaction finishes ``` #### Global Event Bus ```python from abstractcore.events import GlobalEventBus, on_global, emit_global # Register global event handlers def cost_monitor(event): if event.cost_usd and event.cost_usd > 0.10: print(f"High cost request: ${event.cost_usd}") def performance_monitor(event): if event.duration_ms and event.duration_ms > 10000: print(f"Slow request: {event.duration_ms}ms") # Register handlers on_global(EventType.GENERATION_COMPLETED, cost_monitor) on_global(EventType.GENERATION_COMPLETED, performance_monitor) # Access global event bus directly bus = GlobalEventBus() bus.on(EventType.TOOL_STARTED, tool_security_check) bus.off(EventType.TOOL_STARTED, tool_security_check) # Unregister bus.clear() # Clear all handlers # Emit custom events emit_global(EventType.ERROR, { "error_type": "custom_error", "message": "Custom error occurred", "metadata": {"component": "custom"} }) ``` #### Event Prevention and Control ```python def prevent_dangerous_tools(event): """Prevent execution of dangerous tools.""" for call in event.data.get('tool_calls', []): if call.name in ['delete_file', 'system_command', 'execute_shell']: event.prevent() # Stop execution immediately print(f"Blocked dangerous tool: {call.name}") def limit_tool_execution_time(event): """Prevent tools that might run too long.""" tool_name = event.data.get('tool_name') if tool_name in ['database_scan', 'file_search'] and not event.data.get('timeout'): event.prevent() print(f"Tool {tool_name} requires timeout parameter") # Register prevention handlers (execute before tool runs) on_global(EventType.TOOL_STARTED, prevent_dangerous_tools) on_global(EventType.TOOL_STARTED, limit_tool_execution_time) ``` #### Production Observability ```python # Comprehensive monitoring setup def setup_production_monitoring(): """Setup production-grade event monitoring.""" # Cost tracking def track_costs(event): if event.cost_usd: log_cost_metric(event.cost_usd, event.provider, event.model) # Performance monitoring def track_performance(event): metrics = { 'duration_ms': event.duration_ms, 'tokens_used': event.tokens_used, 'provider': event.provider, 'model': event.model } send_to_metrics_system(metrics) # Error tracking def track_errors(event): error_data = { 'error_type': event.error_type, 'message': event.message, 'provider': event.provider, 'stack_trace': event.stack_trace } send_to_error_tracking(error_data) # Tool usage analytics def track_tool_usage(event): tool_metrics = { 'tool_name': event.data.get('tool_name'), 'execution_time': event.duration_ms, 'success': event.data.get('success', True) } log_tool_analytics(tool_metrics) # Register all monitors on_global(EventType.GENERATION_COMPLETED, track_costs) on_global(EventType.GENERATION_COMPLETED, track_performance) on_global(EventType.ERROR, track_errors) on_global(EventType.TOOL_COMPLETED, track_tool_usage) setup_production_monitoring() ``` ### Memory Management For local providers, explicit memory management: ```python # Explicit memory management for local models llm = create_llm("ollama", model="large-model") response = llm.generate("Hello") llm.unload() # Free memory del llm ``` ## Built-in CLI Applications Three production-ready terminal tools - no Python code required: ### Common Use Cases ```bash # Generate executive summaries for business documents summarizer quarterly_report.pdf --style executive --length brief --output executive_summary.txt # Extract entities and relationships for research extractor research_paper.pdf --format json-ld --entity-types person,organization,technology --output entities.json # Evaluate document quality for different contexts judge technical_spec.md --criteria clarity,accuracy,completeness --context "technical documentation" # Process large documents with chunking summarizer large_document.pdf --chunk-size 15000 --style structured --verbose # Fast entity extraction for content processing extractor articles/ --mode fast --format json --entity-types person,organization --output results/ # Code review and quality assessment judge src/main.py --context "code review" --focus "error handling,documentation" --format plain ``` ### Alternative Methods ```bash # Method 1: Direct commands (recommended) summarizer document.txt --style executive extractor report.pdf --format triples judge essay.md --criteria soundness # Method 2: Via Python module python -m abstractcore.apps.summarizer document.txt --style executive python -m abstractcore.apps.extractor report.pdf --format triples python -m abstractcore.apps.judge essay.md --criteria soundness ``` ### Python API ```python from abstractcore.processing import BasicSummarizer, BasicExtractor, BasicJudge # Use programmatically summarizer = BasicSummarizer() summary = summarizer.summarize(text, style="executive", length="brief") extractor = BasicExtractor() kg = extractor.extract(text, output_format="jsonld") judge = BasicJudge() assessment = judge.evaluate(text, context="code review", focus="error handling") ``` --- ## HTTP Server Comprehensive OpenAI-compatible REST API with advanced routing and agent CLI integration: ### Quick Start ```bash # Start server with default configuration uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000 # Start with custom configuration uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000 --workers 4 --reload # Production deployment gunicorn abstractcore.server.app:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000 ``` ### Complete Endpoint Reference #### Core Endpoints ```bash # Chat completions (OpenAI-compatible) POST /v1/chat/completions # Text completions POST /v1/completions # Embeddings POST /v1/embeddings # OpenAI Responses API (100% Compatible) POST /v1/responses # Provider management GET /providers GET /providers/{provider_name} GET /providers/{provider_name}/models # Health and status GET /health GET /status GET /metrics ``` #### Provider Discovery API ```bash # Get all available providers with comprehensive metadata curl http://localhost:8000/providers # Get specific provider information curl http://localhost:8000/providers/openai # Get models for specific provider curl http://localhost:8000/providers/ollama/models # Provider status with error details curl http://localhost:8000/providers/status ``` ### OpenAI-Compatible Usage Drop-in replacement for OpenAI API with any provider: ```python import openai # Standard OpenAI client configuration client = openai.OpenAI( base_url="http://localhost:8000/v1", api_key="unused" # API key not required for local providers ) # Route to any provider using model format: provider/model response = client.chat.completions.create( model="ollama/qwen3-coder:30b", # Ollama provider messages=[{"role": "user", "content": "Hello!"}] ) # Alternative: Use anthropic provider response = client.chat.completions.create( model="anthropic/claude-3-5-haiku-latest", # Anthropic provider messages=[{"role": "user", "content": "Write code"}], max_tokens=8000, temperature=0.7 ) # Embeddings with any provider embeddings = client.embeddings.create( model="ollama/granite-embedding:278m", input=["Hello", "World"] ) ``` ### Multimodal Requests (Files, Images, Documents) AbstractCore server provides comprehensive support for multimodal requests through multiple compatible formats: #### 1. AbstractCore @filename Syntax Convenient syntax that works across all providers: ```python import openai client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="unused") # Simple file attachment response = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Analyze @report.pdf and @chart.png"}] ) # Works with any provider response = client.chat.completions.create( model="anthropic/claude-3-5-sonnet", messages=[{"role": "user", "content": "Summarize @document.docx"}] ) ``` #### 2. OpenAI Vision API Format (with NEW type="file" Support) Standard OpenAI multimodal format with enhanced file support: ```python # Image with URL response = client.chat.completions.create( model="ollama/qwen2.5vl:7b", messages=[{ "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, { "type": "image_url", "image_url": {"url": "https://example.com/image.jpg"} } ] }] ) # NEW: Explicit file type for documents (PDF, DOCX, XLSX, CSV, etc.) response = client.chat.completions.create( model="openai/gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Analyze this document"}, { "type": "file", "file_url": {"url": "https://example.com/report.pdf"} } ] }] ) # Mixed media (images, documents, data files) response = client.chat.completions.create( model="anthropic/claude-3.5-sonnet", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Compare this chart with the data"}, {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}, {"type": "file", "file_url": {"url": "https://example.com/data.csv"}} ] }] ) # Document with base64 data (PDF, DOCX, CSV, etc.) with open("report.pdf", "rb") as f: pdf_data = base64.b64encode(f.read()).decode() response = client.chat.completions.create( model="lmstudio/qwen/qwen3-next-80b", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Summarize this document"}, {"type": "file", "file_url": {"url": f"data:application/pdf;base64,{pdf_data}"}} ] }] ) # Image with base64 data import base64 with open("image.jpg", "rb") as f: image_data = base64.b64encode(f.read()).decode() response = client.chat.completions.create( model="openai/gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Describe this image"}, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"} } ] }] ) ``` #### 3. OpenAI File Format (Forward-Compatible) AbstractCore supports OpenAI's planned file format with simplified structure (consistent with image_url): ```python # HTTP/HTTPS URLs response = client.chat.completions.create( model="anthropic/claude-3-5-haiku-latest", messages=[{ "role": "user", "content": [ {"type": "text", "text": "What are the key findings in this report?"}, { "type": "file", "file_url": { "url": "https://example.com/documents/report.pdf" } } ] }] ) # Local file paths response = client.chat.completions.create( model="openai/gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Analyze this local spreadsheet"}, { "type": "file", "file_url": { "url": "/Users/username/documents/spreadsheet.xlsx" } } ] }] ) # Base64 data URLs import base64 with open("report.pdf", "rb") as f: file_data = base64.b64encode(f.read()).decode() response = client.chat.completions.create( model="ollama/qwen3:4b", messages=[{ "role": "user", "content": [ {"type": "text", "text": "What's in this document?"}, { "type": "file", "file_url": { "url": f"data:application/pdf;base64,{file_data}" } } ] }] ) ``` **Perfect Consistency with image_url:** ```python # Images and files use identical URL pattern content = [ {"type": "text", "text": "Compare these"}, { "type": "image_url", "image_url": {"url": "https://example.com/chart.png"} }, { "type": "file", "file_url": {"url": "https://example.com/data.xlsx"} } ] ``` #### Supported File Types - **Images**: PNG, JPEG, GIF, WEBP, BMP, TIFF - **Documents**: PDF, DOCX, XLSX, PPTX - **Data/Text**: CSV, TSV, TXT, MD, JSON, XML - **Size Limits**: 10MB per file, 32MB total per request ### OpenAI Responses API (/v1/responses) AbstractCore 2.5.0 introduces 100% OpenAI-compatible `/v1/responses` endpoint with native `input_file` support: #### Why Use /v1/responses? - **OpenAI Compatible**: Drop-in replacement for OpenAI's Responses API - **Native File Support**: `input_file` type designed specifically for document attachments - **Cleaner API**: Explicit separation between text (`input_text`) and files (`input_file`) - **Backward Compatible**: Existing `messages` format still works alongside new `input` format - **Optional Streaming**: Streaming opt-in with `"stream": true` (defaults to `false`) #### OpenAI Responses API Format ```python import requests # Standard OpenAI Responses API format response = requests.post( "http://localhost:8000/v1/responses", json={ "model": "gpt-4o", "input": [ { "role": "user", "content": [ {"type": "input_text", "text": "Analyze this document"}, {"type": "input_file", "file_url": "https://example.com/report.pdf"} ] } ], "stream": False # Optional streaming (defaults to False) } ) # Works with any provider response = requests.post( "http://localhost:8000/v1/responses", json={ "model": "lmstudio/qwen/qwen3-next-80b", "input": [ { "role": "user", "content": [ { "type": "input_text", "text": "Analyze the letter and provide a summary of the key points." }, { "type": "input_file", "file_url": "https://www.berkshirehathaway.com/letters/2024ltr.pdf" } ] } ] } ) # Streaming mode (opt-in) response = requests.post( "http://localhost:8000/v1/responses", json={ "model": "anthropic/claude-3.5-sonnet", "input": [ { "role": "user", "content": [ {"type": "input_text", "text": "Summarize this report"}, {"type": "input_file", "file_url": "https://example.com/report.pdf"} ] } ], "stream": True # Enable real-time streaming }, stream=True # Important for streaming responses ) for line in response.iter_lines(): if line: print(line.decode()) ``` #### Legacy Format (Still Supported) The endpoint automatically detects and supports the legacy `messages` format: ```python # Legacy format (backward compatible) response = requests.post( "http://localhost:8000/v1/responses", json={ "model": "openai/gpt-4", "messages": [ {"role": "user", "content": "Tell me a story"} ], "stream": False } ) ``` #### Automatic Format Detection The server automatically detects which format you're using: - **OpenAI Format**: Presence of `input` field → converts to internal format - **Legacy Format**: Presence of `messages` field → processes directly - **Error**: Missing both `input` and `messages` → returns 400 error with clear message #### Supported Media Types in input_file All file types supported via URL, local path, or base64: ```python # PDF from URL {"type": "input_file", "file_url": "https://example.com/report.pdf"} # Excel from local path {"type": "input_file", "file_url": "/path/to/spreadsheet.xlsx"} # CSV from base64 {"type": "input_file", "file_url": "data:text/csv;base64,RGF0ZSxQcm9kdW..."} # PowerPoint from URL {"type": "input_file", "file_url": "https://example.com/presentation.pptx"} ``` #### Complete Example with Multiple Files ```python response = requests.post( "http://localhost:8000/v1/responses", json={ "model": "openai/gpt-4o", "input": [ { "role": "user", "content": [ { "type": "input_text", "text": "Compare the financial data in the spreadsheet with the PDF report and the chart image" }, {"type": "input_file", "file_url": "https://example.com/financial_data.xlsx"}, {"type": "input_file", "file_url": "https://example.com/annual_report.pdf"}, {"type": "input_file", "file_url": "https://example.com/quarterly_chart.png"} ] } ], "max_tokens": 2000, "temperature": 0.7, "stream": False } ) print(response.json()["choices"][0]["message"]["content"]) #### Mixed Content Example Combine multiple file types in a single request: ```python # Multiple files with different formats response = client.chat.completions.create( model="openai/gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Compare the chart with the spreadsheet data and summarize the document"}, { "type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgoA..."} }, { "type": "file", "file_url": { "url": "https://example.com/data/spreadsheet.xlsx" } }, { "type": "file", "file_url": { "url": "/Users/username/documents/summary.pdf" } } ] }] ) ``` ### Advanced Server Features #### Request ID Tracking and Structured Logging ```python import requests import json # All requests include tracking response = requests.post( "http://localhost:8000/v1/chat/completions", json={ "model": "ollama/qwen3-coder:30b", "messages": [{"role": "user", "content": "Hello"}], "stream": False }, headers={"X-Request-ID": "my-custom-id"} # Optional custom ID ) # Response includes request tracking data = response.json() print(f"Request ID: {data.get('request_id')}") print(f"Provider: {data.get('provider')}") print(f"Model: {data.get('model')}") print(f"Duration: {data.get('duration_ms')}ms") ``` #### Agent CLI Integration Through Syntax Conversion The server automatically handles tool call format conversion for different agent CLIs: ```python # Request with tool calls for Codex CLI response = requests.post( "http://localhost:8000/v1/chat/completions", json={ "model": "ollama/qwen3-coder:30b", "messages": [{"role": "user", "content": "Use the calculator"}], "tools": [ { "type": "function", "function": { "name": "calculate", "description": "Perform calculations", "parameters": { "type": "object", "properties": { "expression": {"type": "string"} } } } } ], "tool_format": "codex" # Server converts to Qwen3 format } ) # Tool calls are automatically converted to requested format ``` #### Streaming Support ```python import requests # Streaming chat completions response = requests.post( "http://localhost:8000/v1/chat/completions", json={ "model": "ollama/qwen3-coder:30b", "messages": [{"role": "user", "content": "Write a story"}], "stream": True }, stream=True ) # Process streaming response for line in response.iter_lines(): if line: data = json.loads(line.decode('utf-8').replace('data: ', '')) if data.get('choices'): content = data['choices'][0]['delta'].get('content', '') print(content, end='', flush=True) ``` ### Server Configuration #### Environment Variables ```bash # Server configuration export ABSTRACTCORE_HOST=0.0.0.0 export ABSTRACTCORE_PORT=8000 export ABSTRACTCORE_WORKERS=4 # Provider configuration export OPENAI_API_KEY=your_openai_key export ANTHROPIC_API_KEY=your_anthropic_key export HUGGINGFACE_API_TOKEN=your_hf_token # Server features export ABSTRACTCORE_ENABLE_CORS=true export ABSTRACTCORE_MAX_REQUEST_SIZE=10485760 # 10MB export ABSTRACTCORE_REQUEST_TIMEOUT=300 # 5 minutes export ABSTRACTCORE_ENABLE_METRICS=true ``` #### Custom Configuration ```python # custom_server.py from abstractcore.server.app import app from abstractcore.server.config import ServerConfig # Custom server configuration config = ServerConfig( host="0.0.0.0", port=8000, enable_cors=True, max_request_size=10 * 1024 * 1024, # 10MB request_timeout=300, # 5 minutes enable_metrics=True, log_level="INFO", provider_timeout=60, # Provider timeout default_provider="ollama", # Default provider enable_tool_conversion=True # Enable tool format conversion ) if __name__ == "__main__": import uvicorn uvicorn.run(app, host=config.host, port=config.port) ``` ### Production Deployment #### Docker Deployment ```dockerfile # Dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8000 CMD ["uvicorn", "abstractcore.server.app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"] ``` #### Kubernetes Deployment ```yaml # k8s-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: abstractcore-server spec: replicas: 3 selector: matchLabels: app: abstractcore-server template: metadata: labels: app: abstractcore-server spec: containers: - name: abstractcore-server image: abstractcore:latest ports: - containerPort: 8000 env: - name: ABSTRACTCORE_WORKERS value: "4" - name: ABSTRACTCORE_ENABLE_METRICS value: "true" --- apiVersion: v1 kind: Service metadata: name: abstractcore-service spec: selector: app: abstractcore-server ports: - port: 80 targetPort: 8000 type: LoadBalancer ``` ### Monitoring and Observability ```python # Access server metrics response = requests.get("http://localhost:8000/metrics") metrics = response.json() print(f"Total requests: {metrics['total_requests']}") print(f"Active connections: {metrics['active_connections']}") print(f"Average response time: {metrics['avg_response_time_ms']}ms") print(f"Provider status: {metrics['providers']}") ``` ### Benefits - **Drop-in OpenAI Replacement**: Use any provider through familiar OpenAI API - **Agent CLI Compatible**: Automatic tool call format conversion - **Production Ready**: Request tracking, metrics, health checks - **Multi-Provider Routing**: Route to any provider using model prefix - **Comprehensive Logging**: Structured logs with request tracing - **Scalable**: Supports horizontal scaling with load balancers ## Production Resilience Enterprise-grade error handling, retries, and circuit breaker patterns for robust applications: ### RetryManager and Circuit Breaker ```python from abstractcore import create_llm from abstractcore.resilience import RetryManager, CircuitBreaker # Configure retry strategy retry_manager = RetryManager( max_attempts=3, # Maximum retry attempts backoff_strategy="exponential", # exponential, linear, fixed base_delay=1.0, # Base delay in seconds max_delay=30.0, # Maximum delay between retries jitter=True, # Add randomness to prevent thundering herd retry_exceptions=[ # Which exceptions to retry "ConnectionError", "TimeoutError", "RateLimitError" ] ) # Configure circuit breaker circuit_breaker = CircuitBreaker( failure_threshold=5, # Failures before opening circuit success_threshold=3, # Successes needed to close circuit timeout=60, # Seconds to wait before trying again monitor_window=300 # Time window for failure counting ) # Create LLM with resilience features llm = create_llm( "openai", model="gpt-4o-mini", retry_manager=retry_manager, circuit_breaker=circuit_breaker ) # Resilient generation with automatic retries response = llm.generate("What is the capital of France?") # Automatically retries on transient failures ``` ### Recovery Timeout Management ```python from abstractcore.resilience import TimeoutManager # Configure comprehensive timeout strategy timeout_manager = TimeoutManager( connection_timeout=10, # Connection establishment timeout read_timeout=60, # Response read timeout total_timeout=120, # Total request timeout recovery_timeout=30, # Time to wait before retry after failure adaptive_timeout=True, # Adjust timeouts based on response patterns timeout_multiplier=1.5 # Multiplier for progressive timeout increases ) # Apply to LLM llm = create_llm( "anthropic", model="claude-3-5-haiku-latest", timeout_manager=timeout_manager ) # Session-level timeout management from abstractcore import BasicSession session = BasicSession(llm) session.set_timeout(45) # Default timeout for all operations session.set_tool_timeout(120) # Specific timeout for tool execution session.set_recovery_timeout(60) # Recovery timeout after failures # Get current timeout settings timeouts = session.get_timeout_config() print(f"Generation timeout: {timeouts['generation']}") print(f"Tool timeout: {timeouts['tool']}") print(f"Recovery timeout: {timeouts['recovery']}") ``` ### Automatic Backoff Strategies ```python from abstractcore.resilience import BackoffStrategy # Exponential backoff with jitter exponential_backoff = BackoffStrategy( strategy="exponential", base_delay=1.0, max_delay=60.0, multiplier=2.0, jitter=True, # Adds ±25% randomness jitter_range=0.25 ) # Linear backoff linear_backoff = BackoffStrategy( strategy="linear", base_delay=2.0, increment=1.5, max_delay=30.0 ) # Custom backoff function def custom_backoff(attempt, context): """Custom backoff based on error type and provider load.""" if context.error_type == "RateLimitError": return min(60, 2 ** attempt) # Exponential for rate limits elif context.provider_load > 0.8: return 5 + attempt * 2 # Linear increase when provider busy else: return 1 # Fast retry for other errors custom_backoff_strategy = BackoffStrategy( strategy="custom", custom_function=custom_backoff ) # Apply backoff strategy llm = create_llm( "ollama", model="qwen3-coder:30b", backoff_strategy=exponential_backoff ) ``` ### Exception Hierarchy and Error Handling ```python from abstractcore.exceptions import ( AbstractCoreError, # Base exception ProviderAPIError, # Provider-specific API errors ModelNotFoundError, # Model not available AuthenticationError, # Invalid credentials RateLimitError, # Rate limit exceeded TimeoutError, # Request timeout ValidationError, # Input validation failed TokenLimitError, # Token limit exceeded CircuitBreakerError # Circuit breaker open ) try: response = llm.generate("Complex request") except RateLimitError as e: print(f"Rate limited: {e.retry_after} seconds") print(f"Provider: {e.provider}") print(f"Current usage: {e.current_usage}") except TokenLimitError as e: print(f"Token limit exceeded: {e.token_count}/{e.token_limit}") print(f"Suggestion: {e.suggestion}") except CircuitBreakerError as e: print(f"Service unavailable: {e.estimated_recovery_time}") print(f"Failure count: {e.failure_count}") except ProviderAPIError as e: print(f"Provider error: {e.provider_message}") print(f"Error code: {e.error_code}") print(f"Retryable: {e.is_retryable}") except AbstractCoreError as e: print(f"AbstractCore error: {e}") print(f"Context: {e.context}") ``` ### Health Checks and Monitoring ```python from abstractcore.resilience import HealthChecker # Configure health monitoring health_checker = HealthChecker( check_interval=30, # Health check every 30 seconds failure_threshold=3, # Mark unhealthy after 3 failures success_threshold=2, # Mark healthy after 2 successes check_timeout=10, # Timeout for health checks checks=[ "provider_connectivity", "model_availability", "token_limits", "response_latency" ] ) # Monitor provider health llm = create_llm( "openai", model="gpt-4o-mini", health_checker=health_checker ) # Check current health status health_status = llm.get_health_status() print(f"Provider healthy: {health_status['healthy']}") print(f"Last check: {health_status['last_check']}") print(f"Response time: {health_status['avg_response_time']}ms") print(f"Success rate: {health_status['success_rate']}%") # Automatic failover based on health if not health_status['healthy']: # Automatically switch to backup provider backup_llm = create_llm("anthropic", model="claude-3-5-haiku-latest") response = backup_llm.generate("Fallback request") ``` ### Production Configuration Patterns ```python # Production-ready configuration from abstractcore import create_llm from abstractcore.resilience import ProductionConfig # Pre-configured production settings production_config = ProductionConfig( # Retry configuration max_retries=3, retry_delay=1.0, retry_backoff=2.0, # Circuit breaker circuit_failure_threshold=5, circuit_timeout=60, # Timeouts connection_timeout=10, read_timeout=60, total_timeout=120, # Health monitoring health_check_interval=30, health_failure_threshold=3, # Error handling log_all_errors=True, alert_on_circuit_open=True, metrics_collection=True ) # Apply production configuration llm = create_llm( "openai", model="gpt-4o-mini", production_config=production_config ) # Session with production resilience session = BasicSession( llm, production_config=production_config, auto_recovery=True, # Automatically recover from failures persistence_enabled=True, # Persist conversation through failures checkpoint_interval=10 # Save checkpoint every 10 messages ) ``` ### Graceful Degradation ```python from abstractcore.resilience import GracefulDegradation # Configure fallback strategies degradation = GracefulDegradation( fallback_providers=["anthropic", "ollama"], # Fallback order reduce_quality=True, # Use faster/smaller models if needed cache_responses=True, # Cache for offline scenarios offline_mode=True # Enable offline capabilities ) # LLM with graceful degradation llm = create_llm( "openai", model="gpt-4o", graceful_degradation=degradation ) # Automatically handles provider failures try: response = llm.generate("Complex analysis task") # Will automatically: # 1. Retry with exponential backoff # 2. Switch to Anthropic if OpenAI fails # 3. Switch to Ollama if both cloud providers fail # 4. Use cached response if all providers fail except Exception as e: print(f"All fallback strategies exhausted: {e}") ``` ### Benefits - **Enterprise Reliability**: Circuit breakers and retry mechanisms prevent cascading failures - **Intelligent Backoff**: Prevents overwhelming providers with failed requests - **Automatic Recovery**: Self-healing systems with minimal manual intervention - **Comprehensive Monitoring**: Health checks and metrics for proactive issue detection - **Graceful Degradation**: Maintains functionality even during partial system failures - **Production Ready**: Battle-tested patterns for high-availability applications ## Debug Capabilities and Self-Healing JSON Robust debugging and error recovery features: ```bash # Debug raw LLM responses for troubleshooting judge document.txt --debug --provider lmstudio --model qwen/qwen3-next-80b # Automatic JSON self-repair handles truncated/malformed responses # Uses self_fixes.py for intelligent error recovery # Increased token limits prevent truncation: max_tokens=32k, max_output_tokens=8k # Focus areas for targeted evaluation (new --focus parameter) judge README.md --focus "architectural diagrams, technical comparisons" --debug ``` **Key Features:** - **Self-Healing JSON**: Automatically repairs truncated or malformed JSON responses - **Debug Mode**: `--debug` flag shows raw LLM responses for troubleshooting - **Focus Areas**: `--focus` parameter for targeted evaluation (replaces `--custom-criteria`) - **Increased Token Limits**: Default `max_tokens=32000`, `max_output_tokens=8000` prevent truncation - **Consistent CLI Syntax**: All apps use space syntax (`--param value`) for consistency ### Complete CLI Parameters Reference #### Extractor Parameters ```bash # Core parameters --focus FOCUS # Specific focus area (e.g., "technology", "business") --format {json-ld,triples,json,yaml} # Output format --entity-types TYPES # Comma-separated types (person,organization,location,etc.) --output OUTPUT # Output file path # Performance & Quality --mode {fast,balanced,thorough} # Extraction mode (balanced=default) --iterate N # Refinement iterations (default: 1) --similarity-threshold 0.0-1.0 # Entity deduplication threshold (default: 0.85) --no-embeddings # Disable semantic deduplication --minified # Compact JSON output # LLM Configuration --provider PROVIDER --model MODEL # Custom LLM provider/model --max-tokens 32000 # Context window (default: 32000) --max-output-tokens 8000 # Output tokens (default: 8000) --timeout 300 # HTTP timeout seconds (default: 300) --chunk-size 8000 # Chunk size for large files ``` #### Judge Parameters ```bash # Evaluation Configuration --criteria CRITERIA # Standard criteria (clarity,soundness,etc.) --focus FOCUS # Primary focus areas for evaluation --context CONTEXT # Evaluation context description --reference FILE_OR_TEXT # Reference content for comparison # Output & Debug --format {json,plain,yaml} # Output format (default: json) --debug # Show raw LLM responses --include-criteria # Include detailed criteria explanations --exclude-global # Skip global assessment for multiple files # LLM Configuration --temperature 0.1 # Evaluation consistency (default: 0.1) --max-tokens 32000 # Context window (default: 32000) --max-output-tokens 8000 # Output tokens (default: 8000) --timeout 300 # HTTP timeout seconds ``` #### Summarizer Parameters ```bash # Content Configuration --style {structured,narrative,objective,analytical,executive,conversational} --length {brief,standard,detailed,comprehensive} --focus FOCUS # Specific focus area --chunk-size 8000 # Chunk size for large files (max: 32000) # Output & Performance --output OUTPUT # Output file path --max-tokens 32000 # Context window (default: 32000) --max-output-tokens 8000 # Output tokens (default: 8000) --verbose # Show detailed progress ``` --- ## Documentation Index ### Essential Guides - **[Getting Started](docs/getting-started.md)**: 5-minute setup and first LLM call - **[API Reference](docs/api-reference.md)**: Complete Python API with examples - **[Centralized Configuration](docs/centralized-config.md)**: Global defaults, app preferences, and configuration management - **[Media Handling System](docs/media-handling-system.md)**: Universal file attachment and processing across all providers - **[Vision Capabilities](docs/vision-capabilities.md)**: Image analysis across 7 providers with automatic optimization - **[Tool Calling](docs/tool-calling.md)**: Universal @tool decorator system - **[Session Management](docs/session.md)**: Persistent conversations with analytics ### CLI Applications - **[Summarizer](docs/apps/basic-summarizer.md)**: Document summarization (`summarizer doc.pdf`) - **[Extractor](docs/apps/basic-extractor.md)**: Knowledge graph extraction (`extractor report.txt`) - **[Judge](docs/apps/basic-judge.md)**: Text evaluation (`judge essay.txt`) ### Advanced Topics - **[Server Guide](docs/server.md)**: OpenAI-compatible HTTP API - **[Embeddings](docs/embeddings.md)**: Semantic search and RAG - **[Architecture](docs/architecture.md)**: System design overview with media processing - **Error Handling**: Comprehensive exception hierarchy for robust applications - **Token Management**: Centralized token counting, cost estimation, and management utilities (`abstractcore.utils.token_utils`) ### Examples & Support - **[Examples](examples/)**: Progressive tutorials and real-world patterns - **[Troubleshooting](docs/troubleshooting.md)**: Common issues and solutions - **[GitHub Issues](https://github.com/lpalbou/AbstractCore/issues)**: Report bugs and get help --- ## Quick Links - **GitHub**: https://github.com/lpalbou/AbstractCore - **PyPI**: `pip install abstractcore[all]` - **License**: MIT | **Python**: 3.9+ | **Status**: Production Ready **For Developers and Architects:** - **Provider Agnostic**: Switch between OpenAI, Anthropic, Ollama, and others without code changes - **Production Ready**: Enterprise-grade error handling, retries, and resilience patterns - **Universal Tool Calling**: Enable function calling on any provider, even those without native support - **Media Processing**: Attach any file type (PDFs, images, Office docs) with simple `@filename` syntax - **Built-in Apps**: Skip boilerplate with ready-to-use CLI tools for common document processing tasks