Media Handling System
Attach any file type to your LLM requests. One simple API works across all providers with intelligent processing and graceful fallback.
Overview
AbstractCore provides a production-ready unified media handling system that enables seamless file attachment and processing across all LLM providers. The system automatically processes images, documents, and other media files (including audio/video inputs via explicit policies) using the same simple API, with intelligent provider-specific formatting and graceful fallback handling.
Key Features
- Universal API - Same
media=[]parameter works across all providers - CLI Integration - Simple
@filenamesyntax for instant file attachment - Intelligent Processing - Automatic file type detection with specialized processors
- Provider Adaptation - Automatic formatting for each provider's API requirements
- Robust Fallback - Graceful degradation when advanced processing fails
- Cross-Format Support - Images, PDFs, Office docs, audio/video, CSV/TSV all work (policy-driven for audio/video)
Quick Start
Python API
from abstractcore import create_llm
# Works with any provider - just change the provider name
llm = create_llm("openai", model="gpt-4o", api_key="your-key")
response = llm.generate(
"What's in this image and document?",
media=["photo.jpg", "report.pdf"]
)
# Same code works with any provider
llm = create_llm("anthropic", model="claude-haiku-4-5")
response = llm.generate(
"Analyze these materials",
media=["chart.png", "data.csv", "presentation.pptx"]
)
CLI Integration
Use the simple @filename syntax to attach any file type:
# PDF Analysis
python -m abstractcore.utils.cli --prompt "What is this document about? @report.pdf"
# Office Documents
python -m abstractcore.utils.cli --prompt "Summarize this presentation @slides.pptx"
python -m abstractcore.utils.cli --prompt "What data is in @spreadsheet.xlsx"
python -m abstractcore.utils.cli --prompt "Analyze this document @contract.docx"
# Data Files
python -m abstractcore.utils.cli --prompt "What patterns are in @sales_data.csv"
# Images
python -m abstractcore.utils.cli --prompt "What's in this image? @screenshot.png"
# Mixed Media
python -m abstractcore.utils.cli --prompt "Compare @chart.png and @data.csv and explain trends"
Supported File Types
Images (Vision Models)
- Formats: PNG, JPEG, GIF, WEBP, BMP, TIFF
- Features: Automatic optimization, resizing, format conversion, EXIF handling
- Max Size: Automatically resized for optimal model performance
Documents
- PDF: Full text extraction with PyMuPDF4LLM, preserves formatting and structure
- Word (DOCX): Full document analysis with structure preservation
- Excel (XLSX): Sheet-by-sheet extraction with data analysis
- PowerPoint (PPTX): Slide content extraction with comprehensive analysis
Data Files
- Text Files: TXT, MD with intelligent parsing
- Data: CSV, TSV with data analysis
- Structured: JSON with intelligent parsing
Audio (policy-driven)
- Formats: MP3, WAV, M4A, OGG, FLAC, AAC
- Default policy:
audio_policy="native_only"(fails unless the selected model supports native audio input) - Speech-to-text fallback:
audio_policy="speech_to_text"injects a transcript (typically requirespip install abstractvoice) - Auto:
audio_policy="auto"uses native audio when supported, otherwise STT when configured, otherwise errors - Reserved:
audio_policy="caption"is not configured in v0 (must error) - Transparency:
response.metadata.media_enrichment[]records what was injected and which backend was used - See: Audio & Voice (STT/TTS)
Video (policy-driven)
- Formats: MP4, MOV, MKV, WEBM, AVI, WMV
- Default policy:
video_policy="auto"(native when supported; otherwise samples frames and routes them through vision/image handling) - Frames fallback:
video_policy="frames_caption"always samples frames (portable across providers) - Strict:
video_policy="native_only"fails unless the selected model supports native video input - Budgets: frame count + downscale are explicit and logged (
video_max_frames,video_max_frame_side) - Note: frame extraction may require
ffmpeg/ffprobeon your PATH
How It Works
The media system uses a sophisticated multi-layer architecture:
- File Attachment Processing - CLI
@filenamesyntax and Pythonmedia=[]parameter - Intelligent Processing - AutoMediaHandler selects appropriate processors (Image, PDF, Office, Text, Audio, Video)
- Provider Formatting - Same content formatted differently for each provider's API
- Graceful Fallback - Multi-level fallback ensures users always get meaningful results
Provider-Specific Formatting Example
AbstractCore automatically formats the same content differently for each provider:
# OpenAI Format (JSON)
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze these files"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0..."}},
{"type": "text", "text": "PDF Content: # Report Title\n\nExecutive Summary..."}
]
}
# Anthropic Format (Messages API)
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze these files"},
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "iVBORw0..."}},
{"type": "text", "text": "PDF Content: # Report Title\n\nExecutive Summary..."}
]
}
Text-only models and vision fallback
For vision-capable models (for example gpt-4o), AbstractCore sends native multimodal message blocks.
For text-only models, AbstractCore can optionally run a configured vision backend to produce short grounded observations and inject them as text:
- Vision fallback (recommended): route images through your configured vision model and inject a short description
- No fallback configured: the model will receive a minimal placeholder marker, and
media_enrichmentwill record that enrichment was skipped
In both cases, you can inspect response.metadata.media_enrichment for transparency about what was injected or skipped.
Media with conversation history (messages)
If you're supplying OpenAI-style messages=[...], keep prompt="" to avoid duplicating the final user turn.
AbstractCore will attach media to the most recent user message.
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate(
prompt="",
messages=[{"role": "user", "content": "What's in this image?"}],
media=["photo.jpg"],
)
print(resp.content)
Audio inputs (speech-to-text)
Audio is handled via an explicit policy to avoid silently changing semantics. For speech audio, use
audio_policy="speech_to_text" (requires pip install abstractvoice).
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate(
"Summarize this call.",
media=["./call.wav"],
audio_policy="speech_to_text",
)
print(resp.content)
See Audio & Voice (STT/TTS) for configuration and server endpoints.
Token Estimation & No Truncation Policy
AbstractCore processors do not silently truncate content. When available, token estimates are added to media metadata so you can make an explicit choice (summarize, chunk, error) at the application layer.
# Example (metadata shape varies by processor)
from abstractcore.media.processors import TextProcessor
processor = TextProcessor()
result = processor.process_file("data.csv")
media_content = result.media_content
print(media_content.metadata.get("estimated_tokens"))
print(media_content.metadata.get("content_length"))
For large files that exceed model context limits, use a summarizer workflow or implement custom chunking.
Common Use Cases
Document Analysis
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o")
# Analyze PDF
response = llm.generate(
"Summarize the key findings in this research paper",
media=["research_paper.pdf"]
)
# Extract data from Excel
response = llm.generate(
"What are the top 5 sales regions by revenue?",
media=["sales_report.xlsx"]
)
# Analyze PowerPoint
response = llm.generate(
"List the main talking points from this presentation",
media=["quarterly_review.pptx"]
)
Multi-File Analysis
# Compare multiple files
response = llm.generate(
"Compare the financial data across these three reports",
media=["q1_report.pdf", "q2_report.pdf", "q3_report.pdf"]
)
# Mixed media types
response = llm.generate(
"Verify that the chart matches the data in the spreadsheet",
media=["sales_chart.png", "sales_data.csv"]
)
Image Analysis with Documents
# Combine images and documents
response = llm.generate(
"Compare the architectural designs with the specifications",
media=["design1.jpg", "design2.jpg", "specifications.pdf"]
)
HTTP Server Support
The media handling system is fully integrated with the OpenAI-compatible HTTP server:
Using @filename Syntax
import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Analyze @report.pdf and @chart.png"}]
)
Using OpenAI Responses API Format
import requests
response = requests.post(
"http://localhost:8000/v1/responses",
json={
"model": "gpt-4o",
"input": [
{
"role": "user",
"content": [
{"type": "input_text", "text": "Analyze this document"},
{"type": "input_file", "file_url": "https://example.com/report.pdf"}
]
}
]
}
)
Error Handling and Fallback
AbstractCore provides robust error handling with graceful degradation:
- Format Detection Failure - Falls back to basic text extraction
- Processing Errors - Returns partial content with error indication
- Unsupported Files - Clear error messages with supported format list
- Size Limits - Automatic chunking for large documents
Best Practices
- File Size - Keep individual files under 10MB for optimal performance
- Image Quality - Use high-quality images but let AbstractCore handle optimization
- Multiple Files - Limit to 5-10 files per request to avoid token limits
- File Types - Stick to supported formats for reliable processing
- Clear Prompts - Specify what you want to extract or analyze from the files
Installation
To use media handling features, install the media extras:
# Install with media support
pip install abstractcore
pip install "abstractcore[media]"
# Combine extras (zsh: keep quotes)
pip install "abstractcore[openai,media]"