Vision Compression User Guide

⚠️ EXPERIMENTAL FEATURE Vision compression and glyph compression are currently in experimental status. The API and behavior may change in future releases. Vision Model Requirement: These features ONLY work with vision-capable models (e.g., gpt-4o, claude-3-5-sonnet, llama3.2-vision, gemini-1.5-pro). Attempting to use compression with non-vision models will raise UnsupportedFeatureError.

Table of Contents

Overview Table of Contents Quick Start Compression Methods Provider Optimization Advanced Usage Analytics and Monitoring Troubleshooting API Reference Best Practices Limitations

Overview

AbstractCore's Vision Compression system transforms long text documents into visual representations for vision-capable models. This can reduce token usage for long inputs, but compression ratios and quality vary significantly by content, model, and configuration.

Table of Contents

  1. Quick Start
  2. Compression Methods
  3. Provider Optimization
  4. Advanced Usage
  5. Analytics and Monitoring
  6. Troubleshooting
  7. API Reference

Quick Start

Installation

# Glyph compression (Pillow renderer)
pip install "abstractcore[compression]"

Optional (experimental): Direct PDF→image conversion requires pdf2image and its system dependencies (Poppler).

Basic Glyph Compression

The simplest way to compress text using vision-based compression:

from abstractcore.compression import GlyphProcessor
from abstractcore.compression.config import GlyphConfig

# Initialize processor
config = GlyphConfig()
config.enabled = True
processor = GlyphProcessor(config=config)

# Compress text
text = "Your long document text here..."
compressed = processor.process_text(
    text,
    provider="openai",
    model="gpt-4o"
)

# Use with LLM
from abstractcore import create_llm

llm = create_llm("openai", model="gpt-4o")
response = llm.generate(
    "Summarize this document",
    media=compressed
)

What to expect: - Compression ratio and quality depend heavily on content, model OCR behavior, and rendering settings. - Treat this as a tuning problem: higher DPI/font sizes typically improve fidelity but increase image tokens.

PDF Compression

For PDF documents, you can use the (experimental) DirectPDFProcessor:

from abstractcore.media.processors.direct_pdf_processor import DirectPDFProcessor

# Requires: `pip install pdf2image` (+ Poppler installed on your system)
processor = DirectPDFProcessor(pages_per_image=2, dpi=150)
result = processor.process_file("document.pdf")
if not result.success:
    raise RuntimeError(result.error_message)

# MediaContent (base64 PNG) you can pass to a vision-capable model:
media_image = result.media_content

Compression Methods

Use Case: General document compression with reliable quality

from abstractcore.compression import GlyphProcessor

processor = GlyphProcessor()
compressed = processor.process_text(
    text,
    provider="openai",
    model="gpt-4o",
    user_preference="auto"  # or "always" to force compression
)

Characteristics: - Compression/quality tradeoffs vary by content, model, and rendering settings - No external infrastructure beyond a vision-capable model - Latency depends on rendering + model inference

2. Optimized Glyph Compression

Use Case: Provider-specific optimization for better compression

from abstractcore.compression.optimizer import CompressionOptimizer
from abstractcore.compression import GlyphProcessor

# Get optimized configuration
optimizer = CompressionOptimizer()
config = optimizer.get_optimized_config(
    provider="openai",
    model="gpt-4o",
    aggressive=False  # Set True for more compression
)

# Use optimized config
processor = GlyphProcessor()
compressed = processor.process_text(text, provider="openai", model="gpt-4o")

Characteristics: - More aggressive provider-specific rendering defaults (when configured) - Compression/quality tradeoffs vary by content, model, and rendering settings

3. Hybrid Compression (Experimental)

Use Case: Maximum compression for large documents

from abstractcore.compression.vision_compressor import HybridCompressionPipeline
from abstractcore import create_llm

# Initialize hybrid pipeline
pipeline = HybridCompressionPipeline(
    vision_provider="ollama",
    vision_model="llama3.2-vision"
)

# Compress with target ratio
result = pipeline.compress(
    text,
    target_ratio=20.0,  # Target 20x compression
    min_quality=0.90    # Minimum 90% quality
)

print(f"Achieved: {result['total_compression_ratio']:.1f}x compression")
print(f"Quality: {result['total_quality_score']:.1%}")

# Use the compressed images with an LLM
llm = create_llm("ollama", model="llama3.2-vision")
response = llm.generate(
    "Summarize this document",
    media=result['media']  # Access the compressed images via result['media']
)

Characteristics: - Compression/quality tradeoffs vary by mode and model - Requires a vision-capable model - Latency depends on rendering + model inference

Provider Optimization

Pre-configured Profiles

AbstractCore includes provider/model rendering profiles as starting points. They are heuristics (not guarantees) and may need tuning for your content and target vision model.

Examples: - OpenAI: gpt-4o, gpt-4o-mini - Anthropic: claude-3-5-sonnet, claude-haiku-4-5 - Ollama: llama3.2-vision

Using Provider Profiles

from abstractcore.compression.optimizer import create_optimized_config
from abstractcore.compression import GlyphProcessor

# Automatic provider optimization
config = create_optimized_config("openai", "gpt-4o")
processor = GlyphProcessor()

# Process with optimized settings
compressed = processor.process_text(
    text,
    provider="openai",
    model="gpt-4o"
)

Custom Profiles

Create custom optimization profiles:

from abstractcore.compression.optimizer import OptimizationProfile

custom_profile = OptimizationProfile(
    provider="custom",
    model="custom-model",
    dpi=72,              # Lower = more compression
    font_size=6,         # Smaller = more compression
    line_height=7,       # Tighter = more compression
    columns=6,           # More = more compression
    margin_x=2,
    margin_y=2,
    target_compression=5.0,
    quality_threshold=0.85,
    notes="Ultra-aggressive compression"
)

# Convert to rendering config
config = custom_profile.to_rendering_config()

Advanced Usage

Adaptive Compression

Automatically select compression level based on document characteristics:

from abstractcore.compression.vision_compressor import VisionCompressor

compressor = VisionCompressor()

# Adaptive compression to meet targets
result = compressor.adaptive_compress(
    glyph_images=compressed_images,
    original_tokens=25000,
    target_ratio=20.0,   # Aim for 20x
    min_quality=0.85     # But maintain 85% quality
)

print(f"Selected mode: {result.metadata['mode']}")
print(f"Achieved: {result.compression_ratio:.1f}x at {result.quality_score:.1%} quality")

Quality Control

Configure quality thresholds and validation:

from abstractcore.compression.config import GlyphConfig

config = GlyphConfig()
config.quality_threshold = 0.95  # Require 95% quality
config.min_token_threshold = 1000  # Only compress if >1000 tokens
config.target_compression_ratio = 4.0  # Target 4x compression

processor = GlyphProcessor(config=config)

# Quality validation happens automatically
try:
    compressed = processor.process_text(text, provider="openai", model="gpt-4o")
except CompressionQualityError as e:
    print(f"Quality too low: {e.quality_score:.1%}")
    # Fall back to uncompressed text

Caching

Enable caching for repeated compressions:

from abstractcore.compression.cache import CompressionCache

# Cache is enabled by default in GlyphProcessor
processor = GlyphProcessor()

# First compression (slow)
compressed1 = processor.process_text(text, provider="openai", model="gpt-4o")

# Second compression (cached, instant)
compressed2 = processor.process_text(text, provider="openai", model="gpt-4o")

# Check cache statistics
stats = processor.get_compression_stats()
print(f"Cache hits: {stats['cache_stats']['hits']}")

Analytics and Monitoring

Track Compression Performance

from abstractcore.compression.analytics import get_analytics

analytics = get_analytics()

# Record compression operation
analytics.record_compression(
    provider="openai",
    model="gpt-4o",
    original_tokens=25000,
    compressed_tokens=7500,
    quality_score=0.92,
    processing_time=1.5,
    method="glyph"
)

# Get provider statistics
stats = analytics.get_provider_stats("openai")
print(f"Average compression: {stats['avg_compression_ratio']:.1f}x")
print(f"Average quality: {stats['avg_quality_score']:.1%}")

# Generate report
report = analytics.generate_report()
print(report)
# Get compression trends
trends = analytics.get_trends(hours=24)
print(f"Compression trend: {trends['ratio_trend']}")
print(f"Quality trend: {trends['quality_trend']}")

# Get optimization suggestions
suggestions = analytics.get_optimization_suggestions()
for suggestion in suggestions:
    print(f"- {suggestion}")

Benchmark Configurations

from abstractcore.compression.optimizer import CompressionOptimizer

optimizer = CompressionOptimizer()
test_text = "Sample text for benchmarking..."

# Benchmark a profile
profile = optimizer.profiles["openai/gpt-4o"]
results = optimizer.benchmark_profile(profile, test_text)

print(f"Compression: {results['compression_ratio']:.1f}x")
print(f"Quality: {results['quality_score']:.1%}")
print(f"Time: {results['processing_time']:.2f}s")
print(f"Meets target: {results['meets_target']}")

Troubleshooting

Common Issues and Solutions

1. Low Compression Ratio

Problem: Getting less than 2x compression

Solutions:

# Use aggressive mode
config = create_optimized_config(provider, model, aggressive=True)

# Lower quality threshold
config.quality_threshold = 0.85

# Increase columns and reduce font
config.columns = 6
config.font_size = 6

2. Quality Too Low

Problem: Text becomes unreadable or quality score <90%

Solutions:

# Use conservative settings
config.dpi = 96  # Higher DPI
config.font_size = 9  # Larger font
config.columns = 3  # Fewer columns

# Force higher quality threshold
config.quality_threshold = 0.95

3. Processing Too Slow

Problem: Compression takes >5 seconds

Solutions:

# Enable caching
processor = GlyphProcessor()  # Cache enabled by default

# Reduce image count
config.pages_per_image = 3  # More pages per image

# Use simpler renderer
config.auto_crop = False  # Skip auto-cropping

4. UnsupportedFeatureError with glyph_compression

Problem: UnsupportedFeatureError: Glyph compression requires a vision-capable model

Cause: Attempting to use glyph_compression="always" with a non-vision model

Solution:

from abstractcore import create_llm

# WRONG: Non-vision model with forced compression
llm = create_llm("openai", model="gpt-4")  # No vision support
response = llm.generate(
    "Summarize",
    media=["doc.txt"],
    glyph_compression="always"  # Raises UnsupportedFeatureError!
)

# RIGHT: Use a vision-capable model
llm = create_llm("openai", model="gpt-4o")  # Has vision support ✓
response = llm.generate(
    "Summarize",
    media=["doc.txt"],
    glyph_compression="always"  # Works!
)

# ALTERNATIVE: Use auto mode (graceful fallback)
llm = create_llm("openai", model="gpt-4")
response = llm.generate(
    "Summarize",
    media=["doc.txt"]
    # glyph_compression="auto" is default
    # Logs warning and falls back to text processing
)

Vision-Capable Models: - OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4-vision-preview - Anthropic: claude-3-5-sonnet, claude-3-opus, claude-3-sonnet, claude-3-haiku - Ollama: llama3.2-vision, llava, bakllava, moondream - Google: gemini-1.5-pro, gemini-1.5-flash

5. Provider Doesn't Support Vision

Problem: Provider rejects compressed images

Solution:

# Check provider capabilities first
from abstractcore.media.capabilities import get_model_capabilities

capabilities = get_model_capabilities(provider, model)
if capabilities.get('vision_support'):
    # Safe to use compression
    compressed = processor.process_text(text, provider, model)
else:
    # Use text directly
    response = llm.generate(text)

API Reference

GlyphProcessor

class GlyphProcessor:
    def __init__(self, config: Optional[GlyphConfig] = None)

    def process_text(
        self,
        content: str,
        provider: str = None,
        model: str = None,
        user_preference: str = "auto"
    ) -> List[MediaContent]

    def can_process(
        self,
        content: str,
        provider: str,
        model: str
    ) -> bool

    def get_compression_stats(self) -> Dict[str, Any]

CompressionOptimizer

class CompressionOptimizer:
    def get_optimized_config(
        self,
        provider: str,
        model: str,
        aggressive: bool = False
    ) -> RenderingConfig

    def analyze_compression_potential(
        self,
        text_length: int,
        provider: str,
        model: str
    ) -> Dict[str, Any]

    def benchmark_profile(
        self,
        profile: OptimizationProfile,
        test_text: str
    ) -> Dict[str, Any]

HybridCompressionPipeline

class HybridCompressionPipeline:
    def __init__(
        self,
        vision_provider: str = "ollama",
        vision_model: str = "llama3.2-vision"
    )

    def compress(
        self,
        text: str,
        target_ratio: float = 30.0,
        min_quality: float = 0.85
    ) -> Dict[str, Any]
    # Returns dict with:
    #   - media: List[MediaContent] - The compressed images to use with LLM
    #   - total_compression_ratio: float - Achieved compression ratio
    #   - total_quality_score: float - Quality score (0-1)
    #   - original_tokens: int - Original token count
    #   - final_tokens: int - Compressed token count

CompressionAnalytics

class CompressionAnalytics:
    def record_compression(...) -> CompressionMetrics
    def get_provider_stats(provider: str) -> Dict[str, Any]
    def get_trends(hours: int = 24) -> Dict[str, Any]
    def get_optimization_suggestions() -> List[str]
    def generate_report() -> str

Best Practices

1. Choose the Right Method

  • < 1,000 tokens: Don’t compress (overhead often isn’t worth it)
  • 1,000–10,000 tokens: Standard Glyph may help (measure on your workload)
  • 10,000–100,000 tokens: Optimized Glyph may help (measure on your workload)
  • > 100,000 tokens: Consider hybrid (experimental; validate quality carefully)

2. Balance Quality and Compression

# For critical documents
config.quality_threshold = 0.95
config.target_compression_ratio = 3.0

# For general use
config.quality_threshold = 0.90
config.target_compression_ratio = 4.0

# For archives
config.quality_threshold = 0.85
config.target_compression_ratio = 5.0

3. Use Provider-Specific Optimization

Always use the optimized profile for your provider:

# Good
config = create_optimized_config("openai", "gpt-4o")

# Better (for more compression)
config = create_optimized_config("openai", "gpt-4o", aggressive=True)

4. Monitor and Improve

# Track performance
analytics = get_analytics()

# Review weekly
report = analytics.generate_report()

# Apply suggestions
suggestions = analytics.get_optimization_suggestions()

5. Handle Failures Gracefully

try:
    compressed = processor.process_text(text, provider, model)
    response = llm.generate(prompt, media=compressed)
except CompressionQualityError:
    # Fall back to uncompressed
    response = llm.generate(prompt + "\n\n" + text)
except Exception as e:
    logger.error(f"Compression failed: {e}")
    # Use alternative approach

Limitations

Realistic Expectations

  • Compression ratio: Varies widely by content, rendering settings, and the vision model’s OCR/tokenizer behavior
  • Overhead: Rendering + vision inference adds latency; caching can help for repeated runs
  • Quality tradeoff: Higher compression targets can reduce fidelity
  • Provider dependency: Requires vision-capable models

Not Suitable For

  • Real-time chat (latency too high)
  • Short messages (<1000 tokens)
  • Mission-critical accuracy requirements
  • Providers without vision support

For technical details and research background, see Vision Compression Reality Report