CLI Applications

Production-ready terminal applications for document processing, knowledge extraction, and text evaluation.

Overview

AbstractCore includes three production-ready CLI applications that work directly from the terminal without any Python programming. These tools provide immediate access to advanced LLM capabilities for common text processing tasks.

📄 Summarizer

Intelligent document summarization with multiple styles and lengths

summarizer document.pdf --style=executive

🔍 Extractor

Knowledge graph extraction with multiple output formats

extractor report.txt --format=json-ld

⚖️ Judge

Text evaluation and scoring with customizable criteria

judge essay.txt --criteria=clarity,accuracy

Installation & Setup

Apps are automatically available after installing AbstractCore:

# Install with all features
pip install abstractcore[all]

# Apps are immediately available
summarizer --help
extractor --help
judge --help

Usage Methods

Each application can be used in two different ways:

✅ Direct Commands (Recommended)

summarizer document.txt
extractor report.pdf
judge essay.md

🐍 Python Module

python -m abstractcore.apps.summarizer document.txt
python -m abstractcore.apps.extractor report.pdf
python -m abstractcore.apps.judge essay.md

📄 Summarizer

Intelligent document summarization with multiple styles and lengths.

Quick Examples

# Basic summarization
summarizer document.pdf

# Executive summary with brief length
summarizer report.txt --style=executive --length=brief --output=summary.txt

# Technical focus with detailed analysis
summarizer spec.md --focus="implementation details" --style=analytical --length=detailed

# Large document with custom chunking
summarizer large_manual.txt --chunk-size=15000 --verbose

Key Parameters

Parameter Options Default Description
--style structured, narrative, objective, analytical, executive, conversational structured Summary presentation style
--length brief, standard, detailed, comprehensive standard Summary length and depth
--focus Any text None Specific focus area for summarization
--output File path Console Output file path

🔍 Extractor

Knowledge graph extraction with multiple output formats and entity types.

Quick Examples

# Basic knowledge graph extraction
extractor document.pdf

# Focus on specific domain with detailed extraction
extractor tech_report.pdf --focus=technology --length=detailed --format=json-ld

# Extract specific entity types only
extractor article.txt --entity-types=person,organization,location --output=entities.jsonld

# High-quality extraction with multiple iterations
extractor research_paper.pdf --iterate=3 --length=comprehensive --verbose

# Fast extraction for large documents
extractor large_doc.txt --mode=fast --minified --output=kg_fast.jsonld

Key Parameters

Parameter Options Default Description
--format json-ld, triples, json, yaml json-ld Output format
--entity-types person,organization,location,technology,etc. All types Entity types to focus on
--mode fast, balanced, thorough balanced Extraction mode
--iterate 1-10 1 Number of refinement iterations

Entity Types

person - People and individuals
organization - Companies, institutions
location - Places, cities, countries
technology - Software, hardware, systems
concept - Abstract concepts, ideas
event - Occurrences, meetings

🔧 Debug Capabilities and Self-Healing JSON

Robust debugging and error recovery features for production use.

Key Features

  • Self-Healing JSON: Automatically repairs truncated or malformed JSON responses
  • Debug Mode: --debug flag shows raw LLM responses for troubleshooting
  • Focus Areas: --focus parameter for targeted evaluation
  • Increased Token Limits: Default max_tokens=32000, max_output_tokens=8000
  • Consistent CLI Syntax: All apps use space syntax (--param value)

Debug Examples

# Debug raw LLM responses for troubleshooting
judge document.txt --debug --provider lmstudio --model qwen/qwen3-next-80b

# Focus areas for targeted evaluation
judge README.md --focus "architectural diagrams, technical comparisons" --debug

# Automatic JSON self-repair with increased token limits
extractor large_doc.txt --mode thorough --max-tokens 32000 --max-output-tokens 8000

⚖️ Judge

Text evaluation and scoring with customizable criteria, contexts, and debug capabilities.

Enhanced Examples

# Text evaluation with focus areas and debug capabilities
judge essay.txt --criteria clarity,accuracy,coherence --context "academic writing" --include-criteria
judge code.py --context "code review" --format plain --verbose --debug
judge README.md --focus "technical accuracy,examples" --temperature 0.05 --max-output-tokens 8000

# Advanced evaluation scenarios  
judge document.txt --debug --provider lmstudio --model qwen/qwen3-next-80b
judge proposal.md --focus "architectural diagrams, technical comparisons" --debug
judge multiple_files.txt --exclude-global --reference reference.md

Key Parameters

Parameter Options Default Description
--context Any text None Evaluation context (e.g., "code review")
--criteria clarity,soundness,effectiveness,etc. Default set Standard evaluation criteria
--custom-criteria Custom comma-separated list None Custom evaluation criteria
--format json, plain, yaml json Output format

Available Criteria

clarity - Clear and understandable
soundness - Logically sound and valid
effectiveness - Achieves intended purpose
completeness - Covers all necessary aspects
coherence - Well-structured and logical
innovation - Novel and creative approach

Common Parameters

Parameters available across all three applications:

Parameter Description Example
--provider + --model Use different LLM providers --provider=openai --model=gpt-4o-mini
--output Save results to file --output=results.txt
--verbose Show detailed progress --verbose
--timeout HTTP timeout (seconds) --timeout=600

📋 Complete CLI Parameters Reference

Comprehensive parameter documentation for all applications with enhanced features.

Extractor Parameters

# Core parameters
--focus FOCUS                    # Specific focus area (e.g., "technology", "business")
--format {json-ld,triples,json,yaml}  # Output format
--entity-types TYPES             # Comma-separated types (person,organization,location,etc.)
--output OUTPUT                  # Output file path

# Performance & Quality
--mode {fast,balanced,thorough}  # Extraction mode (balanced=default)
--iterate N                      # Refinement iterations (default: 1)
--similarity-threshold 0.0-1.0   # Entity deduplication threshold (default: 0.85)
--no-embeddings                  # Disable semantic deduplication
--minified                       # Compact JSON output

# LLM Configuration
--provider PROVIDER --model MODEL  # Custom LLM provider/model
--max-tokens 32000               # Context window (default: 32000)
--max-output-tokens 8000         # Output tokens (default: 8000)
--timeout 300                    # HTTP timeout seconds (default: 300)
--chunk-size 8000                # Chunk size for large files

Judge Parameters

# Evaluation Configuration
--criteria CRITERIA              # Standard criteria (clarity,soundness,etc.)
--focus FOCUS                    # Primary focus areas for evaluation
--context CONTEXT                # Evaluation context description
--reference FILE_OR_TEXT         # Reference content for comparison

# Output & Debug
--format {json,plain,yaml}       # Output format (default: json)
--debug                          # Show raw LLM responses
--include-criteria               # Include detailed criteria explanations
--exclude-global                 # Skip global assessment for multiple files

# LLM Configuration  
--temperature 0.1                # Evaluation consistency (default: 0.1)
--max-tokens 32000               # Context window (default: 32000)
--max-output-tokens 8000         # Output tokens (default: 8000)
--timeout 300                    # HTTP timeout seconds

Summarizer Parameters

# Content Configuration
--style {structured,narrative,objective,analytical,executive,conversational}
--length {brief,standard,detailed,comprehensive}
--focus FOCUS                    # Specific focus area
--chunk-size 8000                # Chunk size for large files (max: 32000)

# Output & Performance
--output OUTPUT                  # Output file path
--max-tokens 32000               # Context window (default: 32000)
--max-output-tokens 8000         # Output tokens (default: 8000)
--verbose                        # Show detailed progress

📁 Media Handling with @filename Syntax

All CLI apps support universal media handling with simple @filename syntax for analyzing images, PDFs, Office documents, and data files.

Supported File Types

  • Images: PNG, JPEG, GIF, WEBP, BMP, TIFF (analyzed via vision models)
  • Documents: PDF, DOCX, XLSX, PPTX (text extracted automatically)
  • Data Files: CSV, TSV, TXT, MD, JSON (parsed intelligently)

CLI Examples

# Summarize PDF document
summarizer @report.pdf --style executive

# Extract entities from Office document
extractor @presentation.pptx --format json-ld

# Evaluate document with image reference
judge @document.docx --reference @original.pdf

# Multiple files
summarizer @chapter1.pdf @chapter2.pdf --length comprehensive

# Mixed media - combine chart analysis with data
extractor "@sales_chart.png What trends do you see?" --focus "business metrics"

Learn more: Media Handling System Guide and Vision Capabilities Guide

Related Documentation

Centralized Configuration

Set default providers and models once, never specify again

Configure CLI Apps →

Media Handling System

Universal file attachment across all apps with @filename syntax

Learn Media Handling →

Vision Capabilities

Image analysis with vision fallback for text-only models

Explore Vision Features →

Complete Documentation

📄 Summarizer

Complete guide with advanced usage patterns and Python API

View Full Documentation →

🔍 Extractor

Comprehensive knowledge graph extraction guide with examples

View Full Documentation →

⚖️ Judge

Complete evaluation guide with LLM-as-a-judge best practices

View Full Documentation →