Error Handling¶
Handle errors gracefully with Strutex's comprehensive exception hierarchy.
Exception Hierarchy¶
All Strutex errors inherit from StrutexError, enabling catch-all patterns:
StrutexError (base)
├── ProviderError # LLM API failures
│ ├── RateLimitError # 429 - retry after delay
│ ├── AuthenticationError # 401 - invalid API key
│ └── ModelNotFoundError # 404 - model unavailable
├── ExtractionError # Document processing failed
│ ├── DocumentParseError # Can't read file
│ └── SchemaError # Invalid schema definition
├── ValidationError # Output validation failed
├── ConfigurationError # Setup/config issues
│ └── PluginError # Plugin loading failed
├── CacheError # Cache operations failed
├── SecurityError # Security check failed
│ └── InjectionDetectedError # Prompt injection detected
└── TimeoutError # Operation timed out
Quick Start¶
from strutex import (
DocumentProcessor,
StrutexError,
ProviderError,
RateLimitError,
ValidationError,
ExtractionError,
)
import time
processor = DocumentProcessor(provider="gemini")
try:
result = processor.process("invoice.pdf", "Extract invoice", schema=MySchema)
except RateLimitError as e:
# Handle rate limiting with retry
print(f"Rate limited, wait {e.retry_after}s")
time.sleep(e.retry_after or 60)
except ProviderError as e:
# Check if error is retryable
if e.retryable:
print(f"Retryable error from {e.provider}: {e.message}")
else:
print(f"Permanent error: {e.message}")
except ValidationError as e:
# Handle validation failures
print(f"Validation failed: {e.issues}")
except ExtractionError as e:
# Handle extraction failures
print(f"Failed at stage {e.stage}: {e.message}")
except StrutexError as e:
# Catch-all for any Strutex error
print(f"Strutex error: {e.message}")
print(f"Details: {e.details}")
Error Hook for Global Handling¶
from strutex import DocumentProcessor, GeminiProvider
import logging
def global_error_handler(error, file_path, context):
"""Handle all extraction errors."""
logging.error(f"Extraction failed for {file_path}: {error}")
# Option 1: Return fallback
return {"extraction_failed": True, "error": str(error)}
# Option 2: Re-raise (return None)
# return None
processor = DocumentProcessor(
provider=GeminiProvider(),
on_error=global_error_handler
)
Retry Strategies¶
Simple Retry¶
import time
from strutex import DocumentProcessor, GeminiProvider
def process_with_retry(processor, file_path, prompt, schema, max_retries=3):
for attempt in range(max_retries):
try:
return processor.process(file_path, prompt, schema=schema)
except Exception as e:
if attempt == max_retries - 1:
raise
wait = 2 ** attempt # Exponential backoff
print(f"Attempt {attempt + 1} failed, retrying in {wait}s...")
time.sleep(wait)
processor = DocumentProcessor(provider=GeminiProvider())
result = process_with_retry(processor, "doc.pdf", "Extract", MySchema)
Using RetryConfig¶
from strutex import ProviderChain, RetryConfig
from strutex import GeminiProvider, OpenAIProvider
# Create chain with retry config
chain = ProviderChain(
providers=[GeminiProvider(), OpenAIProvider()],
retry_config=RetryConfig(
max_retries=3,
backoff_factor=2.0,
retry_on=[ConnectionError, TimeoutError]
)
)
processor = DocumentProcessor(provider=chain)
Provider Fallback¶
Fall back to another provider on failure:
from strutex import DocumentProcessor, ProviderChain
from strutex import GeminiProvider, OpenAIProvider, OllamaProvider
# Try providers in order
chain = ProviderChain([
GeminiProvider(), # Try first
OpenAIProvider(), # Fall back
OllamaProvider(), # Last resort (local)
])
processor = DocumentProcessor(provider=chain)
# Automatically tries next provider on failure
result = processor.process("doc.pdf", "Extract", schema=MySchema)
Debugging Extraction Issues¶
Enable Logging¶
import logging
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("strutex").setLevel(logging.DEBUG)
processor = DocumentProcessor(provider=GeminiProvider())
result = processor.process("doc.pdf", "Extract", schema=MySchema)
# See detailed logs
Inspect Raw Response¶
# Add post-process hook to see raw result
@processor.on_post_process
def debug_result(result, context):
print(f"Raw result: {result}")
print(f"Context: {context}")
return result
Check Provider Health¶
from strutex import GeminiProvider
provider = GeminiProvider()
if provider.health_check():
print("Provider is healthy")
else:
print("Provider unavailable")
Common Issues & Solutions¶
| Issue | Solution |
|---|---|
| "API key not found" | Set env var: export GOOGLE_API_KEY=... |
| "Rate limit exceeded" | Add retry with backoff, use caching |
| "Invalid JSON response" | Check prompt, use verify=True |
| "Schema validation failed" | Check field types, use Optional[] |
| "File too large" | Use chunking or switch to Claude |
| "Extraction timeout" | Increase timeout, use smaller model |
Validation Errors¶
Handle validation failures separately:
from strutex.validators import ValidationChain, SchemaValidator, SumValidator
chain = ValidationChain([SchemaValidator(), SumValidator()])
result = processor.process("invoice.pdf", "Extract", schema=InvoiceSchema)
# Validate separately
is_valid, errors = chain.validate(result, InvoiceSchema)
if not is_valid:
print(f"Validation errors: {errors}")
# Handle: log, alert, request human review
Error Recovery in Batches¶
from strutex import DocumentProcessor, GeminiProvider
processor = DocumentProcessor(provider=GeminiProvider())
files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = []
errors = []
for file_path in files:
try:
result = processor.process(file_path, "Extract", schema=MySchema)
results.append({"file": file_path, "data": result})
except Exception as e:
errors.append({"file": file_path, "error": str(e)})
print(f"Success: {len(results)}, Failed: {len(errors)}")
# Retry failed files with different provider
if errors:
fallback = DocumentProcessor(provider=OpenAIProvider())
for err in errors:
try:
result = fallback.process(err["file"], "Extract", schema=MySchema)
results.append({"file": err["file"], "data": result})
except:
pass # Truly failed
Next Steps¶
| Want to... | Go to... |
|---|---|
| See real examples | Use Cases |
| Optimize prompts | Prompt Engineering |
| Add caching | Caching |