Skip to content

Hooks System

Extend the strutex processing pipeline without modifying core components.

New in v0.4.2

Callback and decorator hooks — No pluggy knowledge required! Use simple callbacks or decorators directly on DocumentProcessor.


Plugins vs Hooks

Before diving in, understand the distinction:

Plugins (Provider, Validator, etc.) Hooks (pre_process, post_process)
Role core components — do the actual work Monitors — observe without replacing
Pattern Strategy (replace engine) Observer (wrap engine)
Quantity One at a time Many simultaneously
Use case "Use OpenAI instead of Gemini" "Log every request and add timestamps"

Rule of thumb:

  • Changing what runs? → Plugin
  • Observing when things run? → Hook

from strutex import DocumentProcessor

processor = DocumentProcessor(
    on_pre_process=lambda fp, prompt, schema, mime, ctx: {
        "prompt": prompt + "\nBe precise."
    },
    on_post_process=lambda result, ctx: {
        **result, "processed": True
    },
    on_error=lambda error, fp, ctx: {
        "status": "error", "message": str(error)
    }
)
from strutex import DocumentProcessor
from datetime import datetime

processor = DocumentProcessor()

@processor.on_post_process
def add_timestamp(result, context):
    result["processed_at"] = datetime.now().isoformat()
    return result

@processor.on_pre_process
def add_instructions(file_path, prompt, schema, mime_type, context):
    return {"prompt": prompt + "\nExtract all values precisely."}

@processor.on_error
def handle_rate_limit(error, file_path, context):
    if "rate limit" in str(error).lower():
        return {"error": "Rate limited, please retry later"}
    return None  # Propagate other errors

Hook Types

Hook Called Receives Returns
on_pre_process Before processing (file_path, prompt, schema, mime_type, context) {"prompt": ...} or None
on_post_process After processing (result, context) Modified result or None
on_error On exception (error, file_path, context) Fallback result or None

Callbacks vs Decorators

Approach Best For
Callbacks Quick, inline transformations; lambda functions
Decorators Reusable, named functions; complex logic

You can use both together — they execute in order:

processor = DocumentProcessor(
    on_post_process=lambda r, c: {**r, "via_callback": True}
)

@processor.on_post_process
def via_decorator(result, context):
    result["via_decorator"] = True
    return result

# Result will have both keys

Complete Example

from strutex import DocumentProcessor, Object, String, Number
from datetime import datetime
import time

processor = DocumentProcessor(provider="gemini")

@processor.on_pre_process
def start_timer(file_path, prompt, schema, mime_type, context):
    context["start_time"] = time.time()
    print(f"Processing: {file_path}")
    return None

@processor.on_post_process
def add_metadata(result, context):
    result["_processed_at"] = datetime.now().isoformat()
    result["_elapsed_seconds"] = time.time() - context["start_time"]
    return result

@processor.on_error
def fallback_handler(error, file_path, context):
    print(f"Error processing {file_path}: {error}")
    return {"error": str(error), "file": file_path}

schema = Object(properties={"invoice_number": String(), "total": Number()})
result = processor.process("invoice.pdf", "Extract invoice data", schema)

Hook Execution Order

  1. Pre-process hooks run in registration order
  2. Security validation (input sanitization)
  3. Provider processing (LLM extraction)
  4. Security validation (output validation)
  5. Post-process hooks run in registration order
  6. Pydantic validation (if model was provided)

If an error occurs at step 3, error hooks run until one returns a fallback.


Advanced: Pluggy Integration

Callback/decorator hooks are automatically integrated with pluggy. This means:

  • Your callbacks work alongside global pluggy plugins
  • Third-party packages can register hooks via entry points
  • All hooks execute through the same pipeline

Global Pluggy Hooks

For distributed plugins or complex scenarios:

from strutex.plugins import hookimpl, register_hook_plugin

class MetricsPlugin:
    @hookimpl
    def pre_process(self, file_path, prompt, schema, mime_type, context):
        context["start_time"] = time.time()

    @hookimpl
    def post_process(self, result, context):
        elapsed = time.time() - context["start_time"]
        metrics.record("extraction_time", elapsed)

register_hook_plugin(MetricsPlugin())

How It Works

┌─────────────────────────────────────────────────────────────────┐
│  processor.process() calls call_hook("post_process", ...)       │
│                              │                                   │
│                              ▼                                   │
│              pluggy.PluginManager.hook.post_process()           │
│                              │                                   │
│     ┌────────────────────────┼────────────────────────┐         │
│     ▼                        ▼                        ▼         │
│  Callback       Decorator       Global Pluggy                   │
│  Hooks          Hooks           Plugins                         │
│  (wrapped in    (wrapped in     (registered via                 │
│  _CallbackHook) _CallbackHook)  register_hook_plugin)           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

API Reference

register_hook_plugin(plugin: Any) -> None

Register a plugin with hook implementations.

PARAMETER DESCRIPTION
plugin

An object with @hookimpl decorated methods

TYPE: Any

RAISES DESCRIPTION
RuntimeError

If pluggy is not installed

Source code in strutex/plugins/hooks.py
def register_hook_plugin(plugin: Any) -> None:
    """
    Register a plugin with hook implementations.

    Args:
        plugin: An object with @hookimpl decorated methods

    Raises:
        RuntimeError: If pluggy is not installed
    """
    pm = get_plugin_manager()
    if pm is None:
        raise RuntimeError("pluggy is required for hook plugins. Install with: pip install pluggy")

    pm.register(plugin)

options: show_root_heading: true

unregister_hook_plugin(plugin: Any) -> None

Unregister a previously registered hook plugin.

PARAMETER DESCRIPTION
plugin

The plugin object to unregister

TYPE: Any

Source code in strutex/plugins/hooks.py
def unregister_hook_plugin(plugin: Any) -> None:
    """
    Unregister a previously registered hook plugin.

    Args:
        plugin: The plugin object to unregister
    """
    pm = get_plugin_manager()
    if pm is not None:
        pm.unregister(plugin)

options: show_root_heading: true