Plugin System¶

Everything in strutex is pluggable. Use defaults or register your own implementations.

New in v0.3.0

Plugin System v2 introduces auto-registration via inheritance, lazy loading, entry points, priority-based ordering, and CLI tooling.

Architecture: Plugins vs Hooks¶

Strutex has two extension mechanisms that serve different purposes:

┌─────────────────────────────────────────────────────────────────┐
│                     DocumentProcessor.process()                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─── HOOKS (Observers) ───┐                                     │
│  │ • pre_process           │ ◄── Logging, timing, prompt mods   │
│  └─────────────────────────┘                                     │
│              │                                                   │
│              ▼                                                   │
│  ┌─── PLUGINS (Components) ─┐                                    │
│  │ • SecurityPlugin         │ ◄── Validates input               │
│  │ • Extractor              │ ◄── PDF → text                    │
│  │ • Provider               │ ◄── LLM call                      │
│  │ • Validator              │ ◄── Validates output              │
│  │ • Postprocessor          │ ◄── Transforms result             │
│  └──────────────────────────┘                                    │
│              │                                                   │
│              ▼                                                   │
│  ┌─── HOOKS (Observers) ───┐                                     │
│  │ • post_process          │ ◄── Add metadata, notifications    │
│  │ • on_error              │ ◄── Fallbacks, alerting            │
│  └─────────────────────────┘                                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

When to Use Which?¶

Feature	Plugins (Base Classes)	Hooks System
Pattern	Strategy Pattern	Observer/Middleware Pattern
Role	Drivers — define how a step is performed	Observers — react to pipeline events
Cardinality	1:1 — one Provider, one Extractor per run	1:N — many hooks can run simultaneously
Complexity	Higher — implement interface methods	Lower — just a function or decorator
Goal	Interchangeability — replace the engine	Cross-cutting concerns — add without touching engine

Use a Plugin when:

Changing the fundamental logic (e.g., "use OCR instead of text extraction")
Replacing a core component (different LLM provider)

Use a Hook when:

Observing events (logging, timing, metrics)
Modifying data generically (add metadata to all results)
Handling errors (fallbacks, alerting)

Plugin Types¶

Type	Purpose	Built-in Examples
`provider`	LLM backends	Gemini, OpenAI
`security`	Input/output protection	InputSanitizer, PromptInjectionDetector
`extractor`	Document parsing	PDF, Image, Excel
`validator`	Output validation	Schema, business rules
`postprocessor`	Data transformation	DateNormalizer

The PluginType enum provides type-safe access:

from strutex.plugins import PluginType

PluginType.PROVIDER      # "provider"
PluginType.EXTRACTOR     # "extractor"
PluginType.VALIDATOR     # "validator"
PluginType.POSTPROCESSOR # "postprocessor"
PluginType.SECURITY      # "security"

Quick Start¶

Auto-Registration via Inheritance¶

Simply inherit from a base class and your plugin is automatically registered:

from strutex.plugins import Provider

class MyProvider(Provider):
    """Auto-registered as 'myprovider'"""
    capabilities = ["vision"]

    def process(self, file_path, prompt, schema, mime_type, **kwargs):
        return {"result": "data"}

That's it! No decorators or manual registration needed.

Customizing Registration¶

Use class arguments to customize the name:

class FastProvider(Provider, name="fast"):
    """Registered as 'fast' with high priority"""
    priority = 90  # Priority is a class attribute
    cost = 0.5
    capabilities = ["vision", "batch"]

    def process(self, *args, **kwargs):
        ...

Opting Out of Auto-Registration¶

For intermediate base classes:

class BasePdfProvider(Provider, register=False):
    """NOT registered - abstract base class"""
    def common_pdf_logic(self):
        ...

class AdobeProvider(BasePdfProvider):
    """Registered as 'adobeprovider'"""
    def process(self, *args, **kwargs):
        ...

Tip

Classes with unimplemented @abstractmethods are automatically skipped.

Plugin Attributes¶

Attribute	Type	Default	Description
`strutex_plugin_version`	`str`	`"1.0"`	API version for compatibility
`priority`	`int`	`50`	Order in waterfall (0-100, higher = preferred)
`cost`	`float`	`1.0`	Cost hint (lower = cheaper)
`capabilities`	`list`	`[]`	Features this plugin supports

Registration Methods¶

1. Auto-Registration (Recommended)¶

Just inherit from a base class:

class MyProvider(Provider):
    def process(self, ...): ...
# → Registered as "myprovider"

class MyProvider(Provider, name="custom"):
    def process(self, ...): ...
# → Registered as "custom"

2. Entry Points (For Packages)¶

For distributable packages, register in pyproject.toml:

[project.entry-points."strutex.providers"]
my_provider = "my_package:MyProvider"

[project.entry-points."strutex.validators"]
my_validator = "my_package:MyValidator"

Plugins are lazy loaded — only imported when first used.

3. Manual Registration¶

from strutex.plugins import PluginRegistry

PluginRegistry.register("provider", "my_provider", MyProvider)

CLI Commands¶

# List all plugins
strutex plugins list

# Filter by type
strutex plugins list --type provider

# JSON output
strutex plugins list --json

# Plugin details
strutex plugins info gemini --type provider

# Refresh discovery cache
strutex plugins refresh

Creating Custom Plugins¶

Custom Provider¶

from strutex.plugins import Provider

class OllamaProvider(Provider):
    priority = 60
    capabilities = ["local", "vision"]

    def __init__(self, model="llama3"):
        self.model = model

    def process(self, file_path, prompt, schema, mime_type, **kwargs):
        # Your implementation
        ...

Custom Validator¶

from strutex.plugins import Validator, ValidationResult

class SumValidator(Validator):
    """Verify line items sum to total."""
    priority = 70

    def validate(self, data, schema=None):
        items_sum = sum(i.get("amount", 0) for i in data.get("items", []))
        total = data.get("total", 0)

        if abs(items_sum - total) > 0.01:
            return ValidationResult(
                valid=False,
                data=data,
                issues=[f"Sum mismatch: {items_sum} != {total}"]
            )
        return ValidationResult(valid=True, data=data)

Custom Postprocessor¶

from strutex.plugins import Postprocessor
import re

class DateNormalizer(Postprocessor):
    """Convert DD.MM.YYYY to YYYY-MM-DD."""

    def process(self, data):
        result = data.copy()
        if "date" in result:
            match = re.match(r'(\d{2})\.(\d{2})\.(\d{4})', result["date"])
            if match:
                d, m, y = match.groups()
                result["date"] = f"{y}-{m}-{d}"
        return result

API Reference¶

`PluginRegistry` ¶

Central registry for all plugin types with lazy loading.

Plugins are stored as EntryPoint objects and only loaded when first accessed via get(). This improves startup time and avoids importing unused dependencies.

Usage

Get a plugin (loads on first access)¶

cls = PluginRegistry.get("provider", "gemini")

List all plugins (does not load them)¶

all_providers = PluginRegistry.list("provider")

Force discovery from entry points¶

count = PluginRegistry.discover()

`clear(plugin_type: Optional[str] = None) -> None` `classmethod` ¶

Clear registered plugins.

PARAMETER	DESCRIPTION
`plugin_type`	If provided, only clear this type. Otherwise clear all. TYPE: `Optional[str]` DEFAULT: `None`

Source code in strutex/plugins/registry.py

@classmethod
def clear(cls, plugin_type: Optional[str] = None) -> None:
    """
    Clear registered plugins.

    Args:
        plugin_type: If provided, only clear this type. Otherwise clear all.
    """
    if plugin_type:
        cls._entry_points.pop(plugin_type, None)
        cls._loaded.pop(plugin_type, None)
        cls._manual.pop(plugin_type, None)
    else:
        cls._entry_points.clear()
        cls._loaded.clear()
        cls._manual.clear()
        cls._discovered = False

`discover(group_prefix: str = 'strutex', force: bool = False) -> int` `classmethod` ¶

Discover and register plugins from entry points.

Scans for entry points matching the pattern: - strutex.providers - strutex.validators - strutex.postprocessors - strutex.security - etc.

Entry points are stored for lazy loading - they are not imported until first use via get().

PARAMETER	DESCRIPTION
`group_prefix`	Entry point group prefix (default: "strutex") TYPE: `str` DEFAULT: `'strutex'`
`force`	Force re-discovery even if already discovered TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`int`	Number of entry points discovered

Example pyproject.toml: [project.entry-points."strutex.providers"] my_provider = "my_package:MyProvider"

Source code in strutex/plugins/registry.py

@classmethod
def discover(cls, group_prefix: str = "strutex", force: bool = False) -> int:
    """
    Discover and register plugins from entry points.

    Scans for entry points matching the pattern:
    - strutex.providers
    - strutex.validators
    - strutex.postprocessors
    - strutex.security
    - etc.

    Entry points are stored for lazy loading - they are not imported
    until first use via get().

    Args:
        group_prefix: Entry point group prefix (default: "strutex")
        force: Force re-discovery even if already discovered

    Returns:
        Number of entry points discovered

    Example pyproject.toml:
        [project.entry-points."strutex.providers"]
        my_provider = "my_package:MyProvider"
    """
    if cls._discovered and not force:
        return sum(len(eps) for eps in cls._entry_points.values())

    discovered = 0

    # Get entry_points function
    if sys.version_info >= (3, 10):
        from importlib.metadata import entry_points
    else:
        try:
            from importlib_metadata import entry_points
        except ImportError:
            cls._discovered = True
            return 0

    # Get all entry point groups
    try:
        all_eps = entry_points()

        # Get group names that match our prefix
        if hasattr(all_eps, 'groups'):
            # Python 3.12+ style
            groups = [g for g in all_eps.groups if g.startswith(f"{group_prefix}.")]
        elif hasattr(all_eps, 'keys'):
            # Python 3.9-3.11 style (dict-like)
            groups = [g for g in all_eps.keys() if g.startswith(f"{group_prefix}.")]
        else:
            groups = []
    except Exception:
        cls._discovered = True
        return 0

    for group in groups:
        # Extract plugin type from group name
        # e.g., "strutex.providers" -> "provider"
        plugin_type = group.replace(f"{group_prefix}.", "").rstrip("s")

        if plugin_type not in cls._entry_points:
            cls._entry_points[plugin_type] = {}

        try:
            # Get entry points for this group
            if hasattr(all_eps, 'select'):
                eps = all_eps.select(group=group)
            else:
                eps = all_eps.get(group, [])

            for ep in eps:
                # Store entry point for lazy loading
                cls._entry_points[plugin_type][ep.name.lower()] = ep
                discovered += 1

        except Exception:
            pass

    cls._discovered = True
    return discovered

`get(plugin_type: str, name: str) -> Optional[Type]` `classmethod` ¶

Get a registered plugin class by type and name.

If the plugin is registered via entry point and not yet loaded, it will be loaded on first access (lazy loading).

PARAMETER	DESCRIPTION
`plugin_type`	Type of plugin TYPE: `str`
`name`	Name of the plugin TYPE: `str`

RETURNS	DESCRIPTION
`Optional[Type]`	The plugin class, or None if not found

Source code in strutex/plugins/registry.py

@classmethod
def get(cls, plugin_type: str, name: str) -> Optional[Type]:
    """
    Get a registered plugin class by type and name.

    If the plugin is registered via entry point and not yet loaded,
    it will be loaded on first access (lazy loading).

    Args:
        plugin_type: Type of plugin
        name: Name of the plugin

    Returns:
        The plugin class, or None if not found
    """
    name_lower = name.lower()

    # Ensure discovery has run
    if not cls._discovered:
        cls.discover()

    # Check loaded cache first
    if name_lower in cls._loaded.get(plugin_type, {}):
        return cls._loaded[plugin_type][name_lower]

    # Check manual registrations
    if name_lower in cls._manual.get(plugin_type, {}):
        return cls._manual[plugin_type][name_lower]

    # Try to lazy load from entry point
    ep = cls._entry_points.get(plugin_type, {}).get(name_lower)
    if ep is not None:
        plugin_cls = cls._load_entry_point(ep, plugin_type, name_lower)
        if plugin_cls is not None:
            return plugin_cls

    return None

`get_plugin_info(plugin_type: str, name: str) -> Optional[Dict[str, Any]]` `classmethod` ¶

Get metadata about a plugin without necessarily loading it.

PARAMETER	DESCRIPTION
`plugin_type`	Type of plugin TYPE: `str`
`name`	Name of the plugin TYPE: `str`

RETURNS	DESCRIPTION
`Optional[Dict[str, Any]]`	Dict with plugin info, or None if not found

Source code in strutex/plugins/registry.py

@classmethod
def get_plugin_info(cls, plugin_type: str, name: str) -> Optional[Dict[str, Any]]:
    """
    Get metadata about a plugin without necessarily loading it.

    Args:
        plugin_type: Type of plugin
        name: Name of the plugin

    Returns:
        Dict with plugin info, or None if not found
    """
    name_lower = name.lower()

    if not cls._discovered:
        cls.discover()

    # Check if loaded
    if name_lower in cls._loaded.get(plugin_type, {}):
        plugin_cls = cls._loaded[plugin_type][name_lower]
        return {
            "name": name_lower,
            "version": getattr(plugin_cls, "strutex_plugin_version", "unknown"),
            "priority": getattr(plugin_cls, "priority", 50),
            "cost": getattr(plugin_cls, "cost", 1.0),
            "capabilities": getattr(plugin_cls, "capabilities", []),
            "loaded": True,
            "healthy": cls._check_health(plugin_cls),
        }

    # Check entry point
    ep = cls._entry_points.get(plugin_type, {}).get(name_lower)
    if ep is not None:
        return {
            "name": name_lower,
            "entry_point": f"{ep.group}:{ep.name}",
            "loaded": False,
            "healthy": None,  # Unknown until loaded
        }

    return None

`get_sorted(plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]]` `classmethod` ¶

Get all plugins of a type sorted by priority.

Useful for waterfall selection where you want to try higher-priority plugins first.

PARAMETER	DESCRIPTION
`plugin_type`	Type of plugin TYPE: `str`
`reverse`	If True (default), higher priority first TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`List[Tuple[str, Type]]`	List of (name, class) tuples sorted by priority

Source code in strutex/plugins/registry.py

@classmethod
def get_sorted(cls, plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]]:
    """
    Get all plugins of a type sorted by priority.

    Useful for waterfall selection where you want to try
    higher-priority plugins first.

    Args:
        plugin_type: Type of plugin
        reverse: If True (default), higher priority first

    Returns:
        List of (name, class) tuples sorted by priority
    """
    plugins = cls.list(plugin_type)
    return sorted(
        plugins.items(),
        key=lambda x: getattr(x[1], 'priority', 50),
        reverse=reverse
    )

`list(plugin_type: str) -> Dict[str, Type]` `classmethod` ¶

List all plugins of a given type.

Note: This loads all plugins of the type. Use list_names() for a lightweight listing without loading.

PARAMETER	DESCRIPTION
`plugin_type`	Type of plugin TYPE: `str`

RETURNS	DESCRIPTION
`Dict[str, Type]`	Dictionary mapping names to plugin classes

Source code in strutex/plugins/registry.py

@classmethod
def list(cls, plugin_type: str) -> Dict[str, Type]:
    """
    List all plugins of a given type.

    Note: This loads all plugins of the type. Use list_names()
    for a lightweight listing without loading.

    Args:
        plugin_type: Type of plugin

    Returns:
        Dictionary mapping names to plugin classes
    """
    if not cls._discovered:
        cls.discover()

    result = {}

    # Get all names from entry points and manual registrations
    all_names = set()
    all_names.update(cls._entry_points.get(plugin_type, {}).keys())
    all_names.update(cls._manual.get(plugin_type, {}).keys())
    all_names.update(cls._loaded.get(plugin_type, {}).keys())

    # Load each plugin
    for name in all_names:
        plugin_cls = cls.get(plugin_type, name)
        if plugin_cls is not None:
            result[name] = plugin_cls

    return result

`list_names(plugin_type: str) -> List[str]` `classmethod` ¶

List names of all plugins of a given type without loading them.

PARAMETER	DESCRIPTION
`plugin_type`	Type of plugin TYPE: `str`

RETURNS	DESCRIPTION
`List[str]`	List of plugin names

Source code in strutex/plugins/registry.py

@classmethod
def list_names(cls, plugin_type: str) -> List[str]:
    """
    List names of all plugins of a given type without loading them.

    Args:
        plugin_type: Type of plugin

    Returns:
        List of plugin names
    """
    if not cls._discovered:
        cls.discover()

    names = set()
    names.update(cls._entry_points.get(plugin_type, {}).keys())
    names.update(cls._manual.get(plugin_type, {}).keys())
    names.update(cls._loaded.get(plugin_type, {}).keys())

    return sorted(names)

`list_types() -> List[str]` `classmethod` ¶

List all registered plugin types.

Source code in strutex/plugins/registry.py

@classmethod
def list_types(cls) -> List[str]:
    """List all registered plugin types."""
    if not cls._discovered:
        cls.discover()

    types = set()
    types.update(cls._entry_points.keys())
    types.update(cls._manual.keys())
    types.update(cls._loaded.keys())

    return sorted(types)

`register(plugin_type: str, name: str, plugin_cls: Type) -> None` `classmethod` ¶

Register a plugin class manually.

This is used by the @register decorator for backwards compatibility. Prefer using entry points in pyproject.toml for new plugins.

PARAMETER	DESCRIPTION
`plugin_type`	Type of plugin (e.g., "provider", "security", "validator") TYPE: `str`
`name`	Unique name for this plugin TYPE: `str`
`plugin_cls`	The plugin class to register TYPE: `Type`

Source code in strutex/plugins/registry.py

@classmethod
def register(cls, plugin_type: str, name: str, plugin_cls: Type) -> None:
    """
    Register a plugin class manually.

    This is used by the @register decorator for backwards compatibility.
    Prefer using entry points in pyproject.toml for new plugins.

    Args:
        plugin_type: Type of plugin (e.g., "provider", "security", "validator")
        name: Unique name for this plugin
        plugin_cls: The plugin class to register
    """
    if plugin_type not in cls._manual:
        cls._manual[plugin_type] = {}

    cls._manual[plugin_type][name.lower()] = plugin_cls

    # Also add to loaded cache
    if plugin_type not in cls._loaded:
        cls._loaded[plugin_type] = {}
    cls._loaded[plugin_type][name.lower()] = plugin_cls

options: show_root_heading: true members: - register - get - list - discover

`Provider` ¶

Bases: ABC

Base class for LLM providers.

All providers must implement the process method to handle document extraction via their specific LLM API.

Subclassing auto-registers the plugin. Use class arguments to customize:

class MyProvider(Provider, name="custom", priority=90):
    ...

ATTRIBUTE	DESCRIPTION
`strutex_plugin_version`	API version for compatibility checks TYPE: `str`
`priority`	Ordering priority (0-100, higher = preferred) TYPE: `int`
`cost`	Cost hint for optimization (lower = cheaper) TYPE: `float`
`capabilities`	List of supported features TYPE: `List[str]`

`aprocess(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any` `async` ¶

Async version of process. Override for true async support. Default implementation calls sync version.

Source code in strutex/plugins/base.py

async def aprocess(
    self,
    file_path: str,
    prompt: str,
    schema: Schema,
    mime_type: str,
    **kwargs
) -> Any:
    """
    Async version of process. Override for true async support.
    Default implementation calls sync version.
    """
    return self.process(file_path, prompt, schema, mime_type, **kwargs)

`has_capability(capability: str) -> bool` ¶

Check if this provider has a specific capability.

Source code in strutex/plugins/base.py

def has_capability(self, capability: str) -> bool:
    """Check if this provider has a specific capability."""
    return capability.lower() in [c.lower() for c in self.capabilities]

`health_check() -> bool` `classmethod` ¶

Check if this provider is healthy and ready to use.

Override in subclasses for custom health checks (e.g., API connectivity).

RETURNS	DESCRIPTION
`bool`	True if healthy, False otherwise

Source code in strutex/plugins/base.py

@classmethod
def health_check(cls) -> bool:
    """
    Check if this provider is healthy and ready to use.

    Override in subclasses for custom health checks (e.g., API connectivity).

    Returns:
        True if healthy, False otherwise
    """
    return True

`process(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any` `abstractmethod` ¶

Process a document and extract structured data.

PARAMETER	DESCRIPTION
`file_path`	Path to the document file TYPE: `str`
`prompt`	Extraction prompt/instructions TYPE: `str`
`schema`	Expected output schema TYPE: `Schema`
`mime_type`	MIME type of the file TYPE: `str`
`**kwargs`	Provider-specific options DEFAULT: `{}`

RETURNS	DESCRIPTION
`Any`	Extracted data matching the schema

Source code in strutex/plugins/base.py

@abstractmethod
def process(
    self,
    file_path: str,
    prompt: str,
    schema: Schema,
    mime_type: str,
    **kwargs
) -> Any:
    """
    Process a document and extract structured data.

    Args:
        file_path: Path to the document file
        prompt: Extraction prompt/instructions
        schema: Expected output schema
        mime_type: MIME type of the file
        **kwargs: Provider-specific options

    Returns:
        Extracted data matching the schema
    """
    pass

options: show_root_heading: true

Plugin System¶

Architecture: Plugins vs Hooks¶

When to Use Which?¶

Plugin Types¶

Quick Start¶

Auto-Registration via Inheritance¶

Customizing Registration¶

Opting Out of Auto-Registration¶

Plugin Attributes¶

Registration Methods¶

1. Auto-Registration (Recommended)¶

2. Entry Points (For Packages)¶

3. Manual Registration¶

CLI Commands¶

Creating Custom Plugins¶

Custom Provider¶

Custom Validator¶

Custom Postprocessor¶

API Reference¶

PluginRegistry ¶

Get a plugin (loads on first access)¶

List all plugins (does not load them)¶

Force discovery from entry points¶

clear(plugin_type: Optional[str] = None) -> None classmethod ¶

discover(group_prefix: str = 'strutex', force: bool = False) -> int classmethod ¶

get(plugin_type: str, name: str) -> Optional[Type] classmethod ¶

get_plugin_info(plugin_type: str, name: str) -> Optional[Dict[str, Any]] classmethod ¶

get_sorted(plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]] classmethod ¶

list(plugin_type: str) -> Dict[str, Type] classmethod ¶

list_names(plugin_type: str) -> List[str] classmethod ¶

list_types() -> List[str] classmethod ¶

register(plugin_type: str, name: str, plugin_cls: Type) -> None classmethod ¶

Provider ¶

aprocess(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any async ¶

has_capability(capability: str) -> bool ¶

health_check() -> bool classmethod ¶

process(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any abstractmethod ¶

`PluginRegistry` ¶

`clear(plugin_type: Optional[str] = None) -> None` `classmethod` ¶

`discover(group_prefix: str = 'strutex', force: bool = False) -> int` `classmethod` ¶

`get(plugin_type: str, name: str) -> Optional[Type]` `classmethod` ¶

`get_plugin_info(plugin_type: str, name: str) -> Optional[Dict[str, Any]]` `classmethod` ¶

`get_sorted(plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]]` `classmethod` ¶

`list(plugin_type: str) -> Dict[str, Type]` `classmethod` ¶

`list_names(plugin_type: str) -> List[str]` `classmethod` ¶

`list_types() -> List[str]` `classmethod` ¶

`register(plugin_type: str, name: str, plugin_cls: Type) -> None` `classmethod` ¶

`Provider` ¶

`aprocess(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any` `async` ¶

`has_capability(capability: str) -> bool` ¶

`health_check() -> bool` `classmethod` ¶

`process(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any` `abstractmethod` ¶