Skip to content

Plugin System

Everything in strutex is pluggable. Use defaults or register your own implementations.

New in v0.3.0

Plugin System v2 introduces auto-registration via inheritance, lazy loading, entry points, priority-based ordering, and CLI tooling.


Architecture: Plugins vs Hooks

Strutex has two extension mechanisms that serve different purposes:

┌─────────────────────────────────────────────────────────────────┐
│                     DocumentProcessor.process()                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─── HOOKS (Observers) ───┐                                     │
│  │ • pre_process           │ ◄── Logging, timing, prompt mods   │
│  └─────────────────────────┘                                     │
│              │                                                   │
│              ▼                                                   │
│  ┌─── PLUGINS (Components) ─┐                                    │
│  │ • SecurityPlugin         │ ◄── Validates input               │
│  │ • Extractor              │ ◄── PDF → text                    │
│  │ • Provider               │ ◄── LLM call                      │
│  │ • Validator              │ ◄── Validates output              │
│  │ • Postprocessor          │ ◄── Transforms result             │
│  └──────────────────────────┘                                    │
│              │                                                   │
│              ▼                                                   │
│  ┌─── HOOKS (Observers) ───┐                                     │
│  │ • post_process          │ ◄── Add metadata, notifications    │
│  │ • on_error              │ ◄── Fallbacks, alerting            │
│  └─────────────────────────┘                                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

When to Use Which?

Feature Plugins (Base Classes) Hooks System
Pattern Strategy Pattern Observer/Middleware Pattern
Role Drivers — define how a step is performed Observers — react to pipeline events
Cardinality 1:1 — one Provider, one Extractor per run 1:N — many hooks can run simultaneously
Complexity Higher — implement interface methods Lower — just a function or decorator
Goal Interchangeability — replace the engine Cross-cutting concerns — add without touching engine

Use a Plugin when:

  • Changing the fundamental logic (e.g., "use OCR instead of text extraction")
  • Replacing a core component (different LLM provider)

Use a Hook when:

  • Observing events (logging, timing, metrics)
  • Modifying data generically (add metadata to all results)
  • Handling errors (fallbacks, alerting)

Plugin Types

Type Purpose Built-in Examples
provider LLM backends Gemini, OpenAI
security Input/output protection InputSanitizer, PromptInjectionDetector
extractor Document parsing PDF, Image, Excel
validator Output validation Schema, business rules
postprocessor Data transformation DateNormalizer

The PluginType enum provides type-safe access:

from strutex.plugins import PluginType

PluginType.PROVIDER      # "provider"
PluginType.EXTRACTOR     # "extractor"
PluginType.VALIDATOR     # "validator"
PluginType.POSTPROCESSOR # "postprocessor"
PluginType.SECURITY      # "security"

Quick Start

Auto-Registration via Inheritance

Simply inherit from a base class and your plugin is automatically registered:

from strutex.plugins import Provider

class MyProvider(Provider):
    """Auto-registered as 'myprovider'"""
    capabilities = ["vision"]

    def process(self, file_path, prompt, schema, mime_type, **kwargs):
        return {"result": "data"}

That's it! No decorators or manual registration needed.

Customizing Registration

Use class arguments to customize the name:

class FastProvider(Provider, name="fast"):
    """Registered as 'fast' with high priority"""
    priority = 90  # Priority is a class attribute
    cost = 0.5
    capabilities = ["vision", "batch"]

    def process(self, *args, **kwargs):
        ...

Opting Out of Auto-Registration

For intermediate base classes:

class BasePdfProvider(Provider, register=False):
    """NOT registered - abstract base class"""
    def common_pdf_logic(self):
        ...

class AdobeProvider(BasePdfProvider):
    """Registered as 'adobeprovider'"""
    def process(self, *args, **kwargs):
        ...

Tip

Classes with unimplemented @abstractmethods are automatically skipped.


Plugin Attributes

Attribute Type Default Description
strutex_plugin_version str "1.0" API version for compatibility
priority int 50 Order in waterfall (0-100, higher = preferred)
cost float 1.0 Cost hint (lower = cheaper)
capabilities list [] Features this plugin supports

Registration Methods

Just inherit from a base class:

class MyProvider(Provider):
    def process(self, ...): ...
# → Registered as "myprovider"

class MyProvider(Provider, name="custom"):
    def process(self, ...): ...
# → Registered as "custom"

2. Entry Points (For Packages)

For distributable packages, register in pyproject.toml:

[project.entry-points."strutex.providers"]
my_provider = "my_package:MyProvider"

[project.entry-points."strutex.validators"]
my_validator = "my_package:MyValidator"

Plugins are lazy loaded — only imported when first used.

3. Manual Registration

from strutex.plugins import PluginRegistry

PluginRegistry.register("provider", "my_provider", MyProvider)

CLI Commands

# List all plugins
strutex plugins list

# Filter by type
strutex plugins list --type provider

# JSON output
strutex plugins list --json

# Plugin details
strutex plugins info gemini --type provider

# Refresh discovery cache
strutex plugins refresh

Creating Custom Plugins

Custom Provider

from strutex.plugins import Provider

class OllamaProvider(Provider):
    priority = 60
    capabilities = ["local", "vision"]

    def __init__(self, model="llama3"):
        self.model = model

    def process(self, file_path, prompt, schema, mime_type, **kwargs):
        # Your implementation
        ...

Custom Validator

from strutex.plugins import Validator, ValidationResult

class SumValidator(Validator):
    """Verify line items sum to total."""
    priority = 70

    def validate(self, data, schema=None):
        items_sum = sum(i.get("amount", 0) for i in data.get("items", []))
        total = data.get("total", 0)

        if abs(items_sum - total) > 0.01:
            return ValidationResult(
                valid=False,
                data=data,
                issues=[f"Sum mismatch: {items_sum} != {total}"]
            )
        return ValidationResult(valid=True, data=data)

Custom Postprocessor

from strutex.plugins import Postprocessor
import re

class DateNormalizer(Postprocessor):
    """Convert DD.MM.YYYY to YYYY-MM-DD."""

    def process(self, data):
        result = data.copy()
        if "date" in result:
            match = re.match(r'(\d{2})\.(\d{2})\.(\d{4})', result["date"])
            if match:
                d, m, y = match.groups()
                result["date"] = f"{y}-{m}-{d}"
        return result

API Reference

PluginRegistry

Central registry for all plugin types with lazy loading.

Plugins are stored as EntryPoint objects and only loaded when first accessed via get(). This improves startup time and avoids importing unused dependencies.

Usage

Get a plugin (loads on first access)

cls = PluginRegistry.get("provider", "gemini")

List all plugins (does not load them)

all_providers = PluginRegistry.list("provider")

Force discovery from entry points

count = PluginRegistry.discover()

clear(plugin_type: Optional[str] = None) -> None classmethod

Clear registered plugins.

PARAMETER DESCRIPTION
plugin_type

If provided, only clear this type. Otherwise clear all.

TYPE: Optional[str] DEFAULT: None

Source code in strutex/plugins/registry.py
@classmethod
def clear(cls, plugin_type: Optional[str] = None) -> None:
    """
    Clear registered plugins.

    Args:
        plugin_type: If provided, only clear this type. Otherwise clear all.
    """
    if plugin_type:
        cls._entry_points.pop(plugin_type, None)
        cls._loaded.pop(plugin_type, None)
        cls._manual.pop(plugin_type, None)
    else:
        cls._entry_points.clear()
        cls._loaded.clear()
        cls._manual.clear()
        cls._discovered = False

discover(group_prefix: str = 'strutex', force: bool = False) -> int classmethod

Discover and register plugins from entry points.

Scans for entry points matching the pattern: - strutex.providers - strutex.validators - strutex.postprocessors - strutex.security - etc.

Entry points are stored for lazy loading - they are not imported until first use via get().

PARAMETER DESCRIPTION
group_prefix

Entry point group prefix (default: "strutex")

TYPE: str DEFAULT: 'strutex'

force

Force re-discovery even if already discovered

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
int

Number of entry points discovered

Example pyproject.toml: [project.entry-points."strutex.providers"] my_provider = "my_package:MyProvider"

Source code in strutex/plugins/registry.py
@classmethod
def discover(cls, group_prefix: str = "strutex", force: bool = False) -> int:
    """
    Discover and register plugins from entry points.

    Scans for entry points matching the pattern:
    - strutex.providers
    - strutex.validators
    - strutex.postprocessors
    - strutex.security
    - etc.

    Entry points are stored for lazy loading - they are not imported
    until first use via get().

    Args:
        group_prefix: Entry point group prefix (default: "strutex")
        force: Force re-discovery even if already discovered

    Returns:
        Number of entry points discovered

    Example pyproject.toml:
        [project.entry-points."strutex.providers"]
        my_provider = "my_package:MyProvider"
    """
    if cls._discovered and not force:
        return sum(len(eps) for eps in cls._entry_points.values())

    discovered = 0

    # Get entry_points function
    if sys.version_info >= (3, 10):
        from importlib.metadata import entry_points
    else:
        try:
            from importlib_metadata import entry_points
        except ImportError:
            cls._discovered = True
            return 0

    # Get all entry point groups
    try:
        all_eps = entry_points()

        # Get group names that match our prefix
        if hasattr(all_eps, 'groups'):
            # Python 3.12+ style
            groups = [g for g in all_eps.groups if g.startswith(f"{group_prefix}.")]
        elif hasattr(all_eps, 'keys'):
            # Python 3.9-3.11 style (dict-like)
            groups = [g for g in all_eps.keys() if g.startswith(f"{group_prefix}.")]
        else:
            groups = []
    except Exception:
        cls._discovered = True
        return 0

    for group in groups:
        # Extract plugin type from group name
        # e.g., "strutex.providers" -> "provider"
        plugin_type = group.replace(f"{group_prefix}.", "").rstrip("s")

        if plugin_type not in cls._entry_points:
            cls._entry_points[plugin_type] = {}

        try:
            # Get entry points for this group
            if hasattr(all_eps, 'select'):
                eps = all_eps.select(group=group)
            else:
                eps = all_eps.get(group, [])

            for ep in eps:
                # Store entry point for lazy loading
                cls._entry_points[plugin_type][ep.name.lower()] = ep
                discovered += 1

        except Exception:
            pass

    cls._discovered = True
    return discovered

get(plugin_type: str, name: str) -> Optional[Type] classmethod

Get a registered plugin class by type and name.

If the plugin is registered via entry point and not yet loaded, it will be loaded on first access (lazy loading).

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

name

Name of the plugin

TYPE: str

RETURNS DESCRIPTION
Optional[Type]

The plugin class, or None if not found

Source code in strutex/plugins/registry.py
@classmethod
def get(cls, plugin_type: str, name: str) -> Optional[Type]:
    """
    Get a registered plugin class by type and name.

    If the plugin is registered via entry point and not yet loaded,
    it will be loaded on first access (lazy loading).

    Args:
        plugin_type: Type of plugin
        name: Name of the plugin

    Returns:
        The plugin class, or None if not found
    """
    name_lower = name.lower()

    # Ensure discovery has run
    if not cls._discovered:
        cls.discover()

    # Check loaded cache first
    if name_lower in cls._loaded.get(plugin_type, {}):
        return cls._loaded[plugin_type][name_lower]

    # Check manual registrations
    if name_lower in cls._manual.get(plugin_type, {}):
        return cls._manual[plugin_type][name_lower]

    # Try to lazy load from entry point
    ep = cls._entry_points.get(plugin_type, {}).get(name_lower)
    if ep is not None:
        plugin_cls = cls._load_entry_point(ep, plugin_type, name_lower)
        if plugin_cls is not None:
            return plugin_cls

    return None

get_plugin_info(plugin_type: str, name: str) -> Optional[Dict[str, Any]] classmethod

Get metadata about a plugin without necessarily loading it.

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

name

Name of the plugin

TYPE: str

RETURNS DESCRIPTION
Optional[Dict[str, Any]]

Dict with plugin info, or None if not found

Source code in strutex/plugins/registry.py
@classmethod
def get_plugin_info(cls, plugin_type: str, name: str) -> Optional[Dict[str, Any]]:
    """
    Get metadata about a plugin without necessarily loading it.

    Args:
        plugin_type: Type of plugin
        name: Name of the plugin

    Returns:
        Dict with plugin info, or None if not found
    """
    name_lower = name.lower()

    if not cls._discovered:
        cls.discover()

    # Check if loaded
    if name_lower in cls._loaded.get(plugin_type, {}):
        plugin_cls = cls._loaded[plugin_type][name_lower]
        return {
            "name": name_lower,
            "version": getattr(plugin_cls, "strutex_plugin_version", "unknown"),
            "priority": getattr(plugin_cls, "priority", 50),
            "cost": getattr(plugin_cls, "cost", 1.0),
            "capabilities": getattr(plugin_cls, "capabilities", []),
            "loaded": True,
            "healthy": cls._check_health(plugin_cls),
        }

    # Check entry point
    ep = cls._entry_points.get(plugin_type, {}).get(name_lower)
    if ep is not None:
        return {
            "name": name_lower,
            "entry_point": f"{ep.group}:{ep.name}",
            "loaded": False,
            "healthy": None,  # Unknown until loaded
        }

    return None

get_sorted(plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]] classmethod

Get all plugins of a type sorted by priority.

Useful for waterfall selection where you want to try higher-priority plugins first.

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

reverse

If True (default), higher priority first

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
List[Tuple[str, Type]]

List of (name, class) tuples sorted by priority

Source code in strutex/plugins/registry.py
@classmethod
def get_sorted(cls, plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]]:
    """
    Get all plugins of a type sorted by priority.

    Useful for waterfall selection where you want to try
    higher-priority plugins first.

    Args:
        plugin_type: Type of plugin
        reverse: If True (default), higher priority first

    Returns:
        List of (name, class) tuples sorted by priority
    """
    plugins = cls.list(plugin_type)
    return sorted(
        plugins.items(),
        key=lambda x: getattr(x[1], 'priority', 50),
        reverse=reverse
    )

list(plugin_type: str) -> Dict[str, Type] classmethod

List all plugins of a given type.

Note: This loads all plugins of the type. Use list_names() for a lightweight listing without loading.

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

RETURNS DESCRIPTION
Dict[str, Type]

Dictionary mapping names to plugin classes

Source code in strutex/plugins/registry.py
@classmethod
def list(cls, plugin_type: str) -> Dict[str, Type]:
    """
    List all plugins of a given type.

    Note: This loads all plugins of the type. Use list_names()
    for a lightweight listing without loading.

    Args:
        plugin_type: Type of plugin

    Returns:
        Dictionary mapping names to plugin classes
    """
    if not cls._discovered:
        cls.discover()

    result = {}

    # Get all names from entry points and manual registrations
    all_names = set()
    all_names.update(cls._entry_points.get(plugin_type, {}).keys())
    all_names.update(cls._manual.get(plugin_type, {}).keys())
    all_names.update(cls._loaded.get(plugin_type, {}).keys())

    # Load each plugin
    for name in all_names:
        plugin_cls = cls.get(plugin_type, name)
        if plugin_cls is not None:
            result[name] = plugin_cls

    return result

list_names(plugin_type: str) -> List[str] classmethod

List names of all plugins of a given type without loading them.

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

RETURNS DESCRIPTION
List[str]

List of plugin names

Source code in strutex/plugins/registry.py
@classmethod
def list_names(cls, plugin_type: str) -> List[str]:
    """
    List names of all plugins of a given type without loading them.

    Args:
        plugin_type: Type of plugin

    Returns:
        List of plugin names
    """
    if not cls._discovered:
        cls.discover()

    names = set()
    names.update(cls._entry_points.get(plugin_type, {}).keys())
    names.update(cls._manual.get(plugin_type, {}).keys())
    names.update(cls._loaded.get(plugin_type, {}).keys())

    return sorted(names)

list_types() -> List[str] classmethod

List all registered plugin types.

Source code in strutex/plugins/registry.py
@classmethod
def list_types(cls) -> List[str]:
    """List all registered plugin types."""
    if not cls._discovered:
        cls.discover()

    types = set()
    types.update(cls._entry_points.keys())
    types.update(cls._manual.keys())
    types.update(cls._loaded.keys())

    return sorted(types)

register(plugin_type: str, name: str, plugin_cls: Type) -> None classmethod

Register a plugin class manually.

This is used by the @register decorator for backwards compatibility. Prefer using entry points in pyproject.toml for new plugins.

PARAMETER DESCRIPTION
plugin_type

Type of plugin (e.g., "provider", "security", "validator")

TYPE: str

name

Unique name for this plugin

TYPE: str

plugin_cls

The plugin class to register

TYPE: Type

Source code in strutex/plugins/registry.py
@classmethod
def register(cls, plugin_type: str, name: str, plugin_cls: Type) -> None:
    """
    Register a plugin class manually.

    This is used by the @register decorator for backwards compatibility.
    Prefer using entry points in pyproject.toml for new plugins.

    Args:
        plugin_type: Type of plugin (e.g., "provider", "security", "validator")
        name: Unique name for this plugin
        plugin_cls: The plugin class to register
    """
    if plugin_type not in cls._manual:
        cls._manual[plugin_type] = {}

    cls._manual[plugin_type][name.lower()] = plugin_cls

    # Also add to loaded cache
    if plugin_type not in cls._loaded:
        cls._loaded[plugin_type] = {}
    cls._loaded[plugin_type][name.lower()] = plugin_cls

options: show_root_heading: true members: - register - get - list - discover

Provider

Bases: ABC

Base class for LLM providers.

All providers must implement the process method to handle document extraction via their specific LLM API.

Subclassing auto-registers the plugin. Use class arguments to customize:

class MyProvider(Provider, name="custom", priority=90):
    ...
ATTRIBUTE DESCRIPTION
strutex_plugin_version

API version for compatibility checks

TYPE: str

priority

Ordering priority (0-100, higher = preferred)

TYPE: int

cost

Cost hint for optimization (lower = cheaper)

TYPE: float

capabilities

List of supported features

TYPE: List[str]

aprocess(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any async

Async version of process. Override for true async support. Default implementation calls sync version.

Source code in strutex/plugins/base.py
async def aprocess(
    self,
    file_path: str,
    prompt: str,
    schema: Schema,
    mime_type: str,
    **kwargs
) -> Any:
    """
    Async version of process. Override for true async support.
    Default implementation calls sync version.
    """
    return self.process(file_path, prompt, schema, mime_type, **kwargs)

has_capability(capability: str) -> bool

Check if this provider has a specific capability.

Source code in strutex/plugins/base.py
def has_capability(self, capability: str) -> bool:
    """Check if this provider has a specific capability."""
    return capability.lower() in [c.lower() for c in self.capabilities]

health_check() -> bool classmethod

Check if this provider is healthy and ready to use.

Override in subclasses for custom health checks (e.g., API connectivity).

RETURNS DESCRIPTION
bool

True if healthy, False otherwise

Source code in strutex/plugins/base.py
@classmethod
def health_check(cls) -> bool:
    """
    Check if this provider is healthy and ready to use.

    Override in subclasses for custom health checks (e.g., API connectivity).

    Returns:
        True if healthy, False otherwise
    """
    return True

process(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any abstractmethod

Process a document and extract structured data.

PARAMETER DESCRIPTION
file_path

Path to the document file

TYPE: str

prompt

Extraction prompt/instructions

TYPE: str

schema

Expected output schema

TYPE: Schema

mime_type

MIME type of the file

TYPE: str

**kwargs

Provider-specific options

DEFAULT: {}

RETURNS DESCRIPTION
Any

Extracted data matching the schema

Source code in strutex/plugins/base.py
@abstractmethod
def process(
    self,
    file_path: str,
    prompt: str,
    schema: Schema,
    mime_type: str,
    **kwargs
) -> Any:
    """
    Process a document and extract structured data.

    Args:
        file_path: Path to the document file
        prompt: Extraction prompt/instructions
        schema: Expected output schema
        mime_type: MIME type of the file
        **kwargs: Provider-specific options

    Returns:
        Extracted data matching the schema
    """
    pass

options: show_root_heading: true