Plugin System¶
Everything in strutex is pluggable. Use defaults or register your own implementations.
New in v0.3.0
Plugin System v2 introduces auto-registration via inheritance, lazy loading, entry points, priority-based ordering, and CLI tooling.
Architecture: Plugins vs Hooks¶
Strutex has two extension mechanisms that serve different purposes:
┌─────────────────────────────────────────────────────────────────┐
│ DocumentProcessor.process() │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─── HOOKS (Observers) ───┐ │
│ │ • pre_process │ ◄── Logging, timing, prompt mods │
│ └─────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─── PLUGINS (Components) ─┐ │
│ │ • SecurityPlugin │ ◄── Validates input │
│ │ • Extractor │ ◄── PDF → text │
│ │ • Provider │ ◄── LLM call │
│ │ • Validator │ ◄── Validates output │
│ │ • Postprocessor │ ◄── Transforms result │
│ └──────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─── HOOKS (Observers) ───┐ │
│ │ • post_process │ ◄── Add metadata, notifications │
│ │ • on_error │ ◄── Fallbacks, alerting │
│ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
When to Use Which?¶
| Feature | Plugins (Base Classes) | Hooks System |
|---|---|---|
| Pattern | Strategy Pattern | Observer/Middleware Pattern |
| Role | Drivers — define how a step is performed | Observers — react to pipeline events |
| Cardinality | 1:1 — one Provider, one Extractor per run | 1:N — many hooks can run simultaneously |
| Complexity | Higher — implement interface methods | Lower — just a function or decorator |
| Goal | Interchangeability — replace the engine | Cross-cutting concerns — add without touching engine |
Use a Plugin when:
- Changing the fundamental logic (e.g., "use OCR instead of text extraction")
- Replacing a core component (different LLM provider)
Use a Hook when:
- Observing events (logging, timing, metrics)
- Modifying data generically (add metadata to all results)
- Handling errors (fallbacks, alerting)
Plugin Types¶
| Type | Purpose | Built-in Examples |
|---|---|---|
provider |
LLM backends | Gemini, OpenAI |
security |
Input/output protection | InputSanitizer, PromptInjectionDetector |
extractor |
Document parsing | PDF, Image, Excel |
validator |
Output validation | Schema, business rules |
postprocessor |
Data transformation | DateNormalizer |
The PluginType enum provides type-safe access:
from strutex.plugins import PluginType
PluginType.PROVIDER # "provider"
PluginType.EXTRACTOR # "extractor"
PluginType.VALIDATOR # "validator"
PluginType.POSTPROCESSOR # "postprocessor"
PluginType.SECURITY # "security"
Quick Start¶
Auto-Registration via Inheritance¶
Simply inherit from a base class and your plugin is automatically registered:
from strutex.plugins import Provider
class MyProvider(Provider):
"""Auto-registered as 'myprovider'"""
capabilities = ["vision"]
def process(self, file_path, prompt, schema, mime_type, **kwargs):
return {"result": "data"}
That's it! No decorators or manual registration needed.
Customizing Registration¶
Use class arguments to customize the name:
class FastProvider(Provider, name="fast"):
"""Registered as 'fast' with high priority"""
priority = 90 # Priority is a class attribute
cost = 0.5
capabilities = ["vision", "batch"]
def process(self, *args, **kwargs):
...
Opting Out of Auto-Registration¶
For intermediate base classes:
class BasePdfProvider(Provider, register=False):
"""NOT registered - abstract base class"""
def common_pdf_logic(self):
...
class AdobeProvider(BasePdfProvider):
"""Registered as 'adobeprovider'"""
def process(self, *args, **kwargs):
...
Tip
Classes with unimplemented @abstractmethods are automatically skipped.
Plugin Attributes¶
| Attribute | Type | Default | Description |
|---|---|---|---|
strutex_plugin_version |
str |
"1.0" |
API version for compatibility |
priority |
int |
50 |
Order in waterfall (0-100, higher = preferred) |
cost |
float |
1.0 |
Cost hint (lower = cheaper) |
capabilities |
list |
[] |
Features this plugin supports |
Registration Methods¶
1. Auto-Registration (Recommended)¶
Just inherit from a base class:
class MyProvider(Provider):
def process(self, ...): ...
# → Registered as "myprovider"
class MyProvider(Provider, name="custom"):
def process(self, ...): ...
# → Registered as "custom"
2. Entry Points (For Packages)¶
For distributable packages, register in pyproject.toml:
[project.entry-points."strutex.providers"]
my_provider = "my_package:MyProvider"
[project.entry-points."strutex.validators"]
my_validator = "my_package:MyValidator"
Plugins are lazy loaded — only imported when first used.
3. Manual Registration¶
from strutex.plugins import PluginRegistry
PluginRegistry.register("provider", "my_provider", MyProvider)
CLI Commands¶
# List all plugins
strutex plugins list
# Filter by type
strutex plugins list --type provider
# JSON output
strutex plugins list --json
# Plugin details
strutex plugins info gemini --type provider
# Refresh discovery cache
strutex plugins refresh
Creating Custom Plugins¶
Custom Provider¶
from strutex.plugins import Provider
class OllamaProvider(Provider):
priority = 60
capabilities = ["local", "vision"]
def __init__(self, model="llama3"):
self.model = model
def process(self, file_path, prompt, schema, mime_type, **kwargs):
# Your implementation
...
Custom Validator¶
from strutex.plugins import Validator, ValidationResult
class SumValidator(Validator):
"""Verify line items sum to total."""
priority = 70
def validate(self, data, schema=None):
items_sum = sum(i.get("amount", 0) for i in data.get("items", []))
total = data.get("total", 0)
if abs(items_sum - total) > 0.01:
return ValidationResult(
valid=False,
data=data,
issues=[f"Sum mismatch: {items_sum} != {total}"]
)
return ValidationResult(valid=True, data=data)
Custom Postprocessor¶
from strutex.plugins import Postprocessor
import re
class DateNormalizer(Postprocessor):
"""Convert DD.MM.YYYY to YYYY-MM-DD."""
def process(self, data):
result = data.copy()
if "date" in result:
match = re.match(r'(\d{2})\.(\d{2})\.(\d{4})', result["date"])
if match:
d, m, y = match.groups()
result["date"] = f"{y}-{m}-{d}"
return result
API Reference¶
PluginRegistry
¶
Central registry for all plugin types with lazy loading.
Plugins are stored as EntryPoint objects and only loaded when first accessed via get(). This improves startup time and avoids importing unused dependencies.
Usage
Get a plugin (loads on first access)¶
cls = PluginRegistry.get("provider", "gemini")
List all plugins (does not load them)¶
all_providers = PluginRegistry.list("provider")
Force discovery from entry points¶
count = PluginRegistry.discover()
clear(plugin_type: Optional[str] = None) -> None
classmethod
¶
Clear registered plugins.
| PARAMETER | DESCRIPTION |
|---|---|
plugin_type
|
If provided, only clear this type. Otherwise clear all.
TYPE:
|
Source code in strutex/plugins/registry.py
discover(group_prefix: str = 'strutex', force: bool = False) -> int
classmethod
¶
Discover and register plugins from entry points.
Scans for entry points matching the pattern: - strutex.providers - strutex.validators - strutex.postprocessors - strutex.security - etc.
Entry points are stored for lazy loading - they are not imported until first use via get().
| PARAMETER | DESCRIPTION |
|---|---|
group_prefix
|
Entry point group prefix (default: "strutex")
TYPE:
|
force
|
Force re-discovery even if already discovered
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of entry points discovered |
Example pyproject.toml: [project.entry-points."strutex.providers"] my_provider = "my_package:MyProvider"
Source code in strutex/plugins/registry.py
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 | |
get(plugin_type: str, name: str) -> Optional[Type]
classmethod
¶
Get a registered plugin class by type and name.
If the plugin is registered via entry point and not yet loaded, it will be loaded on first access (lazy loading).
| PARAMETER | DESCRIPTION |
|---|---|
plugin_type
|
Type of plugin
TYPE:
|
name
|
Name of the plugin
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[Type]
|
The plugin class, or None if not found |
Source code in strutex/plugins/registry.py
get_plugin_info(plugin_type: str, name: str) -> Optional[Dict[str, Any]]
classmethod
¶
Get metadata about a plugin without necessarily loading it.
| PARAMETER | DESCRIPTION |
|---|---|
plugin_type
|
Type of plugin
TYPE:
|
name
|
Name of the plugin
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[Dict[str, Any]]
|
Dict with plugin info, or None if not found |
Source code in strutex/plugins/registry.py
get_sorted(plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]]
classmethod
¶
Get all plugins of a type sorted by priority.
Useful for waterfall selection where you want to try higher-priority plugins first.
| PARAMETER | DESCRIPTION |
|---|---|
plugin_type
|
Type of plugin
TYPE:
|
reverse
|
If True (default), higher priority first
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[Tuple[str, Type]]
|
List of (name, class) tuples sorted by priority |
Source code in strutex/plugins/registry.py
list(plugin_type: str) -> Dict[str, Type]
classmethod
¶
List all plugins of a given type.
Note: This loads all plugins of the type. Use list_names() for a lightweight listing without loading.
| PARAMETER | DESCRIPTION |
|---|---|
plugin_type
|
Type of plugin
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Type]
|
Dictionary mapping names to plugin classes |
Source code in strutex/plugins/registry.py
list_names(plugin_type: str) -> List[str]
classmethod
¶
List names of all plugins of a given type without loading them.
| PARAMETER | DESCRIPTION |
|---|---|
plugin_type
|
Type of plugin
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[str]
|
List of plugin names |
Source code in strutex/plugins/registry.py
list_types() -> List[str]
classmethod
¶
List all registered plugin types.
Source code in strutex/plugins/registry.py
register(plugin_type: str, name: str, plugin_cls: Type) -> None
classmethod
¶
Register a plugin class manually.
This is used by the @register decorator for backwards compatibility. Prefer using entry points in pyproject.toml for new plugins.
| PARAMETER | DESCRIPTION |
|---|---|
plugin_type
|
Type of plugin (e.g., "provider", "security", "validator")
TYPE:
|
name
|
Unique name for this plugin
TYPE:
|
plugin_cls
|
The plugin class to register
TYPE:
|
Source code in strutex/plugins/registry.py
options: show_root_heading: true members: - register - get - list - discover
Provider
¶
Bases: ABC
Base class for LLM providers.
All providers must implement the process method to handle document extraction via their specific LLM API.
Subclassing auto-registers the plugin. Use class arguments to customize:
class MyProvider(Provider, name="custom", priority=90):
...
| ATTRIBUTE | DESCRIPTION |
|---|---|
strutex_plugin_version |
API version for compatibility checks
TYPE:
|
priority |
Ordering priority (0-100, higher = preferred)
TYPE:
|
cost |
Cost hint for optimization (lower = cheaper)
TYPE:
|
capabilities |
List of supported features
TYPE:
|
aprocess(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any
async
¶
Async version of process. Override for true async support. Default implementation calls sync version.
Source code in strutex/plugins/base.py
has_capability(capability: str) -> bool
¶
health_check() -> bool
classmethod
¶
Check if this provider is healthy and ready to use.
Override in subclasses for custom health checks (e.g., API connectivity).
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if healthy, False otherwise |
Source code in strutex/plugins/base.py
process(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs) -> Any
abstractmethod
¶
Process a document and extract structured data.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to the document file
TYPE:
|
prompt
|
Extraction prompt/instructions
TYPE:
|
schema
|
Expected output schema
TYPE:
|
mime_type
|
MIME type of the file
TYPE:
|
**kwargs
|
Provider-specific options
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
Extracted data matching the schema |
Source code in strutex/plugins/base.py
options: show_root_heading: true