Skip to content

Plugin System

Everything in strutex is pluggable. Use defaults or register your own implementations.

New in v0.3.0

Plugin System v2 introduces auto-registration via inheritance, lazy loading, entry points, priority-based ordering, and CLI tooling.


Architecture: Plugins vs Hooks

Strutex has two extension mechanisms that serve different purposes:

┌─────────────────────────────────────────────────────────────────┐
│                     DocumentProcessor.process()                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─── HOOKS (Observers) ───┐                                     │
│  │ • pre_process           │ ◄── Logging, timing, prompt mods   │
│  └─────────────────────────┘                                     │
│              │                                                   │
│              ▼                                                   │
│  ┌─── PLUGINS (Components) ─┐                                    │
│  │ • SecurityPlugin         │ ◄── Validates input               │
│  │ • Extractor              │ ◄── PDF → text                    │
│  │ • Provider               │ ◄── LLM call                      │
│  │ • Validator              │ ◄── Validates output              │
│  │ • Postprocessor          │ ◄── Transforms result             │
│  └──────────────────────────┘                                    │
│              │                                                   │
│              ▼                                                   │
│  ┌─── HOOKS (Observers) ───┐                                     │
│  │ • post_process          │ ◄── Add metadata, notifications    │
│  │ • on_error              │ ◄── Fallbacks, alerting            │
│  └─────────────────────────┘                                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

When to Use Which?

Feature Plugins (Base Classes) Hooks System
Pattern Strategy Pattern Observer/Middleware Pattern
Role Drivers — define how a step is performed Observers — react to pipeline events
Cardinality 1:1 — one Provider, one Extractor per run 1:N — many hooks can run simultaneously
Complexity Higher — implement interface methods Lower — just a function or decorator
Goal Interchangeability — replace the engine Cross-cutting concerns — add without touching engine

Use a Plugin when:

  • Changing the fundamental logic (e.g., "use OCR instead of text extraction")
  • Replacing a core component (different LLM provider)

Use a Hook when:

  • Observing events (logging, timing, metrics)
  • Modifying data generically (add metadata to all results)
  • Handling errors (fallbacks, alerting)

Processing Pipeline Flow

When you call DocumentProcessor.process(), plugins and hooks are invoked in this order:

flowchart TD
    A["DocumentProcessor.process()"] --> B["1. PRE-PROCESS HOOKS"]
    B --> C["2. SecurityPlugin.validate_input()"]
    C --> D{"3. Cache Hit?"}
    D -->|Yes| K["Return Cached Result"]
    D -->|No| E["4. Provider.process()"]
    E -->|Success| G["5. SecurityPlugin.validate_output()"]
    E -->|Error| F["4a. ERROR HOOKS"]
    F -->|Fallback| G
    F -->|Re-raise| Z["Raise Exception"]
    G --> H["6. POST-PROCESS HOOKS"]
    H --> I["7. Pydantic Validation"]
    I --> J{"8. verify=True?"}
    J -->|Yes| L["Verification Loop"]
    L --> M["Return Result"]
    J -->|No| M

Step-by-Step Breakdown

Step Component Purpose
1 Pre-process Hooks Modify prompt, add context, log start
2 SecurityPlugin.validate_input() Sanitize input, detect injection attacks
3 Cache Return cached result if available
4 Provider.process() Send document + prompt to LLM
4a Error Hooks Handle failures, return fallback
5 SecurityPlugin.validate_output() Clean/validate extracted data
6 Post-process Hooks Transform result, normalize data
7 Pydantic Validation Validate against model (if provided)
8 Verification Loop LLM self-checks output (if verify=True)

Example: Full Pipeline with All Plugin Types

from strutex import DocumentProcessor
from strutex.providers import GeminiProvider
from strutex.security import SecurityChain, InputSanitizer
from strutex.plugins import Validator, Postprocessor, ValidationResult

# Custom Validator
class TotalValidator(Validator):
    def validate(self, data, schema=None):
        items = data.get("line_items", [])
        total = data.get("total", 0)
        items_sum = sum(i.get("amount", 0) for i in items)

        return ValidationResult(
            valid=abs(items_sum - total) < 0.01,
            data=data,
            issues=[] if abs(items_sum - total) < 0.01 else ["Sum mismatch"]
        )

# Custom Postprocessor
class DateNormalizer(Postprocessor):
    def process(self, data):
        if "date" in data:
            data["date"] = data["date"].replace("/", "-")
        return data

# Setup processor with plugins
processor = DocumentProcessor(
    provider=GeminiProvider(),
    security=SecurityChain([InputSanitizer()])
)

# Add hooks
@processor.on_pre_process
def log_start(file_path, prompt, schema, mime_type, context):
    print(f"Processing: {file_path}")

@processor.on_post_process
def add_metadata(result, context):
    result["_processed_by"] = "strutex"
    return result

@processor.on_error
def handle_rate_limit(error, file_path, context):
    if "rate limit" in str(error).lower():
        return {"error": "Rate limited", "retry": True}
    return None  # Re-raise other errors

# Process with full pipeline
result = processor.process(
    "invoice.pdf",
    "Extract invoice data",
    model=InvoiceSchema,
    verify=True  # Enable verification loop
)

Plugin Types

Type Purpose Built-in Examples
provider LLM backends Gemini, OpenAI
security Input/output protection InputSanitizer, PromptInjectionDetector
extractor Document parsing PDF, Image, Excel
validator Output validation Schema, business rules
postprocessor Data transformation DateNormalizer

The PluginType enum provides type-safe access:

from strutex.plugins import PluginType

PluginType.PROVIDER      # "provider"
PluginType.EXTRACTOR     # "extractor"
PluginType.VALIDATOR     # "validator"
PluginType.POSTPROCESSOR # "postprocessor"
PluginType.SECURITY      # "security"

Quick Start

Auto-Registration via Inheritance

Simply inherit from a base class and your plugin is automatically registered:

from strutex.plugins import Provider

class MyProvider(Provider):
    """Auto-registered as 'myprovider'"""
    capabilities = ["vision"]

    def process(self, file_path, prompt, schema, mime_type, **kwargs):
        return {"result": "data"}

That's it! No decorators or manual registration needed.

Customizing Registration

Use class arguments to customize the name:

class FastProvider(Provider, name="fast"):
    """Registered as 'fast' with high priority"""
    priority = 90  # Priority is a class attribute
    cost = 0.5
    capabilities = ["vision", "batch"]

    def process(self, *args, **kwargs):
        ...

Opting Out of Auto-Registration

For intermediate base classes:

class BasePdfProvider(Provider, register=False):
    """NOT registered - abstract base class"""
    def common_pdf_logic(self):
        ...

class AdobeProvider(BasePdfProvider):
    """Registered as 'adobeprovider'"""
    def process(self, *args, **kwargs):
        ...

Tip

Classes with unimplemented @abstractmethods are automatically skipped.


Plugin Attributes

Attribute Type Default Description
strutex_plugin_version str "1.0" API version for compatibility
priority int 50 Order in waterfall (0-100, higher = preferred)
cost float 1.0 Cost hint (lower = cheaper)
capabilities list [] Features this plugin supports

Registration Methods

Just inherit from a base class:

class MyProvider(Provider):
    def process(self, ...): ...
# → Registered as "myprovider"

class MyProvider(Provider, name="custom"):
    def process(self, ...): ...
# → Registered as "custom"

2. Entry Points (For Packages)

For distributable packages, register in pyproject.toml:

[project.entry-points."strutex.providers"]
my_provider = "my_package:MyProvider"

[project.entry-points."strutex.validators"]
my_validator = "my_package:MyValidator"

Plugins are lazy loaded — only imported when first used.

3. Manual Registration

from strutex.plugins import PluginRegistry

PluginRegistry.register("provider", "my_provider", MyProvider)

CLI Commands

# List all plugins
strutex plugins list

# Filter by type
strutex plugins list --type provider

# JSON output
strutex plugins list --json

# Plugin details
strutex plugins info gemini --type provider

# Refresh discovery cache
strutex plugins refresh

Creating Custom Plugins

Custom Provider

from strutex.plugins import Provider

class OllamaProvider(Provider):
    priority = 60
    capabilities = ["local", "vision"]

    def __init__(self, model="llama3"):
        self.model = model

    def process(self, file_path, prompt, schema, mime_type, **kwargs):
        # Your implementation
        ...

Custom Validator

from strutex.plugins import Validator, ValidationResult

class SumValidator(Validator):
    """Verify line items sum to total."""
    priority = 70

    def validate(self, data, schema=None):
        items_sum = sum(i.get("amount", 0) for i in data.get("items", []))
        total = data.get("total", 0)

        if abs(items_sum - total) > 0.01:
            return ValidationResult(
                valid=False,
                data=data,
                issues=[f"Sum mismatch: {items_sum} != {total}"]
            )
        return ValidationResult(valid=True, data=data)

Custom Postprocessor

from strutex.plugins import Postprocessor
import re

class DateNormalizer(Postprocessor):
    """Convert DD.MM.YYYY to YYYY-MM-DD."""

    def process(self, data):
        result = data.copy()
        if "date" in result:
            match = re.match(r'(\d{2})\.(\d{2})\.(\d{4})', result["date"])
            if match:
                d, m, y = match.groups()
                result["date"] = f"{y}-{m}-{d}"
        return result

API Reference

PluginRegistry

Central registry for all plugin types with lazy loading.

Plugins are stored as EntryPoint objects and only loaded when first accessed via get(). This improves startup time and avoids importing unused dependencies.

Usage

Get a plugin (loads on first access)

cls = PluginRegistry.get("provider", "gemini")

List all plugins (does not load them)

all_providers = PluginRegistry.list("provider")

Force discovery from entry points

count = PluginRegistry.discover()

clear(plugin_type: Optional[str] = None) -> None classmethod

Clear registered plugins.

PARAMETER DESCRIPTION
plugin_type

If provided, only clear this type. Otherwise clear all.

TYPE: Optional[str] DEFAULT: None

Source code in strutex/plugins/registry.py
@classmethod
def clear(cls, plugin_type: Optional[str] = None) -> None:
    """
    Clear registered plugins.

    Args:
        plugin_type: If provided, only clear this type. Otherwise clear all.
    """
    if plugin_type:
        cls._entry_points.pop(plugin_type, None)
        cls._loaded.pop(plugin_type, None)
        cls._manual.pop(plugin_type, None)
    else:
        cls._entry_points.clear()
        cls._loaded.clear()
        cls._manual.clear()
        cls._discovered = False

discover(group_prefix: str = 'strutex', force: bool = False) -> int classmethod

Discover and register plugins from entry points.

Scans for entry points matching the pattern: - strutex.providers - strutex.validators - strutex.postprocessors - strutex.security - etc.

Entry points are stored for lazy loading - they are not imported until first use via get().

PARAMETER DESCRIPTION
group_prefix

Entry point group prefix (default: "strutex")

TYPE: str DEFAULT: 'strutex'

force

Force re-discovery even if already discovered

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
int

Number of entry points discovered

Example pyproject.toml: [project.entry-points."strutex.providers"] my_provider = "my_package:MyProvider"

Source code in strutex/plugins/registry.py
@classmethod
def discover(cls, group_prefix: str = "strutex", force: bool = False) -> int:
    """
    Discover and register plugins from entry points.

    Scans for entry points matching the pattern:
    - strutex.providers
    - strutex.validators
    - strutex.postprocessors
    - strutex.security
    - etc.

    Entry points are stored for lazy loading - they are not imported
    until first use via get().

    Args:
        group_prefix: Entry point group prefix (default: "strutex")
        force: Force re-discovery even if already discovered

    Returns:
        Number of entry points discovered

    Example pyproject.toml:
        [project.entry-points."strutex.providers"]
        my_provider = "my_package:MyProvider"
    """
    if cls._discovered and not force:
        return sum(len(eps) for eps in cls._entry_points.values())

    discovered = 0

    # Get entry_points function
    if sys.version_info >= (3, 10):
        from importlib.metadata import entry_points
    else:
        try:
            from importlib_metadata import entry_points
        except ImportError:
            cls._discovered = True
            return 0

    # Get all entry points
    try:
        all_eps = entry_points()

        # Strategy: collect all matching EntryPoint objects first
        matching_eps: List["EntryPoint"] = []

        # Check for dict-like interface (Python < 3.10 stdlib or SelectableGroups in 3.10/3.11)
        if hasattr(all_eps, 'items'):
            for group, eps in all_eps.items():
                if group.startswith(f"{group_prefix}."):
                    # eps can be a single EntryPoint or list, depending on impl
                    # Standard is list
                    if isinstance(eps, list):
                        matching_eps.extend(eps)
                    else:
                        # Some implementations might return single object? Unlikely but safe.
                        try:
                            matching_eps.extend(eps)
                        except TypeError:
                            matching_eps.append(eps)
        else:
            # Sequence-like interface (Python 3.12+ EntryPoints, or importlib_metadata)
            for ep in all_eps:
                if ep.group.startswith(f"{group_prefix}."):
                    matching_eps.append(ep)

        # Now register them
        for ep in matching_eps:
            # Extract plugin type from group name (e.g. "strutex.providers" -> "provider")
            plugin_type = ep.group.replace(f"{group_prefix}.", "").rstrip("s")

            if plugin_type not in cls._entry_points:
                cls._entry_points[plugin_type] = {}

            cls._entry_points[plugin_type][ep.name.lower()] = ep
            discovered += 1

    except Exception:
        pass

    cls._discovered = True
    return discovered

get(plugin_type: str, name: str) -> Optional[Type] classmethod

Get a registered plugin class by type and name.

If the plugin is registered via entry point and not yet loaded, it will be loaded on first access (lazy loading).

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

name

Name of the plugin

TYPE: str

RETURNS DESCRIPTION
Optional[Type]

The plugin class, or None if not found

Source code in strutex/plugins/registry.py
@classmethod
def get(cls, plugin_type: str, name: str) -> Optional[Type]:
    """
    Get a registered plugin class by type and name.

    If the plugin is registered via entry point and not yet loaded,
    it will be loaded on first access (lazy loading).

    Args:
        plugin_type: Type of plugin
        name: Name of the plugin

    Returns:
        The plugin class, or None if not found
    """
    name_lower = name.lower()

    # Ensure discovery has run
    if not cls._discovered:
        cls.discover()

    # Check loaded cache first
    if name_lower in cls._loaded.get(plugin_type, {}):
        return cls._loaded[plugin_type][name_lower]

    # Check manual registrations
    if name_lower in cls._manual.get(plugin_type, {}):
        return cls._manual[plugin_type][name_lower]

    # Try to lazy load from entry point
    ep = cls._entry_points.get(plugin_type, {}).get(name_lower)
    if ep is not None:
        plugin_cls = cls._load_entry_point(ep, plugin_type, name_lower)
        if plugin_cls is not None:
            return plugin_cls

    return None

get_plugin_info(plugin_type: str, name: str) -> Optional[Dict[str, Any]] classmethod

Get metadata about a plugin without necessarily loading it.

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

name

Name of the plugin

TYPE: str

RETURNS DESCRIPTION
Optional[Dict[str, Any]]

Dict with plugin info, or None if not found

Source code in strutex/plugins/registry.py
@classmethod
def get_plugin_info(cls, plugin_type: str, name: str) -> Optional[Dict[str, Any]]:
    """
    Get metadata about a plugin without necessarily loading it.

    Args:
        plugin_type: Type of plugin
        name: Name of the plugin

    Returns:
        Dict with plugin info, or None if not found
    """
    name_lower = name.lower()

    if not cls._discovered:
        cls.discover()

    # Check if loaded
    if name_lower in cls._loaded.get(plugin_type, {}):
        plugin_cls = cls._loaded[plugin_type][name_lower]
        return {
            "name": name_lower,
            "version": getattr(plugin_cls, "strutex_plugin_version", "unknown"),
            "priority": getattr(plugin_cls, "priority", 50),
            "cost": getattr(plugin_cls, "cost", 1.0),
            "capabilities": getattr(plugin_cls, "capabilities", []),
            "loaded": True,
            "healthy": cls._check_health(plugin_cls),
        }

    # Check entry point
    ep = cls._entry_points.get(plugin_type, {}).get(name_lower)
    if ep is not None:
        return {
            "name": name_lower,
            "entry_point": f"{ep.group}:{ep.name}",
            "loaded": False,
            "healthy": None,  # Unknown until loaded
        }

    return None

get_sorted(plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]] classmethod

Get all plugins of a type sorted by priority.

Useful for waterfall selection where you want to try higher-priority plugins first.

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

reverse

If True (default), higher priority first

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
List[Tuple[str, Type]]

List of (name, class) tuples sorted by priority

Source code in strutex/plugins/registry.py
@classmethod
def get_sorted(cls, plugin_type: str, reverse: bool = True) -> List[Tuple[str, Type]]:
    """
    Get all plugins of a type sorted by priority.

    Useful for waterfall selection where you want to try
    higher-priority plugins first.

    Args:
        plugin_type: Type of plugin
        reverse: If True (default), higher priority first

    Returns:
        List of (name, class) tuples sorted by priority
    """
    plugins = cls.list(plugin_type)
    return sorted(
        plugins.items(),
        key=lambda x: getattr(x[1], 'priority', 50),
        reverse=reverse
    )

list(plugin_type: str) -> Dict[str, Type] classmethod

List all plugins of a given type.

Note: This loads all plugins of the type. Use list_names() for a lightweight listing without loading.

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

RETURNS DESCRIPTION
Dict[str, Type]

Dictionary mapping names to plugin classes

Source code in strutex/plugins/registry.py
@classmethod
def list(cls, plugin_type: str) -> Dict[str, Type]:
    """
    List all plugins of a given type.

    Note: This loads all plugins of the type. Use list_names()
    for a lightweight listing without loading.

    Args:
        plugin_type: Type of plugin

    Returns:
        Dictionary mapping names to plugin classes
    """
    if not cls._discovered:
        cls.discover()

    result = {}

    # Get all names from entry points and manual registrations
    all_names: Set[str] = set()
    all_names.update(cls._entry_points.get(plugin_type, {}).keys())
    all_names.update(cls._manual.get(plugin_type, {}).keys())
    all_names.update(cls._loaded.get(plugin_type, {}).keys())

    # Load each plugin
    for name in all_names:
        plugin_cls = cls.get(plugin_type, name)
        if plugin_cls is not None:
            result[name] = plugin_cls

    return result

list_names(plugin_type: str) -> List[str] classmethod

List names of all plugins of a given type without loading them.

PARAMETER DESCRIPTION
plugin_type

Type of plugin

TYPE: str

RETURNS DESCRIPTION
List[str]

List of plugin names

Source code in strutex/plugins/registry.py
@classmethod
def list_names(cls, plugin_type: str) -> List[str]:
    """
    List names of all plugins of a given type without loading them.

    Args:
        plugin_type: Type of plugin

    Returns:
        List of plugin names
    """
    if not cls._discovered:
        cls.discover()

    names: Set[str] = set()
    names.update(cls._entry_points.get(plugin_type, {}).keys())
    names.update(cls._manual.get(plugin_type, {}).keys())
    names.update(cls._loaded.get(plugin_type, {}).keys())

    return sorted(names)

list_types() -> List[str] classmethod

List all registered plugin types.

Source code in strutex/plugins/registry.py
@classmethod
def list_types(cls) -> List[str]:
    """List all registered plugin types."""
    if not cls._discovered:
        cls.discover()

    types: Set[str] = set()
    types.update(cls._entry_points.keys())
    types.update(cls._manual.keys())
    types.update(cls._loaded.keys())

    return sorted(types)

register(plugin_type: str, name: str, plugin_cls: Type) -> None classmethod

Register a plugin class manually.

This is used by the @register decorator for backwards compatibility. Prefer using entry points in pyproject.toml for new plugins.

PARAMETER DESCRIPTION
plugin_type

Type of plugin (e.g., "provider", "security", "validator")

TYPE: str

name

Unique name for this plugin

TYPE: str

plugin_cls

The plugin class to register

TYPE: Type

Source code in strutex/plugins/registry.py
@classmethod
def register(cls, plugin_type: str, name: str, plugin_cls: Type) -> None:
    """
    Register a plugin class manually.

    This is used by the @register decorator for backwards compatibility.
    Prefer using entry points in pyproject.toml for new plugins.

    Args:
        plugin_type: Type of plugin (e.g., "provider", "security", "validator")
        name: Unique name for this plugin
        plugin_cls: The plugin class to register
    """
    if plugin_type not in cls._manual:
        cls._manual[plugin_type] = {}

    cls._manual[plugin_type][name.lower()] = plugin_cls

    # Also add to loaded cache
    if plugin_type not in cls._loaded:
        cls._loaded[plugin_type] = {}
    cls._loaded[plugin_type][name.lower()] = plugin_cls

options: show_root_heading: true members: - register - get - list - discover

Provider

Bases: ABC

Base class for LLM providers.

All providers must implement the process method to handle document extraction via their specific LLM API.

Subclassing auto-registers the plugin. Use class arguments to customize:

class MyProvider(Provider, name="custom", priority=90):
    ...
ATTRIBUTE DESCRIPTION
strutex_plugin_version

API version for compatibility checks

TYPE: str

priority

Ordering priority (0-100, higher = preferred)

TYPE: int

cost

Cost hint for optimization (lower = cheaper)

TYPE: float

capabilities

List of supported features

TYPE: List[str]

aprocess(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs: Any) -> Any async

Async version of process.

Runs the sync process() method in a thread pool to avoid blocking the event loop. Override this method for true native async support using async SDKs (e.g., AsyncOpenAI, AsyncAnthropic).

PARAMETER DESCRIPTION
file_path

Path to the document file

TYPE: str

prompt

Extraction prompt/instructions

TYPE: str

schema

Expected output schema

TYPE: Schema

mime_type

MIME type of the file

TYPE: str

**kwargs

Provider-specific options

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Any

Extracted data matching the schema

Source code in strutex/plugins/base.py
async def aprocess(
    self,
    file_path: str,
    prompt: str,
    schema: Schema,
    mime_type: str,
    **kwargs: Any
) -> Any:
    """
    Async version of process.

    Runs the sync process() method in a thread pool to avoid blocking
    the event loop. Override this method for true native async support
    using async SDKs (e.g., AsyncOpenAI, AsyncAnthropic).

    Args:
        file_path: Path to the document file
        prompt: Extraction prompt/instructions
        schema: Expected output schema
        mime_type: MIME type of the file
        **kwargs: Provider-specific options

    Returns:
        Extracted data matching the schema
    """
    import asyncio
    return await asyncio.to_thread(
        self.process, file_path, prompt, schema, mime_type, **kwargs
    )

has_capability(capability: str) -> bool

Check if this provider has a specific capability.

Source code in strutex/plugins/base.py
def has_capability(self, capability: str) -> bool:
    """Check if this provider has a specific capability."""
    return capability.lower() in [c.lower() for c in self.capabilities]

health_check() -> bool classmethod

Check if this provider is healthy and ready to use.

Override in subclasses for custom health checks (e.g., API connectivity).

RETURNS DESCRIPTION
bool

True if healthy, False otherwise

Source code in strutex/plugins/base.py
@classmethod
def health_check(cls) -> bool:
    """
    Check if this provider is healthy and ready to use.

    Override in subclasses for custom health checks (e.g., API connectivity).

    Returns:
        True if healthy, False otherwise
    """
    return True

process(file_path: str, prompt: str, schema: Schema, mime_type: str, **kwargs: Any) -> Any abstractmethod

Process a document and extract structured data.

PARAMETER DESCRIPTION
file_path

Path to the document file

TYPE: str

prompt

Extraction prompt/instructions

TYPE: str

schema

Expected output schema

TYPE: Schema

mime_type

MIME type of the file

TYPE: str

**kwargs

Provider-specific options

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Any

Extracted data matching the schema

Source code in strutex/plugins/base.py
@abstractmethod
def process(
    self,
    file_path: str,
    prompt: str,
    schema: Schema,
    mime_type: str,
    **kwargs: Any
) -> Any:
    """
    Process a document and extract structured data.

    Args:
        file_path: Path to the document file
        prompt: Extraction prompt/instructions
        schema: Expected output schema
        mime_type: MIME type of the file
        **kwargs: Provider-specific options

    Returns:
        Extracted data matching the schema
    """
    pass

options: show_root_heading: true