Pydantic Support¶
Use Pydantic models for type-safe document extraction with automatic validation.
Quick Start¶
from pydantic import BaseModel, Field
from strutex import DocumentProcessor
class Invoice(BaseModel):
invoice_number: str = Field(description="Unique ID")
total: float = Field(description="Total amount")
processor = DocumentProcessor(provider="gemini")
result = processor.process(
"invoice.pdf",
"Extract invoice data",
model=Invoice # Use model= instead of schema=
)
# result is a validated Invoice instance!
print(result.invoice_number)
print(result.total)
Nested Models¶
from typing import List, Optional
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
total: float
class Vendor(BaseModel):
name: str
address: Optional[str] = None
class Invoice(BaseModel):
invoice_number: str
date: str = Field(description="YYYY-MM-DD format")
vendor: Vendor
items: List[LineItem]
subtotal: float
tax: Optional[float] = None
total: float
Field Descriptions¶
Use Field(description=...) to guide the LLM:
class Invoice(BaseModel):
invoice_number: str = Field(
description="The unique invoice identifier, e.g. INV-2024-001"
)
date: str = Field(
description="Invoice date in YYYY-MM-DD format"
)
total: float = Field(
description="Final payable amount including tax"
)
Manual Conversion¶
You can also convert Pydantic models to strutex schemas manually:
from strutex import pydantic_to_schema, validate_with_pydantic
# Convert model to schema
schema = pydantic_to_schema(Invoice)
# Later, validate dict data
data = {"invoice_number": "INV-001", "total": 100.0}
invoice = validate_with_pydantic(data, Invoice)
Validation Errors¶
Pydantic validation runs after LLM extraction: