Quickstart¶

Extract structured data from your first document in 5 minutes.

Installation¶

pip install strutex

Your First Extraction¶

from strutex import DocumentProcessor
from strutex.schemas import INVOICE_US

# Create processor
processor = DocumentProcessor(provider="gemini")  # or "openai", "anthropic"

# Extract structured data
result = processor.process(
    file_path="invoice.pdf",
    prompt="Extract invoice details",
    schema=INVOICE_US
)

# Use the result
print(f"Vendor: {result['vendor_name']}")
print(f"Total: ${result['total']}")
print(f"Date: {result['date']}")

That's it! You've extracted structured JSON from a document.

What Just Happened?¶

DocumentProcessor — The main engine that handles extraction
INVOICE_US — A built-in schema defining what fields to extract
process() — Sends the document to an LLM and validates the result

Next Steps¶

Want to...	Go to...
Define your own schema	Your First Schema
Try different providers	Switching Providers
Add data validation	Adding Validation

Troubleshooting¶

"API key not found"

export GOOGLE_API_KEY="your-key"  # For Gemini
export OPENAI_API_KEY="your-key"  # For OpenAI

"File not found"

# Use absolute path
result = processor.process("/full/path/to/invoice.pdf", ...)