Quickstart¶
Extract structured data from your first document in 5 minutes.
Installation¶
Your First Extraction¶
from strutex import DocumentProcessor
from strutex.schemas import INVOICE_US
# Create processor
processor = DocumentProcessor(provider="gemini") # or "openai", "anthropic"
# Extract structured data
result = processor.process(
file_path="invoice.pdf",
prompt="Extract invoice details",
schema=INVOICE_US
)
# Use the result
print(f"Vendor: {result['vendor_name']}")
print(f"Total: ${result['total']}")
print(f"Date: {result['date']}")
That's it! You've extracted structured JSON from a document.
What Just Happened?¶
- DocumentProcessor — The main engine that handles extraction
- INVOICE_US — A built-in schema defining what fields to extract
- process() — Sends the document to an LLM and validates the result
Next Steps¶
| Want to... | Go to... |
|---|---|
| Define your own schema | Your First Schema |
| Try different providers | Switching Providers |
| Add data validation | Adding Validation |
Troubleshooting¶
"API key not found"
"File not found"