Skip to content

Validators

Validate LLM output for correctness and data quality.


Overview

Validators check extracted data against rules and can be composed into chains.

from strutex import SchemaValidator, SumValidator, ValidationChain

chain = ValidationChain([
    SchemaValidator(),
    SumValidator(tolerance=0.01),
])

result = chain.validate(data, schema)
if not result.valid:
    print(result.issues)

Built-in Validators

SchemaValidator

Ensures output structure matches expected schema.

from strutex import SchemaValidator, Object, String, Number, Array

schema = Object(properties={
    "invoice_number": String,
    "total": Number,
    "items": Array(items=Object(properties={
        "amount": Number
    }))
})

validator = SchemaValidator()
result = validator.validate(data, schema)

Checks:

  • Required fields are present
  • Field types match (string, number, boolean, array, object)
  • Nested objects validated recursively (issues reported with full path, e.g., items.0.amount)

SumValidator

Verifies line items sum to stated total.

from strutex import SumValidator

validator = SumValidator(
    items_field="line_items",
    amount_field="price",
    total_field="grand_total",
    tolerance=0.01
)

result = validator.validate({
    "line_items": [{"price": 10.00}, {"price": 20.00}],
    "grand_total": 30.00
})
# result.valid == True

DateValidator

Validates date formats and ranges.

from strutex import DateValidator

validator = DateValidator(
    date_fields=["invoice_date", "due_date"],
    min_year=2020,
    max_year=2030
)

result = validator.validate({
    "invoice_date": "2024-01-15",
    "due_date": "2024-02-15"
})

Accepted formats: ISO, European (DD.MM.YYYY), US (MM/DD/YYYY)


Validation Chains

Compose multiple validators:

from strutex import ValidationChain, SchemaValidator, SumValidator, DateValidator

chain = ValidationChain([
    SchemaValidator(strict=True),
    SumValidator(tolerance=0.01),
    DateValidator(),
], strict=True)  # Stop on first failure

result = chain.validate(data, schema)

print(result.valid)   # True/False
print(result.issues)  # List of error messages
print(result.data)    # Possibly modified data

Modes:

  • strict=True — Stop on first failure
  • strict=False — Collect all issues

Creating Custom Validators

from strutex.plugins import Validator, ValidationResult

class EmailValidator(Validator, name="email"):
    priority = 50

    def validate(self, data, schema=None):
        issues = []
        email = data.get("email", "")

        if email and "@" not in email:
            issues.append(f"Invalid email: {email}")

        return ValidationResult(
            valid=len(issues) == 0,
            data=data,
            issues=issues
        )

API Reference

SchemaValidator(strict: bool = False)

Bases: Validator

Validates that extracted data matches the expected schema structure.

Checks: - Required fields are present - Field types match (string, number, boolean, array, object) - Nested objects are validated recursively

ATTRIBUTE DESCRIPTION
strict

If True, fail on extra fields not in schema

Initialize the schema validator.

PARAMETER DESCRIPTION
strict

If True, reject data with fields not in schema

TYPE: bool DEFAULT: False

Source code in strutex/validators/schema.py
def __init__(self, strict: bool = False):
    """
    Initialize the schema validator.

    Args:
        strict: If True, reject data with fields not in schema
    """
    self.strict = strict

validate(data: Dict[str, Any], schema: Optional[Schema] = None, source_text: Optional[str] = None) -> ValidationResult

Validate data against a schema.

PARAMETER DESCRIPTION
data

The extracted data to validate

TYPE: Dict[str, Any]

schema

The expected schema structure

TYPE: Optional[Schema] DEFAULT: None

RETURNS DESCRIPTION
ValidationResult

ValidationResult with validation status and any issues

Source code in strutex/validators/schema.py
def validate(
    self,
    data: Dict[str, Any],
    schema: Optional[Schema] = None,
    source_text: Optional[str] = None
) -> ValidationResult:
    """
    Validate data against a schema.

    Args:
        data: The extracted data to validate
        schema: The expected schema structure

    Returns:
        ValidationResult with validation status and any issues
    """
    if schema is None:
        return ValidationResult(valid=True, data=data)

    issues: List[str] = []
    self._validate_value(data, schema, "", issues)

    return ValidationResult(
        valid=len(issues) == 0,
        data=data,
        issues=issues
    )

options: show_root_heading: true

SumValidator(items_field: str = 'items', amount_field: str = 'amount', total_field: str = 'total', tolerance: float = 0.01, strict: bool = False)

Bases: Validator

Validates that line item amounts sum to the stated total.

Common use case: Invoice validation where item totals should match the invoice total.

ATTRIBUTE DESCRIPTION
items_field

Field name containing the list of items

amount_field

Field name in each item containing the amount

total_field

Field name containing the expected total

tolerance

Acceptable difference (for floating point comparison)

strict

If True, fail when required fields are missing

Initialize the sum validator.

PARAMETER DESCRIPTION
items_field

Name of the field containing line items

TYPE: str DEFAULT: 'items'

amount_field

Name of the amount field in each item

TYPE: str DEFAULT: 'amount'

total_field

Name of the total field

TYPE: str DEFAULT: 'total'

tolerance

Maximum acceptable difference

TYPE: float DEFAULT: 0.01

strict

If True, fail validation when items or total are missing

TYPE: bool DEFAULT: False

Source code in strutex/validators/sum.py
def __init__(
    self,
    items_field: str = "items",
    amount_field: str = "amount",
    total_field: str = "total",
    tolerance: float = 0.01,
    strict: bool = False
):
    """
    Initialize the sum validator.

    Args:
        items_field: Name of the field containing line items
        amount_field: Name of the amount field in each item
        total_field: Name of the total field
        tolerance: Maximum acceptable difference
        strict: If True, fail validation when items or total are missing
    """
    self.items_field = items_field
    self.amount_field = amount_field
    self.total_field = total_field
    self.tolerance = tolerance
    self.strict = strict

validate(data: Dict[str, Any], schema: Any = None, source_text: Optional[str] = None) -> ValidationResult

Validate that line items sum to the total.

PARAMETER DESCRIPTION
data

The extracted data to validate

TYPE: Dict[str, Any]

schema

Not used by this validator

TYPE: Any DEFAULT: None

RETURNS DESCRIPTION
ValidationResult

ValidationResult indicating if sums match

Source code in strutex/validators/sum.py
def validate(
    self,
    data: Dict[str, Any],
    schema: Any = None,
    source_text: Optional[str] = None
) -> ValidationResult:
    """
    Validate that line items sum to the total.

    Args:
        data: The extracted data to validate
        schema: Not used by this validator

    Returns:
        ValidationResult indicating if sums match
    """
    issues = []

    # Get items and total
    items = data.get(self.items_field, [])
    total = data.get(self.total_field)

    # Handle missing fields
    if not items or total is None:
        if self.strict:
            missing = []
            if not items:
                missing.append(f"'{self.items_field}' (line items)")
            if total is None:
                missing.append(f"'{self.total_field}'")
            issues.append(f"Missing required fields: {', '.join(missing)}")
            return ValidationResult(valid=False, data=data, issues=issues)
        # Non-strict mode: skip validation when fields are missing
        return ValidationResult(valid=True, data=data)

    # Calculate sum of items
    try:
        items_sum = sum(
            float(item.get(self.amount_field, 0)) 
            for item in items 
            if isinstance(item, dict)
        )
    except (TypeError, ValueError) as e:
        issues.append(f"Could not calculate sum: {e}")
        return ValidationResult(valid=False, data=data, issues=issues)

    # Compare with tolerance
    try:
        total_float = float(total)
    except (TypeError, ValueError):
        issues.append(f"Total field is not a number: {total}")
        return ValidationResult(valid=False, data=data, issues=issues)

    difference = abs(items_sum - total_float)

    if difference > self.tolerance:
        issues.append(
            f"Sum mismatch: items sum to {items_sum:.2f}, "
            f"but total is {total_float:.2f} "
            f"(difference: {difference:.2f})"
        )
        return ValidationResult(valid=False, data=data, issues=issues)

    return ValidationResult(valid=True, data=data)

options: show_root_heading: true

DateValidator(date_fields: Optional[List[str]] = None, formats: Optional[List[str]] = None, min_year: int = 1900, max_year: int = 2100)

Bases: Validator

Validates date fields for format and range.

Checks: - Date strings match expected formats - Dates are within acceptable range - Optional normalization to ISO format

ATTRIBUTE DESCRIPTION
date_fields

List of field names to validate

formats

Accepted date formats (strptime patterns)

min_date

Minimum acceptable date

max_date

Maximum acceptable date

Initialize the date validator.

PARAMETER DESCRIPTION
date_fields

Field names to validate (None = auto-detect)

TYPE: Optional[List[str]] DEFAULT: None

formats

Accepted date formats

TYPE: Optional[List[str]] DEFAULT: None

min_year

Minimum acceptable year

TYPE: int DEFAULT: 1900

max_year

Maximum acceptable year

TYPE: int DEFAULT: 2100

Source code in strutex/validators/date.py
def __init__(
    self,
    date_fields: Optional[List[str]] = None,
    formats: Optional[List[str]] = None,
    min_year: int = 1900,
    max_year: int = 2100,
):
    """
    Initialize the date validator.

    Args:
        date_fields: Field names to validate (None = auto-detect)
        formats: Accepted date formats
        min_year: Minimum acceptable year
        max_year: Maximum acceptable year
    """
    self.date_fields = date_fields
    self.formats = formats or self.DEFAULT_FORMATS
    self.min_year = min_year
    self.max_year = max_year

validate(data: Dict[str, Any], schema: Any = None, source_text: Optional[str] = None) -> ValidationResult

Validate date fields in the data.

PARAMETER DESCRIPTION
data

The extracted data to validate

TYPE: Dict[str, Any]

schema

Not used by this validator

TYPE: Any DEFAULT: None

RETURNS DESCRIPTION
ValidationResult

ValidationResult with validation status

Source code in strutex/validators/date.py
def validate(
    self,
    data: Dict[str, Any],
    schema: Any = None,
    source_text: Optional[str] = None
) -> ValidationResult:
    """
    Validate date fields in the data.

    Args:
        data: The extracted data to validate
        schema: Not used by this validator

    Returns:
        ValidationResult with validation status
    """
    issues = []

    # Determine which fields to check
    if self.date_fields:
        fields_to_check = self.date_fields
    else:
        # Auto-detect: look for fields with "date" in the name
        fields_to_check = [
            k for k in data.keys() 
            if "date" in k.lower()
        ]

    for field in fields_to_check:
        value = data.get(field)
        if value is None or value == "":
            continue

        if not isinstance(value, str):
            continue

        # Try to parse the date
        parsed_date = None
        for fmt in self.formats:
            try:
                parsed_date = datetime.strptime(value, fmt)
                break
            except ValueError:
                continue

        if parsed_date is None:
            issues.append(f"{field}: invalid date format '{value}'")
            continue

        # Check year range
        if parsed_date.year < self.min_year:
            issues.append(f"{field}: year {parsed_date.year} is before {self.min_year}")
        elif parsed_date.year > self.max_year:
            issues.append(f"{field}: year {parsed_date.year} is after {self.max_year}")

    return ValidationResult(
        valid=len(issues) == 0,
        data=data,
        issues=issues
    )

options: show_root_heading: true

ValidationChain(validators: List[Validator], strict: bool = True)

Composes multiple validators into a sequential chain.

Validators run in order. If any validator fails (in strict mode), the chain stops and returns the failure. In lenient mode, all validators run and issues are collected.

Example
chain = ValidationChain([
    SchemaValidator(),
    SumValidator(tolerance=0.01),
    DateValidator(date_fields=["invoice_date"]),
])

result = chain.validate(data, schema)
if not result.valid:
    print(result.issues)

Initialize the validation chain.

PARAMETER DESCRIPTION
validators

List of validators to run in order

TYPE: List[Validator]

strict

If True, stop on first failure. If False, collect all issues.

TYPE: bool DEFAULT: True

Source code in strutex/validators/chain.py
def __init__(
    self,
    validators: List[Validator],
    strict: bool = True
):
    """
    Initialize the validation chain.

    Args:
        validators: List of validators to run in order
        strict: If True, stop on first failure. If False, collect all issues.
    """
    self.validators = validators
    self.strict = strict

add(validator: Validator) -> ValidationChain

Add a validator to the chain.

PARAMETER DESCRIPTION
validator

The validator to add

TYPE: Validator

RETURNS DESCRIPTION
ValidationChain

Self for method chaining

Source code in strutex/validators/chain.py
def add(self, validator: Validator) -> "ValidationChain":
    """
    Add a validator to the chain.

    Args:
        validator: The validator to add

    Returns:
        Self for method chaining
    """
    self.validators.append(validator)
    return self

validate(data: Dict[str, Any], schema: Any = None, source_text: Optional[str] = None) -> ValidationResult

Run all validators in the chain.

PARAMETER DESCRIPTION
data

The data to validate

TYPE: Dict[str, Any]

schema

Optional schema to pass to validators

TYPE: Any DEFAULT: None

RETURNS DESCRIPTION
ValidationResult

Combined ValidationResult from all validators

Source code in strutex/validators/chain.py
def validate(
    self,
    data: Dict[str, Any],
    schema: Any = None,
    source_text: Optional[str] = None
) -> ValidationResult:
    """
    Run all validators in the chain.

    Args:
        data: The data to validate
        schema: Optional schema to pass to validators

    Returns:
        Combined ValidationResult from all validators
    """
    all_issues: List[str] = []
    current_data = data

    for validator in self.validators:
        result = validator.validate(current_data, schema, source_text=source_text)

        if not result.valid:
            all_issues.extend(result.issues)

            if self.strict:
                return ValidationResult(
                    valid=False,
                    data=current_data,
                    issues=all_issues
                )

        # Use possibly modified data for next validator
        current_data = result.data

    return ValidationResult(
        valid=len(all_issues) == 0,
        data=current_data,
        issues=all_issues
    )

options: show_root_heading: true