-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Is your feature request related to a problem? Please describe.
We would like to store a Pydantic model inside the LLMStructuredColumnConfig, since it allows us to better validate the output. However, column configs need to be serializable. Because of that, we end up converting that to a JSON schema, and using gsonschema to do the validation.
Unfortunately, validating using the JSON schema is not as good as using the Pydantic model. Recently, for instance, we had an issue where models would generate either "price": "12.34" (converted to string) or "price": 12.34 (converted to float), both of which passed validation (price being a Decimal), and later would have issues writing to Parquet. See #171 for more details.
Describe the solution you'd like
It would be nice to be able to serialize the Pydantic model keeping features such as complex types, validators etc. One possibility would be creating our own serialization.
Describe alternatives you've considered
Alternative solutions include adding more info to the prompt, depending on the type; and standardizing types later, before writing to Parquet.
Additional context
See notebook attached (by @nabinchha) for one possibility on how to ser/de Pydantic models.