-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Priority Level
Medium (Annoying but has workaround)
Describe the bug
SchemaTransformProcessor tends to run into json serde issue intermittently depending on LLM output in upstream columns used in the processor template. The exception is raised here. It's a JSONDecodeError
ColumnWiseDatasetBuilder._run_processors(self, stage, dataframe, current_batch_number)
298 except Exception as e:
--> 299 raise DatasetProcessingError(
300 f"🛑 Failed to process dataset with processor {processor.name} in stage {stage}: {e}"
301 ) from e
302 return dataframe
DatasetProcessingError: 🛑 Failed to process dataset with processor SchemaTransformProcessor in stage BuildStage.POST_BATCH: Expecting ',' delimiter: line 1 column 313 (char 312)
Steps/Code to reproduce bug
Here's an example SDG config where I was running into this intermittently
config_builder.add_column(
SamplerColumnConfig(
name="language",
sampler_type=SamplerType.CATEGORY,
params=CategorySamplerParams(
values=["English", "French", "Spanish", "German"],
),
drop=True
)
)
config_builder.add_column(
LLMTextColumnConfig(
name="greetings",
model_alias="nvidia-text",
prompt="""
Write a casual and formal response greeting in '{{language}}' language.
""",
)
)
config_builder.add_column(
LLMTextColumnConfig(
name="greetings_response",
model_alias="nvidia-text",
prompt="""
Write a follow up natural response to the greeting in '{{greetings}}' said in '{{language}}' language.
""",
)
)
# preview_results = data_designer.preview(config_builder=config_builder)
config_builder.add_processor(
SchemaTransformProcessorConfig(
name="chat_format",
template={
"messages": [
{
"role": "user",
"content": "Say hello in {{language}}"
},
{
"role": "assistant",
"content": "{{greeting}}"
},
{
"role": "user",
"content": "{{greetings_response}}"
}
]
}
)
)
Expected behavior
This behavior shouldn't be flaky.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working