Skip to content
This repository was archived by the owner on Apr 18, 2025. It is now read-only.

Column info for invalid column number#8

Open
oliverfu89 wants to merge 3 commits into
multimeric:masterfrom
oliverfu89:Column_Info_For_Invalid_Column_Number
Open

Column info for invalid column number#8
oliverfu89 wants to merge 3 commits into
multimeric:masterfrom
oliverfu89:Column_Info_For_Invalid_Column_Number

Conversation

@oliverfu89
Copy link
Copy Markdown

I have updated the ValidationError message in case the number of columns for the schema and the date frame do not match to include the names of the columns which are additionally present in the schema and/or data frame. This should be more informative for the user.

@multimeric
Copy link
Copy Markdown
Owner

This is a good idea, I'm sure extra warnings would be helpful. That said, the code could be a bit cleaner if you used sets:

schema_columns = set(self.get_column_names())
df_columns = set(df.columns)

add_schema_columns = schema_columns - df_columns
add_df_columns  =  df_columns - schema_columns

Also could you please add at least one test that test that your warning is correct in test/test_schema.py

…s are shown in output message of validation warning
@oliverfu89
Copy link
Copy Markdown
Author

I have added a unit test for the warning. Thanks for the hint.

@oliverfu89 oliverfu89 closed this Jan 18, 2019
@oliverfu89 oliverfu89 reopened this Jan 18, 2019
Comment thread test/test_schema.py
df = pd.DataFrame.from_dict({'a': [1, 2, 3]})

out = self.schema.validate(df, columns=['a', 'b'])
assert out[0].message == 'The column b exists in the schema but not in the data frame'
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread test/test_schema.py
# should raise a PanSchArgumentError
self.assertRaises(PanSchArgumentError, self.schema.validate, df, columns=['c'])

def test_column_not_present_shown(self):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring that explains this test (can be one sentence)

Comment thread pandas_schema/schema.py
)
return errors

errors.append(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have a second ValidationWarning added here? You already have one error if add_schema_columns and one error if add_df_columns. If there is an error with the number of columns this will produce two ValidationWarning objects, when it should produce one.

Comment thread pandas_schema/schema.py
schema_columns = set(self.get_column_names())
df_columns = set(df.columns)

add_schema_columns = [col for col in schema_columns if col not in df_columns]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use set operations here, (https://docs.python.org/3/library/stdtypes.html#set) e.g. add_schema_columns = schema_columns - df_columns

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants