-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Description
CSV files saved by MS Excel (and possibly other tools) may have the \ufeff BOM.
Adding print(headers) after L133 in UK_LLC_File_1_Checker.py to debug:
Checking filename
Loading data from file
['\ufeffSTUDY_ID', 'ROW_STATUS', 'NHS_NUMBER', 'SURNAME', 'FORENAME', 'MIDDLENAMES', 'ADDRESS_1', 'ADDRESS_2', 'ADDRESS_3', 'ADDRESS_4', 'ADDRESS_5', 'POSTCODE', 'ADDRESS_START_DATE', 'ADDRESS_END_DATE', 'DATE_OF_BIRTH', 'GENDER_CD', 'CREATE_DATE', 'UKLLC_STATUS', 'NHS_E_Linkage_Permission', 'NHS_Digital_Study_Number', 'NHS_S_Linkage_Permission', 'NHS_S_Study_Number', 'NHS_W_Linkage_Permission', 'NHS_NI_Linkage_Permission', 'NHS_NI_Study_Number', 'Geocoding_Permission', 'Small_Area_Permission', 'Environment_Permission', 'Property_Level_Permission', 'Multiple_Birth', 'National_Opt_Out', 'DFE_Linkage_Permission', 'DWP_Linkage_Permission', 'HMRC_Linkage_Permission']
Unrecognised field names
Unrecognised field names
Column field name(s) STUDY_ID are not as expected. Unable to continue.
Line(s) (ignoring header) 0
This confuses users as usually they cannot see the invisible characters around STUDY_ID with MS Excel, VS Code, Notepad++ or other tools. There might be an indicator saying "Encoding: UTF-8 with BOM" in the status bar but it's not intuitive.
> sw_vers
ProductName: macOS
ProductVersion: 26.2
BuildVersion: 25C56
> python -V
Python 3.13.11
Solution
May print the raw value that Python parsed (not only the expected value) and let the users investigate. May also specify another encoding when opening the file, but at the cost of general applicability.
Metadata
Metadata
Assignees
Labels
No labels