Skip to content
Doug Schmidt edited this page Mar 4, 2022 · 4 revisions

The header area of a CSV file is typically a single line.

Version 21.4.3+ of the plugin adds some more flexibilty to deal with multi-line headers.

Three Configuration properties work together to control the which lines from the header area will be used to defined the columns of the data rows.

  • HeaderRowCount - Defaults to 0, but automatically is set to 1 if any ColumnHeader (or '@' short-form) sources are defined.
  • IgnoredLeadingHeaderRowCount - Defaults to 0, but can be set to skip over some header rows before those rows contribute to the parsed column identifiers.
  • IgnoredTrailingingHeaderRowCount - Defaults to 0, but can be set to skip over some header rows after the column identifiers, but before the data rows start.

Column names are formed by joining all the fields from adjacent header rows, separated by a single space.

Some examples will help show how these properties influence the parsed data.

Example 0 - No header line at all!

This example is a reminder that Tabular can parse data from CSVs that are only data rows, with absolutely no header information. Columns are just referenced by column indexes, starting at 1. It's not a very forgiving format, but it is possible.

LOC1, 2020-04-01, 12.5
LOC2, 1985-02-15, -3.5

The above CSV can be parsed as air temperature readings using this TOML:

Location = '@#1'
Time = '@#2'

[Reading]
ParameterId = 'TA'
UnitId = 'degC'
Value = '@#3'

Example 1 - A single-line header

This is by far the most common case, where the first line in the file defines the columns and the following lines are data rows.

Location, Time, Air Temp
LOC1, 2020-04-01, 12.5
LOC2, 1985-02-15, -3.5

The TOML to parse that CSV file doesn't even need to set the HeaderRowCount, IgnoredLeadingHeaderRowCount, or IgnoredTrailingingHeaderRowCount, since everything is inferred from the '@' short-form syntax.

Location = '@Location'
Time = '@Time'

[Reading]
ParameterId = 'TA'
UnitId = 'degC'
Value = '@Air Temp'

Example 2 - A two-line header

Now let's add in an extra header row to that CSV:

       , Start, Air Temp
Location, Time, °C
LOC1, 2020-04-01, 12.5
LOC2, 1985-02-15, -3.5

The TOML to parse that CSV becomes:

HeaderRowCount = 2
Location = '@Location'
Time = '@Start Time'

[Reading]
ParameterId = 'TA'
UnitId = 'degC'
Value = '@Air Temp °C'

Notice that the column names Location, Start Time, and Air Temp ° are all formed by joining the same fields from adjacent header rows, trimmed of whitespace and separated by a single space.

  • Location is the combination of field 1 values of "" and "Location".
  • Start Time is the combination of field 2 values of "Start" and "Time".
  • Air Temp degC is the combination of field 1 values of "Air Temp" and "°C".

Example 3 - A multi-line header, but ignoring some trailing lines

Consider this next CSV variant, containing some separating lines between the header and the data:

        , Start, Air
Location,  Time, Temp
======================
======================
LOC1, 2020-04-01, 12.5
LOC2, 1985-02-15, -3.5

We'll need to ingore those line of equals, to make the TOML work.

HeaderRowCount = 4 # Two lines of columns followed by two lines of cruft
IgnoredTrailingHeaderRows = 2 # This ignores the === lines that follow
Location = '@Location'
Time = '@Start Time'

[Reading]
ParameterId = 'TA'
UnitId = 'degC'
Value = '@Air Temp'

Clone this wiki locally