Skip to content

Only parse Schedule A itemizations #45

@NickCrews

Description

@NickCrews

Hi! Thanks for this great utility.

I only care about the Schedule A itemizations. In some cases of multi gig .FEC files, the non-schedule A entries can take up more than half of the file, and so really slow down parsing.

Can we add some options to only parse particular itemizations?

In the meantime, I do this, do you see any problems with it? Like are schedule A itemizations always going to come before other schedules?

# filter_fec.sh

# We only want the individual contributions from an FEC file. We don't want
# the other itemizations, they can be gigabytes and slow parsings

# From the FEC file format documentation:

# The first record of every electronic file that is submitted to the FEC must be an
# HDR record that precedes the main body of the ASCII CSV (comma separated values) data.
# The second record will be a "cover" record for the particular filing, (for example,
# a F3 or and F3X record for a FEC-3 or FEC-3X electronic report). An unlimited number
# of Schedule records (examples: SA, SB, SC/ ...) can follow the first two records of
# an FEC electronic report file. (Electronic fi les are usually assigned the file
# suffix ".fec".)

# So as soon as we see a line starting with "SB", "SC", or "SD", we stop.
# From https://stackoverflow.com/a/8940829/5156887
awk '{if(/^SB|^SC|^SD/)exit;else print}'

and use it as curl https://docquery.fec.gov/dcdev/posted/13360.fec | filter_fec.sh | fastfec 13360

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions