-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
Description
Hi! Thanks for this great utility.
I only care about the Schedule A itemizations. In some cases of multi gig .FEC files, the non-schedule A entries can take up more than half of the file, and so really slow down parsing.
Can we add some options to only parse particular itemizations?
In the meantime, I do this, do you see any problems with it? Like are schedule A itemizations always going to come before other schedules?
# filter_fec.sh
# We only want the individual contributions from an FEC file. We don't want
# the other itemizations, they can be gigabytes and slow parsings
# From the FEC file format documentation:
# The first record of every electronic file that is submitted to the FEC must be an
# HDR record that precedes the main body of the ASCII CSV (comma separated values) data.
# The second record will be a "cover" record for the particular filing, (for example,
# a F3 or and F3X record for a FEC-3 or FEC-3X electronic report). An unlimited number
# of Schedule records (examples: SA, SB, SC/ ...) can follow the first two records of
# an FEC electronic report file. (Electronic fi les are usually assigned the file
# suffix ".fec".)
# So as soon as we see a line starting with "SB", "SC", or "SD", we stop.
# From https://stackoverflow.com/a/8940829/5156887
awk '{if(/^SB|^SC|^SD/)exit;else print}'and use it as curl https://docquery.fec.gov/dcdev/posted/13360.fec | filter_fec.sh | fastfec 13360