posextract offers grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora. It traverses the syntactic dependency relations between parts-of-speech and returns sequences of words that share a grammatical relationship. See our article for more. You can also download posextract for pypi with pip.
extract_triplesto extract subject-verb-object (SVO) and subject-verb-adjective complement (SVA) triplesextract_adj_noun_pairsto extract adjective-noun pairsextract_subj_verb_pairsto extract subject-verb pairs
Required Paramters:
inputcan be the name of a csv file or an input stringoutputname of the output file
Optional Paramters:
--data_columnspecify the column to extract triples from.--id_columnspecify a unique ID field if csv file is given.--file-delimiterspecify comma, pipe, or tab. Default is comma.--post-combine-adjcombine triples (adjective predicate with object)--add-auxiliaryextract future and past tense triples.--prep-phraseextract the . Default set to false.--no-compound-nounExtract just the subject or object (e.g. "Indian Government" is extracted as just "Government").--lemmaspecify whether to lemmatize parts-of-speech. Default is non-lemmatized.--verboseprint
Extract grammatical triples.
from posextract import grammatical_triples
triples = grammatical_triples.extract(['Landlords may exercise oppression.', 'The soldiers were ill.'])
for triple in triples:
print(triple)
# Output: Landlords exercise oppression, soldiers were ill
Extract grammatical triples using different options from default:
from posextract.util import TripleExtractorOptions
triples = grammatical_triples.extract(sent, TripleExtractorOptions(prep_phrase = True))
Or extract adjectives and the nouns they modify.
from posextract import adj_noun_pairs
adj_noun = adj_noun_pairs.extract()
Or extract subjects and their verbs.
from posextract import subj_verb_pairs
subj_verb = subj_verb_pairs.extract()
posextract can extract grammatical triples from text:
python -m posextract.extract_triples "Landlords may exercise oppression." output.csv
# Output: Landlords exercise oppression
posextract can extract SVO/SVA relationships separately or it can combine the adjective as part of a SVO triple:
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj
# Output: soldiers were terminally, soldiers were ill
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj
# Output: soldiers were terminally ill
If provided a .csv file:
python -m posextract.extract_triples --data_column sentence --id_column sentence_id input.csv output.csv
... see our Wiki: