pdplumber return empty string on importerror#481
pdplumber return empty string on importerror#481bosd wants to merge 2 commits intoinvoice-x:masterfrom
Conversation
024d674 to
ed9499d
Compare
src/invoice2data/input/pdfplumber.py
Outdated
| except ImportError: | ||
| logger.debug("Cannot import pdfplumber") | ||
| logger.error("Cannot import pdfplumber") | ||
| return "".encode("UTF-8") |
There was a problem hiding this comment.
Returning empty string suggests that invoice was parsed but was empty.
If we want to return some value then make it None please.
If you take a look at pdftotext.py however, you'll see it raises EnvironmentError if pdftotext is missing. So returning None will make pdfplumber.py somehow incompatible with the pdftotext.py.
As for decision what is better: return None or raise EnvironmentError - I have no idea or preference.
There was a problem hiding this comment.
@rmilecki I agree, we need to think of an solution for this.
Returning None conflicts with:
invoice2data/src/invoice2data/main.py
Line 89 in a5bdd50
as Nonetype cannot be decoded
(Maybe we can remove the decode line? I assume it is a python2 leftover)
pdftotext might be a different story, as it is one of the default/main parsers.
So making everything fail when it's unavailable is not a big deal.
Is there a way to raise the error, only if the pdfplumber input module is called?
We don't want the whole lib to fail on this missing requirement.
This pr #491
is currently failing because of the missing pdfplumber.
(the test should not even run when it is unavailable but that's a different sunbject)
In that example, there should be an ImportError or EnvironmentError
ed9499d to
f7d42b7
Compare
f7d42b7 to
1ae80ab
Compare
1ae80ab to
db0ac11
Compare
Was running some tests, encountered following error when pdfplumber is not available.
This PR returns and empty value and let invoice2data fail gracefully.
Before:
invoice2data input.pdf --input-reader=pdfplumberAfter:
fixes #362