Conversation
to make internal PLOS server processes easier for checking articles likely to have updated XML, adds `from_partial_doi` class method & `find_valid_partial_dois` regex.
The same as it can initialize an Article object from DOI, it can do that from a partial DOI as well.
| return 'pone.0040259' | ||
|
|
||
|
|
||
| def test_partial_doi_regex(test_partial_doi): |
There was a problem hiding this comment.
Could you test the regexes directly as well?
There was a problem hiding this comment.
Actually this test is confusingly named… its not testing the doi regexes it's testing the validate_partial_doi function.
| "|10\.1371/annotation/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}") | ||
| partial_doi_regex_search = re.compile(r"p[a-zA-Z]{3}\.[\d]{7}" | ||
| "|annotation/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}") | ||
| partial_doi_regex_match = re.compile(r"^p[a-zA-Z]{3}\.[\d]{7}$" |
There was a problem hiding this comment.
It looks like this is just the same string with ^+your_str+$, am I correct? and the last part of it is almost the same as the last part of the full_doi_regex_search?
Why not make this based on a single string that you then combine as needed to make the search and match?
I'm not sure you need to define search vs. match separately anyway (in this case at least). I ask below… but it'd be a lot easier to know whether that is the case if you tested the regexes directly.
| Example: 'pbio.2000777' is True, but '10.1371/journal.pbio.2000777' is False | ||
| :return: True if string is in valid PLOS partial DOI format; False if not | ||
| """ | ||
| return bool(partial_doi_regex_match.search(partial_doi)) |
There was a problem hiding this comment.
why is the match regex used to search?
| @property | ||
| def partial_doi(self): | ||
| """Convert a DOI to a partial DOI.""" | ||
| return self.doi.lstrip('10.1371/').replace('journal.', '') |
There was a problem hiding this comment.
why aren't you using doi_to_partial?
| elif key not in self.dois: | ||
| if partial_to_doi(key) in self.dois: | ||
| return Article.from_partial_doi(key, directory=self.directory) | ||
| elif validate_partial_doi(key): |
There was a problem hiding this comment.
wouldn't it be safer to validate first?
| return cls(filename_to_doi(filename), directory=directory) | ||
|
|
||
| @classmethod | ||
| def from_partial_doi(cls, partial_doi, directory=None): |
There was a problem hiding this comment.
Why don't you validate the partial doi first?
|
|
||
|
|
||
| @pytest.fixture | ||
| def corpus(): |
There was a problem hiding this comment.
We should consider the corpus a module/session level scoped fixture since this is basically the same as in some of the other tests. You don't need to do it for this PR but it's worth considering
|
|
||
|
|
||
| @pytest.fixture | ||
| def test_article(): |
There was a problem hiding this comment.
Generally fixtures don't have test prefixed to them (see corpus above)
|
|
||
| @pytest.fixture | ||
| def test_article(): | ||
| return Article('10.1371/journal.pone.0040259', directory=starterdir) |
There was a problem hiding this comment.
You could use the corpus & doi fixtures directly here, since fixtures can be used in other fixtures:
@pytest.fixture(scope='module')
def article(corpus, doi):
return corpus[doi]
|
|
||
|
|
||
| @pytest.fixture | ||
| def test_doi(): |
There was a problem hiding this comment.
Same as with test_article shouldn't be prefixed with test, better is to just use def doi():
|
|
||
|
|
||
| @pytest.fixture | ||
| def test_partial_doi(): |
There was a problem hiding this comment.
This should be called partial_doi
|
|
||
| def test_partial_doi_regex(test_partial_doi): | ||
| assert validate_partial_doi(test_partial_doi) | ||
| assert not validate_partial_doi(' pone.0040259') |
There was a problem hiding this comment.
If you're going to have a test like this it's better to make explicit what the manipulation is " " + partial_doi would do that.
| def test_partial_doi_regex(test_partial_doi): | ||
| assert validate_partial_doi(test_partial_doi) | ||
| assert not validate_partial_doi(' pone.0040259') | ||
| assert not validate_partial_doi('pone.0040259 ') |
There was a problem hiding this comment.
Same as above… reuse partial_doi… partial_doi + " "
| assert doi == test_doi | ||
|
|
||
|
|
||
| def test_partial_doi_method_article(test_partial_doi, test_article): |
There was a problem hiding this comment.
technically I would prefer this to be in its own test_article.py for article functionality but I don't think that exists.
| assert article == test_article | ||
|
|
||
|
|
||
| def test_partial_doi_method_corpus(corpus, test_article, test_partial_doi): |
There was a problem hiding this comment.
This should be in the test_corpus.py testing module.
|
I was thinking about this for a bit… why do you ever need to validate a partial doi? Why not validate only on dois since dois are the only things that actually have canonical representations? That way the canonical way to do everything is always to transform the partial doi into the doi first and then validate the doi. This seems like it introduces a lot of complexity and it's not clear to me what the wins are. |
This PR adds a number of methods to the Article and Corpus class to allow for Article() creation from a partial DOI, including from Corpus, transformations between partial and full DOIs, as well as regex to validate partial DOIs. I added a new pytest file as well to cover these changes.