Partial DOIs by eseiver · Pull Request #101 · PLOS/allofplos

eseiver · 2018-03-22T22:04:57Z

This PR adds a number of methods to the Article and Corpus class to allow for Article() creation from a partial DOI, including from Corpus, transformations between partial and full DOIs, as well as regex to validate partial DOIs. I added a new pytest file as well to cover these changes.

to make internal PLOS server processes easier for checking articles likely to have updated XML, adds `from_partial_doi` class method & `find_valid_partial_dois` regex.

The same as it can initialize an Article object from DOI, it can do that from a partial DOI as well.

mpacer · 2018-04-10T00:35:40Z

allofplos/tests/test_partial_dois.py

+    return 'pone.0040259'
+
+
+def test_partial_doi_regex(test_partial_doi):


Could you test the regexes directly as well?

Actually this test is confusingly named… its not testing the doi regexes it's testing the validate_partial_doi function.

mpacer · 2018-04-10T00:38:11Z

allofplos/plos_regex.py

                                   "|10\.1371/annotation/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}")
+partial_doi_regex_search = re.compile(r"p[a-zA-Z]{3}\.[\d]{7}"
+                                      "|annotation/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}")
+partial_doi_regex_match = re.compile(r"^p[a-zA-Z]{3}\.[\d]{7}$"


It looks like this is just the same string with ^+your_str+$, am I correct? and the last part of it is almost the same as the last part of the full_doi_regex_search?

Why not make this based on a single string that you then combine as needed to make the search and match?

I'm not sure you need to define search vs. match separately anyway (in this case at least). I ask below… but it'd be a lot easier to know whether that is the case if you tested the regexes directly.

mpacer · 2018-04-10T00:39:17Z

allofplos/plos_regex.py

+    Example: 'pbio.2000777' is True, but '10.1371/journal.pbio.2000777' is False
+    :return: True if string is in valid PLOS partial DOI format; False if not
+    """ 
+    return bool(partial_doi_regex_match.search(partial_doi))


why is the match regex used to search?

mpacer · 2018-04-10T00:40:26Z

allofplos/article.py

+    @property
+    def partial_doi(self):
+        """Convert a DOI to a partial DOI."""
+        return self.doi.lstrip('10.1371/').replace('journal.', '')


why aren't you using doi_to_partial?

mpacer · 2018-04-10T00:40:53Z

allofplos/corpus/corpus.py

        elif key not in self.dois:
+            if partial_to_doi(key) in self.dois:
+                return Article.from_partial_doi(key, directory=self.directory)
+            elif validate_partial_doi(key):


wouldn't it be safer to validate first?

mpacer · 2018-04-10T00:41:28Z

allofplos/article.py

        return cls(filename_to_doi(filename), directory=directory)
+
+    @classmethod
+    def from_partial_doi(cls, partial_doi, directory=None):


Why don't you validate the partial doi first?

mpacer · 2018-04-10T00:44:03Z

allofplos/tests/test_partial_dois.py

+
+
+@pytest.fixture
+def corpus():


We should consider the corpus a module/session level scoped fixture since this is basically the same as in some of the other tests. You don't need to do it for this PR but it's worth considering

mpacer · 2018-04-10T00:44:50Z

allofplos/tests/test_partial_dois.py

+
+
+@pytest.fixture
+def test_article():


Generally fixtures don't have test prefixed to them (see corpus above)

mpacer · 2018-04-10T00:47:29Z

allofplos/tests/test_partial_dois.py

+
+@pytest.fixture
+def test_article():
+    return Article('10.1371/journal.pone.0040259', directory=starterdir)


You could use the corpus & doi fixtures directly here, since fixtures can be used in other fixtures:

@pytest.fixture(scope='module') def article(corpus, doi): return corpus[doi]

mpacer · 2018-04-10T00:48:00Z

allofplos/tests/test_partial_dois.py

+
+
+@pytest.fixture
+def test_doi():


Same as with test_article shouldn't be prefixed with test, better is to just use def doi():

mpacer · 2018-04-10T00:48:34Z

allofplos/tests/test_partial_dois.py

+
+
+@pytest.fixture
+def test_partial_doi():


This should be called partial_doi

mpacer · 2018-04-10T00:49:17Z

allofplos/tests/test_partial_dois.py

+
+def test_partial_doi_regex(test_partial_doi):
+    assert validate_partial_doi(test_partial_doi)
+    assert not validate_partial_doi(' pone.0040259')


If you're going to have a test like this it's better to make explicit what the manipulation is " " + partial_doi would do that.

mpacer · 2018-04-10T00:49:49Z

allofplos/tests/test_partial_dois.py

+def test_partial_doi_regex(test_partial_doi):
+    assert validate_partial_doi(test_partial_doi)
+    assert not validate_partial_doi(' pone.0040259')
+    assert not validate_partial_doi('pone.0040259 ')


Same as above… reuse partial_doi… partial_doi + " "

mpacer · 2018-04-10T00:51:00Z

allofplos/tests/test_partial_dois.py

+    assert doi == test_doi
+
+
+def test_partial_doi_method_article(test_partial_doi, test_article):


technically I would prefer this to be in its own test_article.py for article functionality but I don't think that exists.

mpacer · 2018-04-10T00:51:44Z

allofplos/tests/test_partial_dois.py

+    assert article == test_article
+
+
+def test_partial_doi_method_corpus(corpus, test_article, test_partial_doi):


This should be in the test_corpus.py testing module.

mpacer · 2018-04-10T00:55:27Z

I was thinking about this for a bit… why do you ever need to validate a partial doi?

Why not validate only on dois since dois are the only things that actually have canonical representations?

That way the canonical way to do everything is always to transform the partial doi into the doi first and then validate the doi.

This seems like it introduces a lot of complexity and it's not clear to me what the wins are.

eseiver force-pushed the partialdois branch from dfdad96 to 26edacf Compare March 22, 2018 22:46

eseiver requested a review from mpacer March 28, 2018 23:09

eseiver added 7 commits March 28, 2018 17:02

add partal dois regex & class method

c8a474e

to make internal PLOS server processes easier for checking articles likely to have updated XML, adds `from_partial_doi` class method & `find_valid_partial_dois` regex.

add partial_doi property

dee96f4

add validate_partial_doi to regex

8ca7b20

add two-way doi <-> partial_doi conversions

30ee364

update from_partial_doi class method

322e7d0

update Corpus() to take partial doi

2148ccf

The same as it can initialize an Article object from DOI, it can do that from a partial DOI as well.

add tests for partial dois

ee71745

eseiver force-pushed the partialdois branch from 26edacf to ee71745 Compare March 29, 2018 00:04

mpacer reviewed Apr 10, 2018

View reviewed changes

allofplos/tests/test_partial_dois.py

@pytest.fixture

def test_partial_doi():

Copy link

Collaborator

mpacer Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called partial_doi

mpacer reviewed Apr 10, 2018

View reviewed changes

		return 'pone.0040259'


		def test_partial_doi_regex(test_partial_doi):

		assert doi == test_doi


		def test_partial_doi_method_article(test_partial_doi, test_article):

		assert article == test_article


		def test_partial_doi_method_corpus(corpus, test_article, test_partial_doi):

Conversation

eseiver commented Mar 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mpacer Apr 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mpacer Apr 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mpacer commented Apr 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mpacer Apr 10, 2018 •

edited

Loading

mpacer Apr 10, 2018 •

edited

Loading

mpacer commented Apr 10, 2018 •

edited

Loading