Skip to content

Faster tests#300

Open
tgrandje wants to merge 11 commits intomasterfrom
changed/faster-tests
Open

Faster tests#300
tgrandje wants to merge 11 commits intomasterfrom
changed/faster-tests

Conversation

@tgrandje
Copy link
Copy Markdown
Collaborator

@tgrandje tgrandje commented Mar 26, 2026

#299 and #296

@tgrandje
Copy link
Copy Markdown
Collaborator Author

In the present state, that PR should'nt be merged until we state on #303 : it uses a recent geopandas functionality which is not available in the latest py3.9-compatible geopandas release.

@tgrandje tgrandje marked this pull request as ready for review April 6, 2026 12:21
@tgrandje
Copy link
Copy Markdown
Collaborator Author

tgrandje commented Apr 6, 2026

OK, this is good for review.

I dumped most of the tree decision managing department's retrieval as it was not easy to maintain and could somehow be quite slow. It's been simplified to 3 cases:

  • either we're working on the department's dataset (same as before)
  • or we're on a dataset where INSEE (now) displays insee dpt codes (that's the case for EPCIs for instance) ;
  • or in any other case, it's fast enough to use a sjoin (assuming we already stored a preprocessed departments' geodataset).

During my manual tests, I've mostly seen awkward results for "collectivités territoriales" (Saint-Martin, Saint-Barthélémy) or some other exotic cases : for instance arrondissements municipaux datasets, cantons, where there are holes in french territory or when you choose to zoom in on Paris' inner suburbs and your current dataset has geometries maching Paris's external suburbs. I don't think that was ever perfectly rendered in pynsee...

Comment thread pynsee/geodata/translate_and_zoom.py Outdated
# (This shouldn't be necessary with modern ADMINEXPRESS datasets and
# is here as a backup safe)
dep["insee_dep_geometry"] = dep.simplify_coverage(1)
dep["insee_dep_geometry"] = dep["insee_dep_geometry"].buffer(-1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this occur only in the if? (otherwise there is no overlap so I don't see why we should change the correct polygons

Copy link
Copy Markdown
Collaborator Author

@tgrandje tgrandje Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No : if a geometry fully covers a city department, it will sjoin on a city and it's neighbours. The buffer of -1meter is gonna fix this in the overwhelming majority of cases.

EDIT : I meant a department and not a city, obviously... 😳

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I'm not getting this, could you maybe draw it on a piece of paper and take a photo? (or we can schedule a meeting to discuss it in real time if it's easier, in which case you can just email me)
To prepare for that discussion, even though I don't get the issue, I'm wondering why you chose not to test for small fractions of the original areas instead of modifying the original geometries.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

@tgrandje tgrandje Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this clearer?

EDIT
The negative buffer helps avoiding case 1A. Problems might occur on case 2B, but that seems too exotic a case to be handled by pynsee (I think this will concern only points in a 2meters-large bandwidth around DROM territories).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why you chose not to test for small fractions of the original areas instead of modifying the original geometries.

I'm not sure I understood what you meant either. Let's discuss this in our matrix chanel?

Comment thread pynsee/geodata/translate_and_zoom.py
Comment thread pynsee/geodata/translate_and_zoom.py Outdated
Comment on lines +139 to +141
# buffer on deps should be safe to be used without any duplication
# of gdf, except when there is a valid overlapping (ie regions
# covering multiple deps, interdep epcis...).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the buffer part is relevant here anymore, is it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still useful for case 1A (see above).

Comment thread pynsee/geodata/translate_and_zoom.py Outdated
Comment on lines +152 to +154
# Retrieve simplified geometries for deps. Note that it used a negative
# buffer (10 meters) which should not alter geographic transformations
# given it's range
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adapt to "may" if you indeed moved the buffer into the if

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fxed the comment (not a 10 meter range, but 1 meter)

@tgrandje
Copy link
Copy Markdown
Collaborator Author

Note for memory's sake: we also could detect multipoint datasets and link it to parent datasets. For instance chef_lieu_d_epci has a lien_vers_l_epci fields which links to cleabs in the epci dataset. But:

  • that seems overly complex to maintain (it's a rare use case I think) while sjoin are fasts on a point vs. polygon match,
  • and users may use the LATEST dataset vintage and there is no certitude that LATEST:chef_lieu_d_epci and LATEST:epci are of the same vintage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants