Skip to content

Improve firm geolocation quality #1

@memoryfull

Description

@memoryfull

RFSD companion paper writes:

<...> throughout 2014--2023 88.8% of total revenue is geocoded up to a house or street on average in the RFSD; location of 10.0% of revenue is available at city level only.

Figure 3(a) also shows that location accuracy degrades over time. There are three possible ways to improve location inference in the RFSD as it is is currently performed by OpenStreetMap Nominatim (as of version 1.0.0).

  1. Partnership with a business intelligence provider. Most business information providers (SPARK, Kontur, SBIS, DaData) have already geocoded legal addresses of Russian firms. One can make a plea for them to contribute to the RFSD by open-sourcing their mapping between taxpayer/organization identifier (INN, OGRN) and longitude and latitude.
  2. Partnership with location inference providers. There are a handful of vendors offering direct geocoding services (Yandex, DaData, 2GIS, Google, etc.). The query costs, however, are prohibitive: RUB 0.1 per query at minimum × ~9 million unique legal addresses in the EGRUL) ≃ RUB900,000 (≃USD9K). We may seek discounts for bulk processing, but they have to be substantial.
  3. Rule-based address correction before Nominatim. One can develop a set of rules/regular expressions to manually pre-process addresses before sending them to Nominatim. Please find in egrul_addresses_sample_weighted_13jan24.csv.gz a weighted sample of 10,000 source addresses and their geocoding results: 60% of addresses that could not be geocoded, 20% of addresses geocoded to city level, 15% of addresses geocoded to street level, and 5% of addresses geocoded to house level. This stratified sample is a starting point for the development of a rule-based pre-processing: the idea is to develop a set of rules maximising geocoding precision without sacrificing the quality of already geocoded addresses.

We thank you for any contributions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions