Skip to content

Feat add endpoint and pages options to sparql filter#105

Open
zache-fi wants to merge 2 commits intomainfrom
FEAT-add-endpoint-and-pages-options-to-sparql-filter
Open

Feat add endpoint and pages options to sparql filter#105
zache-fi wants to merge 2 commits intomainfrom
FEAT-add-endpoint-and-pages-options-to-sparql-filter

Conversation

@zache-fi
Copy link
Copy Markdown
Collaborator

@zache-fi zache-fi commented Apr 8, 2026

Description

This will add support for using full SELECTs and defining article URI:s directly in SPARQL in ?article variable.

Example below

{{Viikon kilpailu kriteerit|sparql|mode=pages|query=SELECT ?article WHERE { hint:Query hint:optimizer "None" . SERVICE <https://qlever.cs.uni-freiburg.de/api/wikidata> { SERVICE <https://sparqlbridge.toolforge.org/newpages/sparql/wiki=fi,smn,olo,se,incubator&include_edited_pages=1&timestamp=20260331&user_list_page=w:fi:Wikiprojekti:Punaisten_linkkien_naiset/2026> { SELECT ?article ?item WHERE { ?article <http://schema.org/about> ?item . } GROUP BY ?article ?item } } ?item wdt:P21 ?gender . FILTER (?gender NOT IN (wd:Q6581097, wd:Q44148, wd:Q2449503)) } GROUP BY ?article }}

The endpoint definition will work like this

{{Viikon kilpailu kriteerit|sparql|endpoint=https://query-main.wikidata.org/sparqlmode=pages|query=?item wdt:P31 wd:Q146. }}

Howto test

ukbot --page Wikiprojekti:Punaisten_linkkien_naiset/2026-test --simulate config/config.fi-pln.yml

Note: Qlever can fail with Status Code=503 and it is not related to our code

#104

What type of PR is this? (check all applicable)

  • 🍕 Feature

Related Tickets & Documents

Tested?

  • 👍 yes

Added to documentation?

  • 🙅 no documentation needed

[optional] Are there any pre- or post-deployment tasks we need to perform?

@zache-fi
Copy link
Copy Markdown
Collaborator Author

zache-fi commented Apr 8, 2026

hmmph, need redo this as it deletes configs

@zache-fi zache-fi closed this Apr 8, 2026
@zache-fi
Copy link
Copy Markdown
Collaborator Author

zache-fi commented Apr 8, 2026

it was OK afterall, just weird whitespace change effect as the edit was to the end of the file

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support to the SPARQL filter for (1) querying against a configurable SPARQL endpoint and (2) a new mode=pages where the query returns article URIs directly (for multiwiki/incubator use cases like sparqlbridge).

Changes:

  • Extend SparqlFilter to accept endpoint and mode (items vs pages) and to query the configured endpoint.
  • Add tests covering the new endpoint/mode plumbing and pages URL filtering behavior.
  • Expose endpoint/mode template parameters in multiple site configs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ukbot/filters.py Adds endpoint + mode support, endpoint validation, and pages-mode URI parsing.
test/test_filters.py Adds unit tests for SparqlFilter endpoint/mode behavior and pages filtering.
config/sites/nowiki.yml Exposes endpoint and mode as SPARQL filter params.
config/sites/glwiki.yml Exposes endpoint and mode as SPARQL filter params.
config/sites/fiwiki.yml Exposes endpoint and mode as SPARQL filter params.
config/sites/euwiki.yml Exposes endpoint and mode as SPARQL filter params.
config/sites/eswiki.yml Exposes endpoint and mode as SPARQL filter params.
config/sites/enwiki.yml Exposes endpoint and mode as SPARQL filter params.
config/sites/cawiki.yml Exposes endpoint and mode as SPARQL filter params.
config/config.se.yml Adds endpoint mapping for the SPARQL filter (but not mode).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +744 to +748
query_param = cfg['params']['query']
if not tpl.has_param(query_param):
raise RuntimeError(_('No "%s" parameter given') % cfg['params']['query'])

endpoint_param = cfg['params'].get('endpoint')
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparqlFilter.make is now reading query_param from cfg['params']['query'] (a localized parameter name) and then passing that into tpl.has_param(...) / tpl.get_raw_param(...). Because FilterTemplate.has_param/get_raw_param already localize the provided internal key, this effectively double-localizes and will fail on wikis where the query param is translated (e.g. nowiki uses spørring). Use the internal keys ('query', 'endpoint', 'mode') when calling tpl.*, and let FilterTemplate handle localization.

Copilot uses AI. Check for mistakes.
Comment on lines +768 to +772
self.endpoint = endpoint or 'https://query.wikidata.org/sparql'
endpoint_scheme = urllib.parse.urlparse(self.endpoint).scheme.lower()
if endpoint_scheme not in ['http', 'https']:
raise ValueError('Invalid sparql endpoint scheme: %s' % endpoint_scheme)
if mode not in ['items', 'pages']:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing a user-provided endpoint makes the bot perform outbound HTTP requests to arbitrary hosts (contest pages are wiki-editable). Validating only the URL scheme (http/https) still permits SSRF to localhost/private IP ranges and internal services. Consider adding host allowlisting in config, and/or explicitly blocking localhost + RFC1918/link-local ranges after DNS resolution, so only trusted SPARQL endpoints can be used.

Copilot uses AI. Check for mistakes.
Comment on lines 822 to +826
item_var = 'item'
if self.mode == 'pages':
self.add_pages()
logger.info('SparqlFilter: Initialized with %d articles', len(self.page_keys))
return
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In mode='pages', add_pages() relies on do_query() selecting the first variable in the SPARQL result (head.vars[0]). This means a valid query that returns ?article but lists another variable first (e.g. SELECT ?item ?article WHERE ...) will silently produce wrong/empty results. Since the PR description says ?article is the intended variable, consider letting do_query() accept an explicit variable name (e.g. var='article') and raising a clear error if it is missing.

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +134
@patch('ukbot.filters.SparqlFilter.fetch')
def test_make_reads_endpoint_param(self, fetch_mock):
tpl = Mock()
tpl.sites = Mock()
tpl.has_param = lambda name: name in ['query', 'endpoint']
tpl.get_raw_param = lambda name: {
'query': 'SELECT ?item WHERE { ?item wdt:P31 wd:Q5 . }',
'endpoint': 'https://example.org/sparql',
}[name]
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests mock tpl directly and therefore don’t exercise the real localization logic in FilterTemplate (where parameter names may be translated, e.g. query -> spørring). Given the changes in SparqlFilter.make, adding a test that uses a real FilterTemplate instance (or at least simulates has_param/get_raw_param localization behavior) would catch regressions on non-English configs.

Copilot uses AI. Check for mistakes.
Comment on lines 101 to 105
ignore: ignore
sparql: sparql # as in {{ ukb criterion | sparql }}
query: query # as in {{ ukb criterion | sparql | query=... }}
endpoint: endpoint
pages:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config/config.se.yml adds endpoint but not mode, while the new SparqlFilter supports a mode parameter and the other site configs in this PR expose it. If this config is still used, add the corresponding mode translation/mapping here as well so mode=pages can be set on sewiki contests.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FR] add incubator wiki and other wikimedia wiki support to SPARQL filter

2 participants