Skip to content

forzagreen/wikitermbase

Repository files navigation

wikitermbase

Table of Contents

Overview

Wiki Term Base is a tool designed to standardise terminology used on Arabic Wikipedia and accelerate vocabulary translation.

ℹ For functional documentation, please check the dedicated Wikipedia page مسرد الويكي (in Arabic).

🌐 The website is available at: https://wikitermbase.toolforge.org

It is hosted on Toolforge, as a Python ASGI application built with the FastAPI framework (served by gunicorn with uvicorn workers via the Toolforge Build Service), using a MariaDB relational database.

The website's frontend is built with React framework.

The Wikipedia gadget frontend is built with OOUI and can be enabled in Arabic Wikipedia's user preferences.

Wiki Gadget

The Wikipedia gadget can be activated in user preferences -> "مسرد الويكي".

The deployed version in Arabic Wikipedia:

On Wikipedia, gadgets are production-ready features, while user scripts serve as a flexible environment for development and experimentation.

The user script, available at gadget/SearchTerm.js, differs from gadget code in that it consolidates all imports, JavaScript code, and CSS styles into a single file.

Local Setup

Please note that the database content is managed in the project arabterm.

Clone the arabterm repository, and start the MariaDB database in a Docker container:

make init
make init_mariadb  # start or create container
make delete_mariadb  # delete database if exists
make migrate_to_mariadb  # migrate the SQLite content to MariaDB

Then from wikitermbase repository, install python dependencies (requires uv):

make init

Create a file at ./var/local.cnf with (adapt values):

[client]
user = MyUserName
password = MyTestPassword

Start the application:

make run

You can then open the web application at http://127.0.0.1:5001/

Backend

Python version: 3.13

API

Interactive OpenAPI docs (Swagger UI) are available at /docs — and at /redoc for the ReDoc rendering. These are auto-generated from the FastAPI route signatures and let you try every endpoint from the browser.

  • Aggregated search (results are groupped by the arabic term):
GET /api/v1/search/aggregated?q=magnetoscope
GET /api/v1/search/aggregated?q=اشتقاق

As a result, we get a JSON. An example can found at gadget/response.json

  • Raw search (without groupping):
GET /api/v1/search?q=magnetoscope
GET /api/v1/search?q=اشتقاق

API on Toolforge (Build Service)

ASGI applications cannot run on Toolforge's legacy python3.13 uWSGI webservice — they require the Build Service backend, which uses Cloud Native Buildpacks to build a container image directly from the public GitHub repo and runs it according to the Procfile. Frontend assets (backend/frontend/dist/) are committed to git so the Python buildpack alone is sufficient — no Node.js step in the build pipeline.

Refs:

Initial Setup

DB credentials don't need to be configured: Toolforge auto-injects TOOL_REPLICA_USER and TOOL_REPLICA_PASSWORD into Build Service containers (same as for the legacy uWSGI webservice). The app reads them directly from os.environ.

ssh toolforge
become wikitermbase

# Stop the legacy webservice if it was previously running on python3.13
toolforge webservice --backend=kubernetes python3.13 stop || true

# Build the image from the public GitHub repo
toolforge build start https://github.com/forzagreen/wikitermbase
toolforge build show   # wait until status is ok(Succeeded)

# Start the Build Service webservice
toolforge webservice buildservice start --mount=none

Test: https://wikitermbase.toolforge.org/api/v1/stats. Logs: toolforge webservice buildservice logs -f.

Updating the Codebase

Code deploys are automated. On push to main, the deploy-code job in .github/workflows/ci.yml SSHs into the bastion and runs toolforge build start + toolforge webservice buildservice restart. Markdown-only and data-only changes skip the rebuild. Manual re-deploy: Actions tab → "CI" → "Run workflow" on main.

Include any frontend rebuild in the commit (make build_frontend && git add backend/frontend/dist && git commit). The Python buildpack auto-detects uv.lock and installs deps with uv sync, so committing changes to pyproject.toml + uv.lock is all that's needed when adding dependencies.

Verify the gadget on Arabic Wikipedia still works after each deploy.

Manual fallback (if GitHub Actions is down):

ssh toolforge && become wikitermbase
toolforge build start https://github.com/forzagreen/wikitermbase
toolforge build show   # wait until status is ok(Succeeded)
toolforge webservice buildservice restart

Database: MariaDB

Data lives in forzagreen/arabterm — that's the source of truth and where dictionary edits happen. When a PR touching db/mariadb/arabterm.sql.gz is merged to arabterm's main, the cross-repo CI flow auto-opens a PR here with the regenerated db/arabterm.sql; merging that PR triggers the production DB import (see "Updating the Database" below). For the upstream dump-generation workflow (make init_mariadb, make migrate_to_mariadb, make dump), see arabterm's README.

MariaDB on Toolforge

Initial Setup

Ref: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#User_databases

  • ssh toolforge and become wikitermbase
  • Find out your user in $HOME/replica.my.cnf
  • Create the database:
    • Open the SQL console: sql tools
    • Create the database: MariaDB [(none)]> CREATE DATABASE s55953__arabterm;

Updating the Database

DB imports are automated. The flow is:

  1. Update data in forzagreen/arabterm and merge to main. When db/mariadb/arabterm.sql.gz changes, arabterm's notify-wikitermbase.yml dispatches an event to this repo.
  2. wikitermbase's refresh-dump.yml runs make download_dump && make fix_dump and opens a PR titled chore: refresh DB dump from arabterm@<sha>.
  3. Review the diff to db/arabterm.sql and merge. CI's deploy-db job SSHs into the bastion and runs mariadb ... < db/arabterm.sql automatically.

Manual triggers:

  • Re-run the dump regeneration: Actions tab → "Refresh DB dump from arabterm" → "Run workflow".

  • Re-import without a code change:

    ssh toolforge && become wikitermbase
    cd ~/wikitermbase
    mariadb --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud s55953__arabterm < db/arabterm.sql

Troubleshooting

All these issues are fixed by running make fix_dump

  • https://jira.mariadb.org/browse/MDEV-34183 drop the line /*!999999\- enable the sandbox mode */ or /*M!999999\- enable the sandbox mode */
  • ERROR 1273 (HY000) at line 25: Unknown collation: 'utf8mb4_uca1400_ai_ci', replace it with utf8mb4_unicode_520_ci

References

About

Standardise terminology used on Arabic Wikipedia and accelerate vocabulary translation

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors