Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,4 @@ Open a [GitHub issue](https://github.com/nikazzio/scriptoria/issues/new) with:

## License

By submitting a pull request, you agree that your contribution is licensed under the [MIT License](LICENSE) that covers this project.
By submitting a pull request, you agree that your contribution is licensed under the [GNU GPL v3 or later](LICENSE) that covers this project.
695 changes: 674 additions & 21 deletions LICENSE

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Copyright (C) 2026 Niki Corradetti

Scriptoria is licensed under the GNU General Public License,
version 3 or, at your option, any later version
(`GPL-3.0-or-later`).
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<a href="https://codecov.io/gh/nikazzio/scriptoria"><img alt="Coverage" src="https://img.shields.io/codecov/c/github/nikazzio/scriptoria?style=flat-square&logo=pytest&label=coverage"></a>
<a href="https://www.python.org/"><img alt="Python 3.10+" src="https://img.shields.io/badge/python-3.10+-3572A5?style=flat-square&logo=python&logoColor=white"></a>
<a href="https://github.com/nikazzio/scriptoria/releases"><img alt="Release" src="https://img.shields.io/github/v/release/nikazzio/scriptoria?display_name=tag&style=flat-square&color=0b7285"></a>
<a href="LICENSE"><img alt="MIT" src="https://img.shields.io/badge/license-MIT-22d3ee?style=flat-square"></a>
<a href="LICENSE"><img alt="GPL-3.0-or-later" src="https://img.shields.io/badge/license-GPL--3.0--or--later-22d3ee?style=flat-square"></a>
</p>

<p align="center">
Expand Down Expand Up @@ -125,5 +125,5 @@ Expected. Open an item from Library, or use the recent-work hub at `/studio`.
---

<p align="center">
<sub>Built for manuscript-heavy research workflows · <a href="LICENSE">MIT</a></sub>
<sub>Built for manuscript-heavy research workflows · <a href="LICENSE">GPL-3.0-or-later</a></sub>
</p>
4 changes: 2 additions & 2 deletions docs/CONFIG_REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Default download policies used when library-specific override is not enabled.

## `settings.network.libraries.<library>`

Libraries supported: `gallica`, `vaticana`, `bodleian`, `institut_de_france`, `internet_culturale` (BETA), `unknown`.
Libraries supported: `gallica`, `vaticana`, `bodleian`, `institut_de_france`, `estense`, `internet_culturale` (BETA), `unknown`.

**HTTPClient Integration**: These settings are used by the centralized `HTTPClient` class for per-library network policies (rate limiting, retry, backoff, concurrency).

Expand Down Expand Up @@ -432,7 +432,7 @@ Discovery search configuration. Editable from Settings > Discovery tab in the we
- `max_results_per_provider` (`int`, default: `20`)
- Maximum number of results returned by each search provider per query.
- Clamped to [1, 50] at runtime and on save.
- For paginatable providers (Archive.org, Harvard, LOC, Gallica, Internet Culturale (BETA)), additional results can be loaded via the "Carica altri risultati" button.
- For paginatable providers (Archive.org, Harvard, LOC, Gallica, Estense, Internet Culturale (BETA)), additional results can be loaded via the "Carica altri risultati" button.
- Non-paginatable providers (Vatican, Bodleian, Cambridge, Heidelberg, Institut, e-codices) return at most this many results from a single API call.
- For Internet Culturale (BETA) the upstream page size is fixed at 20 regardless of `max_results_per_provider`; the "has more" check relies on the authoritative `totalPages` parsed from the HTML instead of the result cap.

Expand Down
24 changes: 13 additions & 11 deletions docs/guides/discovery-and-library.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ Discovery is the boundary layer between Scriptoria and heterogeneous external pr

## What Discovery Does

Discovery resolves external input into a candidate manuscript record. That input can be a direct IIIF manifest URL, a provider item URL, a shelfmark or provider-specific identifier, or a free-text query when the provider has a usable search adapter.
Discovery resolves external input into a candidate item record. That input can be a direct IIIF manifest URL, a provider item URL, a shelfmark or provider-specific identifier, or a free-text query when the provider has a usable search adapter.

The output of Discovery is not yet a full local manuscript workspace. It is a normalized candidate with enough metadata for preview, local registration, and later download.
The output of Discovery is not yet a full local workspace. It is a normalized candidate with enough metadata for preview, local registration, and later download.

## Resolve Versus Search

Internally, Discovery supports both direct resolution and provider-specific search. Those are not the same operation.

- direct resolution means Scriptoria can normalize a known URL or identifier directly into a manifest and manuscript identity;
- direct resolution means Scriptoria can normalize a known URL or identifier directly into a manifest and stable item identity;
- search means Scriptoria asks a provider-specific search surface for possible results and then maps those results back into the product model.

Some providers are strong at both. Others are mostly direct-resolution providers with limited search value. This is why the type of input you paste matters.
Expand All @@ -34,7 +34,7 @@ That is not just user advice. It reflects the shape of the provider registry and

## What Happens When You Add An Item

`Add item` does not force a full download. It persists a local manuscript record and related normalized metadata so the item becomes part of the local catalog.
`Add item` does not force a full download. It persists a local item record and related normalized metadata so the item becomes part of the local catalog.

This is one of the most important product rules:

Expand All @@ -57,11 +57,13 @@ Discovery also reflects provider-specific result behavior. Some providers can ex

The practical posture is to treat Discovery as a normalized gateway, not as proof that every library offers the same search ergonomics.

Biblioteca Estense (Modena) is a dedicated provider for the Jarvis backend (`jarvis.edl.beniculturali.it`). Unlike ICCU, it exposes native IIIF v2/v3 manifests with a level-2 Image API, so items read comfortably inside Mirador with real zoom. Search uses the Spring Data REST endpoint and covers short title, author, and pressmark in a single call; pagination works the same way as for other discovery-first providers.

Internet Culturale **(BETA)** is a special case worth calling out explicitly. It sits at the bottom of the provider select because the integration is experimental: useful when ICCU is the only channel to reach an Italian record, but less reliable than any native IIIF provider. It is an aggregator that fronts around fifty Italian libraries (Laurenziana, Marciana, BNCF, BNCR, Estense, and many smaller partners) and it routinely returns thousands of results for a single keyword. Scriptoria shows the upstream total as "Mostrati X di Y risultati" so the size of the result set is visible, and "Carica altri risultati" walks through the remaining pages twenty at a time. Because the upstream does not expose a IIIF manifest directly, the manifest used internally is converted on-the-fly from ICCU's MAG/XML document; partial records (those declaring more pages than the server actually serves) are still saved as partial scans rather than failing outright, but expect occasional teaser records where only the frontispiece is really available.

## What Library Does

Library is the local catalog of manuscript records and their current working state.
Library is the local catalog of item records and their current working state.

In practical terms, Library is where you:

Expand All @@ -77,9 +79,9 @@ Library is not a passive bookmark list. It is the operational registry for the l

## What A Library Entry Represents

A Library card is the visible UI form of a local manuscript record. That record can include provider identity, manuscript id and manifest URL, normalized title and metadata preview, path information, local manifest state, local scan state, local PDF state, missing-page information, and the asset-state hints later used by Studio and Output.
A Library card is the visible UI form of a local item record. That record can include provider identity, item id and manifest URL, normalized title and metadata preview, path information, local manifest state, local scan state, local PDF state, missing-page information, and the asset-state hints later used by Studio and Output.

This is why Library matters even before a full download exists. It is already the stable identity layer for the manuscript inside Scriptoria.
This is why Library matters even before a full download exists. It is already the stable identity layer for the item inside Scriptoria.

## Saved, Partial, Complete

Expand Down Expand Up @@ -115,7 +117,7 @@ Each of these is dispatched as a tracked download job with the standard pause, r

The other surface in Library is catalog-side: actions that change how an item is described or classified locally without re-downloading anything.

- `Set type` records the manuscript type inside your own catalog.
- `Set type` records the item type inside your own catalog.
- `Update notes` stores free-form annotations on the entry.
- `Refresh metadata` re-fetches normalized metadata from the upstream provider when the source record has changed.
- `Reclassify` re-runs provider classification for one item; `Reclassify all` and `Normalize states` are bulk passes used after registry or schema upgrades.
Expand All @@ -124,13 +126,13 @@ These actions are cheap, local-state operations. Use them to keep your catalog c

## Why Discovery And Library Must Stay Separate

The separation is deliberate for three reasons. Providers are inconsistent, and the local catalog should not inherit the instability of upstream discovery surfaces. Local state also has to remain legible: a manuscript may be known locally long before it becomes a complete local asset set. Finally, the workflow is incremental by design. Scriptoria is built for shortlisting, staged download, partial repair, and later export, not only for all-or-nothing acquisition.
The separation is deliberate for three reasons. Providers are inconsistent, and the local catalog should not inherit the instability of upstream discovery surfaces. Local state also has to remain legible: an item may be known locally long before it becomes a complete local asset set. Finally, the workflow is incremental by design. Scriptoria is built for shortlisting, staged download, partial repair, and later export, not only for all-or-nothing acquisition.

## Practical Rule Of Thumb

If you are still deciding what the manuscript is, you are in `Discovery`.
If you are still deciding what the document is, you are in `Discovery`.

If Scriptoria already knows the manuscript and you are deciding what to do with its local state, you are in `Library`.
If Scriptoria already knows the document and you are deciding what to do with its local state, you are in `Library`.

## Related Docs

Expand Down
8 changes: 4 additions & 4 deletions docs/intro/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Getting Started

This page is the shortest reliable path to a working local installation and a first useful session. It does not try to document every feature. Its purpose is to get you from clone to a real manuscript workflow without confusion about what the application is doing.
This page is the shortest reliable path to a working local installation and a first useful session. It does not try to document every feature. Its purpose is to get you from clone to a real document workflow without confusion about what the application is doing.

Scriptoria exposes two entry points. `scriptoria` starts the web application and gives you the complete workflow. `scriptoria-cli "<manifest-url>"` is the direct CLI path when you already know the exact item you want. For most users, the web application is the right starting point because it exposes discovery, local cataloging, Studio work, and export in one place.

Expand Down Expand Up @@ -47,7 +47,7 @@ At first start, expect a local-first application rather than a public website. E

## What You Will See

The interface is organized into four operational surfaces. `Discovery` resolves external inputs into candidate items. `Library` tracks the items already known to your local workspace. `Studio` opens one manuscript in a working context. `Output` handles page inspection, export preparation, and finished artifacts.
The interface is organized into four operational surfaces. `Discovery` resolves external inputs into candidate items. `Library` tracks the items already known to your local workspace. `Studio` opens one item in a working context. `Output` handles page inspection, export preparation, and finished artifacts.

Those surfaces are separate because they represent different states in the workflow. Discovery is not Library, and Library is not the same thing as a complete local download.

Expand Down Expand Up @@ -80,7 +80,7 @@ That last option is the least universal. Some providers are good at discovery-fi

## When To Use The CLI

Use the CLI when you already know the exact manuscript and do not need the full interactive workflow.
Use the CLI when you already know the exact document and do not need the full interactive workflow.

Example:

Expand Down Expand Up @@ -111,7 +111,7 @@ You do not need to touch configuration for a first session. Once you start worki
Most first-run friction comes from a small set of predictable cases:

- the input pasted into Discovery is too vague for the chosen provider;
- the manuscript is `saved` but not yet downloaded, so Studio opens in remote mode and looks slower than expected;
- the item is `saved` but not yet downloaded, so Studio opens in remote mode and looks slower than expected;
- a partial download was interrupted and Library shows the item in a mid-state;
- the upstream provider rate-limited a fast acquisition.

Expand Down
18 changes: 9 additions & 9 deletions docs/reference/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The CLI lives in `src/universal_iiif_cli/cli.py` and is exposed by the `scriptoria-cli` entry point. It shares the same provider registry, resolver layer, and local vault used by the web application, so anything resolved or stored from the CLI shows up in the same Library that Studio reads.

The CLI exists for two situations: direct acquisition when you already know the manuscript you want, and quick inspection or repair of local state without opening the web app.
The CLI exists for two situations: direct acquisition when you already know the document you want, and quick inspection or repair of local state without opening the web app.

## Basic Usage

Expand All @@ -16,7 +16,7 @@ If you call `scriptoria-cli` with no positional argument, it enters an interacti

## Wizard Mode

Wizard mode is intentionally minimal. It asks for a manuscript or viewer URL, an optional output filename, and an optional OCR model name. It is meant for one-off downloads where you do not want to remember flag names. Anything more advanced should use explicit flags.
Wizard mode is intentionally minimal. It asks for a document or viewer URL, an optional output filename, and an optional OCR model name. It is meant for one-off downloads where you do not want to remember flag names. Anything more advanced should use explicit flags.

```text
🌍 UNIVERSAL IIIF DOWNLOADER 🌍
Expand All @@ -31,7 +31,7 @@ OCR Model (optional, e.g. 'kraken', press Enter to skip): ...
These flags control the acquisition run started by a positional URL or by the wizard.

- `-o, --output`
- Output PDF filename. Without this flag, Scriptoria picks a name from the manuscript identifier.
- Output PDF filename. Without this flag, Scriptoria picks a name from the item identifier.
- `-w, --workers`
- Concurrent downloads for the current run. Default `4`. Increase only if both your network and the upstream provider can absorb it without rate-limiting penalties.
- `--clean-cache`
Expand All @@ -48,15 +48,15 @@ These flags control the acquisition run started by a positional URL or by the wi
These flags do not start a download. They read or modify the local vault directly through `VaultManager`.

- `--list`
- List local manuscripts in the database. The output shows manuscript id, status, page progress, and provider library, with a status icon: ✅ complete, ⏳ downloading, ❌ error, ⚪ other.
- List local items in the database. The output shows item id, status, page progress, and provider library, with a status icon: ✅ complete, ⏳ downloading, ❌ error, ⚪ other.
- `--info ID`
- Show stored fields for one manuscript (provider identity, status, paths, progress, manifest URL, and related metadata).
- Show stored fields for one item (provider identity, status, paths, progress, manifest URL, and related metadata).
- `--delete ID`
- Delete a manuscript record from the vault. This removes the local catalog entry; runtime files on disk are handled by separate cleanup flows.
- Delete an item record from the vault. This removes the local catalog entry; runtime files on disk are handled by separate cleanup flows.
- `--delete-job JOB_ID`
- Remove a single download job row from the internal `download_jobs` table. Mostly useful during development or when stray records survive a crash.
- `--set-status ID STATUS`
- Force the stored status for a manuscript. Standard values are `pending`, `downloading`, `complete`, and `error`. Other strings are accepted with a warning, but the rest of the system reasons in terms of the standard set.
- Force the stored status for an item. Standard values are `pending`, `downloading`, `complete`, and `error`. Other strings are accepted with a warning, but the rest of the system reasons in terms of the standard set.

## Other Options

Expand All @@ -76,13 +76,13 @@ If you need to change those locations, edit `config.json` rather than passing pa
## Operational Notes

- Resolution and provider classification use the same registry as the web UI. If a URL resolves in the CLI, it will resolve the same way in Discovery.
- Local state is shared with Studio. A manuscript downloaded from the CLI is immediately visible in Library and openable in Studio without further import.
- Local state is shared with Studio. An item downloaded from the CLI is immediately visible in Library and openable in Studio without further import.
- The CLI is the right surface for shell pipelines, scripted batch acquisition, headless environments, and local-state inspection.
- The legacy entry points `iiif-cli` and `iiif-studio` are still installed as aliases for `scriptoria-cli` and `scriptoria` to avoid breaking older scripts. New work should use the `scriptoria` names.

## Examples

Download a manuscript by direct manifest URL:
Download an item by direct manifest URL:

```bash
scriptoria-cli "https://digi.vatlib.it/iiif/MSS_Urb.lat.1779/manifest.json"
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The `paths.*` keys define the local runtime directories used by Scriptoria.

These paths cover:

- downloads and local manuscript workspaces;
- downloads and local document workspaces;
- export output;
- temporary image staging;
- model caches;
Expand Down Expand Up @@ -78,7 +78,7 @@ It is split into three layers:
- `settings.network.download.*` for default document download behavior;
- `settings.network.libraries.<provider>.*` for provider-specific overrides.

The supported provider keys under `settings.network.libraries.*` are `gallica`, `vaticana`, `bodleian`, `institut_de_france`, `internet_culturale` **(BETA)**, and `unknown`. Setting `use_custom_policy: false` on a library makes it inherit the `settings.network.download.*` defaults; `true` activates the per-library override fields.
The supported provider keys under `settings.network.libraries.*` are `gallica`, `vaticana`, `bodleian`, `institut_de_france`, `estense`, `internet_culturale` **(BETA)**, and `unknown`. Setting `use_custom_policy: false` on a library makes it inherit the `settings.network.download.*` defaults; `true` activates the per-library override fields.

`internet_culturale` (BETA) ships with a conservative default policy (2 workers per job, 1.0–3.0s delay, 300s cooldown on 403/429, 40 requests per 60s burst window) because the ICCU aggregator is a shared infrastructure and is noticeably less tolerant than large IIIF-native providers.

Expand Down
Loading
Loading