Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,22 @@ name: Tests

on:
push:
branches: [ lwt, test-suite ]
branches: [ master ]
pull_request:
branches: [ lwt ]
branches: [ master ]

permissions:
contents: read

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1

- name: Set up OCaml
uses: ocaml/setup-ocaml@v3
uses: ocaml/setup-ocaml@e32b06a3e831ff2fbc6f08cf35be2085e3918014 # v3.6.1
with:
ocaml-compiler: 5.1.1
dune-cache: true
Expand All @@ -28,16 +31,33 @@ jobs:
opam install . --deps-only --update-invariant
npm install --no-save typescript browserify pug-lexer pug-parser pug-walk

- name: Cache QuickJS
id: cache-quickjs
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: quickjs
key: quickjs-2021-03-27-${{ runner.os }}

- name: Install QuickJS
if: steps.cache-quickjs.outputs.cache-hit != 'true'
run: |
curl -fsSL https://bellard.org/quickjs/quickjs-2021-03-27.tar.xz -o quickjs.tar.xz
tar xvf quickjs.tar.xz && rm quickjs.tar.xz
mv quickjs-2021-03-27 quickjs
cd quickjs && make

- name: Cache Flow
id: cache-flow
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: flow
key: flow-v0.183.1

- name: Install Flow
run: |
git clone --branch v0.183.1 --depth 1 https://github.com/facebook/flow.git flow
if [ ! -d flow ]; then
git clone --branch v0.183.1 --depth 1 https://github.com/facebook/flow.git flow
fi
ln -s "$(pwd)/flow/src/parser" src/flow_parser
ln -s "$(pwd)/flow/src/third-party/sedlex" src/sedlex
ln -s "$(pwd)/flow/src/hack_forked/utils/collections" src/collections
Expand Down
76 changes: 49 additions & 27 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Agent Information - String Extractor

This repository contains an OCaml-based internationalization (i18n) string extraction tool. It parses source files (JS, TS, Vue, Pug, HTML) and extracts strings for translation management.
This repository contains an OCaml-based internationalization (i18n) string extraction tool. It parses source files (JS, TS, Vue, Pug, HTML, Astro) and extracts strings for translation management.

## Documentation

Expand All @@ -10,69 +10,91 @@ This repository contains an OCaml-based internationalization (i18n) string extra
- Looking for specific functionality or function definitions before searching.

- **[DEVELOPMENT.md](DEVELOPMENT.md)**: Contains instructions for environment setup, build processes for various platforms, and release workflows. **Read this file first** when:
- Setting up the development environment or installing dependencies (OCaml, JS, QuickJS).
- Setting up the development environment or installing dependencies (OCaml, JS, QuickJS, Flow).
- Building the project for development or release.
- Executing the tool for manual verification or testing.
- Managing version numbers or release artifacts.

## Project Overview

- **Language**: OCaml (5.1.1) with some C++ (QuickJS bridge) and JavaScript (parsers via Browserify).
- **Language**: OCaml (5.1.1 in CI) with some C++ (QuickJS bridge) and JavaScript (parsers via Browserify).
- **Architecture**:
- `src/cli/`: Main entry point, command-line interface, and output generation logic.
- `src/cli/`: Main entry point (`strings.ml`), command-line interface, output generation (`.strings`/`.json`), and Vue file splitting (`vue.ml`).
- `src/parsing/`: OCaml parsers using `Angstrom` for custom formats and `Flow_parser` for JS.
- `src/quickjs/`: Bridge to QuickJS to run JavaScript-based parsers (TypeScript/Pug) from OCaml.
- `src/utils/`: Common utilities for collection, timing, and I/O.
- **Key Libraries**: `Core`, `Lwt` (concurrency), `Angstrom` (parsing), `Yojson`, `Ppx_jane`.
- **Key Libraries**: `Core`, `Lwt` (concurrency), `Angstrom`, `Yojson`, `Ppx_jane`.
- **Active branch context**: This codebase is the **Lwt** variant (an Eio port exists on other branches). CI runs on branches `lwt` and `test-suite`. Concurrency code uses `Lwt.Syntax`/`Lwt_io`, and `Strings.parse` returns `string Core.String.Table.t Lwt.t`.

## Essential Commands

### Build
- **Development build**: `dune build src/cli/strings.exe`
- **Watch mode**: `dune build src/cli/strings.exe -w`
- **Release build (MacOS)**: `DUNE_PROFILE=release dune build src/cli/strings.exe`
- **Full release cycle**: See `DEVELOPMENT.md` for `cp`, `strip`, and Docker commands.
- **Release build**: `DUNE_PROFILE=release dune build src/cli/strings.exe`
- **Full release cycle** (strip, Docker/Linux): see `DEVELOPMENT.md`.
- If `dune` is not on PATH, run `eval $(opam env)` first (or prefix with `opam exec --`).

### Test
```sh
eval $(opam env)
dune runtest tests/
```
This runs both the inline unit tests (`tests/test_runner.ml`) and an integration test defined as a `runtest` rule in `tests/dune`, which builds the CLI, runs it against `tests/fixtures/` in a temp directory, and verifies that existing French translations are preserved and `MISSING TRANSLATION` markers are emitted.

### Run
- After building: `./_build/default/src/cli/strings.exe [directory-to-extract-from]`
- The CLI expects to be run from the root of a project containing a `strings/` directory (or it will create one if a `.git` folder is present).

### Installation (Dev Setup)
Refer to `DEVELOPMENT.md` for specific `opam` and `npm` setup steps, as the project has several external dependencies (Flow, QuickJS, pug-lexer, etc.).
- The CLI **fails with "This program must be run from the root of your project"** unless the working directory contains either a `strings/` directory or a `.git` directory.
- Output directory defaults to `strings/`; override with `--output DIR` (`-o`).
- All long flags require the full `--` form (`~full_flag_required` is set everywhere).

### CLI Flags (actual, from `src/cli/strings.ml`)
- `--output DIR` / `-o`: change output directory (default `strings`).
- `--ts`: treat scripts in HTML/Pug element attributes as TypeScript.
- `--slow-pug` / `--sp`: use the official Pug parser via QuickJS instead of the fast native OCaml one.
- `--debug-pug` / `--dp`, `--debug-html` / `--dh`, and `--debug-astro` / `--da`: debug template parsing (mutually exclusive). The first two target `.vue` files; `--debug-astro` targets `.astro` files.
- There is **no** `--show-debugging` flag.

## Setup Gotchas (things that break builds)

- **Flow symlinks**: `src/flow_parser`, `src/sedlex`, and `src/collections` are symlinks into a cloned `flow` repo (v0.183.1) at the project root. If they're missing or dangling, builds fail with module errors. Recreate per `DEVELOPMENT.md`.
- **QuickJS dependency**: Requires a compiled `quickjs` directory (quickjs-2021-03-27, `make` run) at the project root. `dune` rules in `src/quickjs/dune` copy `quickjs.h`, `libquickjs.a`, and invoke `quickjs/qjsc` from there.
- **Generated runtime**: `src/quickjs/runtime.h` is generated at build time from `src/quickjs/parsers.js` via `npx browserify` then `qjsc`. Requires `npm install --no-save typescript browserify pug-lexer pug-parser pug-walk` at the repo root.
- **libomp**: `src/quickjs/dune` searches a hardcoded list of paths for `libomp.a`/`libgomp.a` (Homebrew Cellar paths on macOS, `/usr/lib/...` on Linux). If your system has it elsewhere, the build fails with "Could not find libomp.a" — add your path to the list in `src/quickjs/dune`.
- **Link flags**: Platform/profile-specific link flags live in `src/cli/link_flags.{system}.{dev,release}.dune` (the Linux dev one is just `()`). A missing file for your platform/profile combination breaks the build.
- **Version number**: `let version = "x.y.z"` in `src/cli/strings.ml` must be bumped manually for releases.

## Code Conventions & Patterns

### Parsing Strategy
1. **Direct Parsers**: Simple formats like `.strings`, `HTML`, and basic `Vue` tags are parsed using `Angstrom` in `src/parsing/`.
2. **JS/TS Parsing**:
- Javascript uses `Flow_parser` and a custom AST walker in `src/parsing/js_ast.ml`.
2. **JS/TS Parsing**:
- JavaScript uses `Flow_parser` and a custom AST walker in `src/parsing/js_ast.ml`.
- TypeScript uses the official TS parser running inside QuickJS (`src/quickjs/`).
3. **Pug Parsing**: Has a "fast" OCaml implementation (`src/parsing/pug.ml`) and a "slow" official Pug implementation via QuickJS (`src/quickjs/`).
3. **Pug Parsing**: Has a "fast" OCaml implementation (`src/parsing/pug.ml`) and a "slow" official Pug implementation via QuickJS (enabled with `--slow-pug`).
4. **Astro Parsing**: Native Angstrom scanner (`src/parsing/astro.ml`) segments `.astro` files into frontmatter, `<I18n>`/`<i18n>` blocks, `{...}` expressions, and `<script>` blocks. All Astro possible-scripts are parsed as **TSX regardless of `--ts`** (`process_file ~template_script:Vue.TSX` in `src/cli/strings.ml`; `Quickjs.Typescript_tsx` → `extractFromTSX` in `src/quickjs/parsers.js`), so JSX inside expressions parses. `Astro.collect` also re-scans expression segments containing `<I18n`/`<i18n` for nested I18n blocks (conditional/mapped JSX rendering). A missing `is:raw` on an `<I18n>` whose text contains `{` produces a non-fatal warning.

### Extraction Pattern
- Content is extracted into a `Utils.Collector.t`.
- The collector tracks found strings, potential scripts (to be further parsed), and file errors.
- **Convention**: Strings found inside `L("...")` calls are treated as translations in JS/TS.
- Content is extracted into a `Utils.Collector.t` (`{ path; strings: string Queue.t; ... }`).
- The collector tracks found strings, potential scripts (to be further parsed), file errors (fatal, `❌`), and warnings (non-fatal, `⚠️`). Use `Collector.blit_transfer` to merge collectors.
- **Convention**: Strings found inside `L("...")` calls are treated as translations in JS/TS. In Vue/HTML/Pug templates and Astro files, text inside `i18n`/`I18n` elements is a translation key.

### Concurrency
- Uses `Lwt` for cooperative concurrency.
- Parallel traversal of directories is handled in `src/cli/strings.ml` via `Lwt_list` and `Lwt_pool`.
- JS workers (QuickJS) are managed via `Lwt_pool` and `Lwt_preemptive` in `src/quickjs/quickjs.ml`.
- Angstrom parsing of channels uses `Angstrom_lwt_unix.parse` taking an `Lwt_io.input_channel` (not raw strings or Eio flows).

## Important Gotchas

- **QuickJS Dependency**: Requires a compiled `quickjs` directory at the project root for building. `dune` rules in `src/quickjs/dune` copy headers and libraries from there.
- **Generated Headers**: `src/quickjs/runtime.h` is generated from `src/quickjs/parsers.js` using `browserify` and `qjsc`.
- **Linking**: MacOS builds use specific link flags (e.g., `ld64.lld`) defined in `src/cli/link_flags.*`.
- **OCamlFormat**: `.ocamlformat` is present; ensure you format OCaml code before submitting.
### Style
- **OCamlFormat**: `.ocamlformat` defines a custom "Asemio Style" (margin 106, `break-cases=all`, `if-then-else=keyword-first`, etc.). Format OCaml code before submitting.
- **Memory Safety**: Be cautious with C++ FFI code in `src/quickjs/quickjs.cpp`, particularly regarding OCaml's GC interaction (`CAMLparam`, `CAMLreturn`, `caml_release_runtime_system`).

## Testing Approach

- **Inline Tests**: The project uses `ppx_inline_test`. Parsers in `src/parsing/` can be tested directly within the OCaml files or in the `tests/` directory.
- **Test Suite**: A standard test suite is located in `tests/test_runner.ml`. It covers JS, HTML, Pug, and `.strings` file parsing.
- **Integration Tests**: Verification can be performed by running the built binary against fixtures in `tests/fixtures/` and checking the generated output in the `strings/` directory.
- **Debug Flags**: Use `--show-debugging` or `--debug-pug` / `--debug-html` flags in the CLI to inspect internal parsing results.
- **Inline Tests**: `ppx_inline_test` (`let%test_unit`) with `ppx_assert` (`[%test_eq:]`). Tests live in `tests/test_runner.ml` (a library with `(inline_tests)`); the `parsing` library also has `(inline_tests)` enabled (e.g. the tests at the bottom of `src/parsing/astro.ml`).
- **Lwt in tests**: Wrap async test bodies in `Lwt_main.run`. For testing `Strings.parse` without the filesystem, build an in-memory channel: `Lwt_io.of_bytes ~mode:Lwt_io.input (Lwt_bytes.of_string content)`.
- **Synchronous parsers**: `Js.extract_to_collector` and the HTML/Pug/Astro parsers work on raw strings synchronously — no Lwt needed in those tests.
- **Integration Tests**: `tests/dune` contains a bash-based `runtest` rule using fixtures in `tests/fixtures/` (demo.html, demo.js, demo.pug, demo.vue, demo.astro). The CI workflow (`.github/workflows/test.yml`) requires `mkdir -p strings` before `dune runtest tests/`.

## Troubleshooting

Expand Down
7 changes: 5 additions & 2 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The main entry point of the application is **`src/cli/strings.ml`**. It handles
│ │ ├── js.ml # JavaScript string extraction entry point
│ │ ├── pug.ml # Native Pug template parsing
│ │ ├── html.ml # HTML template parsing
│ │ ├── astro.ml # Native Astro file scanning (frontmatter, I18n, expressions)
│ │ ├── strings.ml # .strings file parsing logic
│ │ └── ... # Other specialized parsers (vue blocks, styles)
│ ├── quickjs/ # Interface to QuickJS for JS/TS/Pug parsing
Expand Down Expand Up @@ -50,13 +51,15 @@ The main entry point of the application is **`src/cli/strings.ml`**. It handles
- **`Parsing.Js.extract_to_collector`**: Entry point for scanning JavaScript source code.
- **`Parsing.Js_ast.extract`**: A comprehensive walker for the Flow AST that identifies and extracts strings from `L("...")` calls.
- **`Parsing.Pug.collect`**: Traverses the native Pug AST to extract strings.
- **`Parsing.Astro.parser` / `Parsing.Astro.collect`**: Native Angstrom scanner for `.astro` files. Segments a file into frontmatter, `<script>` blocks, `{...}` expressions (brace matching respects strings, template literals, and comments), and `<I18n>`/`<i18n>` blocks. `collect` enqueues I18n slot text into `strings`, all code segments into `possible_scripts` (always parsed as TSX so JSX in expressions works), re-scans expressions containing `<I18n`/`<i18n` for nested I18n blocks, and emits a non-fatal warning when I18n text contains `{placeholders}` without the `is:raw` directive. `parser` takes `unit` because it uses an internal shared buffer — create a fresh parser per file.
- **`Parsing.Strings.parse`**: Parses existing `.strings` files into a lookup table. Takes a `Lwt_io.input_channel` and returns a `string Core.String.Table.t Lwt.t`.

### `src/quickjs/`
- **`Quickjs.extract_to_collector`**: Offloads extraction to QuickJS for TypeScript and advanced Pug templates.

### `src/utils/`
- **`Utils.Collector.create`**: Initializes a new string collection state for a specific file. (type `t = { path: string; strings: string Queue.t; ... }`)
- **`Utils.Collector.create`**: Initializes a new string collection state for a specific file. (type `t = { path: string; strings: string Queue.t; possible_scripts: string Queue.t; file_errors: string Queue.t; warnings: string Queue.t }`)
- **`Utils.Collector.render_errors` / `Utils.Collector.render_warnings`**: Render collected errors (`❌`, fatal) and warnings (`⚠️`, non-fatal) for terminal output.
- **`Utils.Collector.blit_transfer`**: Merges results from one collector into another.

## Control Flow
Expand All @@ -75,6 +78,6 @@ The project implements a multi-layered testing strategy:
- JavaScript string extraction via `Flow_parser`.
- HTML extraction via `SZXX` and Pug extraction via `Angstrom`.
- Apple-style `.strings` file parsing (via `Lwt_main.run` and `Lwt_io`).
3. **Integration Testing**: The `tests/fixtures/` directory contains sample files of all supported types. The CLI can be run against these fixtures to verify end-to-end extraction and output generation (`.strings` and `.json` files).
3. **Integration Testing**: The `tests/fixtures/` directory contains sample files of all supported types (including `demo.astro`). The CLI can be run against these fixtures to verify end-to-end extraction and output generation (`.strings` and `.json` files).

The `tests/dune` file configures the test library and enables inline tests for the module.
Loading
Loading