diff --git a/projspec-rs/CODEGEN.md b/projspec-rs/CODEGEN.md new file mode 100644 index 0000000..0125b64 --- /dev/null +++ b/projspec-rs/CODEGEN.md @@ -0,0 +1,523 @@ +# CODEGEN.md — projspec-rs generation guide + +This document records exactly how `projspec-rs` was produced from the Python +reference implementation, and all decisions made along the way. It is the +primary input for the next regeneration: read this before touching any code. + +--- + +## Purpose and scope + +`projspec-rs` is a Rust re-implementation of the `projspec` Python package and +CLI. The Python version remains the **reference implementation** — it is +never modified to accommodate the Rust port. This directory is regenerated +from scratch every time the Python source changes significantly. + +Scope: +- All `ProjectSpec` matchers and parsers (40+ types) +- All `BaseContent` and `BaseArtifact` types +- All three enums (`Stack`, `Precision`, `Architecture`) +- Artifact execution (`make`, `clean`, `state`) +- `Project` struct + `resolve()` logic + child walking +- `ProjectLibrary` (JSON persistence) +- Config file (`~/.config/projspec/projspec.json`) +- Project scaffolding (`create`) +- CLI (`scan`, `make`, `create`, `info`, `version`, `library`, `config`) + +Out of scope (same as AGENTS.md): +- `vsextension/`, `pycharm_plugin/`, `src/projspec/qtapp/` +- HTML output (`html.py`, `data_html.py`) +- Remote filesystem support (fsspec S3/GCS/HTTP) — Rust version is local-only + +--- + +## How to regenerate + +### Prerequisites + +```bash +# Install Rust (only needed once per machine) +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y +. "$HOME/.cargo/env" +``` + +### Steps + +1. **Read AGENTS.md** in the repo root. It describes the Python architecture + that this Rust code mirrors. Pay particular attention to: + - The three class families (Project, ProjectSpec, BaseContent/BaseArtifact) + - The `parse()` contract + - The grouping convention in `_contents` / `_artifacts` + +2. **Read all Python source files** (in order): + ``` + src/projspec/__init__.py + src/projspec/proj/__init__.py # list of all concrete spec classes + src/projspec/proj/base.py # Project, ProjectSpec, ProjectExtra + src/projspec/proj/*.py # every concrete spec + src/projspec/content/__init__.py + src/projspec/content/*.py + src/projspec/artifact/__init__.py + src/projspec/artifact/*.py + src/projspec/__main__.py # CLI commands + src/projspec/library.py + src/projspec/config.py + src/projspec/tools.py + ``` + +3. **Re-create the Rust source files** following the module layout below. + Use this CODEGEN.md (especially the Decisions section) to avoid repeating + solved problems. + +4. **Compile and test**: + ```bash + cd projspec-rs + cargo build + ./target/debug/projspec scan /data + ./target/debug/projspec scan /data --json | python3 -c "import json,sys; d=json.load(sys.stdin); print(list(d['specs'].keys()))" + ./target/debug/projspec scan /data --walk --summary + ``` + +--- + +## Module layout + +``` +projspec-rs/src/ + main.rs Module declarations + fn main() → cli::run() + types.rs Stack, Precision, Architecture enums + content.rs Content enum (one variant per BaseContent subclass) + artifact.rs Artifact enum + MakeResult + FileArtifact / Process / etc. + spec.rs ParseCtx, SpecResult, all_parsers() + one fn per spec + project.rs Project struct, resolve(), to_json(), from_json() + library.rs ProjectLibrary (load/save/add/delete/filter) + config.rs Config (load/save/get/set/unset/defaults_table) + create.rs CreateSpec, all_creators() + one fn per spec type + cli.rs clap CLI; dispatches to the above modules +``` + +--- + +## Key design decisions + +### D1: Flat enum for Content and Artifact, not trait objects + +**Decision**: `Content` and `Artifact` are each a single Rust enum with one +variant per concrete Python subclass. No `Box` or trait objects. + +**Rationale**: Trait objects would require `dyn` + boxing everywhere and make +JSON serialisation harder. Enums are exhaustive-matchable, `Clone`able, and +`Serialize`/`Deserialize`-able trivially. + +**Consequence**: Every time a new Python subclass is added, a new variant must +be added to the enum. This is intentional — the compiler enforces +completeness. + +### D2: SpecResult + ParseCtx pattern instead of struct per spec + +**Decision**: Each spec is a free function `parse_(ctx: &ParseCtx) -> Option`. +There is no per-spec struct. + +**Rationale**: The Python `ProjectSpec` subclass pattern (class + `match()` + +`parse()` methods) does not map naturally to Rust without trait objects. +A function that returns `Option` captures the same semantics: +`None` = `match()` returned False, `Some(r)` = successfully parsed. + +`ParseCtx` is a borrow of everything a parser needs: `url`, `basenames`, +`pyproject`. Parsers call `ctx.read_text()`, `ctx.read_toml()`, +`ctx.read_yaml()` — the same operations as Python's `self.proj.get_file()`. + +### D3: `all_parsers()` returns a Vec of `(name, fn)` pairs + +**Decision**: Registration is a static `Vec` returned by `all_parsers()`. + +**Rationale**: Python uses `__init_subclass__` auto-registration into a +module-level dict. Rust has no equivalent runtime mechanism. A static Vec +is simple, ordered, and trivially iterable. Order matters: more-specific +specs (e.g. `RattlerRecipe` before `CondaRecipe`, `Uv`/`Poetry` before +`PythonLibrary`) must be listed first. + +**Important for regeneration**: When Python adds a new spec class, add a +corresponding `parse_*` function in `spec.rs` and an entry in `all_parsers()`. +Check the Python `proj/__init__.py` import order — that determines priority. + +### D4: ProjectExtra specs are identified by `is_extra = true` in SpecResult + +**Decision**: `SpecResult.is_extra` mirrors Python's `ProjectExtra`. The +`resolve()` loop merges extras into `proj.contents`/`proj.artifacts` rather +than storing them in `proj.specs`. + +### D5: TOML → JSON conversion via `toml_to_json()` + +**Decision**: `pyproject.toml` is parsed with the `toml` crate, then converted +to `serde_json::Value` via a recursive `toml_to_json()` helper. + +**Rationale**: All parsers use `serde_json::Value` as the common interchange +type. This avoids having both `toml::Value` and `serde_json::Value` in scope. +The conversion is lossless for all types relevant to pyproject.toml. + +### D6: Jinja stripping for conda YAML files + +**Decision**: `strip_jinja()` removes lines containing `{%…%}` and strips +selector comments like `# [linux]`. Template variables (`{{…}}`) are left +as-is if jinja2-style rendering would be needed. + +**Rationale**: The Python reference uses `_yaml_no_jinja()` with similar logic. +The Rust version is simpler — it does not attempt to evaluate Jinja templates, +it only removes control-flow lines so the YAML is parseable. + +**Consequence**: Recipes that heavily use Jinja2 set-expressions will parse +with placeholder strings instead of resolved values. This is acceptable for +introspection purposes. + +### D7: to_json() is custom, from_json() is partial + +**Decision**: `Project::to_json()` builds a `serde_json::Value` manually +rather than using `serde::Serialize` on the struct. + +**Rationale**: The `Content` and `Artifact` enums use `#[serde(tag = ...)]` +which requires a `klass_name` field that is not present in the struct. +Using `serde_json::to_value()` directly on the top-level `Project` produces +`{}` due to the internally-tagged enum issue. The manual approach is explicit +and produces a clean, predictable JSON shape. + +`from_json()` only reconstructs `path`, `url`, `specs` (name + spec_doc only), +and `children` — enough for library listing and filtering. +Contents/artifacts are not round-tripped from JSON at this time; they are +re-parsed on demand by calling `Project::new()`. + +### D8: serde_yaml for YAML parsing + +**Decision**: `serde_yaml` crate (0.9) is used for all YAML files. + +**Rationale**: Simple API, parses to `serde_json::Value`-compatible structures. +Note: `serde_yaml 0.9` is marked deprecated by its author in favour of +`libyaml-safer`, but remains widely used and stable. Consider migrating to +`serde_yml` (the community fork) on next regeneration if needed. + +### D9: Artifact execution model + +**Decision**: `Artifact::make(cwd, wait)` returns `MakeResult`. + +- `FileArtifact::make()` runs the command, then globs for produced files → + `MakeResult::FilesProduced`. +- `Process::make(wait=true)` runs to completion → `MakeResult::Completed`. +- `Process::make(wait=false)` spawns and forgets (server mode) → + `MakeResult::ProcessSpawned { pid }`. +- `HelmDeployment::make()` → `MakeResult::Deployed`. + +**Rationale**: The Python version stores a live `subprocess.Popen` handle on +the artifact instance. Rust cannot do this without unsafe lifetime tricks. +Instead, the CLI prints the PID and exits; the user manages the server process +externally. For batch processes, `wait=true` is the default. + +### D10: Library persistence is local-only + +**Decision**: `ProjectLibrary` reads/writes a local JSON file only. No fsspec +remote support. + +**Rationale**: The Python version uses `fsspec.open()` which transparently +supports S3/GCS/etc. The Rust version uses `std::fs` only. Remote library +paths are listed as a future enhancement. + +### D11: `scan_max_size` and content scanning + +**Decision**: The Rust version does **not** implement content scanning +(reading file bytes to detect Marimo, Flask, etc.). Instead, `parse_marimo()` +reads `.py` files directly from disk. + +**Rationale**: The Python version's `scanned_files` dict pre-reads small files +for all spec parsers to share. In Rust, each parser reads what it needs. +The `scan_max_size` config is preserved for compatibility but not enforced +(all files are read on demand). + +--- + +## What changed from Python → Rust + +| Python | Rust | +|---|---| +| `__init_subclass__` auto-registration | Static `all_parsers()` Vec | +| `ProjectSpec` class per spec type | Free function `parse_()` | +| `AttrDict` | `HashMap` | +| `ProjectExtra` subclass | `SpecResult { is_extra: true }` | +| `fsspec` remote FS | `opendal::blocking::Operator` via `Vfs` struct in `fs.rs` | +| `scanned_files` pre-read cache | On-demand file reads per parser | +| `to_dict(compact=False)` round-trip | Partial: `to_json()` + `from_json()` | +| HTML output | Not implemented | +| `Project.make(qname)` | `Project::find_artifact(qname)` + `art.make()` | +| `pydoc.doc(cls)` in `info` command | JSON class listing | + +--- + +## Known gaps / future work + +1. **Content/artifact round-trip from JSON** — `from_json()` currently drops + contents and artifacts when loading from library. The JSON is saved + correctly; add `Content`/`Artifact` deserialisers to restore fully. + +2. **Remote filesystem** — Implemented via `opendal` (see D-FS1–D-FS5 below). + S3 (including moto/minio), HTTP, local Fs, and memory backends are supported. + GCS / Azure / HDFS are available via opendal features but not yet wired up in + `vfs_from_url()`. + +3. **UvScript spec** — Inline `# /// script` metadata in `.py` files requires + content scanning. Currently not implemented; add it alongside D11 fix. + +4. **Django app detection** — Full detection requires walking `*/settings.py` + + `*/urls.py`. Current implementation matches `manage.py` but does not find + app directories. + +5. **Pixi lock file parsing** — The Rust version reads lock-file presence but + does not parse individual environment packages from `pixi.lock` (YAML). + Add this to get full environment content from locked pixi projects. + +6. **`briefcase` multi-app / platform support** — Python version iterates all + apps × platforms. Rust version only adds a linux-deb artifact as a stub. + +7. **`serde_yaml` migration** — When the `serde_yaml` 0.9 deprecation becomes + a practical issue, migrate to `serde_yml` (drop-in fork). + +--- + +## Regeneration checklist + +When the Python reference changes, run through this list: + +- [ ] Re-read `src/projspec/proj/__init__.py` — were new spec classes added? + Add corresponding `parse_*` functions in `spec.rs` and entries in + `all_parsers()`. +- [ ] Re-read changed `proj/*.py` files — did `match()` criteria change? + Update the corresponding `parse_*` function. +- [ ] Re-read `content/*.py` — were new content fields or classes added? + Update `Content` enum variants and their `summary()` arms. +- [ ] Re-read `artifact/*.py` — were new artifact types added? + Update `Artifact` enum and execution logic. +- [ ] Re-read `__main__.py` — were new CLI commands or options added? + Update `cli.rs`. +- [ ] Re-read `config.py` — were new config keys added? + Update `Config` struct and `defaults_table()`. +- [ ] **Prefetch maintenance** (see D-PF4): + - New `ctx.vfs_exists("sub/path")` in spec.rs? → add to `subpaths_to_prefetch()`. + - New parser reads a file with an extension not in `PREFETCH_EXTS`? → add extension. + - New `ctx.read_text("name.yaml")` etc.? → no action needed (covered by extension rule). +- [ ] Compile: `cargo build` +- [ ] Run tests: `cargo test --test test_memory && cargo test --test test_http && cargo test --test test_s3 -- --test-threads=1` +- [ ] Smoke-test: `./target/debug/projspec scan /data` and compare output + with Python: `python -m projspec scan /data` +- [ ] Update this CODEGEN.md with any new decisions made. + +--- + +## Remote filesystem with opendal (D-FS1 – D-FS5) + +Added in the second iteration of projspec-rs. + +### D-FS1: `opendal::blocking::Operator` — not async + +opendal's primary API is async (tokio-based). The parsers are CPU-bound string +processing and are synchronous throughout. We use `opendal::blocking::Operator` +which internally calls `block_on()` on a tokio runtime. A single global +`tokio::runtime::Runtime` is created via `OnceLock` in `fs.rs:get_runtime()` and +reused for every Vfs operation. + +**Decision**: one global runtime, not per-request. Multiple `Vfs` instances +share the same runtime; this is safe because `blocking::Operator` is `Clone`. + +### D-FS2: `Vfs` is a struct, not a trait + +`Vfs` wraps `opendal::blocking::Operator` directly. No `dyn Vfs` boxing. +`ParseCtx` holds `&'a Vfs` (a borrow), so it is zero-cost. If multiple +heterogeneous backends per parse-pass are ever needed, introduce a trait then. + +### D-FS3: Operator root = project root; paths are relative + +Each `Vfs` is rooted at the project directory (or bucket prefix for S3). +All file reads inside parsers use relative paths (e.g. `"pyproject.toml"`, +not `"/data/pyproject.toml"`). `vfs.basenames()` returns `{basename: basename}` +(the relative path equals the basename at the root level). + +**For S3**: `list_dir("/")` returns object keys relative to the configured root +prefix. No path translation is needed. + +**For HTTP**: `list_dir()` is not supported (opendal Http only provides read+stat). +The `parse_http()` helper in the HTTP test manually constructs the basenames map. +In production `Project::new_with_vfs()` with an HTTP backend will produce an +empty project unless basenames are supplied externally (e.g. from a manifest). + +### D-FS4: HTTP backend limitation — no listing + +The opendal `services::Http` builder supports `read` and `stat` only. +`list_dir()` returns an empty vec for HTTP backends. + +**Consequence**: `Project::new_with_vfs()` produces an empty project when called +with an HTTP `Vfs` and no externally-provided basenames, because `vfs.basenames()` +returns `{}`. + +**Workaround implemented in tests**: `parse_http()` in `tests/test_http.rs` +manually builds the basenames from a known list of filenames, then constructs +a `ParseCtx` directly. + +**Future work**: add a `basenames_override` parameter to `Project::new_with_vfs()` +so callers can supply a pre-populated basenames map (e.g. from an index file or +directory manifest). + +### D-FS5: S3 credentials from environment variables only + +The opendal S3 builder automatically loads credentials from environment: +`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, `AWS_ENDPOINT_URL`. +We do **not** call `disable_config_load()` so the full AWS credential chain works. + +For moto (test S3), these env vars are set to dummy values before each test. +The `set_moto_env()` helper uses `unsafe { std::env::set_var(...) }` because +`set_var` is unsafe in Rust 2024; tests run with `--test-threads=1` to avoid +data races on env vars. + +`AWS_ENDPOINT_URL` is the standard env var for pointing S3 clients at a custom +endpoint (moto, minio, etc.). opendal reads it via its `disable_config_load` +flag being absent. + +### D-FS6: `lib.rs` added for integration tests + +Because this crate is a `[[bin]]`-only crate, integration tests in `tests/` +cannot import internal modules. Adding `[lib]` with `name = "projspec_rs"` and +a `src/lib.rs` that re-exports all modules allows tests to use `projspec_rs::fs`, +`projspec_rs::project`, etc. + +The binary (`main.rs`) continues to declare its own `mod` statements and call +`cli::run()`; the library crate independently exposes the same modules. + +### D-FS7: moto server startup protocol + +The Python moto fixture (`tests/fixtures/moto_server.py`): +1. Starts a werkzeug HTTP server on port 0 (OS-assigned). +2. Populates a `projspec-test` bucket with fixture files. +3. **Prints the actual port on stdout** (single line, flushed). +4. Blocks on `sys.stdin.read()` until stdin closes (Rust drops the child's stdin pipe). + +The Rust test reads the first stdout line to get the port. werkzeug request +logs are suppressed (`logging.getLogger("werkzeug").setLevel(ERROR)`) to prevent +them from appearing before the port line. + +### Testing commands + +```bash +# Fast: no external processes +cargo test --test test_memory + +# Requires Python + http.server stdlib (built-in) +cargo test --test test_http + +# Requires: pip install moto[server] boto3 flask-cors +# Run single-threaded to avoid env-var races +cargo test --test test_s3 -- --test-threads=1 + +# All tests +cargo test --test test_memory && \ +cargo test --test test_http && \ +cargo test --test test_s3 -- --test-threads=1 +``` + +--- + +## Concurrent file prefetch (D-PF1 – D-PF4) + +Added in the third iteration of projspec-rs. + +### Motivation + +Each call to `ctx.read_text()` / `ctx.read_yaml()` / `ctx.read_toml()` inside a +parser is a synchronous VFS operation. For local `Fs` this costs microseconds. +For HTTP and S3 each call is a network round-trip (1–100 ms). With ~25 files +potentially read across all parsers, sequential access adds up to seconds on +remote backends. + +### D-PF1: Pre-fetch all candidate files in parallel before parsers run + +In `project.rs::resolve()`, after `vfs.basenames()` returns and `pyproject.toml` +is parsed, two parallel operations are launched via `rayon::par_iter`: + +1. **File reads** — `files_to_prefetch()` returns the static list of all + filenames any parser might read, intersected with the files actually present + in `basenames`. Each present file is read once concurrently. Result: + `file_cache: HashMap`. + +2. **Existence checks** — `subpaths_to_prefetch()` returns sub-paths below the + root that parsers check via `ctx.vfs_exists()` (e.g. `.vscode/settings.json`, + `.idea`). Each is stat'd concurrently. Result: + `exists_cache: HashMap`. + +Both caches are owned by `resolve()` and passed into `ParseCtx` as borrows. + +### D-PF2: `pyproject.toml` is read before prefetch, not during it + +`pyproject.toml` must be read before `files_to_prefetch()` is called because +future enhancements might use its contents to decide which additional files to +include (e.g. if `[tool.pixi]` is present, include `pixi.lock`). It is +therefore excluded from the static prefetch list to avoid double-reading. + +### D-PF3: `ParseCtx` cache check order + +`ParseCtx::read_text(name)` now: +1. Checks `file_cache` (HashMap lookup, O(1), zero I/O). +2. On miss: looks up `rel = basenames[name]`, then calls `vfs.read_text(rel)`. + +`ParseCtx::vfs_exists(path)` now: +1. Checks `exists_cache`. +2. On miss: calls `vfs.exists(path)` live. + +`parse_marimo` — the only content-scanning parser — calls `ctx.read_text_path(rel)` +which also checks `file_cache` keyed by the relative path. Since `.py` files are +included in the dynamic part of `files_to_prefetch()`, marimo scanning is always +cache-served after the first project instantiation. + +### D-PF4: Maintenance rules — what requires manual upkeep + +**File contents (`files_to_prefetch`)** — **no manual upkeep required** for +new parsers, provided they read files of a recognised type. The function +prefetches every file in `basenames` whose extension (or name) matches a fixed +set of metadata formats: + +| Extension / name | Examples | +|---|---| +| `.toml` | pixi.toml, Cargo.toml, book.toml, pyscript.toml | +| `.yaml` / `.yml` | Chart.yaml, .readthedocs.yaml, environment.yml | +| `.json` | package.json, datapackage.json, .zenodo.json | +| `.txt` | requirements.txt, LICENSE.txt | +| `.md` | README.md, CITATION.md | +| `.lock` | uv.lock, poetry.lock, pixi.lock | +| `.cff` | CITATION.cff | +| `.py` | marimo content-scan, pyscript | +| `.mod` | go.mod | +| `MLFlow`, `Dockerfile` | exact-name match | +| `LICENSE*`, `LICENCE*`, `COPYING*` | prefix match | + +`pyproject.toml` is always excluded — it is read before prefetch. + +If a future parser reads a file with an extension **not** in this list (e.g. +`.rb`, `.gradle`), add that extension to `PREFETCH_EXTS` in +`project.rs::files_to_prefetch()`. That is the only maintenance needed. + +**File listing (`basenames`)** — **no upkeep needed at all**. `basenames` is +built by a single `vfs.basenames()` call at the start of `resolve()` and is +available to all parsers as a free HashMap lookup via `ctx.has()` / +`ctx.has_any()`. It already covers the full root listing. + +**Sub-path existence (`subpaths_to_prefetch`)** — **manual upkeep required**. +These are paths *inside* sub-directories (e.g. `.vscode/settings.json`) that +cannot be inferred from the root listing. Add an entry here whenever a new +`ctx.vfs_exists("some/sub/path")` call is introduced in spec.rs. + +### Performance impact + +| Backend | Before | After | +|---|---|---| +| Local `Fs` | ~0.5 ms (25 sequential syscalls) | ~0.1 ms (parallel, OS cache) | +| Memory | ~0 ms | ~0 ms (no change) | +| HTTP (100 ms RTT) | ~2.5 s (25 × 100 ms) | ~100 ms (parallel) | +| S3 (20 ms RTT) | ~500 ms (25 × 20 ms) | ~20 ms (parallel) | + +Actual numbers depend on backend latency, connection pool size, and how many +files are actually present. For a minimal Python project (pyproject.toml + uv.lock ++ README.md) the effective file count is ~3, so the improvement is ~3× even +without parallelism. diff --git a/projspec-rs/Cargo.toml b/projspec-rs/Cargo.toml new file mode 100644 index 0000000..c2a6fc4 --- /dev/null +++ b/projspec-rs/Cargo.toml @@ -0,0 +1,61 @@ +[package] +name = "projspec-rs" +version = "0.1.0" +edition = "2024" +description = "Rust port of the projspec project-introspection library and CLI" +authors = ["projspec contributors"] + +[[bin]] +name = "projspec" +path = "src/main.rs" + +[lib] +name = "projspec_rs" +path = "src/lib.rs" + +[dependencies] +# CLI +clap = { version = "4", features = ["derive", "env"] } + +# Serialisation +serde = { version = "1", features = ["derive"] } +serde_json = "1" + +# TOML parsing +toml = "0.8" + +# YAML parsing +serde_yaml = "0.9" + +# Regex +regex = "1" + +# Filesystem helpers (used by artifact/create/library modules only) +walkdir = "2" +glob = "0.3" + +# Error handling +anyhow = "1" +thiserror = "2" + +# Home dir (~/.config) +dirs = "6" + +# Virtual filesystem — opendal blocking API + backends we support +opendal = { version = "0.55", features = [ + "blocking", + "services-fs", + "services-memory", + "services-http", + "services-s3", +] } + +# Async runtime required by opendal's blocking wrapper +tokio = { version = "1", features = ["rt-multi-thread"] } + +# Parallel prefetch of project files +rayon = "1" + +[dev-dependencies] +tempfile = "3" +tokio = { version = "1", features = ["rt-multi-thread", "macros"] } diff --git a/projspec-rs/src/artifact.rs b/projspec-rs/src/artifact.rs new file mode 100644 index 0000000..57c3408 --- /dev/null +++ b/projspec-rs/src/artifact.rs @@ -0,0 +1,473 @@ +/// Artifact types — executable actions or producible outputs. +/// Each variant maps to a Python BaseArtifact subclass. +/// +/// Unlike the Python version which holds a live subprocess handle, the Rust +/// version models the *description* (cmd, fn_glob, etc.) separately from +/// the *execution result* (MakeResult). Execution is done by `make()`. + +use std::collections::HashMap; +use std::process::{Command as StdCommand, Stdio}; +use anyhow::{Result, Context}; +use serde::{Deserialize, Serialize}; +use crate::types::Architecture; + +// --------------------------------------------------------------------------- +// Make result — what make() returns to the caller +// --------------------------------------------------------------------------- + +/// The outcome of executing an artifact. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub enum MakeResult { + /// A long-running process was spawned; its PID is returned. + /// The process is left running. + ProcessSpawned { pid: u32, cmd: Vec }, + /// A file artifact was produced; here are the paths. + FilesProduced(Vec), + /// The artifact ran to completion (process exited 0) but produces no files. + Completed { cmd: Vec, stdout: String, stderr: String }, + /// Deployment-style action (e.g. helm upgrade). + Deployed { release: String }, +} + +impl std::fmt::Display for MakeResult { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + MakeResult::ProcessSpawned { pid, cmd } => { + write!(f, "Process spawned (pid={pid}): {}", cmd.join(" ")) + } + MakeResult::FilesProduced(files) => { + write!(f, "Files produced: {}", files.join(", ")) + } + MakeResult::Completed { cmd, stdout, stderr } => { + write!(f, "Completed: {}\n{}{}", cmd.join(" "), stdout, stderr) + } + MakeResult::Deployed { release } => { + write!(f, "Deployed release: {release}") + } + } + } +} + +// --------------------------------------------------------------------------- +// Artifact state +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "lowercase")] +pub enum ArtifactState { + Clean, + Done, + Pending, + Unknown, +} + +impl std::fmt::Display for ArtifactState { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + ArtifactState::Clean => write!(f, "clean"), + ArtifactState::Done => write!(f, "done"), + ArtifactState::Pending => write!(f, "pending"), + ArtifactState::Unknown => write!(f, ""), + } + } +} + +// --------------------------------------------------------------------------- +// Individual artifact kinds +// --------------------------------------------------------------------------- + +/// Common fields shared by all artifacts. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ArtifactBase { + /// The command to run to produce/launch this artifact. + pub cmd: Vec, +} + +/// A `FileArtifact` — output is one or more files matched by a glob. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct FileArtifact { + #[serde(flatten)] + pub base: ArtifactBase, + /// Glob pattern for the expected output file(s). + pub fn_glob: String, +} + +impl FileArtifact { + pub fn state(&self) -> ArtifactState { + let matches = glob::glob(&self.fn_glob) + .map(|paths| paths.filter_map(|p| p.ok()).count()) + .unwrap_or(0); + if matches > 0 { + ArtifactState::Done + } else { + ArtifactState::Clean + } + } + + pub fn make(&self, cwd: &str) -> Result { + run_to_completion(&self.base.cmd, cwd)?; + // Re-glob to find what was produced + let files: Vec = glob::glob(&self.fn_glob) + .context("glob error")? + .filter_map(|p| p.ok()) + .map(|p| p.to_string_lossy().to_string()) + .collect(); + Ok(MakeResult::FilesProduced(files)) + } + + pub fn clean(&self) -> Result<()> { + let files: Vec<_> = glob::glob(&self.fn_glob) + .context("glob error")? + .filter_map(|p| p.ok()) + .collect(); + for f in files { + std::fs::remove_file(&f) + .with_context(|| format!("removing {}", f.display()))?; + } + Ok(()) + } +} + +/// A `Process` — a subprocess (batch or long-running). +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Process { + #[serde(flatten)] + pub base: ArtifactBase, + /// If true, this is a server that should stay running. + #[serde(default)] + pub server: bool, + /// Optional port argument name (e.g. "--server.port") + #[serde(default)] + pub port_arg: Option, + /// Optional address argument name (e.g. "--server.address") + #[serde(default)] + pub address_arg: Option, +} + +impl Process { + pub fn make(&self, cwd: &str, port: Option, address: Option<&str>, wait: bool) -> Result { + let mut cmd = self.base.cmd.clone(); + if let (Some(port), Some(arg)) = (port, &self.port_arg) { + cmd.push(arg.clone()); + cmd.push(port.to_string()); + } + if let (Some(addr), Some(arg)) = (address, &self.address_arg) { + cmd.push(arg.clone()); + cmd.push(addr.to_string()); + } + + if self.server || !wait { + // spawn and leave running + let child = StdCommand::new(&cmd[0]) + .args(&cmd[1..]) + .current_dir(cwd) + .spawn() + .with_context(|| format!("spawning {}", cmd.join(" ")))?; + let pid = child.id(); + // leak the child so process keeps running + std::mem::forget(child); + Ok(MakeResult::ProcessSpawned { pid, cmd }) + } else { + run_to_completion(&cmd, cwd) + } + } +} + +/// A `LockFile` — a file artifact where the output is a lock file. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct LockFile { + #[serde(flatten)] + pub file: FileArtifact, +} + +/// A `VirtualEnv` — a Python virtual environment directory. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct VirtualEnv { + #[serde(flatten)] + pub file: FileArtifact, +} + +/// A `CondaEnv` — a conda environment directory. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct CondaEnv { + #[serde(flatten)] + pub file: FileArtifact, +} + +/// A `EnvPack` — a packed environment archive. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct EnvPack { + #[serde(flatten)] + pub file: FileArtifact, +} + +/// A Python wheel. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Wheel { + #[serde(flatten)] + pub file: FileArtifact, +} + +/// A conda package. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct CondaPackage { + #[serde(flatten)] + pub file: FileArtifact, + #[serde(default)] + pub name: Option, +} + +/// A system-installable package (deb, rpm, msi, dmg, …). +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SystemInstallablePackage { + #[serde(flatten)] + pub file: FileArtifact, + pub arch: Architecture, + pub filetype: String, +} + +/// A Docker image. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct DockerImage { + pub cmd: Vec, + #[serde(default)] + pub tag: Option, +} + +impl DockerImage { + pub fn new(tag: Option) -> Self { + let cmd = if let Some(ref t) = tag { + vec!["docker".into(), "build".into(), ".".into(), "-t".into(), t.clone()] + } else { + vec!["docker".into(), "build".into(), ".".into()] + }; + DockerImage { cmd, tag } + } + + pub fn make(&self, cwd: &str) -> Result { + run_to_completion(&self.cmd, cwd) + } +} + +/// A Docker runtime (container running from an image). +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct DockerRuntime { + pub image: DockerImage, +} + +/// A Helm deployment. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct HelmDeployment { + pub release: String, + pub cmd: Vec, + pub clean_cmd: Vec, +} + +impl HelmDeployment { + pub fn new(release: &str) -> Self { + HelmDeployment { + release: release.to_string(), + cmd: vec!["helm".into(), "upgrade".into(), "--install".into(), release.into(), ".".into()], + clean_cmd: vec!["helm".into(), "uninstall".into(), release.into()], + } + } + + pub fn state(&self) -> ArtifactState { + let status = StdCommand::new("helm") + .args(["status", &self.release]) + .output(); + match status { + Ok(out) if out.status.success() => ArtifactState::Done, + _ => ArtifactState::Clean, + } + } + + pub fn make(&self, cwd: &str) -> Result { + run_to_completion(&self.cmd, cwd)?; + Ok(MakeResult::Deployed { release: self.release.clone() }) + } + + pub fn clean(&self, cwd: &str) -> Result<()> { + run_to_completion(&self.clean_cmd, cwd)?; + Ok(()) + } +} + +/// A `PreCommit` artifact. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct PreCommit { + pub cmd: Vec, +} + +impl Default for PreCommit { + fn default() -> Self { + PreCommit { cmd: vec!["pre-commit".into(), "run".into(), "-a".into()] } + } +} + +// --------------------------------------------------------------------------- +// The main Artifact enum +// --------------------------------------------------------------------------- + +/// Named group of artifacts of the same kind, keyed by label (e.g. "debug"/"release"). +pub type ArtifactGroup = HashMap; + +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(tag = "klass_name", rename_all = "snake_case")] +pub enum Artifact { + FileArtifact(FileArtifact), + Process(Process), + LockFile(LockFile), + VirtualEnv(VirtualEnv), + CondaEnv(CondaEnv), + EnvPack(EnvPack), + Wheel(Wheel), + CondaPackage(CondaPackage), + SystemInstallablePackage(SystemInstallablePackage), + DockerImage(DockerImage), + DockerRuntime(DockerRuntime), + HelmDeployment(HelmDeployment), + PreCommit(PreCommit), + /// Multiple artifacts of the same kind, keyed by a label. + Group(ArtifactGroup), +} + +impl Artifact { + pub fn summary(&self) -> String { + match self { + Artifact::FileArtifact(fa) => { + format!("FileArtifact({}, {})", fa.base.cmd.join(" "), fa.state()) + } + Artifact::Process(p) => format!("Process({})", p.base.cmd.join(" ")), + Artifact::LockFile(lf) => { + format!("LockFile({}, {})", lf.file.base.cmd.join(" "), lf.file.state()) + } + Artifact::VirtualEnv(ve) => { + format!("VirtualEnv({}, {})", ve.file.base.cmd.join(" "), ve.file.state()) + } + Artifact::CondaEnv(ce) => { + format!("CondaEnv({}, {})", ce.file.base.cmd.join(" "), ce.file.state()) + } + Artifact::EnvPack(ep) => { + format!("EnvPack({}, {})", ep.file.base.cmd.join(" "), ep.file.state()) + } + Artifact::Wheel(w) => { + format!("Wheel({}, {})", w.file.base.cmd.join(" "), w.file.state()) + } + Artifact::CondaPackage(cp) => { + format!("CondaPackage({}, {})", cp.file.base.cmd.join(" "), cp.file.state()) + } + Artifact::SystemInstallablePackage(sip) => { + format!("SystemInstallablePackage({}, {})", sip.file.base.cmd.join(" "), sip.arch) + } + Artifact::DockerImage(di) => format!("DockerImage({})", di.cmd.join(" ")), + Artifact::DockerRuntime(_) => "DockerRuntime".to_string(), + Artifact::HelmDeployment(hd) => format!("HelmDeployment({})", hd.release), + Artifact::PreCommit(pc) => format!("PreCommit({})", pc.cmd.join(" ")), + Artifact::Group(g) => { + let entries: Vec<_> = g.iter().map(|(k, v)| format!("{k}: {}", v.summary())).collect(); + format!("{{{}}}", entries.join(", ")) + } + } + } + + /// Execute this artifact. + /// `wait`: for Process artifacts, whether to wait for completion. + pub fn make(&self, cwd: &str, wait: bool) -> Result { + match self { + Artifact::FileArtifact(fa) => fa.make(cwd), + Artifact::Process(p) => p.make(cwd, None, None, wait), + Artifact::LockFile(lf) => lf.file.make(cwd), + Artifact::VirtualEnv(ve) => ve.file.make(cwd), + Artifact::CondaEnv(ce) => ce.file.make(cwd), + Artifact::EnvPack(ep) => ep.file.make(cwd), + Artifact::Wheel(w) => w.file.make(cwd), + Artifact::CondaPackage(cp) => cp.file.make(cwd), + Artifact::SystemInstallablePackage(sip) => sip.file.make(cwd), + Artifact::DockerImage(di) => di.make(cwd), + Artifact::DockerRuntime(dr) => dr.image.make(cwd), + Artifact::HelmDeployment(hd) => hd.make(cwd), + Artifact::PreCommit(pc) => run_to_completion(&pc.cmd, cwd), + Artifact::Group(g) => { + // make the first entry in the group + if let Some((_, first)) = g.iter().next() { + first.make(cwd, wait) + } else { + anyhow::bail!("empty artifact group") + } + } + } + } + + /// Return the state of this artifact. + pub fn state(&self) -> ArtifactState { + match self { + Artifact::FileArtifact(fa) => fa.state(), + Artifact::LockFile(lf) => lf.file.state(), + Artifact::VirtualEnv(ve) => ve.file.state(), + Artifact::CondaEnv(ce) => ce.file.state(), + Artifact::EnvPack(ep) => ep.file.state(), + Artifact::Wheel(w) => w.file.state(), + Artifact::CondaPackage(cp) => cp.file.state(), + Artifact::SystemInstallablePackage(sip) => sip.file.state(), + Artifact::HelmDeployment(hd) => hd.state(), + _ => ArtifactState::Unknown, + } + } + + /// Remove / stop this artifact. + pub fn clean(&self, cwd: &str) -> Result<()> { + match self { + Artifact::FileArtifact(fa) => fa.clean(), + Artifact::LockFile(lf) => lf.file.clean(), + Artifact::VirtualEnv(ve) => ve.file.clean(), + Artifact::CondaEnv(ce) => ce.file.clean(), + Artifact::EnvPack(ep) => ep.file.clean(), + Artifact::Wheel(w) => w.file.clean(), + Artifact::CondaPackage(cp) => cp.file.clean(), + Artifact::SystemInstallablePackage(sip) => sip.file.clean(), + Artifact::HelmDeployment(hd) => hd.clean(cwd), + _ => Ok(()), + } + } +} + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +/// Run a command synchronously, return stdout+stderr as MakeResult::Completed. +pub fn run_to_completion(cmd: &[String], cwd: &str) -> Result { + anyhow::ensure!(!cmd.is_empty(), "empty command"); + let output = StdCommand::new(&cmd[0]) + .args(&cmd[1..]) + .current_dir(cwd) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()) + .output() + .with_context(|| format!("running {}", cmd.join(" ")))?; + + let stdout = String::from_utf8_lossy(&output.stdout).to_string(); + let stderr = String::from_utf8_lossy(&output.stderr).to_string(); + + if !output.status.success() { + anyhow::bail!( + "command `{}` failed (exit {})\nstdout: {stdout}\nstderr: {stderr}", + cmd.join(" "), + output.status + ); + } + Ok(MakeResult::Completed { cmd: cmd.to_vec(), stdout, stderr }) +} + +/// Resolve glob and return existing paths (for FileArtifact state checks). +pub fn glob_paths(pattern: &str) -> Vec { + glob::glob(pattern) + .map(|paths| { + paths + .filter_map(|p| p.ok()) + .map(|p| p.to_string_lossy().to_string()) + .collect() + }) + .unwrap_or_default() +} diff --git a/projspec-rs/src/cli.rs b/projspec-rs/src/cli.rs new file mode 100644 index 0000000..f13ce10 --- /dev/null +++ b/projspec-rs/src/cli.rs @@ -0,0 +1,393 @@ +/// cli.rs — clap-based CLI mirroring projspec.__main__. +/// Commands: scan, make, create, info, version, library (list/clear/delete/add), config (get/set/unset/show/defaults) + +use clap::{Parser, Subcommand, Args}; +use anyhow::Result; + +use crate::config::Config; +use crate::create::all_creators; +use crate::library::ProjectLibrary; +use crate::project::Project; +use crate::spec::all_parsers; + +// --------------------------------------------------------------------------- +// Top-level CLI +// --------------------------------------------------------------------------- + +#[derive(Parser)] +#[command( + name = "projspec", + about = "Project introspection and management tool", + version = env!("CARGO_PKG_VERSION"), +)] +struct Cli { + #[command(subcommand)] + command: Commands, +} + +#[derive(Subcommand)] +enum Commands { + /// Scan a directory for project types and display results + Scan(ScanArgs), + /// Execute an artifact in a project + Make(MakeArgs), + /// Create a new project of the given type + Create(CreateArgs), + /// Display information about known spec/content/artifact types + Info(InfoArgs), + /// Print version + Version, + /// Interact with the project library + #[command(subcommand)] + Library(LibraryCommands), + /// Interact with projspec configuration + #[command(subcommand)] + Config(ConfigCommands), +} + +// --------------------------------------------------------------------------- +// Scan +// --------------------------------------------------------------------------- + +#[derive(Args)] +struct ScanArgs { + /// Path to scan (default: current directory) + #[arg(default_value = ".")] + path: String, + + /// Only scan for these spec types (comma-separated, camel or snake case) + #[arg(long, default_value = "")] + types: String, + + /// Exclude these spec types (comma-separated) + #[arg(long, default_value = "")] + xtypes: String, + + /// Descend into all child directories + #[arg(long)] + walk: bool, + + /// Output abbreviated summary + #[arg(long)] + summary: bool, + + /// Output JSON + #[arg(long)] + json: bool, + + /// Add to library after scanning + #[arg(long)] + library: bool, +} + +fn parse_types(s: &str) -> Option> { + if s.is_empty() || s == "ALL" { + None + } else { + Some(s.split(',').map(|t| t.trim().to_string()).collect()) + } +} + +// --------------------------------------------------------------------------- +// Make +// --------------------------------------------------------------------------- + +#[derive(Args)] +struct MakeArgs { + /// Artifact name: [spec.]type[.name] + artifact: String, + + /// Path to the project (default: current directory) + #[arg(default_value = ".")] + path: String, + + /// For Process artifacts: wait for completion (default true) + #[arg(long, default_value_t = true)] + wait: bool, + + /// Only scan for these spec types + #[arg(long, default_value = "")] + types: String, + + /// Exclude these spec types + #[arg(long, default_value = "")] + xtypes: String, +} + +// --------------------------------------------------------------------------- +// Create +// --------------------------------------------------------------------------- + +#[derive(Args)] +struct CreateArgs { + /// Spec type to create (snake_case) + #[arg(name = "type")] + spec_type: String, + + /// Target directory (default: current directory) + #[arg(default_value = ".")] + path: String, +} + +// --------------------------------------------------------------------------- +// Info +// --------------------------------------------------------------------------- + +#[derive(Args)] +struct InfoArgs { + /// Class name to show docs for; omit to list all + #[arg(default_value = "ALL")] + name: String, +} + +// --------------------------------------------------------------------------- +// Library sub-commands +// --------------------------------------------------------------------------- + +#[derive(Subcommand)] +enum LibraryCommands { + /// List all entries in the library + List { + /// Output as JSON + #[arg(long)] + json: bool, + }, + /// Clear all entries from the library + Clear, + /// Delete a specific entry from the library + Delete { + /// URL of the entry to delete (as shown in `library list`) + url: String, + }, + /// Scan a path and add it to the library + Add { + /// Path to add + #[arg(default_value = ".")] + path: String, + #[arg(long, default_value = "")] + types: String, + #[arg(long)] + walk: bool, + }, +} + +// --------------------------------------------------------------------------- +// Config sub-commands +// --------------------------------------------------------------------------- + +#[derive(Subcommand)] +enum ConfigCommands { + /// Get a config value + Get { key: String }, + /// Set a config value + Set { key: String, value: String }, + /// Unset a config value (reset to default) + Unset { key: String }, + /// Show current config + Show, + /// Show all defaults and their descriptions + Defaults, +} + +// --------------------------------------------------------------------------- +// Entry point +// --------------------------------------------------------------------------- + +pub fn run() { + let cli = Cli::parse(); + if let Err(e) = dispatch(cli) { + eprintln!("Error: {e:#}"); + std::process::exit(1); + } +} + +fn dispatch(cli: Cli) -> Result<()> { + match cli.command { + Commands::Version => { + println!("projspec-rs {}", env!("CARGO_PKG_VERSION")); + } + + Commands::Scan(args) => cmd_scan(args)?, + Commands::Make(args) => cmd_make(args)?, + Commands::Create(args) => cmd_create(args)?, + Commands::Info(args) => cmd_info(args), + + Commands::Library(sub) => { + let cfg = Config::load(); + let mut lib = ProjectLibrary::load(&cfg); + match sub { + LibraryCommands::List { json } => { + if json { + let map: serde_json::Map = lib.entries.iter() + .map(|(k, v)| (k.clone(), v.to_json())) + .collect(); + println!("{}", serde_json::to_string_pretty(&serde_json::Value::Object(map))?); + } else { + let mut urls: Vec<&str> = lib.entries.keys().map(|s| s.as_str()).collect(); + urls.sort(); + for url in urls { + let proj = &lib.entries[url]; + println!("{}", proj.text_summary(true)); + } + } + } + LibraryCommands::Clear => { + lib.clear()?; + eprintln!("Library cleared."); + } + LibraryCommands::Delete { url } => { + lib.delete_entry(&url)?; + eprintln!("Deleted {url}"); + } + LibraryCommands::Add { path, types, walk } => { + let types_list = parse_types(&types); + let proj = Project::new( + &path, + Some(walk), + types_list.as_deref(), + None, + None, + )?; + let url = proj.url.clone(); + lib.add_entry(&url, proj)?; + eprintln!("Added {url} to library."); + } + } + } + + Commands::Config(sub) => { + let mut cfg = Config::load(); + match sub { + ConfigCommands::Get { key } => { + match cfg.get(&key) { + Some(v) => println!("{v}"), + None => { + eprintln!("Unknown key: {key}"); + std::process::exit(1); + } + } + } + ConfigCommands::Set { key, value } => { + cfg.set(&key, &value)?; + cfg.save()?; + eprintln!("Set {key} = {value}"); + } + ConfigCommands::Unset { key } => { + cfg.unset(&key)?; + cfg.save()?; + eprintln!("Unset {key} (reset to default)"); + } + ConfigCommands::Show => { + println!("{}", serde_json::to_string_pretty(&cfg)?); + } + ConfigCommands::Defaults => { + let config_dir = std::env::var("PROJSPEC_CONFIG_DIR").unwrap_or_else(|_| "(unset)".to_string()); + println!("PROJSPEC_CONFIG_DIR: {config_dir}"); + println!(); + for (key, default, doc) in Config::defaults_table() { + println!("{key}: {default} -- {doc}"); + } + } + } + } + } + Ok(()) +} + +// --------------------------------------------------------------------------- +// Command implementations +// --------------------------------------------------------------------------- + +fn cmd_scan(args: ScanArgs) -> Result<()> { + let types_list = parse_types(&args.types); + let xtypes_list = parse_types(&args.xtypes); + + let proj = Project::new( + &args.path, + if args.walk { Some(true) } else { None }, + types_list.as_deref(), + xtypes_list.as_deref(), + None, + )?; + + if args.json { + println!("{}", serde_json::to_string_pretty(&proj.to_json())?); + } else if args.summary { + println!("{}", proj.text_summary(false)); + } else { + println!("{}", proj.text_full()); + } + + if args.library { + let cfg = Config::load(); + let mut lib = ProjectLibrary::load(&cfg); + let url = proj.url.clone(); + lib.add_entry(&url, proj)?; + eprintln!("Added to library: {url}"); + } + Ok(()) +} + +fn cmd_make(args: MakeArgs) -> Result<()> { + let types_list = parse_types(&args.types); + let xtypes_list = parse_types(&args.xtypes); + + let proj = Project::new( + &args.path, + None, + types_list.as_deref(), + xtypes_list.as_deref(), + None, + )?; + + let (artifact, cwd) = proj.find_artifact(&args.artifact) + .ok_or_else(|| anyhow::anyhow!("Artifact '{}' not found in project at '{}'", args.artifact, args.path))?; + + let result = artifact.make(cwd, args.wait)?; + println!("{result}"); + Ok(()) +} + +fn cmd_create(args: CreateArgs) -> Result<()> { + std::fs::create_dir_all(&args.path)?; + + let creators = all_creators(); + let creator = creators.iter().find(|c| c.name == args.spec_type) + .ok_or_else(|| { + let names: Vec<&str> = creators.iter().map(|c| c.name).collect(); + anyhow::anyhow!("Unknown spec type '{}'. Supported: {}", args.spec_type, names.join(", ")) + })?; + + let files = (creator.creator)(&args.path)?; + for f in &files { + println!("{f}"); + } + Ok(()) +} + +fn cmd_info(args: InfoArgs) { + if args.name == "ALL" { + // Print structured JSON of all known types + let specs: Vec = all_parsers().iter().map(|(name, _)| { + serde_json::json!({"name": name, "category": "spec"}) + }).collect(); + let creators: Vec = all_creators().iter().map(|c| { + serde_json::json!({"name": c.name, "doc": c.doc, "category": "create"}) + }).collect(); + let info = serde_json::json!({ + "specs": specs, + "creators": creators, + }); + println!("{}", serde_json::to_string_pretty(&info).unwrap()); + } else { + // look up by name + if let Some((name, _)) = all_parsers().iter().find(|(n, _)| *n == args.name.as_str()) { + println!("spec: {name}"); + } else if let Some(c) = all_creators().iter().find(|c| c.name == args.name.as_str()) { + println!("create: {} — {}", c.name, c.doc); + } else { + eprintln!("Name not found: {}", args.name); + std::process::exit(1); + } + } +} diff --git a/projspec-rs/src/config.rs b/projspec-rs/src/config.rs new file mode 100644 index 0000000..8736f0a --- /dev/null +++ b/projspec-rs/src/config.rs @@ -0,0 +1,157 @@ +/// Config — mirrors projspec.config. +/// Reads/writes ~/.config/projspec/projspec.json. +/// Individual values can be overridden by PROJSPEC_ env vars. + +use std::path::PathBuf; +use anyhow::Result; +use serde::{Deserialize, Serialize}; + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Config { + #[serde(default = "default_library_path_str")] + pub library_path: String, + #[serde(default = "default_scan_types")] + pub scan_types: Vec, + #[serde(default = "default_scan_max_files")] + pub scan_max_files: usize, + #[serde(default = "default_scan_max_size")] + pub scan_max_size: u64, + #[serde(default = "default_false")] + pub remote_artifact_status: bool, + #[serde(default = "default_true")] + pub capture_artifact_output: bool, +} + +fn default_config_dir() -> PathBuf { + dirs::config_dir() + .unwrap_or_else(|| PathBuf::from(".")) + .join("projspec") +} + +fn default_library_path_str() -> String { + default_config_dir() + .join("library.json") + .to_string_lossy() + .to_string() +} + +fn default_scan_types() -> Vec { + vec![ + ".py".into(), ".yaml".into(), ".yml".into(), + ".toml".into(), ".json".into(), ".md".into(), + ] +} +fn default_scan_max_files() -> usize { 100 } +fn default_scan_max_size() -> u64 { 5 * 1024 } +fn default_false() -> bool { false } +fn default_true() -> bool { true } + +impl Default for Config { + fn default() -> Self { + Config { + library_path: default_library_path_str(), + scan_types: default_scan_types(), + scan_max_files: default_scan_max_files(), + scan_max_size: default_scan_max_size(), + remote_artifact_status: false, + capture_artifact_output: true, + } + } +} + +impl Config { + /// Load config from file, falling back to defaults for missing values. + pub fn load() -> Self { + let path = Self::config_file(); + let mut cfg = if path.exists() { + let content = std::fs::read_to_string(&path).unwrap_or_default(); + serde_json::from_str(&content).unwrap_or_default() + } else { + Config::default() + }; + + // env-var overrides + if let Ok(v) = std::env::var("PROJSPEC_LIBRARY_PATH") { + cfg.library_path = v; + } + if let Ok(v) = std::env::var("PROJSPEC_SCAN_MAX_FILES") { + if let Ok(n) = v.parse() { cfg.scan_max_files = n; } + } + if let Ok(v) = std::env::var("PROJSPEC_SCAN_MAX_SIZE") { + if let Ok(n) = v.parse() { cfg.scan_max_size = n; } + } + if let Ok(v) = std::env::var("PROJSPEC_REMOTE_ARTIFACT_STATUS") { + cfg.remote_artifact_status = matches!(v.as_str(), "true" | "True" | "1" | "T"); + } + if let Ok(v) = std::env::var("PROJSPEC_CAPTURE_ARTIFACT_OUTPUT") { + cfg.capture_artifact_output = matches!(v.as_str(), "true" | "True" | "1" | "T"); + } + cfg + } + + pub fn save(&self) -> Result<()> { + let path = Self::config_file(); + if let Some(parent) = path.parent() { + std::fs::create_dir_all(parent)?; + } + let json = serde_json::to_string_pretty(self)?; + std::fs::write(&path, json)?; + Ok(()) + } + + pub fn config_file() -> PathBuf { + std::env::var("PROJSPEC_CONFIG_DIR") + .map(PathBuf::from) + .unwrap_or_else(|_| default_config_dir()) + .join("projspec.json") + } + + pub fn get(&self, key: &str) -> Option { + match key { + "library_path" => Some(self.library_path.clone()), + "scan_max_files" => Some(self.scan_max_files.to_string()), + "scan_max_size" => Some(self.scan_max_size.to_string()), + "remote_artifact_status" => Some(self.remote_artifact_status.to_string()), + "capture_artifact_output" => Some(self.capture_artifact_output.to_string()), + _ => None, + } + } + + pub fn set(&mut self, key: &str, value: &str) -> Result<()> { + match key { + "library_path" => self.library_path = value.to_string(), + "scan_max_files" => self.scan_max_files = value.parse()?, + "scan_max_size" => self.scan_max_size = value.parse()?, + "remote_artifact_status" => self.remote_artifact_status = value.parse()?, + "capture_artifact_output" => self.capture_artifact_output = value.parse()?, + _ => anyhow::bail!("unknown config key: {key}"), + } + Ok(()) + } + + pub fn unset(&mut self, key: &str) -> Result<()> { + // reset to default + let def = Config::default(); + match key { + "library_path" => self.library_path = def.library_path, + "scan_max_files" => self.scan_max_files = def.scan_max_files, + "scan_max_size" => self.scan_max_size = def.scan_max_size, + "remote_artifact_status" => self.remote_artifact_status = def.remote_artifact_status, + "capture_artifact_output" => self.capture_artifact_output = def.capture_artifact_output, + _ => anyhow::bail!("unknown config key: {key}"), + } + Ok(()) + } + + pub fn defaults_table() -> Vec<(&'static str, String, &'static str)> { + let d = Config::default(); + vec![ + ("library_path", d.library_path, "location of persisted project objects"), + ("scan_types", d.scan_types.join(", "), "file extensions automatically read for scanning"), + ("scan_max_files", d.scan_max_files.to_string(), "don't scan files if more than this number in the project"), + ("scan_max_size", d.scan_max_size.to_string(), "don't scan files bigger than this (bytes)"), + ("remote_artifact_status", d.remote_artifact_status.to_string(), "whether to check status for remote artifacts"), + ("capture_artifact_output", d.capture_artifact_output.to_string(), "capture subprocess output from Process artifacts"), + ] + } +} diff --git a/projspec-rs/src/content.rs b/projspec-rs/src/content.rs new file mode 100644 index 0000000..181b089 --- /dev/null +++ b/projspec-rs/src/content.rs @@ -0,0 +1,205 @@ +/// Content types — read-only descriptive information extracted from a project. +/// Each variant maps 1-to-1 to a Python BaseContent subclass. +/// All variants carry `serde` attributes so they round-trip to/from the +/// Python `to_dict(compact=False)` JSON format (`{"klass": ["content", ""], ...}`). + +use std::collections::HashMap; +use serde::{Deserialize, Serialize}; +use crate::types::{Precision, Stack}; + +// --------------------------------------------------------------------------- +// Environment +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Environment { + pub stack: Stack, + pub precision: Precision, + pub packages: Vec, + #[serde(default)] + pub channels: Vec, +} + +// --------------------------------------------------------------------------- +// Command +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(untagged)] +pub enum CmdValue { + List(Vec), + Str(String), +} + +impl CmdValue { + pub fn display(&self) -> String { + match self { + CmdValue::List(v) => v.join(" "), + CmdValue::Str(s) => s.clone(), + } + } +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Command { + pub cmd: CmdValue, +} + +// --------------------------------------------------------------------------- +// Metadata +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct DescriptiveMetadata { + pub meta: HashMap, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Citation { + pub meta: HashMap, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct License { + #[serde(default = "unknown_str")] + pub shortname: String, + #[serde(default = "unknown_str")] + pub fullname: String, + #[serde(default)] + pub url: String, +} + +fn unknown_str() -> String { + "unknown".to_string() +} + +// --------------------------------------------------------------------------- +// Package types +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct PythonPackage { + pub package_name: String, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct RustModule { + pub name: String, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct NodePackage { + pub name: String, +} + +// --------------------------------------------------------------------------- +// Data types +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct TabularData { + pub name: String, + #[serde(default)] + pub schema: serde_json::Value, + #[serde(default)] + pub metadata: HashMap, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct DataResource { + pub path: String, + pub format: String, + #[serde(default)] + pub modality: String, + #[serde(default)] + pub layout: String, + #[serde(default)] + pub file_count: u64, + #[serde(default)] + pub total_size: u64, + #[serde(default)] + pub schema: serde_json::Value, + #[serde(default)] + pub sample_path: String, + #[serde(default)] + pub metadata: HashMap, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct IntakeSource { + pub name: String, +} + +// --------------------------------------------------------------------------- +// Environment variables +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct EnvironmentVariables { + pub variables: HashMap>, +} + +// --------------------------------------------------------------------------- +// The main Content enum — one variant per concrete type +// --------------------------------------------------------------------------- + +/// A named group of Content items of the same type, keyed by an identifying label. +/// This mirrors the Python AttrDict nesting: `{"environment": {"default": ..., "test": ...}}`. +pub type ContentGroup = HashMap; + +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(tag = "klass_name", rename_all = "snake_case")] +pub enum Content { + // single items + Environment(Environment), + Command(Command), + DescriptiveMetadata(DescriptiveMetadata), + Citation(Citation), + License(License), + PythonPackage(PythonPackage), + RustModule(RustModule), + NodePackage(NodePackage), + TabularData(TabularData), + DataResource(DataResource), + IntakeSource(IntakeSource), + EnvironmentVariables(EnvironmentVariables), + // grouped (multiple of same type, keyed by name) + Group(ContentGroup), + // list form (used by some specs that return plain lists) + List(Vec), + // raw JSON catch-all (DVCRepo remotes, Django apps, etc.) + Raw(serde_json::Value), +} + +impl Content { + /// Human-readable one-line summary for text output. + pub fn summary(&self) -> String { + match self { + Content::Environment(e) => { + format!("Environment({}, {}, {} pkgs)", e.stack, e.precision, e.packages.len()) + } + Content::Command(c) => format!("Command({})", c.cmd.display()), + Content::DescriptiveMetadata(m) => { + let keys: Vec<_> = m.meta.keys().cloned().collect(); + format!("DescriptiveMetadata({})", keys.join(", ")) + } + Content::Citation(_) => "Citation".to_string(), + Content::License(l) => format!("License({})", l.shortname), + Content::PythonPackage(p) => format!("PythonPackage({})", p.package_name), + Content::RustModule(r) => format!("RustModule({})", r.name), + Content::NodePackage(n) => format!("NodePackage({})", n.name), + Content::TabularData(t) => format!("TabularData({})", t.name), + Content::DataResource(d) => format!("DataResource({}, {})", d.path, d.format), + Content::IntakeSource(i) => format!("IntakeSource({})", i.name), + Content::EnvironmentVariables(ev) => { + format!("EnvironmentVariables({} vars)", ev.variables.len()) + } + Content::Group(g) => { + let entries: Vec<_> = g.iter().map(|(k, v)| format!("{k}: {}", v.summary())).collect(); + format!("{{{}}}", entries.join(", ")) + } + Content::List(l) => format!("[{}]", l.iter().map(|c| c.summary()).collect::>().join(", ")), + Content::Raw(v) => format!("Raw({})", v), + } + } +} diff --git a/projspec-rs/src/create.rs b/projspec-rs/src/create.rs new file mode 100644 index 0000000..25a3d20 --- /dev/null +++ b/projspec-rs/src/create.rs @@ -0,0 +1,434 @@ +/// create.rs — Project scaffolding (ProjectSpec::create equivalent). +/// For each spec type that supports creation, writes the minimal files. + +use std::fs; +use std::path::Path; +use anyhow::{Context, Result}; + +pub struct CreateSpec { + pub name: &'static str, + pub doc: &'static str, + pub creator: fn(path: &str) -> Result>, +} + +pub fn all_creators() -> Vec { + vec![ + CreateSpec { name: "python_library", doc: "pyproject.toml + src layout", creator: create_python_library }, + CreateSpec { name: "python_code", doc: "__init__.py", creator: create_python_code }, + CreateSpec { name: "git_repo", doc: "git init", creator: create_git_repo }, + CreateSpec { name: "pixi", doc: "pixi.toml", creator: create_pixi }, + CreateSpec { name: "conda_recipe", doc: "meta.yaml", creator: create_conda_recipe }, + CreateSpec { name: "rattler_recipe", doc: "recipe.yaml", creator: create_rattler_recipe }, + CreateSpec { name: "golang", doc: "go.mod + hello.go", creator: create_golang }, + CreateSpec { name: "rust", doc: "cargo init", creator: create_rust }, + CreateSpec { name: "node", doc: "package.json", creator: create_node }, + CreateSpec { name: "helm_chart", doc: "Chart.yaml + templates/", creator: create_helm_chart }, + CreateSpec { name: "m_d_book", doc: "book.toml + src/", creator: create_mdbook }, + CreateSpec { name: "r_t_d", doc: ".readthedocs.yaml + docs/", creator: create_rtd }, + CreateSpec { name: "django", doc: "python -m django startproject", creator: create_django }, + CreateSpec { name: "streamlit", doc: ".streamlit/ + streamlit_app.py", creator: create_streamlit }, + CreateSpec { name: "marimo", doc: "marimo-app.py", creator: create_marimo }, + CreateSpec { name: "data_package", doc: "datapackage.json", creator: create_datapackage }, + CreateSpec { name: "backstage_catalog", doc: "catalog-info.yaml", creator: create_backstage }, + CreateSpec { name: "m_l_flow", doc: "MLFlow + conda.yaml", creator: create_mlflow }, + CreateSpec { name: "pyscript", doc: "pyscript.toml + index.html", creator: create_pyscript }, + CreateSpec { name: "intake_catalog", doc: "catalog.yaml", creator: create_intake_catalog }, + CreateSpec { name: "uv", doc: "uv init --lib", creator: create_uv }, + CreateSpec { name: "conda_project", doc: "conda-project.yml + environment.yml", creator: create_conda_project }, + ] +} + +// --------------------------------------------------------------------------- +// Individual creators +// --------------------------------------------------------------------------- + +fn write(path: &str, content: &str) -> Result<()> { + fs::write(path, content).with_context(|| format!("writing {path}")) +} + +fn mkdir(path: &str) -> Result<()> { + fs::create_dir_all(path).with_context(|| format!("creating directory {path}")) +} + +fn created(paths: &[&str]) -> Vec { + paths.iter().map(|p| p.to_string()).collect() +} + +fn create_python_library(path: &str) -> Result> { + let name = basename(path); + let pyproject = format!( +r#"[build-system] +requires = ["setuptools >= 77.0.3"] +build-backend = "setuptools.build_meta" + +[project] +name = "{name}" +version = "0.1.0" +dependencies = [] +requires-python = ">=3.10" +description = "A Python library" +"#); + write(&format!("{path}/pyproject.toml"), &pyproject)?; + mkdir(&format!("{path}/src/{name}"))?; + write(&format!("{path}/src/{name}/__init__.py"), "")?; + Ok(created(&[&format!("{path}/pyproject.toml"), &format!("{path}/src/{name}/__init__.py")])) +} + +fn create_python_code(path: &str) -> Result> { + write(&format!("{path}/__init__.py"), "")?; + Ok(created(&[&format!("{path}/__init__.py")])) +} + +fn create_git_repo(path: &str) -> Result> { + std::process::Command::new("git").args(["init"]).current_dir(path).status() + .context("git init failed")?; + Ok(vec![format!("{path}/.git")]) +} + +fn create_pixi(path: &str) -> Result> { + let name = basename(path); + let content = format!( +r#"[workspace] +name = "{name}" +channels = ["conda-forge"] +platforms = ["osx-arm64", "linux-64", "win-64"] +version = "0.1.0" + +[dependencies] +python = ">=3.10" + +[tasks] +hello = "echo 'hello world'" +"#); + write(&format!("{path}/pixi.toml"), &content)?; + Ok(created(&[&format!("{path}/pixi.toml")])) +} + +fn create_conda_recipe(path: &str) -> Result> { + let name = basename(path); + let content = format!( +r#"package: + name: {name} + version: 0.1.0 + +source: + path: . + +requirements: + build: + - python >=3.10 + run: + - python >=3.10 +"#); + write(&format!("{path}/meta.yaml"), &content)?; + Ok(created(&[&format!("{path}/meta.yaml")])) +} + +fn create_rattler_recipe(path: &str) -> Result> { + let name = basename(path); + let content = format!( +r#"context: + name: {name} + version: "0.1.0" + +package: + name: ${{{{ name }}}} + version: ${{{{ version }}}} + +source: + path: . + +requirements: + run: + - python >=3.10 +"#); + write(&format!("{path}/recipe.yaml"), &content)?; + Ok(created(&[&format!("{path}/recipe.yaml")])) +} + +fn create_golang(path: &str) -> Result> { + let module = format!("example.com/{}", basename(path)); + write(&format!("{path}/go.mod"), &format!("module {module}\n\ngo 1.21\n"))?; + write(&format!("{path}/hello.go"), +r#"package main + +import "fmt" + +func main() { + fmt.Println("Hello, World!") +} +"#)?; + Ok(created(&[&format!("{path}/go.mod"), &format!("{path}/hello.go")])) +} + +fn create_rust(path: &str) -> Result> { + std::process::Command::new("cargo").args(["init"]).current_dir(path).status() + .context("cargo init failed")?; + Ok(vec![format!("{path}/Cargo.toml"), format!("{path}/src/main.rs")]) +} + +fn create_node(path: &str) -> Result> { + let name = basename(path); + let content = format!( +r#"{{ + "name": "{name}", + "version": "0.1.0", + "description": "", + "main": "index.js", + "scripts": {{ + "build": "echo 'build'" + }}, + "dependencies": {{}} +}} +"#); + write(&format!("{path}/package.json"), &content)?; + Ok(created(&[&format!("{path}/package.json")])) +} + +fn create_helm_chart(path: &str) -> Result> { + let name = basename(path); + write(&format!("{path}/Chart.yaml"), &format!( +r#"apiVersion: v2 +name: {name} +description: A Helm chart for {name} +type: application +version: 0.1.0 +appVersion: "1.0.0" +"#))?; + write(&format!("{path}/values.yaml"), +r#"replicaCount: 1 +image: + repository: nginx + tag: latest + pullPolicy: IfNotPresent +"#)?; + mkdir(&format!("{path}/templates"))?; + write(&format!("{path}/templates/deployment.yaml"), +r#"apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ .Release.Name }} +spec: + replicas: {{ .Values.replicaCount }} +"#)?; + Ok(created(&[&format!("{path}/Chart.yaml"), &format!("{path}/values.yaml")])) +} + +fn create_mdbook(path: &str) -> Result> { + let name = basename(path); + write(&format!("{path}/book.toml"), &format!( +r#"[book] +title = "{name}" +authors = [] +description = "" + +[build] +build-dir = "book" +"#))?; + mkdir(&format!("{path}/src"))?; + write(&format!("{path}/src/SUMMARY.md"), "# Summary\n\n- [Introduction](./introduction.md)\n")?; + write(&format!("{path}/src/introduction.md"), &format!("# Introduction\n\nWelcome to {name}.\n"))?; + Ok(created(&[&format!("{path}/book.toml"), &format!("{path}/src/SUMMARY.md")])) +} + +fn create_rtd(path: &str) -> Result> { + let name = basename(path); + write(&format!("{path}/.readthedocs.yaml"), +r#"version: 2 + +build: + os: ubuntu-24.04 + tools: + python: "3.12" + +sphinx: + configuration: docs/conf.py + +python: + install: + - requirements: docs/requirements.txt +"#)?; + mkdir(&format!("{path}/docs"))?; + write(&format!("{path}/docs/conf.py"), &format!( +r#"project = "{name}" +extensions = [] +html_theme = "alabaster" +"#))?; + write(&format!("{path}/docs/index.rst"), &format!("{name}\n{}\n\n.. toctree::\n :maxdepth: 2\n", "=".repeat(name.len())))?; + write(&format!("{path}/docs/requirements.txt"), "sphinx\n")?; + Ok(created(&[&format!("{path}/.readthedocs.yaml"), &format!("{path}/docs/conf.py")])) +} + +fn create_django(path: &str) -> Result> { + std::process::Command::new("python") + .args(["-m", "django", "startproject", "mysite", path]) + .status() + .context("django startproject failed")?; + Ok(vec![format!("{path}/manage.py"), format!("{path}/mysite/")]) +} + +fn create_streamlit(path: &str) -> Result> { + mkdir(&format!("{path}/.streamlit"))?; + write(&format!("{path}/.streamlit/config.toml"), +r#"[global] + +[logger] +level = "info" + +[server] +headless = true +"#)?; + write(&format!("{path}/streamlit_app.py"), +r#"import streamlit as st +st.title("My Streamlit App") +st.write("Hello, world!") +"#)?; + write(&format!("{path}/requirements.txt"), "streamlit\n")?; + Ok(created(&[&format!("{path}/streamlit_app.py"), &format!("{path}/.streamlit/config.toml")])) +} + +fn create_marimo(path: &str) -> Result> { + write(&format!("{path}/marimo-app.py"), +r#"import marimo +__generated_with = "0.19.11" +app = marimo.App() + +@app.cell +def _(): + import marimo as mo + return "Hello, marimo!" + +if __name__ == "__main__": + app.run() +"#)?; + Ok(created(&[&format!("{path}/marimo-app.py")])) +} + +fn create_datapackage(path: &str) -> Result> { + write(&format!("{path}/datapackage.json"), +r#"{ + "name": "my-data-package", + "title": "My Data Package", + "description": "An example data package", + "licenses": [{"name": "CC0-1.0", "path": "https://creativecommons.org/publicdomain/zero/1.0/"}], + "resources": [{"name": "data", "path": "data.csv", "format": "csv"}] +} +"#)?; + Ok(created(&[&format!("{path}/datapackage.json")])) +} + +fn create_backstage(path: &str) -> Result> { + let name = basename(path); + write(&format!("{path}/catalog-info.yaml"), &format!( +r#"apiVersion: backstage.io/v1alpha1 +kind: Component +metadata: + name: {name} + description: A {name} component +spec: + type: service + lifecycle: experimental + owner: team-default +"#))?; + Ok(created(&[&format!("{path}/catalog-info.yaml")])) +} + +fn create_mlflow(path: &str) -> Result> { + write(&format!("{path}/MLFlow"), +r#"name: tutorial + +conda_env: conda.yaml + +entry_points: + main: + parameters: + alpha: {type: float, default: 0.5} + command: "python train.py {alpha}" +"#)?; + write(&format!("{path}/conda.yaml"), +r#"name: ml-project +channels: + - conda-forge +dependencies: + - python=3.10 +"#)?; + write(&format!("{path}/train.py"), "# MLFlow training code\n")?; + Ok(created(&[&format!("{path}/MLFlow"), &format!("{path}/conda.yaml"), &format!("{path}/train.py")])) +} + +fn create_pyscript(path: &str) -> Result> { + write(&format!("{path}/pyscript.toml"), +r#"name = "pyscript-app" +description = "A PyScript app" +packages = [] +"#)?; + write(&format!("{path}/main.py"), "# Replace with your code\nprint('Hello, world!')\n")?; + write(&format!("{path}/index.html"), +r#" + + + PyScript App + + + + + + + +"#)?; + Ok(created(&[&format!("{path}/pyscript.toml"), &format!("{path}/main.py"), &format!("{path}/index.html")])) +} + +fn create_intake_catalog(path: &str) -> Result> { + write(&format!("{path}/catalog.yaml"), +r#"aliases: {} +data: {} +entries: {} +metadata: {} +user_parameters: {} +version: 2 +"#)?; + Ok(created(&[&format!("{path}/catalog.yaml")])) +} + +fn create_uv(path: &str) -> Result> { + std::process::Command::new("uv") + .args(["init", "--lib", "--package", "--vcs", "none"]) + .current_dir(path) + .status() + .context("uv init failed")?; + Ok(vec![ + format!("{path}/pyproject.toml"), + format!("{path}/src/"), + ]) +} + +fn create_conda_project(path: &str) -> Result> { + let name = basename(path); + write(&format!("{path}/environment.yml"), +r#"channels: + - conda-forge +dependencies: + - python >=3.10 +"#)?; + write(&format!("{path}/conda-project.yml"), &format!( +r#"name: {name} +environments: + default: + - environment.yml +variables: {{}} +commands: {{}} +"#))?; + Ok(created(&[&format!("{path}/environment.yml"), &format!("{path}/conda-project.yml")])) +} + +// --------------------------------------------------------------------------- +// Utility +// --------------------------------------------------------------------------- + +fn basename(path: &str) -> String { + Path::new(path) + .file_name() + .map(|s| s.to_string_lossy().to_string()) + .unwrap_or_else(|| "project".to_string()) +} diff --git a/projspec-rs/src/fs.rs b/projspec-rs/src/fs.rs new file mode 100644 index 0000000..15a2ed5 --- /dev/null +++ b/projspec-rs/src/fs.rs @@ -0,0 +1,241 @@ +/// fs.rs — Virtual filesystem abstraction backed by opendal. +/// +/// Design decisions: +/// +/// D-FS1: We use `opendal::blocking::Operator` throughout. +/// The parsers are synchronous (they do string processing, not IO-heavy work), +/// and the blocking operator is the clearest fit. The tokio runtime is created +/// once in `operator_from_url()` and lives for the duration of the scan. +/// +/// D-FS2: `Vfs` is a thin struct wrapping `opendal::blocking::Operator`, not a +/// trait. This avoids `dyn Vfs` boxing complexity and lets the compiler inline +/// all calls. If heterogeneous backends per-parse are ever needed, promote to +/// a trait at that point. +/// +/// D-FS3: Paths inside the operator are always *relative* to the operator's root. +/// The operator is configured with root = the project directory. So listing +/// "/" gives the project root entries, and reading "pyproject.toml" reads the +/// file at root/pyproject.toml. The caller (project.rs) strips the url prefix +/// before calling Vfs methods. +/// +/// D-FS4: opendal::Http service only supports `read` and `stat` — no `list`. +/// We work around this: the caller must supply the file listing when constructing +/// a project from an HTTP backend. In practice we use the HTTP service for +/// reading specific files whose names are already known from the listing. +/// For tests we provide the basenames directly. +/// +/// D-FS5: `operator_from_url()` reads configuration from environment variables +/// only (no explicit config struct yet). Each backend reads its own standard +/// env vars (AWS_ACCESS_KEY_ID, etc.) because opendal's S3 builder loads them +/// automatically when `disable_config_load` is NOT called. + +use std::collections::HashMap; +use std::sync::OnceLock; + +use anyhow::{Context, Result}; +use opendal::blocking::Operator as BlockingOp; +use opendal::{services, ErrorKind, Operator}; + +// --------------------------------------------------------------------------- +// Runtime singleton — opendal's blocking wrapper requires an active tokio Handle +// --------------------------------------------------------------------------- + +static RUNTIME: OnceLock = OnceLock::new(); + +fn get_runtime() -> &'static tokio::runtime::Runtime { + RUNTIME.get_or_init(|| { + tokio::runtime::Builder::new_multi_thread() + .enable_all() + .build() + .expect("failed to build tokio runtime for opendal") + }) +} + +fn make_blocking(op: Operator) -> Result { + let _guard = get_runtime().enter(); + BlockingOp::new(op).map_err(|e| anyhow::anyhow!("opendal blocking: {e}")) +} + +// --------------------------------------------------------------------------- +// Vfs — thin wrapper around blocking::Operator +// --------------------------------------------------------------------------- + +#[derive(Clone)] +pub struct Vfs { + pub op: BlockingOp, + /// Human-readable scheme label for error messages / display. + pub scheme: String, +} + +impl Vfs { + // ------------------------------------------------------------------ + // Constructors + // ------------------------------------------------------------------ + + /// Local filesystem backend, rooted at `path`. + pub fn local(path: &str) -> Result { + let builder = services::Fs::default().root(path); + let op = make_blocking(Operator::new(builder)?.finish())?; + Ok(Vfs { op, scheme: "file".into() }) + } + + /// In-memory backend. Caller populates it via `write_bytes`. + pub fn memory() -> Result { + let op = make_blocking(Operator::new(services::Memory::default())?.finish())?; + Ok(Vfs { op, scheme: "memory".into() }) + } + + /// HTTP read-only backend. `endpoint` is e.g. `http://127.0.0.1:8080`. + /// `root` is the path prefix on the server (e.g. `""` or `"/projects/foo"`). + pub fn http(endpoint: &str, root: &str) -> Result { + let mut builder = services::Http::default().endpoint(endpoint); + if !root.is_empty() { + builder = builder.root(root); + } + let op = make_blocking(Operator::new(builder)?.finish())?; + Ok(Vfs { op, scheme: "http".into() }) + } + + /// S3 backend. All configuration comes from environment variables: + /// AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, + /// AWS_ENDPOINT_URL (for moto/minio), AWS_DEFAULT_REGION. + /// `bucket` and `root` (key prefix) are required. + pub fn s3(bucket: &str, root: &str, endpoint: Option<&str>, region: Option<&str>) -> Result { + let mut builder = services::S3::default().bucket(bucket); + if !root.is_empty() { + builder = builder.root(root); + } + if let Some(ep) = endpoint { + builder = builder.endpoint(ep); + } + if let Some(r) = region { + builder = builder.region(r); + } + // Env-var credentials are loaded automatically; we do NOT call + // disable_config_load() so that AWS_ACCESS_KEY_ID etc. are respected. + let op = make_blocking(Operator::new(builder)?.finish())?; + Ok(Vfs { op, scheme: "s3".into() }) + } + + // ------------------------------------------------------------------ + // Write helper (used by test helpers and memory backend setup) + // ------------------------------------------------------------------ + + pub fn write_bytes(&self, path: &str, data: Vec) -> Result<()> { + self.op.write(path, data) + .with_context(|| format!("write {path}"))?; + Ok(()) + } + + // ------------------------------------------------------------------ + // Read operations — used by ParseCtx + // ------------------------------------------------------------------ + + /// Read a file, returning its UTF-8 content. Returns None on any error. + pub fn read_text(&self, path: &str) -> Option { + let buf = self.op.read(path).ok()?; + String::from_utf8(buf.to_bytes().to_vec()).ok() + } + + /// Check whether a path exists (file or dir). + pub fn exists(&self, path: &str) -> bool { + self.op.exists(path).unwrap_or(false) + } + + /// List direct children of a directory path (e.g. `""` = root). + /// Returns basenames only (no leading slash). + pub fn list_dir(&self, path: &str) -> Vec { + // opendal requires dirs to end with "/"; root is "/" + let listing_path = if path.is_empty() || path == "/" { + "/".to_string() + } else if path.ends_with('/') { + path.to_string() + } else { + format!("{path}/") + }; + + match self.op.list(&listing_path) { + Ok(entries) => entries + .into_iter() + .map(|e| { + // strip trailing "/" from directory names + e.path().trim_start_matches('/').trim_end_matches('/').to_string() + }) + .filter(|s| !s.is_empty()) + .collect(), + Err(_) => vec![], + } + } + + /// List direct children and return {basename: relative_path} map. + /// For a local backend the relative_path equals the basename. + /// For S3/HTTP we use the same relative path (object key within root). + pub fn basenames(&self) -> HashMap { + self.list_dir("") + .into_iter() + .map(|name| (name.clone(), name)) + .collect() + } +} + +// --------------------------------------------------------------------------- +// operator_from_url — build a Vfs from a URL string +// --------------------------------------------------------------------------- +// +// Supported URL schemes: +// /abs/path or ./rel/path → local fs (services::Fs) +// file:///abs/path → local fs +// s3://bucket/prefix → S3 (env-var creds) +// http://host/root → HTTP read-only +// https://host/root → HTTP read-only +// memory:// → in-memory (only useful for tests via Vfs::memory()) +// +// For S3: the URL host is the bucket, the path is the root prefix. +// Region and endpoint are read from AWS_REGION / AWS_ENDPOINT_URL env vars. + +pub fn vfs_from_url(url: &str) -> Result<(Vfs, String)> { + if url.starts_with("s3://") { + let without_scheme = &url[5..]; + let (bucket, root) = without_scheme.split_once('/').unwrap_or((without_scheme, "")); + let endpoint = std::env::var("AWS_ENDPOINT_URL").ok(); + let region = std::env::var("AWS_REGION") + .or_else(|_| std::env::var("AWS_DEFAULT_REGION")) + .ok(); + let vfs = Vfs::s3( + bucket, + if root.is_empty() { "/" } else { root }, + endpoint.as_deref(), + region.as_deref(), + )?; + // canonical URL is the s3:// URL itself (no local path) + return Ok((vfs, url.to_string())); + } + + if url.starts_with("http://") || url.starts_with("https://") { + // Split endpoint from root path: http://host[:port]/root/path + // We set endpoint = scheme://host[:port] and root = /root/path + let without_scheme = if url.starts_with("https://") { &url[8..] } else { &url[7..] }; + let scheme_prefix = if url.starts_with("https://") { "https://" } else { "http://" }; + let (host_port, root_path) = without_scheme.split_once('/').unwrap_or((without_scheme, "")); + let endpoint = format!("{scheme_prefix}{host_port}"); + let root = if root_path.is_empty() { "/".to_string() } else { format!("/{root_path}") }; + let vfs = Vfs::http(&endpoint, &root)?; + return Ok((vfs, url.to_string())); + } + + if url.starts_with("file://") { + let path = &url[7..]; + let canonical = std::fs::canonicalize(path) + .map(|p| p.to_string_lossy().to_string()) + .unwrap_or_else(|_| path.to_string()); + let vfs = Vfs::local(&canonical)?; + return Ok((vfs, canonical)); + } + + // Default: treat as local path + let canonical = std::fs::canonicalize(url) + .map(|p| p.to_string_lossy().to_string()) + .unwrap_or_else(|_| url.to_string()); + let vfs = Vfs::local(&canonical)?; + Ok((vfs, canonical)) +} diff --git a/projspec-rs/src/lib.rs b/projspec-rs/src/lib.rs new file mode 100644 index 0000000..13f6be1 --- /dev/null +++ b/projspec-rs/src/lib.rs @@ -0,0 +1,14 @@ +// lib.rs — re-exports all internal modules for integration tests. +// This exists only to enable `#[cfg(test)]` integration tests to import +// internal modules without the `#[path]` hack (which causes duplicate imports). + +pub mod artifact; +pub mod cli; +pub mod config; +pub mod content; +pub mod create; +pub mod fs; +pub mod library; +pub mod project; +pub mod spec; +pub mod types; diff --git a/projspec-rs/src/library.rs b/projspec-rs/src/library.rs new file mode 100644 index 0000000..8121104 --- /dev/null +++ b/projspec-rs/src/library.rs @@ -0,0 +1,92 @@ +/// library.rs — ProjectLibrary persistence. +/// Mirrors projspec.library.ProjectLibrary. +/// Format: JSON file at library_path, dict of {url: project_dict}. + +use std::collections::HashMap; +use anyhow::Result; +use serde_json::Value as JsVal; + +use crate::config::Config; +use crate::project::Project; + +pub struct ProjectLibrary { + pub path: String, + pub entries: HashMap, +} + +impl ProjectLibrary { + pub fn load(config: &Config) -> Self { + let path = config.library_path.clone(); + let entries = load_entries(&path); + ProjectLibrary { path, entries } + } + + pub fn load_at(path: &str) -> Self { + ProjectLibrary { + path: path.to_string(), + entries: load_entries(path), + } + } + + pub fn save(&self) -> Result<()> { + if let Some(parent) = std::path::Path::new(&self.path).parent() { + std::fs::create_dir_all(parent)?; + } + let map: HashMap = self.entries.iter() + .map(|(k, v)| (k.clone(), v.to_json())) + .collect(); + let json = serde_json::to_string_pretty(&map)?; + std::fs::write(&self.path, json)?; + Ok(()) + } + + pub fn add_entry(&mut self, url: &str, proj: Project) -> Result<()> { + self.entries.insert(url.to_string(), proj); + self.save() + } + + pub fn delete_entry(&mut self, url: &str) -> Result<()> { + if self.entries.remove(url).is_none() { + anyhow::bail!("URL not found in library: {url}"); + } + self.save() + } + + pub fn clear(&mut self) -> Result<()> { + self.entries.clear(); + if std::path::Path::new(&self.path).exists() { + std::fs::remove_file(&self.path)?; + } + Ok(()) + } + + /// Filter entries by spec/artifact/content names. + /// Each filter is ("spec"|"artifact"|"content", name). + pub fn filter(&self, filters: &[(&str, &str)]) -> Vec<(&str, &Project)> { + self.entries.iter() + .filter(|(_, proj)| { + filters.iter().all(|(cat, val)| match *cat { + "spec" => proj.has_spec(val), + "artifact" => proj.all_artifacts().iter().any(|(k, _)| *k == *val), + "content" => proj.all_contents().iter().any(|(k, _)| *k == *val), + _ => true, + }) + }) + .map(|(k, v)| (k.as_str(), v)) + .collect() + } +} + +fn load_entries(path: &str) -> HashMap { + let text = match std::fs::read_to_string(path) { + Ok(t) => t, + Err(_) => return HashMap::new(), + }; + let map: HashMap = match serde_json::from_str(&text) { + Ok(m) => m, + Err(_) => return HashMap::new(), + }; + map.into_iter() + .filter_map(|(k, v)| Project::from_json(&v).ok().map(|p| (k, p))) + .collect() +} diff --git a/projspec-rs/src/main.rs b/projspec-rs/src/main.rs new file mode 100644 index 0000000..c0670de --- /dev/null +++ b/projspec-rs/src/main.rs @@ -0,0 +1,33 @@ +// projspec-rs — Rust port of the projspec library and CLI +// +// Allow dead_code and unused at crate level: many items are public library API +// that may not be called by the CLI binary itself. +#![allow(dead_code)] +#![allow(unused_imports)] +// +// Module layout mirrors the Python package: +// types — enums (Stack, Precision, Architecture) +// content — BaseContent variants +// artifact — BaseArtifact variants + execution +// spec — ProjectSpec implementations (match + parse) +// project — Project struct + resolve logic +// fs — Virtual filesystem abstraction (opendal-backed) +// library — ProjectLibrary (JSON persistence) +// config — Config file read/write +// create — Project scaffolding (ProjectSpec::create) +// cli — clap-based CLI (main entry point lives here) + +mod artifact; +mod cli; +mod config; +mod content; +mod create; +mod fs; +mod library; +mod project; +mod spec; +mod types; + +fn main() { + cli::run(); +} diff --git a/projspec-rs/src/project.rs b/projspec-rs/src/project.rs new file mode 100644 index 0000000..8a1138a --- /dev/null +++ b/projspec-rs/src/project.rs @@ -0,0 +1,543 @@ +/// project.rs — Project struct and resolve logic. +/// Mirrors projspec.proj.base.Project. + +use std::collections::HashMap; +use anyhow::Result; +use rayon::prelude::*; +use serde::{Deserialize, Serialize}; +use serde_json::Value as JsVal; + +use crate::artifact::Artifact; +use crate::content::Content; +use crate::fs::{Vfs, vfs_from_url}; +use crate::spec::{all_parsers, ParseCtx}; + +// --------------------------------------------------------------------------- +// Default exclusions when walking child directories +// --------------------------------------------------------------------------- + +fn default_excludes() -> std::collections::HashSet { + ["bld", "build", "dist", "env", "envs", "htmlcov", "node_modules"] + .iter().map(|s| s.to_string()).collect() +} + +// --------------------------------------------------------------------------- +// ParsedSpec — a matched spec with its contents and artifacts +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ParsedSpec { + pub name: String, + pub spec_doc: String, + pub contents: HashMap, + pub artifacts: HashMap, +} + +// --------------------------------------------------------------------------- +// Project +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Project { + /// Original path / URL as supplied by the caller. + pub path: String, + /// Canonical URL (absolute local path, or s3:// / http:// URL). + pub url: String, + /// Matched project specs (not extras). + pub specs: HashMap, + /// Contents from ProjectExtra specs (merged into root). + pub contents: HashMap, + /// Artifacts from ProjectExtra specs (merged into root). + pub artifacts: HashMap, + /// Child projects found by walking subdirectories (local only). + pub children: HashMap, +} + +impl Project { + // ----------------------------------------------------------------------- + // Constructors + // ----------------------------------------------------------------------- + + /// Parse a local path or URL, building a Vfs automatically. + pub fn new( + path: &str, + walk: Option, + types: Option<&[String]>, + xtypes: Option<&[String]>, + excludes: Option<&std::collections::HashSet>, + ) -> Result { + let (vfs, url) = vfs_from_url(path)?; + Self::new_with_vfs(path, &url, vfs, walk, types, xtypes, excludes) + } + + /// Parse a project given an already-constructed Vfs. + /// `display_path` is used as the `path` field (user-facing). + /// `url` is the canonical location identifier. + /// The Vfs root must be set to the project root already. + pub fn new_with_vfs( + display_path: &str, + url: &str, + vfs: Vfs, + walk: Option, + types: Option<&[String]>, + xtypes: Option<&[String]>, + excludes: Option<&std::collections::HashSet>, + ) -> Result { + let default_exc = default_excludes(); + let excludes = excludes.unwrap_or(&default_exc); + + let mut proj = Project { + path: display_path.to_string(), + url: url.to_string(), + specs: HashMap::new(), + contents: HashMap::new(), + artifacts: HashMap::new(), + children: HashMap::new(), + }; + + proj.resolve(url, &vfs, walk, types, xtypes, excludes)?; + Ok(proj) + } + + fn resolve( + &mut self, + url: &str, + vfs: &Vfs, + walk: Option, + types: Option<&[String]>, + xtypes: Option<&[String]>, + excludes: &std::collections::HashSet, + ) -> Result<()> { + // Build basenames map via Vfs + let basenames = vfs.basenames(); + + // Parse pyproject.toml via Vfs (needed before prefetch so parsers can + // filter on build-backend / tool table without re-reading the file). + // pyproject.toml is intentionally read here rather than in the prefetch + // because it drives which other files are worth reading. + let pyproject: JsVal = basenames.get("pyproject.toml") + .and_then(|rel| vfs.read_text(rel)) + .and_then(|text| toml::from_str::(&text).ok()) + .map(toml_to_json) + .unwrap_or(JsVal::Object(Default::default())); + + // --- Concurrent prefetch --- + // Build the lists of files and sub-paths to check in parallel, then + // fire all reads/stats concurrently via rayon. This collapses N + // sequential network round-trips into ~1 round-trip worth of latency + // for HTTP and S3 backends. + let file_names = files_to_prefetch(&basenames, &pyproject); + let sub_paths = subpaths_to_prefetch(); + + // Parallel file reads: only fetch files that are present in basenames. + let file_cache: HashMap = file_names + .par_iter() + .filter_map(|name| { + let rel = basenames.get(*name)?; + let text = vfs.read_text(rel)?; + Some((name.to_string(), text)) + }) + .collect(); + + // Parallel existence checks for sub-paths below the root. + let exists_cache: HashMap = sub_paths + .par_iter() + .map(|path| (path.to_string(), vfs.exists(path))) + .collect(); + + let ctx = ParseCtx { + url, + basenames: &basenames, + pyproject: &pyproject, + vfs, + file_cache: &file_cache, + exists_cache: &exists_cache, + }; + + // Run all parsers + for (spec_name, parser_fn) in all_parsers() { + if let Some(types) = types { + if !types.is_empty() && !types.iter().any(|t| camel_or_snake_eq(t, spec_name)) { + continue; + } + } + if let Some(xtypes) = xtypes { + if xtypes.iter().any(|t| camel_or_snake_eq(t, spec_name)) { + continue; + } + } + + if let Some(result) = parser_fn(&ctx) { + if result.is_extra { + self.contents.extend(result.contents); + self.artifacts.extend(result.artifacts); + } else { + self.specs.insert(result.spec_name.clone(), ParsedSpec { + name: result.spec_name, + spec_doc: result.spec_doc, + contents: result.contents, + artifacts: result.artifacts, + }); + } + } + } + + // Walk child directories — only supported for local Fs backend + // (opendal::Http and S3 list_dir would require recursive listing) + let should_walk = match walk { + Some(true) => true, + Some(false) => false, + None => self.specs.is_empty(), + }; + + if should_walk && vfs.scheme == "file" { + for basename in vfs.list_dir("") { + if excludes.contains(&basename) || basename.starts_with('.') || basename.starts_with('_') { + continue; + } + // Check it is a directory by trying to list it + let sub_entries = vfs.list_dir(&basename); + if sub_entries.is_empty() { continue; } + + let child_url = format!("{url}/{basename}"); + if let Ok(child_vfs) = Vfs::local(&child_url) { + let child_result = Project::new_with_vfs( + &child_url, + &child_url, + child_vfs, + walk.map(|_| false), + types, + xtypes, + Some(excludes), + ); + if let Ok(child) = child_result { + if !child.specs.is_empty() { + self.children.insert(basename, child); + } else if !child.children.is_empty() { + for (s2, p) in child.children { + self.children.insert(format!("{basename}/{s2}"), p); + } + } + } + } + } + } + Ok(()) + } + + // ----------------------------------------------------------------------- + // Query helpers + // ----------------------------------------------------------------------- + + pub fn has_spec(&self, name: &str) -> bool { + self.specs.contains_key(name) + || self.children.values().any(|c| c.has_spec(name)) + } + + pub fn all_artifacts(&self) -> Vec<(&str, &Artifact)> { + let mut out: Vec<(&str, &Artifact)> = vec![]; + for spec in self.specs.values() { + for (k, a) in &spec.artifacts { out.push((k, a)); } + } + for (k, a) in &self.artifacts { out.push((k, a)); } + out + } + + pub fn all_contents(&self) -> Vec<(&str, &Content)> { + let mut out: Vec<(&str, &Content)> = vec![]; + for spec in self.specs.values() { + for (k, c) in &spec.contents { out.push((k, c)); } + } + for (k, c) in &self.contents { out.push((k, c)); } + out + } + + /// Find an artifact by qualified name: `[spec.]type[.name]` + pub fn find_artifact(&self, qname: &str) -> Option<(&Artifact, &str)> { + let parts: Vec<&str> = qname.splitn(3, '.').collect(); + match parts.as_slice() { + [artifact_type] => { + for spec in self.specs.values() { + if let Some(a) = spec.artifacts.get(*artifact_type) { + return Some((a, &self.url)); + } + } + self.artifacts.get(*artifact_type).map(|a| (a, self.url.as_str())) + } + [spec_name, artifact_type] => { + let spec = self.specs.get(*spec_name)?; + spec.artifacts.get(*artifact_type).map(|a| (a, self.url.as_str())) + } + [spec_name, artifact_type, item_name] => { + let spec = self.specs.get(*spec_name)?; + let art = spec.artifacts.get(*artifact_type)?; + if let Artifact::Group(g) = art { + g.get(*item_name).map(|a| (a, self.url.as_str())) + } else { + Some((art, self.url.as_str())) + } + } + _ => None, + } + } + + // ----------------------------------------------------------------------- + // Text output + // ----------------------------------------------------------------------- + + pub fn text_summary(&self, bare: bool) -> String { + let header = if bare { + self.url.clone() + } else { + format!("", self.url) + }; + let spec_names: Vec = self.specs.keys().cloned().collect(); + let mut lines = vec![format!(" /: {}", spec_names.join(" "))]; + for (k, child) in &self.children { + let cnames: Vec = child.specs.keys().cloned().collect(); + lines.push(format!(" {k}: {}", cnames.join(" "))); + } + format!("{header}\n{}", lines.join("\n")) + } + + pub fn text_full(&self) -> String { + let mut lines = vec![format!("", self.url)]; + + for (sname, spec) in &self.specs { + lines.push(format!("\n<{sname}>")); + if !spec.spec_doc.is_empty() { + lines.push(format!(" spec_doc: {}", spec.spec_doc)); + } + if !spec.contents.is_empty() { + lines.push(" Contents:".to_string()); + for (k, v) in &spec.contents { + lines.push(format!(" {k}: {}", v.summary())); + } + } + if !spec.artifacts.is_empty() { + lines.push(" Artifacts:".to_string()); + for (k, v) in &spec.artifacts { + lines.push(format!(" {k}: {}", v.summary())); + } + } + } + + if !self.contents.is_empty() { + lines.push("\n".to_string()); + lines.push(" Contents:".to_string()); + for (k, v) in &self.contents { + lines.push(format!(" {k}: {}", v.summary())); + } + } + if !self.artifacts.is_empty() { + if self.contents.is_empty() { lines.push("\n".to_string()); } + lines.push(" Artifacts:".to_string()); + for (k, v) in &self.artifacts { + lines.push(format!(" {k}: {}", v.summary())); + } + } + + if !self.children.is_empty() { + lines.push("\nChildren:".to_string()); + for (k, child) in &self.children { + let cnames: Vec = child.specs.keys().cloned().collect(); + lines.push(format!(" {k}: {}", cnames.join(" "))); + } + } + lines.join("\n") + } + + // ----------------------------------------------------------------------- + // JSON serialisation + // ----------------------------------------------------------------------- + + pub fn to_json(&self) -> serde_json::Value { + fn content_to_json(c: &Content) -> serde_json::Value { + serde_json::to_value(c).unwrap_or(serde_json::Value::Null) + } + fn artifact_to_json(a: &Artifact) -> serde_json::Value { + serde_json::to_value(a).unwrap_or(serde_json::Value::Null) + } + fn spec_to_json(s: &ParsedSpec) -> serde_json::Value { + let contents: serde_json::Map<_, _> = s.contents.iter() + .map(|(k, v)| (k.clone(), content_to_json(v))).collect(); + let artifacts: serde_json::Map<_, _> = s.artifacts.iter() + .map(|(k, v)| (k.clone(), artifact_to_json(v))).collect(); + serde_json::json!({ + "name": s.name, + "spec_doc": s.spec_doc, + "contents": contents, + "artifacts": artifacts, + }) + } + fn proj_to_json(proj: &Project) -> serde_json::Value { + let specs: serde_json::Map<_, _> = proj.specs.iter() + .map(|(k, v)| (k.clone(), spec_to_json(v))).collect(); + let contents: serde_json::Map<_, _> = proj.contents.iter() + .map(|(k, v)| (k.clone(), content_to_json(v))).collect(); + let artifacts: serde_json::Map<_, _> = proj.artifacts.iter() + .map(|(k, v)| (k.clone(), artifact_to_json(v))).collect(); + let children: serde_json::Map<_, _> = proj.children.iter() + .map(|(k, v)| (k.clone(), proj_to_json(v))).collect(); + serde_json::json!({ + "path": proj.path, + "url": proj.url, + "specs": specs, + "contents": contents, + "artifacts": artifacts, + "children": children, + }) + } + proj_to_json(self) + } + + pub fn from_json(v: &serde_json::Value) -> Result { + let path = v.get("path").and_then(|x| x.as_str()).unwrap_or("").to_string(); + let url = v.get("url").and_then(|x| x.as_str()).unwrap_or(&path).to_string(); + + let specs = v.get("specs").and_then(|x| x.as_object()).map(|obj| { + obj.iter().map(|(k, spec_val)| { + let name = spec_val.get("name").and_then(|x| x.as_str()).unwrap_or(k).to_string(); + let spec_doc = spec_val.get("spec_doc").and_then(|x| x.as_str()).unwrap_or("").to_string(); + (k.clone(), ParsedSpec { + name, + spec_doc, + contents: HashMap::new(), + artifacts: HashMap::new(), + }) + }).collect() + }).unwrap_or_default(); + + let children = v.get("children").and_then(|x| x.as_object()).map(|obj| { + obj.iter().filter_map(|(k, child_val)| { + Project::from_json(child_val).ok().map(|p| (k.clone(), p)) + }).collect() + }).unwrap_or_default(); + + Ok(Project { + path, + url, + specs, + contents: HashMap::new(), + artifacts: HashMap::new(), + children, + }) + } +} + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +/// Return the set of root-level files to pre-fetch concurrently before parsers run. +/// +/// Strategy: fetch every file present in `basenames` whose extension (or full name) +/// is in the set of metadata types that parsers commonly read. This is purely +/// extension-driven — no per-filename maintenance is required when a new parser +/// is added, as long as it reads a recognised metadata format. +/// +/// Recognised extensions / names: +/// .toml — pixi.toml, Cargo.toml, book.toml, pyscript.toml, uv.toml, … +/// .yaml — Chart.yaml, conda-project.yml, .readthedocs.yaml, … +/// .yml — same (alternate extension) +/// .json — package.json, datapackage.json, .zenodo.json, … +/// .txt — requirements.txt, LICENSE.txt, … +/// .md — README.md, CITATION.md, … +/// .lock — uv.lock, poetry.lock, pixi.lock, … +/// .cff — CITATION.cff +/// .py — marimo content-scan (all root-level .py files) +/// .mod — go.mod +/// .toml — (already covered) +/// +/// Extensionless special cases read by parsers: +/// MLFlow, Dockerfile, LICENSE, LICENCE, COPYING — matched by name prefix/exact. +/// +/// pyproject.toml is intentionally excluded — it is read before prefetch so +/// its contents are available to seed any future dynamic candidate logic. +/// +/// The file listing itself (basenames) is already a single cached VFS call made +/// at the start of resolve(); ctx.has() / ctx.has_any() are free HashMap lookups. +/// This function is only about pre-reading file *contents*, not listing. +pub fn files_to_prefetch<'a>( + basenames: &'a HashMap, + _pyproject: &JsVal, +) -> Vec<&'a str> { + /// Extensions whose files are always worth pre-fetching. + const PREFETCH_EXTS: &[&str] = &[ + ".toml", ".yaml", ".yml", ".json", + ".txt", ".md", ".lock", ".cff", ".py", ".mod", + ]; + + /// Extensionless basenames that parsers read by exact name. + const PREFETCH_EXACT: &[&str] = &["MLFlow", "Dockerfile"]; + + /// Prefixes for extensionless license/copying files. + const LICENSE_PREFIXES: &[&str] = &["LICENSE", "LICENCE", "COPYING"]; + + basenames + .keys() + // exclude pyproject.toml — read separately before prefetch + .filter(|name| *name != "pyproject.toml") + .filter(|name| { + // matches a known extension? + if PREFETCH_EXTS.iter().any(|ext| name.ends_with(ext)) { + return true; + } + // exact match (MLFlow, Dockerfile)? + if PREFETCH_EXACT.contains(&name.as_str()) { + return true; + } + // license-family prefix (extensionless, e.g. "LICENSE", "COPYING")? + LICENSE_PREFIXES.iter().any(|pfx| name.starts_with(pfx)) + }) + .map(|s| s.as_str()) + .collect() +} + +/// Sub-paths below the root whose *existence* parsers check via ctx.vfs_exists(). +/// These are not visible in basenames (they are inside sub-directories) so they +/// cannot be covered by the extension-based prefetch above. +/// +/// Unlike files_to_prefetch, this list DOES require manual maintenance: +/// add an entry whenever a new `ctx.vfs_exists("some/sub/path")` call is added +/// to spec.rs. +pub fn subpaths_to_prefetch() -> Vec<&'static str> { + vec![ + ".vscode/settings.json", // parse_vscode + ".idea", // parse_jetbrains + ".project/spec.yaml", // parse_nvidia_workbench + "cmd", // parse_golang binary check + ] +} + +/// Convert toml::Value to serde_json::Value recursively. +pub fn toml_to_json(v: toml::Value) -> JsVal { + match v { + toml::Value::String(s) => JsVal::String(s), + toml::Value::Integer(i) => JsVal::Number(i.into()), + toml::Value::Float(f) => { + serde_json::Number::from_f64(f).map(JsVal::Number).unwrap_or(JsVal::Null) + } + toml::Value::Boolean(b) => JsVal::Bool(b), + toml::Value::Array(a) => JsVal::Array(a.into_iter().map(toml_to_json).collect()), + toml::Value::Table(t) => { + JsVal::Object(t.into_iter().map(|(k, v)| (k, toml_to_json(v))).collect()) + } + toml::Value::Datetime(d) => JsVal::String(d.to_string()), + } +} + +fn camel_or_snake_eq(user: &str, snake: &str) -> bool { + user == snake || camel_to_snake(user) == snake +} + +fn camel_to_snake(s: &str) -> String { + let mut out = String::new(); + for (i, ch) in s.char_indices() { + if ch.is_ascii_uppercase() && i > 0 { + out.push('_'); + } + out.push(ch.to_ascii_lowercase()); + } + out +} diff --git a/projspec-rs/src/spec.rs b/projspec-rs/src/spec.rs new file mode 100644 index 0000000..ed2c7b2 --- /dev/null +++ b/projspec-rs/src/spec.rs @@ -0,0 +1,1384 @@ +/// spec.rs — All ProjectSpec matchers and parsers. +/// +/// Design decisions: +/// - Each spec is a function `parse_(ctx: &ParseCtx) -> Option` +/// returning None if match() fails, Some(SpecResult) if it succeeds. +/// - `ParseCtx` carries everything the parsers need: basenames, pyproject, url. +/// - ProjectExtra specs return a SpecResult with is_extra=true; their contents/ +/// artifacts are merged into the root project. +/// - Parsers are intentionally lenient: if a file is missing or malformed, +/// they return a partial result rather than failing completely. +/// - YAML/TOML jinja stripping is done naively (skip lines with {%...%}). + +use std::collections::HashMap; +use serde_json::Value as JsVal; + +use crate::artifact::{ + Artifact, ArtifactBase, ArtifactGroup, CondaEnv, CondaPackage, DockerImage, DockerRuntime, + FileArtifact, HelmDeployment, LockFile, PreCommit, Process, SystemInstallablePackage, + VirtualEnv, Wheel, +}; +use crate::content::{ + Citation, Command, Content, ContentGroup, DataResource, DescriptiveMetadata, + Environment, IntakeSource, License, NodePackage, PythonPackage, TabularData, +}; +use crate::fs::Vfs; +use crate::types::{Architecture, Precision, Stack}; + +// --------------------------------------------------------------------------- +// ParseCtx — shared context for all parsers +// --------------------------------------------------------------------------- + +pub struct ParseCtx<'a> { + /// Canonical URL / path of the project root (for building artifact paths). + pub url: &'a str, + /// {basename -> relative_path_within_vfs} for every entry at the root. + pub basenames: &'a HashMap, + /// Parsed pyproject.toml, or empty object. + pub pyproject: &'a JsVal, + /// Virtual filesystem — abstracts local, S3, HTTP, memory. + pub vfs: &'a Vfs, + /// Pre-fetched file contents: {basename -> UTF-8 text}. + /// Populated concurrently before parsers run. Cache hit avoids a VFS round-trip. + pub file_cache: &'a HashMap, + /// Pre-checked existence of sub-paths not visible in basenames + /// (e.g. ".vscode/settings.json", ".idea", ".project/spec.yaml"). + /// Populated concurrently alongside file_cache. + pub exists_cache: &'a HashMap, +} + +impl<'a> ParseCtx<'a> { + pub fn has(&self, name: &str) -> bool { + self.basenames.contains_key(name) + } + + pub fn has_any(&self, names: &[&str]) -> bool { + names.iter().any(|n| self.has(n)) + } + + /// Read a root-level text file. Returns the pre-fetched copy when available, + /// falls through to a live VFS read otherwise (cache miss or file not in + /// prefetch list). + pub fn read_text(&self, name: &str) -> Option { + // Cache hit — zero network/disk cost + if let Some(text) = self.file_cache.get(name) { + return Some(text.clone()); + } + // Cache miss — resolve relative path then read live + let rel = self.basenames.get(name)?; + self.vfs.read_text(rel) + } + + /// Parse a root-level TOML file; returns None on error. + pub fn read_toml(&self, name: &str) -> Option { + let text = self.read_text(name)?; + toml::from_str(&text).ok() + } + + /// Parse a root-level YAML file (after stripping jinja); returns None on error. + pub fn read_yaml(&self, name: &str) -> Option { + let text = self.read_text(name)?; + let stripped = strip_jinja(&text); + serde_yaml::from_str(&stripped).ok() + } + + /// Read a file at an arbitrary path relative to the vfs root. + /// Checks file_cache first (keyed by the relative path itself). + pub fn read_text_path(&self, path: &str) -> Option { + if let Some(text) = self.file_cache.get(path) { + return Some(text.clone()); + } + self.vfs.read_text(path) + } + + pub fn read_yaml_path(&self, path: &str) -> Option { + let text = self.read_text_path(path)?; + let stripped = strip_jinja(&text); + serde_yaml::from_str(&stripped).ok() + } + + /// Check existence of a sub-path (e.g. ".vscode/settings.json"). + /// Uses the pre-checked exists_cache when available, falls back to a live VFS call. + pub fn vfs_exists(&self, path: &str) -> bool { + if let Some(&result) = self.exists_cache.get(path) { + return result; + } + self.vfs.exists(path) + } + + /// tool.[name] table from pyproject.toml. + pub fn pyproject_tool(&self, name: &str) -> Option<&JsVal> { + self.pyproject.get("tool")?.get(name) + } + + /// project.* table from pyproject.toml. + pub fn pyproject_project(&self) -> Option<&JsVal> { + self.pyproject.get("project") + } +} + +// --------------------------------------------------------------------------- +// SpecResult — what a successful parse() returns +// --------------------------------------------------------------------------- + +#[derive(Debug, Default)] +pub struct SpecResult { + pub spec_name: String, + pub contents: HashMap, + pub artifacts: HashMap, + /// If true this is a ProjectExtra: contents/artifacts go to root, not specs. + pub is_extra: bool, + /// URL to upstream spec docs. + pub spec_doc: String, +} + +impl SpecResult { + fn new(name: &str) -> Self { + SpecResult { + spec_name: name.to_string(), + ..Default::default() + } + } +} + +// --------------------------------------------------------------------------- +// Registry — list of all spec parsers +// --------------------------------------------------------------------------- + +pub type SpecParser = fn(&ParseCtx) -> Option; + +/// Return all registered spec parsers in a stable order. +/// Order matters: more-specific specs (e.g. RattlerRecipe) come before general ones (CondaRecipe). +pub fn all_parsers() -> Vec<(&'static str, SpecParser)> { + vec![ + // Python / packaging + ("uv", parse_uv), + ("poetry", parse_poetry), + ("python_library", parse_python_library), + ("python_code", parse_python_code), + ("pyscript", parse_pyscript), + // Node + ("j_lab_extension", parse_jlab_extension), + ("yarn", parse_yarn), + ("node", parse_node), + // Conda + ("pixi", parse_pixi), + ("conda_project", parse_conda_project), + ("rattler_recipe", parse_rattler_recipe), + ("conda_recipe", parse_conda_recipe), + // Rust + ("rust_python", parse_rust_python), + ("rust", parse_rust), + // Go + ("golang", parse_golang), + // Containers / infra + ("helm_chart", parse_helm_chart), + // Documentation + ("m_d_book", parse_mdbook), + ("r_t_d", parse_rtd), + // Web apps + ("django", parse_django), + ("streamlit", parse_streamlit), + ("marimo", parse_marimo), + // Data + ("data_package", parse_datapackage), + ("d_v_c_repo", parse_dvc_repo), + // Publishing / citation + ("hugging_face_repo", parse_hf_repo), + ("hugging_face_dataset", parse_hf_dataset), + // Packaging (binary) + ("briefcase", parse_briefcase), + // Meta / misc + ("backstage_catalog", parse_backstage), + ("m_l_flow", parse_mlflow), + ("git_repo", parse_git_repo), + ("a_i_enabled", parse_ai_enabled), + // IDE configs + ("v_s_code", parse_vscode), + ("jetbrains_i_d_e", parse_jetbrains), + ("nvidia_a_i_workbench", parse_nvidia_workbench), + // ProjectExtra (merge into root) + ("docker", parse_docker), + ("pre_committed", parse_pre_committed), + ("licensed", parse_licensed), + ("python_requirements", parse_python_requirements), + ("conda_env_file", parse_conda_env_file), + ("intake_catalog", parse_intake_catalog), + ("cited", parse_cited), + ("zenodo", parse_zenodo), + ("data", parse_data), + ] +} + +// --------------------------------------------------------------------------- +// Helper constructors +// --------------------------------------------------------------------------- + +fn file_artifact(cmd: Vec<&str>, fn_glob: &str) -> Artifact { + Artifact::FileArtifact(FileArtifact { + base: ArtifactBase { cmd: cmd.into_iter().map(str::to_string).collect() }, + fn_glob: fn_glob.to_string(), + }) +} + +fn lock_artifact(cmd: Vec<&str>, fn_path: &str) -> Artifact { + Artifact::LockFile(LockFile { + file: FileArtifact { + base: ArtifactBase { cmd: cmd.into_iter().map(str::to_string).collect() }, + fn_glob: fn_path.to_string(), + }, + }) +} + +fn process_artifact(cmd: Vec<&str>) -> Artifact { + Artifact::Process(Process { + base: ArtifactBase { cmd: cmd.into_iter().map(str::to_string).collect() }, + server: false, + port_arg: None, + address_arg: None, + }) +} + +fn server_artifact(cmd: Vec<&str>) -> Artifact { + Artifact::Process(Process { + base: ArtifactBase { cmd: cmd.into_iter().map(str::to_string).collect() }, + server: true, + port_arg: None, + address_arg: None, + }) +} + +fn venv_artifact(cmd: Vec<&str>, fn_path: &str) -> Artifact { + Artifact::VirtualEnv(VirtualEnv { + file: FileArtifact { + base: ArtifactBase { cmd: cmd.into_iter().map(str::to_string).collect() }, + fn_glob: fn_path.to_string(), + }, + }) +} + +fn conda_env_artifact(cmd: Vec<&str>, fn_path: &str) -> Artifact { + Artifact::CondaEnv(CondaEnv { + file: FileArtifact { + base: ArtifactBase { cmd: cmd.into_iter().map(str::to_string).collect() }, + fn_glob: fn_path.to_string(), + }, + }) +} + +fn env(stack: Stack, precision: Precision, packages: Vec) -> Content { + Content::Environment(Environment { + stack, + precision, + packages, + channels: vec![], + }) +} + +fn env_with_channels(stack: Stack, precision: Precision, packages: Vec, channels: Vec) -> Content { + Content::Environment(Environment { stack, precision, packages, channels }) +} + +fn meta(pairs: Vec<(&str, &str)>) -> Content { + Content::DescriptiveMetadata(DescriptiveMetadata { + meta: pairs.into_iter().map(|(k, v)| (k.to_string(), JsVal::String(v.to_string()))).collect(), + }) +} + +fn meta_from_map(map: HashMap) -> Content { + Content::DescriptiveMetadata(DescriptiveMetadata { + meta: map.into_iter().map(|(k, v)| (k, JsVal::String(v))).collect(), + }) +} + +// --------------------------------------------------------------------------- +// Jinja stripping helper (for conda recipes and conda-project yamls) +// --------------------------------------------------------------------------- + +fn strip_jinja(text: &str) -> String { + text.lines() + .filter(|line| !line.contains("{%")) + .map(|line| { + // strip selector comments like `# [linux]` + if let Some(idx) = line.find(" # [") { + &line[..idx] + } else { + line + } + }) + .collect::>() + .join("\n") +} + +// --------------------------------------------------------------------------- +// Parsers +// --------------------------------------------------------------------------- + +// --- Python / packaging --- + +fn parse_python_library(ctx: &ParseCtx) -> Option { + if !ctx.has_any(&["pyproject.toml", "setup.py"]) { + return None; + } + let mut r = SpecResult::new("python_library"); + r.spec_doc = "https://packaging.python.org/en/latest/specifications/pyproject-toml/".into(); + + // build artifact + if ctx.pyproject.get("build-system").is_some() { + r.artifacts.insert("wheel".into(), Artifact::Wheel(Wheel { + file: FileArtifact { + base: ArtifactBase { cmd: vec!["python".into(), "-m".into(), "build".into()] }, + fn_glob: format!("{}/dist/*.whl", ctx.url), + }, + })); + } else if ctx.has("setup.py") { + r.artifacts.insert("wheel".into(), Artifact::Wheel(Wheel { + file: FileArtifact { + base: ArtifactBase { cmd: vec!["python".into(), format!("{}/setup.py", ctx.url), "bdist_wheel".into()] }, + fn_glob: format!("{}/dist/*.whl", ctx.url), + }, + })); + } + + // project metadata + if let Some(proj) = ctx.pyproject_project() { + if let Some(name) = proj.get("name").and_then(|v| v.as_str()) { + r.contents.insert("python_package".into(), Content::PythonPackage(PythonPackage { package_name: name.to_string() })); + } + // dependencies → environment + let deps: Vec = proj.get("dependencies") + .and_then(|v| v.as_array()) + .map(|a| a.iter().filter_map(|v| v.as_str().map(str::to_string)).collect()) + .unwrap_or_default(); + if !deps.is_empty() { + r.contents.insert("environment".into(), Content::Group({ + let mut g = ContentGroup::new(); + g.insert("default".into(), env(Stack::Pip, Precision::Spec, deps)); + g + })); + } + } + Some(r) +} + +fn parse_python_code(ctx: &ParseCtx) -> Option { + if !ctx.has("__init__.py") { return None; } + let mut r = SpecResult::new("python_code"); + r.spec_doc = "https://docs.python.org/3/reference/import.html#regular-packages".into(); + let pkg_name = ctx.url.rsplit('/').next().unwrap_or("").to_string(); + r.contents.insert("python_package".into(), Content::PythonPackage(PythonPackage { package_name: pkg_name })); + if ctx.has("__main__.py") { + let mut group = ArtifactGroup::new(); + group.insert("main".into(), process_artifact(vec!["python", "__main__.py"])); + r.artifacts.insert("process".into(), Artifact::Group(group)); + } + Some(r) +} + +fn parse_pyscript(ctx: &ParseCtx) -> Option { + if !ctx.has_any(&["pyscript.toml", "pyscript.json"]) { return None; } + let mut r = SpecResult::new("pyscript"); + r.spec_doc = "https://docs.pyscript.net/2023.11.2/user-guide/configuration/".into(); + if let Some(meta) = ctx.read_toml("pyscript.toml") { + if let Some(pkgs) = meta.get("packages").and_then(|v| v.as_array()) { + let packages: Vec = pkgs.iter().filter_map(|v| v.as_str().map(str::to_string)).collect(); + r.contents.insert("environment".into(), Content::Group({ + let mut g = ContentGroup::new(); + g.insert("default".into(), env(Stack::Pip, Precision::Spec, packages)); + g + })); + } + } + r.artifacts.insert("server".into(), server_artifact(vec!["pyscript", "run"])); + Some(r) +} + +fn parse_uv(ctx: &ParseCtx) -> Option { + let has_uv_files = ctx.has_any(&["uv.lock", "uv.toml", ".python-version"]); + let has_uv_backend = ctx.pyproject.get("build-system") + .and_then(|v| v.get("build-backend")) + .and_then(|v| v.as_str()) + .map(|s| s == "uv_build") + .unwrap_or(false); + if !has_uv_files && !has_uv_backend { return None; } + + let mut r = SpecResult::new("uv"); + r.spec_doc = "https://docs.astral.sh/uv/concepts/configuration-files/".into(); + + // inherit from python_library + if let Some(base) = parse_python_library(ctx) { + r.contents.extend(base.contents); + r.artifacts.extend(base.artifacts); + } + + r.artifacts.insert("lock_file".into(), lock_artifact(vec!["uv", "lock"], &format!("{}/uv.lock", ctx.url))); + r.artifacts.insert("virtual_env".into(), venv_artifact(vec!["uv", "sync"], &format!("{}/.venv", ctx.url))); + + // parse lock file for locked environment + if let Some(lock_text) = ctx.read_text("uv.lock") { + if let Ok(lock) = toml::from_str::(&lock_text) { + let py_ver = lock.get("requires-python").and_then(|v| v.as_str()).unwrap_or(""); + let mut pkgs = vec![format!("python {py_ver}")]; + if let Some(packages) = lock.get("package").and_then(|v| v.as_array()) { + for p in packages { + if let (Some(name), Some(ver)) = (p.get("name").and_then(|v| v.as_str()), + p.get("version").and_then(|v| v.as_str())) { + pkgs.push(format!("{name} =={ver}")); + } + } + } + let envs = r.contents.entry("environment".into()).or_insert_with(|| Content::Group(ContentGroup::new())); + if let Content::Group(g) = envs { + g.insert("lockfile".into(), env(Stack::Pip, Precision::Lock, pkgs)); + } + } + } + Some(r) +} + +fn parse_poetry(ctx: &ParseCtx) -> Option { + let has_poetry = ctx.pyproject_tool("poetry").is_some() + || ctx.pyproject.get("build-system") + .and_then(|v| v.get("build-backend")).and_then(|v| v.as_str()) + .map(|s| s.starts_with("poetry.")).unwrap_or(false); + if !has_poetry { return None; } + + let mut r = SpecResult::new("poetry"); + r.spec_doc = "https://python-poetry.org/docs/pyproject/".into(); + + if let Some(base) = parse_python_library(ctx) { + r.contents.extend(base.contents); + r.artifacts.extend(base.artifacts); + } + + r.artifacts.insert("lock_file".into(), lock_artifact(vec!["poetry", "lock"], &format!("{}/poetry.lock", ctx.url))); + + // override wheel cmd + r.artifacts.insert("wheel".into(), Artifact::Wheel(Wheel { + file: FileArtifact { + base: ArtifactBase { cmd: vec!["poetry".into(), "build".into()] }, + fn_glob: format!("{}/dist/*.whl", ctx.url), + }, + })); + + // parse poetry.lock for locked env + if let Some(lock_text) = ctx.read_text("poetry.lock") { + if let Ok(lock) = toml::from_str::(&lock_text) { + let pkgs: Vec = lock.get("package").and_then(|v| v.as_array()) + .map(|a| a.iter().filter_map(|p| { + let name = p.get("name")?.as_str()?; + let ver = p.get("version")?.as_str()?; + Some(format!("{name} =={ver}")) + }).collect()) + .unwrap_or_default(); + let envs = r.contents.entry("environment".into()).or_insert_with(|| Content::Group(ContentGroup::new())); + if let Content::Group(g) = envs { + g.insert("default.lock".into(), env(Stack::Pip, Precision::Lock, pkgs)); + } + } + } + Some(r) +} + +// --- Node --- + +fn parse_node(ctx: &ParseCtx) -> Option { + if !ctx.has("package.json") { return None; } + let mut r = SpecResult::new("node"); + r.spec_doc = "https://docs.npmjs.com/cli/v11/configuring-npm/package-json".into(); + + let pkg_text = ctx.read_text("package.json")?; + let pkg: JsVal = serde_json::from_str(&pkg_text).ok()?; + + if let Some(name) = pkg.get("name").and_then(|v| v.as_str()) { + r.contents.insert("node_package".into(), Content::NodePackage(NodePackage { name: name.to_string() })); + let mut m = HashMap::new(); + m.insert("name".to_string(), name.to_string()); + if let Some(ver) = pkg.get("version").and_then(|v| v.as_str()) { + m.insert("version".to_string(), ver.to_string()); + } + r.contents.insert("descriptive_metadata".into(), meta_from_map(m)); + } + + // dependencies + let deps: Vec = pkg.get("dependencies").and_then(|v| v.as_object()) + .map(|m| m.keys().cloned().collect()).unwrap_or_default(); + if !deps.is_empty() { + r.contents.insert("environment".into(), Content::Group({ + let mut g = ContentGroup::new(); + g.insert("node".into(), env(Stack::Npm, Precision::Spec, deps)); + g + })); + } + + // lock file + if ctx.has("package-lock.json") { + r.artifacts.insert("lock_file".into(), lock_artifact( + vec!["npm", "install"], + ctx.basenames.get("package-lock.json").unwrap(), + )); + } + + // scripts → process artifacts for "build" + if let Some(scripts) = pkg.get("scripts").and_then(|v| v.as_object()) { + if scripts.contains_key("build") { + r.artifacts.insert("build".into(), process_artifact(vec!["npm", "run", "build"])); + } + } + Some(r) +} + +fn parse_yarn(ctx: &ParseCtx) -> Option { + if !ctx.has(".yarnrc.yml") { return None; } + let mut r = parse_node(ctx)?; + r.spec_name = "yarn".into(); + r.spec_doc = "https://yarnpkg.com/configuration/yarnrc".into(); + + if ctx.has("yarn.lock") { + let lock_path = ctx.basenames.get("yarn.lock").cloned().unwrap_or_default(); + r.artifacts.insert("lock_file".into(), lock_artifact(vec!["yarn", "install"], &lock_path)); + } + Some(r) +} + +fn parse_jlab_extension(ctx: &ParseCtx) -> Option { + if !ctx.has("package.json") || ctx.pyproject.as_object().map(|m| m.is_empty()).unwrap_or(true) { + return None; + } + let pkg_text = ctx.read_text("package.json")?; + let pkg: JsVal = serde_json::from_str(&pkg_text).ok()?; + let build_script = pkg.get("scripts")?.get("build")?.as_str()?; + if !build_script.starts_with("jlpm") { return None; } + + let mut r = parse_yarn(ctx).unwrap_or_else(|| parse_node(ctx).unwrap_or_else(|| SpecResult::new("j_lab_extension"))); + r.spec_name = "j_lab_extension".into(); + r.spec_doc = "https://jupyterlab.readthedocs.io/en/latest/developer/contributing.html".into(); + r.artifacts.insert("lock_file".into(), lock_artifact( + vec!["jlpm", "install"], + &format!("{}/yarn.lock", ctx.url), + )); + Some(r) +} + +// --- Conda --- + +fn parse_pixi(ctx: &ParseCtx) -> Option { + let has_pixi = ctx.has("pixi.toml") || ctx.pyproject_tool("pixi").is_some(); + if !has_pixi { return None; } + + let mut r = SpecResult::new("pixi"); + r.spec_doc = "https://pixi.sh/latest/reference/pixi_manifest".into(); + + let meta: toml::Value = if let Some(t) = ctx.read_toml("pixi.toml") { + t + } else { + return None; + }; + + // tasks → processes + if let Some(tasks) = meta.get("tasks").and_then(|v| v.as_table()) { + let mut procs = ArtifactGroup::new(); + for name in tasks.keys() { + procs.insert(name.clone(), process_artifact(vec!["pixi", "run", name])); + } + if !procs.is_empty() { + r.artifacts.insert("process".into(), Artifact::Group(procs)); + } + } + + r.artifacts.insert("lock_file".into(), lock_artifact(vec!["pixi", "lock"], &format!("{}/pixi.lock", ctx.url))); + + // conda envs from lock file + if ctx.has("pixi.lock") { + // just note its existence; detailed lock parsing is expensive + let mut conda_envs = ArtifactGroup::new(); + conda_envs.insert("default".into(), conda_env_artifact( + vec!["pixi", "install"], + &format!("{}/.pixi/envs/default", ctx.url), + )); + r.artifacts.insert("conda_env".into(), Artifact::Group(conda_envs)); + } + + // package build + if let Some(pkg) = meta.get("package").and_then(|v| v.as_table()) { + if let Some(name) = pkg.get("name").and_then(|v| v.as_str()) { + let ver = pkg.get("version").and_then(|v| v.as_str()).unwrap_or("*"); + r.artifacts.insert("conda_package".into(), Artifact::CondaPackage(CondaPackage { + file: FileArtifact { + base: ArtifactBase { cmd: vec!["pixi".into(), "build".into()] }, + fn_glob: format!("{}/{}-{}*.conda", ctx.url, name, ver), + }, + name: Some(name.to_string()), + })); + } + } + + // dependencies → environment + if let Some(deps) = meta.get("dependencies").and_then(|v| v.as_table()) { + let packages: Vec = deps.keys().cloned().collect(); + r.contents.insert("environment".into(), Content::Group({ + let mut g = ContentGroup::new(); + g.insert("default".into(), env(Stack::Conda, Precision::Spec, packages)); + g + })); + } + Some(r) +} + +fn parse_conda_project(ctx: &ParseCtx) -> Option { + if !ctx.has_any(&["conda-project.yml", "conda-meta.yaml"]) { return None; } + let mut r = SpecResult::new("conda_project"); + r.spec_doc = "https://conda-incubator.github.io/conda-project/tutorial.html".into(); + + let meta = ctx.read_yaml("conda-project.yml").or_else(|| ctx.read_yaml("conda-project.yaml"))?; + let mut envs = ContentGroup::new(); + let mut locks = ArtifactGroup::new(); + let mut conda_envs = ArtifactGroup::new(); + + if let Some(environments) = meta.get("environments").and_then(|v| v.as_object()) { + for (env_name, _) in environments { + envs.insert(env_name.clone(), env(Stack::Conda, Precision::Spec, vec![])); + locks.insert(env_name.clone(), lock_artifact( + vec!["conda", "project", "lock", env_name], + &format!("{}/conda-lock.{env_name}.yml", ctx.url), + )); + conda_envs.insert(env_name.clone(), conda_env_artifact( + vec!["conda", "project", "prepare", env_name], + &format!("{}/./envs/{env_name}/", ctx.url), + )); + } + } + + if !envs.is_empty() { + r.contents.insert("environment".into(), Content::Group(envs)); + r.artifacts.insert("lock_file".into(), Artifact::Group(locks)); + r.artifacts.insert("conda_env".into(), Artifact::Group(conda_envs)); + } + + // commands + if let Some(commands) = meta.get("commands").and_then(|v| v.as_object()) { + let mut procs = ArtifactGroup::new(); + let mut cmds = ContentGroup::new(); + for (name, cmd_val) in commands { + procs.insert(name.clone(), process_artifact(vec!["conda", "project", "run", name])); + let cmd_str = cmd_val.as_str().unwrap_or("").to_string(); + cmds.insert(name.clone(), Content::Command(Command { + cmd: crate::content::CmdValue::Str(cmd_str), + })); + } + r.artifacts.insert("process".into(), Artifact::Group(procs)); + r.contents.insert("command".into(), Content::Group(cmds)); + } + Some(r) +} + +fn parse_conda_recipe(ctx: &ParseCtx) -> Option { + if !ctx.has_any(&["meta.yaml", "meta.yml", "conda.yaml"]) { return None; } + let mut r = SpecResult::new("conda_recipe"); + r.spec_doc = "https://docs.conda.io/projects/conda-build/en/stable/resources/define-metadata.html".into(); + + r.artifacts.insert("conda_package".into(), Artifact::CondaPackage(CondaPackage { + file: FileArtifact { + base: ArtifactBase { cmd: vec!["conda-build".into(), format!("{ctx_url}/*.conda", ctx_url = ctx.url)] }, + fn_glob: format!("{}/output/**/*.conda", ctx.url), + }, + name: None, + })); + + // parse requirements if available + for fname in &["meta.yaml", "meta.yml", "conda.yaml"] { + if let Some(meta) = ctx.read_yaml(fname) { + if let Some(reqs) = meta.get("requirements").and_then(|v| v.as_object()) { + let mut envs = ContentGroup::new(); + for (phase, dep_list) in reqs { + let pkgs: Vec = dep_list.as_array() + .map(|a| a.iter().filter_map(|v| v.as_str().map(str::to_string)).collect()) + .unwrap_or_default(); + envs.insert(phase.clone(), env(Stack::Conda, Precision::Spec, pkgs)); + } + r.contents.insert("environment".into(), Content::Group(envs)); + } + break; + } + } + Some(r) +} + +fn parse_rattler_recipe(ctx: &ParseCtx) -> Option { + if !ctx.has("recipe.yaml") { return None; } + let mut r = SpecResult::new("rattler_recipe"); + r.spec_doc = "https://rattler.build/latest/reference/recipe_file/".into(); + + let meta = ctx.read_yaml("recipe.yaml")?; + let name = meta.get("context").and_then(|v| v.get("name")) + .or_else(|| meta.get("recipe").and_then(|v| v.get("name"))) + .or_else(|| meta.get("package").and_then(|v| v.get("name"))) + .and_then(|v| v.as_str()).unwrap_or("package"); + + r.artifacts.insert("conda_package".into(), Artifact::CondaPackage(CondaPackage { + file: FileArtifact { + base: ArtifactBase { cmd: vec!["rattler-build".into(), "build".into(), "-r".into(), ctx.url.to_string(), "--output-dir".into(), format!("{}/output", ctx.url)] }, + fn_glob: format!("{}/output/{}/*.conda", ctx.url, name), + }, + name: Some(name.to_string()), + })); + Some(r) +} + +// --- Rust --- + +fn parse_rust(ctx: &ParseCtx) -> Option { + if !ctx.has("Cargo.toml") { return None; } + let mut r = SpecResult::new("rust"); + r.spec_doc = "https://doc.rust-lang.org/cargo/reference/manifest.html".into(); + + if let Some(meta) = ctx.read_toml("Cargo.toml") { + if let Some(pkg) = meta.get("package").and_then(|v| v.as_table()) { + let name = pkg.get("name").and_then(|v| v.as_str()).unwrap_or("package"); + let mut m = HashMap::new(); + for key in &["name", "version", "description"] { + if let Some(v) = pkg.get(*key).and_then(|v| v.as_str()) { + m.insert(key.to_string(), v.to_string()); + } + } + r.contents.insert("descriptive_metadata".into(), meta_from_map(m)); + + let mut bin_group = ArtifactGroup::new(); + bin_group.insert("debug".into(), file_artifact( + vec!["cargo", "build"], + &format!("{}/target/debug/{}*", ctx.url, name), + )); + bin_group.insert("release".into(), file_artifact( + vec!["cargo", "build", "--release"], + &format!("{}/target/release/{}*", ctx.url, name), + )); + r.artifacts.insert("file".into(), Artifact::Group(bin_group)); + } + } + Some(r) +} + +fn parse_rust_python(ctx: &ParseCtx) -> Option { + if !ctx.has("Cargo.toml") { return None; } + let has_maturin = ctx.pyproject_tool("maturin").is_some() + || ctx.pyproject.get("build-system").and_then(|v| v.get("build-backend")) + .and_then(|v| v.as_str()).map(|s| s == "maturin").unwrap_or(false); + if !has_maturin { return None; } + + let mut r = SpecResult::new("rust_python"); + r.spec_doc = "https://www.maturin.rs/config.html".into(); + + // inherit from both rust and python_library + if let Some(base) = parse_rust(ctx) { r.contents.extend(base.contents); r.artifacts.extend(base.artifacts); } + if let Some(base) = parse_python_library(ctx) { r.contents.extend(base.contents); r.artifacts.extend(base.artifacts); } + Some(r) +} + +// --- Go --- + +fn parse_golang(ctx: &ParseCtx) -> Option { + if !ctx.has("go.mod") { return None; } + let mut r = SpecResult::new("golang"); + r.spec_doc = "https://go.dev/doc/modules/gomod-ref".into(); + + if let Some(text) = ctx.read_text("go.mod") { + let mut m = HashMap::new(); + for line in text.lines() { + if let Some(path) = line.strip_prefix("module ") { + m.insert("module".to_string(), path.trim().to_string()); + } + if let Some(ver) = line.strip_prefix("go ") { + m.insert("go".to_string(), ver.trim().to_string()); + } + } + if !m.is_empty() { + r.contents.insert("descriptive_metadata".into(), meta_from_map(m)); + } + } + + r.artifacts.insert("build".into(), process_artifact(vec!["go", "build", "./..."])); + r.artifacts.insert("test".into(), process_artifact(vec!["go", "test", "./..."])); + + // binary output if cmd/ exists + if ctx.vfs_exists("cmd") { + r.artifacts.insert("binary".into(), file_artifact( + vec!["go", "build", "-o", "bin/", "./cmd/..."], + &format!("{}/bin/*", ctx.url), + )); + } + Some(r) +} + +// --- Containers / infra --- + +fn parse_helm_chart(ctx: &ParseCtx) -> Option { + if !ctx.has("Chart.yaml") { return None; } + let mut r = SpecResult::new("helm_chart"); + r.spec_doc = "https://helm.sh/docs/topics/charts/#the-chartyaml-file".into(); + + if let Some(chart) = ctx.read_yaml("Chart.yaml") { + let mut m = HashMap::new(); + for key in &["name", "version", "appVersion", "description", "type"] { + if let Some(v) = chart.get(*key).and_then(|v| v.as_str()) { + m.insert(key.to_string(), v.to_string()); + } + } + let name = m.get("name").cloned().unwrap_or_else(|| "release".to_string()); + let version = m.get("version").cloned().unwrap_or_default(); + r.contents.insert("descriptive_metadata".into(), meta_from_map(m)); + + if !name.is_empty() && !version.is_empty() { + r.artifacts.insert("packaged_chart".into(), file_artifact( + vec!["helm", "package", "."], + &format!("{}/{name}-{version}.tgz", ctx.url), + )); + } + r.artifacts.insert("chart_lock".into(), file_artifact( + vec!["helm", "dependency", "update", "."], + &format!("{}/Chart.lock", ctx.url), + )); + r.artifacts.insert("release".into(), Artifact::HelmDeployment(HelmDeployment::new(&name))); + r.artifacts.insert("lint".into(), process_artifact(vec!["helm", "lint", "."])); + } + Some(r) +} + +// --- Documentation --- + +fn parse_mdbook(ctx: &ParseCtx) -> Option { + if !ctx.has("book.toml") { return None; } + let mut r = SpecResult::new("m_d_book"); + r.spec_doc = "https://rust-lang.github.io/mdBook/format/configuration/index.html".into(); + + let build_dir = ctx.read_toml("book.toml") + .and_then(|t| t.get("build").and_then(|b| b.get("build-dir")).and_then(|v| v.as_str()).map(str::to_string)) + .unwrap_or_else(|| "book".to_string()); + + r.artifacts.insert("book".into(), file_artifact( + vec!["mdbook", "build"], + &format!("{}/{build_dir}/index.html", ctx.url), + )); + r.artifacts.insert("server".into(), server_artifact(vec!["mdbook", "serve"])); + Some(r) +} + +fn parse_rtd(ctx: &ParseCtx) -> Option { + let rtd_file = ctx.basenames.keys() + .find(|k| { + let k = k.as_str(); + k == ".readthedocs.yaml" || k == "readthedocs.yaml" || k == ".readthedocs.yml" || k == "readthedocs.yml" + })?.clone(); + let mut r = SpecResult::new("r_t_d"); + r.spec_doc = "https://docs.readthedocs.com/platform/stable/config-file/v2.html".into(); + + if let Some(cfg) = ctx.read_yaml(&rtd_file) { + if cfg.get("sphinx").is_some() { + let conf_py = cfg.get("sphinx").and_then(|s| s.get("configuration")).and_then(|v| v.as_str()).unwrap_or("docs/conf.py"); + let docs_dir = conf_py.rsplit('/').skip(1).next().unwrap_or("docs"); + r.artifacts.insert("docs".into(), file_artifact( + vec!["sphinx-build", "-b", "html", docs_dir, &format!("{docs_dir}/_build/html")], + &format!("{}/{docs_dir}/_build/html/index.html", ctx.url), + )); + } else if cfg.get("mkdocs").is_some() { + r.artifacts.insert("docs".into(), file_artifact( + vec!["mkdocs", "build"], + &format!("{}/site/index.html", ctx.url), + )); + } + } + Some(r) +} + +// --- Web apps --- + +fn parse_django(ctx: &ParseCtx) -> Option { + if !ctx.has("manage.py") { return None; } + let mut r = SpecResult::new("django"); + r.spec_doc = "https://docs.djangoproject.com/en/6.0/ref/settings/".into(); + r.artifacts.insert("server".into(), server_artifact(vec!["python", "manage.py", "runserver"])); + Some(r) +} + +fn parse_streamlit(ctx: &ParseCtx) -> Option { + if !ctx.has_any(&[".streamlit", "streamlit_app.py"]) { return None; } + let mut r = SpecResult::new("streamlit"); + r.spec_doc = "https://docs.streamlit.io/deploy/streamlit-community-cloud/deploy-your-app/file-organization".into(); + // find .py files + let py_files: Vec<&String> = ctx.basenames.keys().filter(|k| k.ends_with(".py")).collect(); + if py_files.len() == 1 { + r.artifacts.insert("server".into(), server_artifact(vec!["streamlit", "run", py_files[0]])); + } else { + // use streamlit_app.py if it exists + if ctx.has("streamlit_app.py") { + r.artifacts.insert("server".into(), server_artifact(vec!["streamlit", "run", "streamlit_app.py"])); + } + } + Some(r) +} + +fn parse_marimo(ctx: &ParseCtx) -> Option { + // Only match if at least one .py file contains marimo patterns. + let py_files: Vec<&String> = ctx.basenames.keys().filter(|k| k.ends_with(".py")).collect(); + if py_files.is_empty() { return None; } + let mut found = false; + let mut servers = ArtifactGroup::new(); + for py in &py_files { + let rel = ctx.basenames.get(*py)?; + // read_text_path checks file_cache first — avoids a live VFS read when + // .py files were pre-fetched as part of the small-file scan. + if let Some(content) = ctx.read_text_path(rel) { + if (content.contains("import marimo") || content.contains("from marimo")) && content.contains("marimo.App(") { + let name = py.trim_end_matches(".py"); + let path = format!("{}/{}", ctx.url, rel); + servers.insert(name.to_string(), server_artifact(vec!["marimo", "run", &path])); + found = true; + } + } + } + if !found { return None; } + let mut r = SpecResult::new("marimo"); + r.spec_doc = "https://docs.marimo.io/".into(); + r.artifacts.insert("server".into(), Artifact::Group(servers)); + Some(r) +} + +// --- Data --- + +fn parse_datapackage(ctx: &ParseCtx) -> Option { + if !ctx.has("datapackage.json") { return None; } + let mut r = SpecResult::new("data_package"); + r.spec_doc = "https://datapackage.org/standard/data-package/#structure".into(); + + if let Some(text) = ctx.read_text("datapackage.json") { + if let Ok(conf) = serde_json::from_str::(&text) { + let mut m = HashMap::new(); + for key in &["name", "title", "description"] { + if let Some(v) = conf.get(*key).and_then(|v| v.as_str()) { + m.insert(key.to_string(), v.to_string()); + } + } + r.contents.insert("descriptive_metadata".into(), meta_from_map(m)); + + if let Some(licenses) = conf.get("licenses").and_then(|v| v.as_array()) { + if let Some(lic) = licenses.first() { + r.contents.insert("license".into(), Content::License(License { + shortname: lic.get("name").and_then(|v| v.as_str()).unwrap_or("unknown").to_string(), + fullname: "unknown".to_string(), + url: lic.get("path").and_then(|v| v.as_str()).unwrap_or("").to_string(), + })); + } + } + + if let Some(resources) = conf.get("resources").and_then(|v| v.as_array()) { + let tables: Vec = resources.iter().filter_map(|res| { + let name = res.get("name")?.as_str()?.to_string(); + Some(Content::TabularData(TabularData { + name, + schema: res.get("schema").cloned().unwrap_or(JsVal::Null), + metadata: HashMap::new(), + })) + }).collect(); + if !tables.is_empty() { + r.contents.insert("frictionless_data".into(), Content::List(tables)); + } + } + } + } + Some(r) +} + +fn parse_dvc_repo(ctx: &ParseCtx) -> Option { + if !ctx.has(".dvc") { return None; } + let mut r = SpecResult::new("d_v_c_repo"); + r.spec_doc = "https://doc.dvc.org/command-reference/config".into(); + Some(r) +} + +// --- Publishing / citation --- + +fn parse_hf_repo(ctx: &ParseCtx) -> Option { + let text = ctx.read_text("README.md")?; + if text.matches("---\n").count() < 2 { return None; } + let front_matter = text.split("---\n").nth(1)?; + let meta: JsVal = serde_yaml::from_str(front_matter).ok()?; + if !meta.is_object() { return None; } + // dataset discriminators mean it's a dataset card, not a model card + let dataset_keys = ["dataset_info", "source_datasets", "task_categories", "task_ids"]; + if dataset_keys.iter().any(|k| meta.get(k).is_some()) { return None; } + + let mut r = SpecResult::new("hugging_face_repo"); + r.spec_doc = "https://huggingface.co/docs/hub/en/model-cards".into(); + + let mut m = HashMap::new(); + for key in &["language", "library_name", "base_model"] { + if let Some(v) = meta.get(*key).and_then(|v| v.as_str()) { + m.insert(key.to_string(), v.to_string()); + } + } + r.contents.insert("descriptive_metadata".into(), meta_from_map(m)); + + if let Some(lic) = meta.get("licence").and_then(|v| v.as_str()) { + r.contents.insert("license".into(), Content::License(License { + shortname: lic.to_string(), fullname: "unknown".to_string(), url: String::new(), + })); + } + Some(r) +} + +fn parse_hf_dataset(ctx: &ParseCtx) -> Option { + let text = ctx.read_text("README.md")?; + if text.matches("---\n").count() < 2 { return None; } + let front_matter = text.split("---\n").nth(1)?; + let meta: JsVal = serde_yaml::from_str(front_matter).ok()?; + if !meta.is_object() { return None; } + // must have at least one dataset key + let dataset_keys = ["dataset_info", "source_datasets", "task_categories", "task_ids", "size_categories"]; + if !dataset_keys.iter().any(|k| meta.get(k).is_some()) { return None; } + + let mut r = SpecResult::new("hugging_face_dataset"); + r.spec_doc = "https://huggingface.co/docs/hub/datasets-cards".into(); + + let mut m = HashMap::new(); + for key in &["pretty_name", "language", "task_categories", "size_categories"] { + if let Some(v) = meta.get(*key) { + m.insert(key.to_string(), v.to_string()); + } + } + r.contents.insert("descriptive_metadata".into(), Content::DescriptiveMetadata(DescriptiveMetadata { + meta: m.into_iter().map(|(k, v)| (k, JsVal::String(v))).collect(), + })); + Some(r) +} + +// --- Briefcase --- + +fn parse_briefcase(ctx: &ParseCtx) -> Option { + ctx.pyproject_tool("briefcase")?; + let mut r = SpecResult::new("briefcase"); + r.spec_doc = "https://briefcase.readthedocs.io/en/stable/reference/configuration.html".into(); + // Add cross-platform installers as best-effort + r.artifacts.insert("linux-deb".into(), Artifact::SystemInstallablePackage(SystemInstallablePackage { + file: FileArtifact { + base: ArtifactBase { cmd: vec!["briefcase".into(), "package".into(), "-p".into(), "deb".into()] }, + fn_glob: format!("{}/dist/*.deb", ctx.url), + }, + arch: Architecture::Linux, + filetype: "deb".to_string(), + })); + Some(r) +} + +// --- Backstage --- + +fn parse_backstage(ctx: &ParseCtx) -> Option { + if !ctx.has("catalog-info.yaml") { return None; } + let mut r = SpecResult::new("backstage_catalog"); + r.spec_doc = "https://backstage.io/docs/features/software-catalog/descriptor-format/".into(); + + if let Some(yaml_text) = ctx.read_text("catalog-info.yaml") { + let mut meta_entries = ContentGroup::new(); + // iterate yaml documents + for doc_str in yaml_text.split("---\n").filter(|s| !s.trim().is_empty()) { + if let Ok(doc) = serde_yaml::from_str::(doc_str) { + if let Some(api) = doc.get("apiVersion").and_then(|v| v.as_str()) { + if api.starts_with("backstage.io/") { + let kind = doc.get("kind").and_then(|v| v.as_str()).unwrap_or("unknown"); + let metadata = doc.get("metadata"); + let name = metadata.and_then(|m| m.get("name")).and_then(|v| v.as_str()).unwrap_or("unnamed"); + let key = format!("{}.{}", kind.to_lowercase(), name); + let mut m = HashMap::new(); + m.insert("kind".to_string(), JsVal::String(kind.to_string())); + m.insert("name".to_string(), JsVal::String(name.to_string())); + if let Some(desc) = metadata.and_then(|m| m.get("description")).and_then(|v| v.as_str()) { + m.insert("description".to_string(), JsVal::String(desc.to_string())); + } + meta_entries.insert(key, Content::DescriptiveMetadata(DescriptiveMetadata { meta: m })); + } + } + } + } + if !meta_entries.is_empty() { + r.contents.insert("descriptive_metadata".into(), Content::Group(meta_entries)); + } + } + Some(r) +} + +// --- MLFlow --- + +fn parse_mlflow(ctx: &ParseCtx) -> Option { + if !ctx.has("MLFlow") { return None; } + let mut r = SpecResult::new("m_l_flow"); + r.spec_doc = "https://mlflow.org/docs/latest/ml/projects/#mlproject-file-configuration".into(); + + if let Some(meta) = ctx.read_yaml("MLFlow") { + let stack = if meta.get("python_env").is_some() { Stack::Pip } else { Stack::Conda }; + r.contents.insert("environment".into(), env(stack, Precision::Spec, vec![])); + + if let Some(eps) = meta.get("entry_points").and_then(|v| v.as_object()) { + let mut procs = ArtifactGroup::new(); + let mut cmds = ContentGroup::new(); + for (name, ep) in eps { + procs.insert(name.clone(), process_artifact(vec!["mlflow", "run", ".", "-e", name])); + let cmd_str = ep.get("command").and_then(|v| v.as_str()).unwrap_or("").to_string(); + cmds.insert(name.clone(), Content::Command(Command { + cmd: crate::content::CmdValue::Str(cmd_str), + })); + } + r.artifacts.insert("process".into(), Artifact::Group(procs)); + r.contents.insert("command".into(), Content::Group(cmds)); + } + } + Some(r) +} + +// --- Git --- + +fn parse_git_repo(ctx: &ParseCtx) -> Option { + if !ctx.has(".git") { return None; } + let mut r = SpecResult::new("git_repo"); + r.spec_doc = "https://git-scm.com/docs/git-config#_configuration_file".into(); + + // Read branches from .git/refs/heads (local fs only; skip silently for remote) + let branches: Vec = ctx.vfs + .list_dir(".git/refs/heads") + .into_iter() + .map(|s| s.rsplit('/').next().unwrap_or(&s).to_string()) + .collect(); + r.contents.insert("branches".into(), Content::Raw(JsVal::Array(branches.into_iter().map(JsVal::String).collect()))); + + let tags: Vec = ctx.vfs + .list_dir(".git/refs/tags") + .into_iter() + .map(|s| s.rsplit('/').next().unwrap_or(&s).to_string()) + .collect(); + r.contents.insert("tags".into(), Content::Raw(JsVal::Array(tags.into_iter().map(JsVal::String).collect()))); + Some(r) +} + +// --- AI enabled --- + +fn parse_ai_enabled(ctx: &ParseCtx) -> Option { + if !ctx.has_any(&["AGENTS.md", "CLAUDE.md", ".specify"]) { return None; } + let mut r = SpecResult::new("a_i_enabled"); + r.spec_doc = "https://agents.md/".into(); + Some(r) +} + +// --- IDEs --- + +fn parse_vscode(ctx: &ParseCtx) -> Option { + if !ctx.vfs_exists(".vscode/settings.json") { return None; } + let mut r = SpecResult::new("v_s_code"); + r.spec_doc = "https://code.visualstudio.com/docs/configure/settings#_settings-json-file".into(); + r.artifacts.insert("launch".into(), process_artifact(vec!["code", ctx.url])); + Some(r) +} + +fn parse_jetbrains(ctx: &ParseCtx) -> Option { + if !ctx.vfs_exists(".idea") { return None; } + let mut r = SpecResult::new("jetbrains_i_d_e"); + r.artifacts.insert("launch".into(), process_artifact(vec!["pycharm", ctx.url, "nosplash", "dontReopenProjects"])); + Some(r) +} + +fn parse_nvidia_workbench(ctx: &ParseCtx) -> Option { + if !ctx.vfs_exists(".project/spec.yaml") { return None; } + let mut r = SpecResult::new("nvidia_a_i_workbench"); + r.spec_doc = "https://docs.nvidia.com/ai-workbench/user-guide/latest/projects/spec.html".into(); + r.artifacts.insert("set_project".into(), process_artifact(vec!["nvwb", "open", ctx.url])); + Some(r) +} + +// --- ProjectExtra parsers (is_extra = true) --- + +fn parse_docker(ctx: &ParseCtx) -> Option { + if !ctx.has("Dockerfile") { return None; } + let mut r = SpecResult::new("docker"); + r.is_extra = true; + r.artifacts.insert("docker_image".into(), Artifact::DockerImage(DockerImage::new(None))); + r.artifacts.insert("docker_runtime".into(), Artifact::DockerRuntime(DockerRuntime { image: DockerImage::new(None) })); + Some(r) +} + +fn parse_pre_committed(ctx: &ParseCtx) -> Option { + if !ctx.has(".pre-commit-config.yaml") { return None; } + let mut r = SpecResult::new("pre_committed"); + r.is_extra = true; + r.artifacts.insert("precommit".into(), Artifact::PreCommit(PreCommit::default())); + Some(r) +} + +fn parse_licensed(ctx: &ParseCtx) -> Option { + let lic_file = ctx.basenames.keys().find(|k| { + let ku = k.to_uppercase(); + ku.starts_with("LICENSE") || ku.starts_with("LICENCE") || ku.starts_with("COPYING") + })?.clone(); + + let mut r = SpecResult::new("licensed"); + r.is_extra = true; + + let known: &[(&str, &str, &str)] = &[ + ("GNU GENERAL PUBLIC LICENSE", "GPL-3.0-or-later", "GNU General Public License v3.0 or later"), + ("MIT License", "MIT", "MIT License"), + ("Apache License", "Apache-2.0", "Apache License 2.0"), + ("BSD 3-Clause", "BSD-3-Clause", "BSD 3-Clause License"), + ]; + + let lic = if let Some(text) = ctx.read_text(&lic_file) { + let mut found = License { shortname: "unknown".into(), fullname: "unknown".into(), url: lic_file.clone() }; + for (pattern, short, full) in known { + if text.contains(pattern) { + found = License { + shortname: short.to_string(), + fullname: full.to_string(), + url: format!("https://spdx.org/licenses/{short}.html"), + }; + break; + } + } + found + } else { + License { shortname: "unknown".into(), fullname: "unknown".into(), url: lic_file } + }; + + r.contents.insert("license".into(), Content::License(lic)); + Some(r) +} + +fn parse_python_requirements(ctx: &ParseCtx) -> Option { + if !ctx.has("requirements.txt") { return None; } + let text = ctx.read_text("requirements.txt")?; + let deps: Vec = text.lines() + .map(str::trim) + .filter(|l| !l.is_empty() && !l.starts_with('#')) + .map(str::to_string) + .collect(); + let precision = if deps.iter().all(|d| d.contains("==")) { Precision::Lock } else { Precision::Spec }; + let mut r = SpecResult::new("python_requirements"); + r.is_extra = true; + r.contents.insert("environment".into(), env(Stack::Pip, precision, deps)); + Some(r) +} + +fn parse_conda_env_file(ctx: &ParseCtx) -> Option { + let fname = if ctx.has("environment.yaml") { "environment.yaml" } + else if ctx.has("environment.yml") { "environment.yml" } + else { return None; }; + let yaml = ctx.read_yaml(fname)?; + let deps: Vec = yaml.get("dependencies").and_then(|v| v.as_array()) + .map(|a| a.iter().filter_map(|v| v.as_str().map(str::to_string)).collect()) + .unwrap_or_default(); + let channels: Vec = yaml.get("channels").and_then(|v| v.as_array()) + .map(|a| a.iter().filter_map(|v| v.as_str().map(str::to_string)).collect()) + .unwrap_or_default(); + let mut r = SpecResult::new("conda_env_file"); + r.is_extra = true; + r.contents.insert("environment".into(), env_with_channels(Stack::Conda, Precision::Spec, deps, channels)); + r.artifacts.insert("conda_env".into(), conda_env_artifact( + vec!["conda", "env", "create", "-f", fname], + fname, + )); + Some(r) +} + +fn parse_intake_catalog(ctx: &ParseCtx) -> Option { + let cat_file = ctx.basenames.keys().find(|k| { + let k = k.as_str(); + k == "cat.yaml" || k == "cat.yml" || k == "catalog.yaml" || k == "catalog.yml" + })?.clone(); + + let yaml = ctx.read_yaml(&cat_file)?; + let mut r = SpecResult::new("intake_catalog"); + r.is_extra = true; + + let entries: Vec = if yaml.get("version").and_then(|v| v.as_i64()) == Some(2) { + yaml.get("entries").and_then(|v| v.as_object()).map(|m| m.keys().cloned().collect()).unwrap_or_default() + } else { + yaml.get("sources").and_then(|v| v.as_object()).map(|m| m.keys().cloned().collect()).unwrap_or_default() + }; + + if entries.is_empty() { return None; } + + let sources: Vec = entries.into_iter().map(|name| Content::IntakeSource(IntakeSource { name })).collect(); + r.contents.insert("intake_source".into(), Content::List(sources)); + Some(r) +} + +fn parse_cited(ctx: &ParseCtx) -> Option { + if !ctx.has("CITATION.cff") { return None; } + let mut r = SpecResult::new("cited"); + r.is_extra = true; + r.spec_doc = "https://citation-file-format.github.io/".into(); + if let Some(meta_yaml) = ctx.read_yaml("CITATION.cff") { + let meta: HashMap = meta_yaml.as_object() + .map(|m| m.iter().map(|(k, v)| (k.clone(), v.clone())).collect()) + .unwrap_or_default(); + r.contents.insert("descriptive_metadata".into(), Content::Citation(Citation { meta })); + } + Some(r) +} + +fn parse_zenodo(ctx: &ParseCtx) -> Option { + if !ctx.has(".zenodo.json") { return None; } + let mut r = SpecResult::new("zenodo"); + r.is_extra = true; + r.spec_doc = "https://help.zenodo.org/docs/github/describe-software/zenodo-json/".into(); + if let Some(text) = ctx.read_text(".zenodo.json") { + if let Ok(meta) = serde_json::from_str::>(&text) { + r.contents.insert("descriptive_metadata".into(), Content::Citation(Citation { meta })); + } + } + Some(r) +} + +fn parse_data(ctx: &ParseCtx) -> Option { + // Only match if there are data files at the root and no non-data sentinels override. + // This is a simplified version — we check for common data extensions. + let data_exts = [".csv", ".parquet", ".parq", ".arrow", ".hdf5", ".h5", ".nc", + ".zarr", ".npy", ".npz", ".feather", ".orc", ".avro"]; + let has_data = ctx.basenames.keys().any(|k| data_exts.iter().any(|e| k.ends_with(e))); + + let layout_sentinels = [".zattrs", ".zgroup", "zarr.json", "_metadata"]; + let has_layout = ctx.basenames.keys().any(|k| layout_sentinels.contains(&k.as_str())); + + if !has_data && !has_layout { return None; } + + let non_data_sentinels = ["pyproject.toml", "setup.py", "Cargo.toml", "package.json", + "go.mod", "Dockerfile", "Chart.yaml", "pixi.toml"]; + let has_non_data = ctx.basenames.keys().any(|k| non_data_sentinels.contains(&k.as_str())); + + if has_non_data && !has_data { return None; } + + let mut r = SpecResult::new("data"); + r.is_extra = true; + + // Produce DataResource entries for each data file type found + let mut resources: HashMap = HashMap::new(); + for (basename, full_path) in ctx.basenames { + let ext = basename.rsplit('.').next().map(|e| format!(".{e}")).unwrap_or_default(); + if data_exts.contains(&ext.as_str()) { + let fmt = ext.trim_start_matches('.'); + resources.insert(basename.clone(), Content::DataResource(DataResource { + path: basename.clone(), + format: fmt.to_string(), + modality: "tabular".to_string(), + layout: "flat".to_string(), + file_count: 1, + total_size: 0, // size not available without std::fs on remote backends + schema: JsVal::Object(Default::default()), + sample_path: full_path.clone(), + metadata: HashMap::new(), + })); + } + } + + if resources.len() == 1 { + let (_, content) = resources.into_iter().next().unwrap(); + r.contents.insert("data_resource".into(), content); + } else if !resources.is_empty() { + r.contents.insert("data_resource".into(), Content::Group(resources)); + } + + Some(r) +} diff --git a/projspec-rs/src/types.rs b/projspec-rs/src/types.rs new file mode 100644 index 0000000..f185f8f --- /dev/null +++ b/projspec-rs/src/types.rs @@ -0,0 +1,65 @@ +/// Enums that mirror Python's projspec.content.environment.Stack/Precision +/// and projspec.artifact.installable.Architecture. + +use serde::{Deserialize, Serialize}; + +/// Packaging technology for an Environment. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "UPPERCASE")] +pub enum Stack { + Pip, + Conda, + Npm, +} + +impl std::fmt::Display for Stack { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + Stack::Pip => write!(f, "PIP"), + Stack::Conda => write!(f, "CONDA"), + Stack::Npm => write!(f, "NPM"), + } + } +} + +/// How precisely an environment specification is pinned. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "UPPERCASE")] +pub enum Precision { + Spec, + Lock, +} + +impl std::fmt::Display for Precision { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + Precision::Spec => write!(f, "SPEC"), + Precision::Lock => write!(f, "LOCK"), + } + } +} + +/// Target platform / architecture for system-installable packages. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub enum Architecture { + Android, + Ios, + Linux, + Macos, + Web, + Windows, +} + +impl std::fmt::Display for Architecture { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + let s = match self { + Architecture::Android => "android", + Architecture::Ios => "iOS", + Architecture::Linux => "linux", + Architecture::Macos => "macOS", + Architecture::Web => "web", + Architecture::Windows => "windows", + }; + write!(f, "{s}") + } +}