Add S3-compatible output mode to serve-ltml

## Summary

Add a new `serve-ltml` output mode that stores each rendered PDF in an Amazon S3-compatible object store and returns the object location instead of streaming the PDF bytes directly in the HTTP response.

The new mode should work with AWS S3 and S3-compatible providers such as MinIO, Cloudflare R2, Backblaze B2 S3, etc., without disturbing the existing direct-PDF response mode.

## Goals

- Preserve the current default behavior: `POST /render` returns `application/pdf` unless object-storage mode is explicitly enabled.
- Add a configurable object-storage mode that uploads the finished PDF to an S3-compatible bucket.
- Return a stable machine-readable response describing where the PDF was stored.
- Cover configuration, credentials, object naming, visibility, expiration/lifetime, cleanup, and failure behavior.
- Keep request-scoped upload assets and the base-path overlay behavior unchanged.

## Non-Goals

- Replacing the current direct streaming mode.
- Adding a provider-specific implementation that only works with AWS.
- Building a background job queue in the first iteration.
- Managing bucket lifecycle rules automatically via cloud APIs in the first iteration.

## Proposed API Shape

Keep `POST /render` as the main endpoint and introduce an alternate response mode:

- Default mode: unchanged; response is the rendered PDF stream.
- Object-storage mode: response is JSON, for example:

```json
{
  "storage": "s3",
  "bucket": "example-bucket",
  "key": "renders/2026/03/28/4f3d.../output.pdf",
  "url": "https://storage.example.com/example-bucket/renders/.../output.pdf",
  "expires_at": "2026-03-29T12:34:56Z"
}
```

Questions to settle in implementation/design review:

- Should the mode be enabled globally at process startup, or selectable per request?
- If per request, should selection use `Accept: application/json`, a query parameter, or an explicit request field/header?
- Should the response include a direct object URL, a presigned URL, or only bucket/key metadata?
- Should the server still support inline PDF streaming when storage upload is configured but a request opts out?

My recommendation: start with a server-level config switch for simplicity, but keep the response schema general enough that per-request selection can be added later without breaking clients.

## Configuration Plan

Add object-storage configuration as flags and environment variables, following the existing `namsral/flag` pattern.

Suggested settings:

- `-output-mode` / `OUTPUT_MODE`
  Values: `inline` (default), `s3`
- `-s3-endpoint` / `S3_ENDPOINT`
  Required for most S3-compatible providers; optional for AWS if region-based endpoint resolution is used.
- `-s3-region` / `S3_REGION`
  Required for AWS-style signing.
- `-s3-bucket` / `S3_BUCKET`
  Required when `output-mode=s3`.
- `-s3-prefix` / `S3_PREFIX`
  Optional object key prefix such as `renders/`.
- `-s3-path-style` / `S3_PATH_STYLE`
  Boolean for providers that require path-style addressing.
- `-s3-public-base-url` / `S3_PUBLIC_BASE_URL`
  Optional externally reachable base URL to use in the returned location instead of deriving from the SDK endpoint.
- `-s3-presign-ttl` / `S3_PRESIGN_TTL`
  Optional duration; when set, return a presigned GET URL that expires after the given TTL.
- `-s3-server-side-encryption` / `S3_SERVER_SIDE_ENCRYPTION`
  Optional value such as `AES256` or provider-specific mode.
- `-s3-storage-class` / `S3_STORAGE_CLASS`
  Optional storage tier.
- `-s3-metadata-*`
  Optional future extension; not required in the first pass.

Validation rules:

- `output-mode` must be validated at startup.
- When `output-mode=s3`, require bucket, region/signing configuration, and whatever endpoint settings are necessary for the chosen provider.
- Reject invalid combinations such as both `s3-public-base-url` and `s3-presign-ttl` if the implementation cannot safely honor both.
- Log the chosen mode and non-secret storage settings at startup.
- Never log secrets.

## Authorization And Credential Strategy

Use the AWS SDK for Go v2 and rely on its standard credential/provider chain where possible.

Credential sources to support:

- Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`).
- Shared AWS config/credentials files.
- IAM role / instance profile / ECS/EKS task role when running on AWS.
- Static credentials for non-AWS S3-compatible providers via the same environment variables.

Implementation notes:

- Prefer the default credential chain before introducing project-specific secret flags.
- Avoid adding custom `-s3-access-key` / `-s3-secret-key` flags unless there is a strong operational need; flags are easier to leak via process listings and shell history.
- If explicit credential flags are ever added, document their risks and keep env/config-chain auth as the preferred path.
- Ensure the issue covers signature compatibility for custom endpoints and path-style addressing.

## Object Naming And Response Contract

Define a predictable but collision-resistant key layout. Example:

- `<prefix>/<yyyy>/<mm>/<dd>/<request-id>/output.pdf`

Requirements:

- Keys must be unique across concurrent requests.
- Returned metadata must include at least bucket and key.
- If a URL is returned, specify whether it is:
  - a durable public URL,
  - a derived internal endpoint URL, or
  - a time-limited presigned URL.
- Set object `Content-Type` to `application/pdf`.
- Consider setting `Content-Disposition` metadata if the object will be downloaded by browsers.
- Include `expires_at` only when the returned URL or object retention policy actually has a meaningful expiration.

## Lifetime Management And Cleanup

This needs explicit design because the current server model is request-scoped and ephemeral, while object storage is durable by default.

Plan:

- Treat upload to S3 as the final output handoff after the PDF file has been fully rendered locally.
- Keep the existing request temp-directory cleanup exactly as it is today.
- Add documentation for expected bucket lifecycle management.
- Support one of these initial lifetime approaches:
  - No automatic deletion by the server; rely on bucket lifecycle rules configured out of band.
  - Optional presigned URL expiration only; object may outlive the URL.
  - Optional prefix dedicated to ephemeral renders so operators can attach lifecycle expiration rules.

My recommendation for v1:

- Do not have `serve-ltml` delete objects itself.
- Document that operators should attach lifecycle rules to the configured prefix/bucket.
- Optionally return `expires_at` only for presigned URLs, not as a promise that the object itself will be deleted then.

This keeps the server stateless and avoids hidden cleanup jobs or partially reliable delete-on-timer behavior.

## Failure Handling

Define behavior for each phase:

- LTML parse/render failure: same `400`/`500` behavior as today; no object should be created.
- Upload failure after successful render: return `500 Internal Server Error`; do not return partial location metadata.
- Response serialization failure after successful upload: object may already exist; log enough context to find it.
- If multipart upload is used for large files in the future, abort failed multipart uploads cleanly.

## Implementation Plan

1. Add a small storage abstraction in `cmd/serve-ltml` for render outputs.
   - Example: `type renderSink interface { Store(ctx context.Context, pdfPath string) (RenderLocation, error) }`
   - Provide an inline sink for the current behavior and an S3 sink for object storage.
2. Extend `Config` with validated output-mode and S3 settings.
3. Update startup/config docs in `cmd/serve-ltml/README.md`.
4. Refactor the handler/render pipeline so rendering produces a finished temp PDF file before the final delivery step.
   - This mostly matches current behavior already.
5. Implement an S3 sink using AWS SDK for Go v2.
   - Configure custom endpoint resolution for S3-compatible providers.
   - Support path-style mode.
   - Set `Content-Type: application/pdf`.
6. Define the JSON response schema for storage mode.
7. Decide whether to return raw bucket/key only, bucket/key plus URL, or bucket/key plus optional presigned URL.
8. Add tests for config validation, key generation, JSON responses, and upload failure paths.
9. Add an integration-style test seam using a fake uploader rather than requiring live cloud credentials.
10. Update `cmd/render-ltml` documentation if remote clients need to understand JSON location responses.

## Testing Checklist

- Config tests for valid and invalid `output-mode=s3` combinations.
- Unit tests for object key generation and prefix handling.
- Handler test proving default mode still returns `application/pdf`.
- Handler test proving S3 mode returns JSON with the expected fields.
- Handler test proving render failures do not attempt upload.
- Handler test proving upload failures return `500`.
- Tests verifying request temp directories are still cleaned up in both modes.
- Tests verifying `Content-Type` and optional metadata on uploaded objects.
- Tests for path-style/custom-endpoint configuration using a fake or stubbed uploader.

## Documentation Checklist

- Update `cmd/serve-ltml/README.md` with the new mode, flags/env vars, and response examples.
- Document credential sourcing and recommend the AWS default credential chain.
- Document the distinction between object expiration and presigned URL expiration.
- Document operator expectations around bucket lifecycle rules and retention.
- Document any compatibility impact for `render-ltml -submit` or other clients.

## Open Questions

- Should object-storage mode be process-wide or request-selectable?
- Should the server return bucket/key only, or also a usable URL?
- If a usable URL is returned, should it be public or presigned?
- Should there be a configurable object naming template, or is prefix + generated request ID enough?
- Do we want to expose extra upload headers such as cache control or content disposition in v1?
- Should `render-ltml -submit` eventually grow a mode that prints the returned JSON location instead of expecting PDF bytes?

## Acceptance Criteria

- `serve-ltml` can be started in an explicit S3 output mode without breaking the existing inline-PDF mode.
- A successful render in S3 mode uploads exactly one PDF object with the correct content type and a unique key.
- The HTTP response returns machine-readable location metadata.
- Credentials are sourced without introducing insecure defaults.
- The temp-file lifecycle remains request-scoped and cleaned up locally.
- The retention/lifetime story is clearly documented for operators.
- `go build ./...` and `go test ./...` remain green.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add S3-compatible output mode to serve-ltml #37

Summary

Goals

Non-Goals

Proposed API Shape

Configuration Plan

Authorization And Credential Strategy

Object Naming And Response Contract

Lifetime Management And Cleanup

Failure Handling

Implementation Plan

Testing Checklist

Documentation Checklist

Open Questions

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add S3-compatible output mode to serve-ltml #37

Description

Summary

Goals

Non-Goals

Proposed API Shape

Configuration Plan

Authorization And Credential Strategy

Object Naming And Response Contract

Lifetime Management And Cleanup

Failure Handling

Implementation Plan

Testing Checklist

Documentation Checklist

Open Questions

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions