Skip to content

Add S3-compatible output mode to serve-ltml #37

@rowland

Description

@rowland

Summary

Add a new serve-ltml output mode that stores each rendered PDF in an Amazon S3-compatible object store and returns the object location instead of streaming the PDF bytes directly in the HTTP response.

The new mode should work with AWS S3 and S3-compatible providers such as MinIO, Cloudflare R2, Backblaze B2 S3, etc., without disturbing the existing direct-PDF response mode.

Goals

  • Preserve the current default behavior: POST /render returns application/pdf unless object-storage mode is explicitly enabled.
  • Add a configurable object-storage mode that uploads the finished PDF to an S3-compatible bucket.
  • Return a stable machine-readable response describing where the PDF was stored.
  • Cover configuration, credentials, object naming, visibility, expiration/lifetime, cleanup, and failure behavior.
  • Keep request-scoped upload assets and the base-path overlay behavior unchanged.

Non-Goals

  • Replacing the current direct streaming mode.
  • Adding a provider-specific implementation that only works with AWS.
  • Building a background job queue in the first iteration.
  • Managing bucket lifecycle rules automatically via cloud APIs in the first iteration.

Proposed API Shape

Keep POST /render as the main endpoint and introduce an alternate response mode:

  • Default mode: unchanged; response is the rendered PDF stream.
  • Object-storage mode: response is JSON, for example:
{
  "storage": "s3",
  "bucket": "example-bucket",
  "key": "renders/2026/03/28/4f3d.../output.pdf",
  "url": "https://storage.example.com/example-bucket/renders/.../output.pdf",
  "expires_at": "2026-03-29T12:34:56Z"
}

Questions to settle in implementation/design review:

  • Should the mode be enabled globally at process startup, or selectable per request?
  • If per request, should selection use Accept: application/json, a query parameter, or an explicit request field/header?
  • Should the response include a direct object URL, a presigned URL, or only bucket/key metadata?
  • Should the server still support inline PDF streaming when storage upload is configured but a request opts out?

My recommendation: start with a server-level config switch for simplicity, but keep the response schema general enough that per-request selection can be added later without breaking clients.

Configuration Plan

Add object-storage configuration as flags and environment variables, following the existing namsral/flag pattern.

Suggested settings:

  • -output-mode / OUTPUT_MODE
    Values: inline (default), s3
  • -s3-endpoint / S3_ENDPOINT
    Required for most S3-compatible providers; optional for AWS if region-based endpoint resolution is used.
  • -s3-region / S3_REGION
    Required for AWS-style signing.
  • -s3-bucket / S3_BUCKET
    Required when output-mode=s3.
  • -s3-prefix / S3_PREFIX
    Optional object key prefix such as renders/.
  • -s3-path-style / S3_PATH_STYLE
    Boolean for providers that require path-style addressing.
  • -s3-public-base-url / S3_PUBLIC_BASE_URL
    Optional externally reachable base URL to use in the returned location instead of deriving from the SDK endpoint.
  • -s3-presign-ttl / S3_PRESIGN_TTL
    Optional duration; when set, return a presigned GET URL that expires after the given TTL.
  • -s3-server-side-encryption / S3_SERVER_SIDE_ENCRYPTION
    Optional value such as AES256 or provider-specific mode.
  • -s3-storage-class / S3_STORAGE_CLASS
    Optional storage tier.
  • -s3-metadata-*
    Optional future extension; not required in the first pass.

Validation rules:

  • output-mode must be validated at startup.
  • When output-mode=s3, require bucket, region/signing configuration, and whatever endpoint settings are necessary for the chosen provider.
  • Reject invalid combinations such as both s3-public-base-url and s3-presign-ttl if the implementation cannot safely honor both.
  • Log the chosen mode and non-secret storage settings at startup.
  • Never log secrets.

Authorization And Credential Strategy

Use the AWS SDK for Go v2 and rely on its standard credential/provider chain where possible.

Credential sources to support:

  • Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN).
  • Shared AWS config/credentials files.
  • IAM role / instance profile / ECS/EKS task role when running on AWS.
  • Static credentials for non-AWS S3-compatible providers via the same environment variables.

Implementation notes:

  • Prefer the default credential chain before introducing project-specific secret flags.
  • Avoid adding custom -s3-access-key / -s3-secret-key flags unless there is a strong operational need; flags are easier to leak via process listings and shell history.
  • If explicit credential flags are ever added, document their risks and keep env/config-chain auth as the preferred path.
  • Ensure the issue covers signature compatibility for custom endpoints and path-style addressing.

Object Naming And Response Contract

Define a predictable but collision-resistant key layout. Example:

  • <prefix>/<yyyy>/<mm>/<dd>/<request-id>/output.pdf

Requirements:

  • Keys must be unique across concurrent requests.
  • Returned metadata must include at least bucket and key.
  • If a URL is returned, specify whether it is:
    • a durable public URL,
    • a derived internal endpoint URL, or
    • a time-limited presigned URL.
  • Set object Content-Type to application/pdf.
  • Consider setting Content-Disposition metadata if the object will be downloaded by browsers.
  • Include expires_at only when the returned URL or object retention policy actually has a meaningful expiration.

Lifetime Management And Cleanup

This needs explicit design because the current server model is request-scoped and ephemeral, while object storage is durable by default.

Plan:

  • Treat upload to S3 as the final output handoff after the PDF file has been fully rendered locally.
  • Keep the existing request temp-directory cleanup exactly as it is today.
  • Add documentation for expected bucket lifecycle management.
  • Support one of these initial lifetime approaches:
    • No automatic deletion by the server; rely on bucket lifecycle rules configured out of band.
    • Optional presigned URL expiration only; object may outlive the URL.
    • Optional prefix dedicated to ephemeral renders so operators can attach lifecycle expiration rules.

My recommendation for v1:

  • Do not have serve-ltml delete objects itself.
  • Document that operators should attach lifecycle rules to the configured prefix/bucket.
  • Optionally return expires_at only for presigned URLs, not as a promise that the object itself will be deleted then.

This keeps the server stateless and avoids hidden cleanup jobs or partially reliable delete-on-timer behavior.

Failure Handling

Define behavior for each phase:

  • LTML parse/render failure: same 400/500 behavior as today; no object should be created.
  • Upload failure after successful render: return 500 Internal Server Error; do not return partial location metadata.
  • Response serialization failure after successful upload: object may already exist; log enough context to find it.
  • If multipart upload is used for large files in the future, abort failed multipart uploads cleanly.

Implementation Plan

  1. Add a small storage abstraction in cmd/serve-ltml for render outputs.
    • Example: type renderSink interface { Store(ctx context.Context, pdfPath string) (RenderLocation, error) }
    • Provide an inline sink for the current behavior and an S3 sink for object storage.
  2. Extend Config with validated output-mode and S3 settings.
  3. Update startup/config docs in cmd/serve-ltml/README.md.
  4. Refactor the handler/render pipeline so rendering produces a finished temp PDF file before the final delivery step.
    • This mostly matches current behavior already.
  5. Implement an S3 sink using AWS SDK for Go v2.
    • Configure custom endpoint resolution for S3-compatible providers.
    • Support path-style mode.
    • Set Content-Type: application/pdf.
  6. Define the JSON response schema for storage mode.
  7. Decide whether to return raw bucket/key only, bucket/key plus URL, or bucket/key plus optional presigned URL.
  8. Add tests for config validation, key generation, JSON responses, and upload failure paths.
  9. Add an integration-style test seam using a fake uploader rather than requiring live cloud credentials.
  10. Update cmd/render-ltml documentation if remote clients need to understand JSON location responses.

Testing Checklist

  • Config tests for valid and invalid output-mode=s3 combinations.
  • Unit tests for object key generation and prefix handling.
  • Handler test proving default mode still returns application/pdf.
  • Handler test proving S3 mode returns JSON with the expected fields.
  • Handler test proving render failures do not attempt upload.
  • Handler test proving upload failures return 500.
  • Tests verifying request temp directories are still cleaned up in both modes.
  • Tests verifying Content-Type and optional metadata on uploaded objects.
  • Tests for path-style/custom-endpoint configuration using a fake or stubbed uploader.

Documentation Checklist

  • Update cmd/serve-ltml/README.md with the new mode, flags/env vars, and response examples.
  • Document credential sourcing and recommend the AWS default credential chain.
  • Document the distinction between object expiration and presigned URL expiration.
  • Document operator expectations around bucket lifecycle rules and retention.
  • Document any compatibility impact for render-ltml -submit or other clients.

Open Questions

  • Should object-storage mode be process-wide or request-selectable?
  • Should the server return bucket/key only, or also a usable URL?
  • If a usable URL is returned, should it be public or presigned?
  • Should there be a configurable object naming template, or is prefix + generated request ID enough?
  • Do we want to expose extra upload headers such as cache control or content disposition in v1?
  • Should render-ltml -submit eventually grow a mode that prints the returned JSON location instead of expecting PDF bytes?

Acceptance Criteria

  • serve-ltml can be started in an explicit S3 output mode without breaking the existing inline-PDF mode.
  • A successful render in S3 mode uploads exactly one PDF object with the correct content type and a unique key.
  • The HTTP response returns machine-readable location metadata.
  • Credentials are sourced without introducing insecure defaults.
  • The temp-file lifecycle remains request-scoped and cleaned up locally.
  • The retention/lifetime story is clearly documented for operators.
  • go build ./... and go test ./... remain green.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions