Skip to content

Plan agent connectivity to the backend API from outside the cluster #206

@vredchenko

Description

@vredchenko

Background

The SmartEM Agent runs on Windows EPU workstations alongside microscopes, outside the k8s cluster that hosts smartem-decisions. It needs to reach the backend API to:

  • Read from / write to acquisition data over REST (CRUD on sessions, gridsquares, foilholes, etc.)
  • Receive ML recommendations via SSE
  • Authenticate with Keycloak using the SmartEM_Agent confidential client (client-credentials grant, per smartem-decisions#284)

Current state

  • Development (k3s): the agent reaches the backend via the smartem-http-api-service NodePort 30080 (http://<node-ip>:30080). Works only because the dev cluster lives on the same network as the developer machine.
  • Staging / production: no defined story. The frontend k8s manifests landing in feat(k8s): deploy smartem-frontend across dev/staging/production #205 cover browser traffic only — the SPA pod's own nginx reverse-proxies /api/ to smartem-http-api-service internally inside the pod, so the existing frontend ingress is not a route the agent can use from outside the cluster.

What needs deciding

A deployment-friendly story for the agent's outside-cluster connectivity to the backend in non-dev environments. Sketch of the option space:

Option A — Separate backend ingress

  • New k8s/environments/{staging,production}/smartem-http-api-ingress.yaml routing a dedicated host (e.g. smartem-api-staging.diamond.ac.uk / smartem-api.diamond.ac.uk) to smartem-http-api-service.
  • Pros: clean separation, agent connects to a stable, well-named host; independent failure domain from the frontend; sizing matches workload (SSE + bulk REST, not browser navigations).
  • Cons: extra TLS cert, extra DNS record, extra ingress rule.

Option B — Agent traffic via the frontend ingress

  • Reuse smartem-staging.diamond.ac.uk / smartem.diamond.ac.uk. Either (i) keep the SPA pod's nginx in the path, or (ii) add a second backend rule alongside / so the cluster ingress controller proxies /api/ to smartem-http-api-service directly.
  • Pros: one hostname, one cert, one ingress rule (variant ii also one fewer hop).
  • Cons (variant i): couples agent traffic to the SPA pod's nginx, intertwining failure modes; SPA pod sized for browser traffic, not N concurrent SSE streams. (variant ii): mixes user-facing and machine-facing traffic on the same name; same-origin is irrelevant for the agent (not browser-based).

Option C — LoadBalancer on the backend service

  • Set smartem-http-api-service.type: LoadBalancer in staging/production (or sit a MetalLB / on-prem LB in front).
  • Pros: simple, no ingress controller involved.
  • Cons: on-prem LB scarcity; no TLS termination by default; one LB IP per service.

Option D — Other

E.g. service mesh, per-microscope tunnel, agent goes through a relay. Probably not warranted for the current shape of the workload but worth a brief mention.

Constraints to factor in

  • Auth: agent uses SmartEM_Agent client-credentials against the DLS Keycloak realm. The backend already accepts tokens with azp: SmartEM_Agent (added to KEYCLOAK_ALLOWED_AZP in feat(k8s): deploy smartem-frontend across dev/staging/production #205). No CORS concerns since the agent isn't a browser.
  • TLS: agent traffic should be TLS-terminated at the ingress in non-dev environments. The agent does not need to live on the DLS internal network if a properly-secured public ingress is exposed.
  • SSE: the agent subscribes to ML recommendations via long-lived SSE streams. Whichever route is chosen must support that (ingress controller timeouts, response buffering off, keep-alives).
  • Scale: multiple agents per facility, each holding at least one SSE connection plus periodic REST traffic.
  • Locality: on-prem at DLS the agent and cluster will share the DLS network; the path can be much shorter than a public ingress. Worth deciding whether to design for a single deployment shape or two (DLS-internal vs federated facility).

Related

  • smartem-decisions#284 — agent auth strategy (closed: Keycloak client-credentials with SmartEM_Agent)
  • smartem-devtools#205 — frontend k8s deploy (adds the frontend ingress; explicitly defers this agent connectivity story)
  • smartem-devtools#181 — broader k8s modernisation (Gateway API, Ingress, ClusterIP); overlapping scope, this issue is the narrower agent-specific slice
  • smartem-devtools#179 — staging/production manifests vs on-prem reality (where this lands in practice)

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    devopsCI/CD, deployment, infrastructure, or tooling workresearchInvestigation, spikes, or proof-of-concept worksmartem-agentEPU workstation agent for microscope integration

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions