From 2f6ab0addfe69a1c1046020dc404b7baef482505 Mon Sep 17 00:00:00 2001 From: Akim Khalitov Date: Fri, 12 Jun 2026 16:52:46 +0400 Subject: [PATCH] add MPF pipeline docs: setup guide and endpoint reference --- SUMMARY.md | 2 + .../operations/mpf-pipeline-api.md | 128 +++++++++++ .../provider-directory-pipeline.md | 200 ++++++++++++++++++ 3 files changed, 330 insertions(+) create mode 100644 docs/api-reference/operations/mpf-pipeline-api.md create mode 100644 docs/run-payerbox/provider-directory-pipeline.md diff --git a/SUMMARY.md b/SUMMARY.md index 353391e..91ffd24 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -9,6 +9,7 @@ * [Run Payerbox](run-payerbox/README.md) * [Architecture](run-payerbox/architecture.md) * [Deploy](run-payerbox/deploy.md) + * [MPF Pipeline](run-payerbox/provider-directory-pipeline.md) * [Maintain](run-payerbox/maintain/README.md) * [Observability](run-payerbox/maintain/observability.md) * [Upgrade](run-payerbox/maintain/upgrade.md) @@ -55,4 +56,5 @@ * [order-select](api-reference/operations/cds-hook-order-select.md) * [order-dispatch](api-reference/operations/cds-hook-order-dispatch.md) * [appointment-book](api-reference/operations/cds-hook-appointment-book.md) + * [MPF Endpoints](api-reference/operations/mpf-pipeline-api.md) * [Releases](releases.md) diff --git a/docs/api-reference/operations/mpf-pipeline-api.md b/docs/api-reference/operations/mpf-pipeline-api.md new file mode 100644 index 0000000..d721249 --- /dev/null +++ b/docs/api-reference/operations/mpf-pipeline-api.md @@ -0,0 +1,128 @@ +--- +description: >- + MPF pipeline endpoint reference: the sync and refresh triggers and the + public directory endpoint. +--- + +# MPF Endpoints + +## Auth + +The trigger endpoints take a Bearer token of a `client_credentials` client listed in `MPF_TRIGGER_CLIENT_IDS`. Mint the token from Aidbox: + +```http +POST /auth/token +Content-Type: application/json + +{ "grant_type": "client_credentials", "client_id": "mpf-sync", "client_secret": "" } +``` + +## POST /admin/mpf/sync + +The full run: export, filter, bundle, publish. CMS crawls the registered URLs daily, so schedule a daily run. + +### Body + +All fields are optional. Both only set the publish path, and the export scope stays the one baked into the image. + + + + + + + +
FieldDescription
contractPublish path segment. Defaults to the contract baked into the image, so always pass the deployment's own contract.
yearPublish path segment. Defaults to the current year. CMS reads one URL per contract and year. During open enrollment, run a second sync with next year's value.
+ +### Example + +{% tabs %} +{% tab title="Request" %} +```http +POST /admin/mpf/sync +Authorization: Bearer +Content-Type: application/json + +{ "contract": "H1234", "year": 2026 } +``` +{% endtab %} +{% tab title="Response" %} +```json +{ + "status": "accepted", + "contract": "H1234", + "year": 2026, + "generationTime": "2026-06-12T06:15:00.000Z", + "message": "Sync started: Aidbox $export → poll → pipeline. Watch pod logs for progress." +} +``` +{% endtab %} +{% endtabs %} + +## POST /admin/mpf/refresh + +A debugging shortcut: skips the `$export` (where a full sync spends most of its time) and rebuilds bundles from a previous export. Answers `404` when the folder has no files. + +### Body + + + + + + + + +
FieldDescription
folder (required)A previous export's folder in the source bucket. The portal re-signs its files through Aidbox and rebuilds the bundles.
contractSame value as of the sync that needs to be re-bundled.
yearSame value as of the sync that needs to be re-bundled.
+ +### Example + +{% tabs %} +{% tab title="Request" %} +```http +POST /admin/mpf/refresh +Authorization: Bearer +Content-Type: application/json + +{ "folder": "20260610_1ebb44dd-5669-4181-a7b2-1b72d116a7eb" } +``` +{% endtab %} +{% tab title="Response" %} +```json +{ + "status": "accepted", + "contract": "H1234", + "year": 2026, + "folder": "20260610_1ebb44dd-5669-4181-a7b2-1b72d116a7eb", + "generationTime": "2026-06-12T10:30:00.000Z", + "message": "Refresh started from export folder (re-sign + re-bundle). Watch pod logs." +} +``` +{% endtab %} +{% endtabs %} + +## GET /mpf-provider-directory/{contract}/{year}/{file} + +Public, no auth: the endpoint CMS crawls. Proxies the storage bucket (which can stay private) and supports conditional GET, so repeat crawls only download files that changed. `index.json` lists the bundle URLs. The resources inside the bundles conform to the Plan-Net profiles. + +```http +GET /mpf-provider-directory/H1234/2026/index.json +GET /mpf-provider-directory/H1234/2026/PractitionerRole-001.json +``` + +| Status | Meaning | +|---|---| +| `200` | File served. | +| `304` | Unchanged since the crawler's last visit (`If-None-Match` / `If-Modified-Since`). | +| `404` | Unknown path, nothing published yet, or a publish in progress. | +| `502` | Signing or storage failed. The body says which: `Failed to obtain signed download URL` (check the access policy and `MPF_EXPORT_CLIENT_*`) or `Upstream storage error` (check the bucket). | +| `503` | `MPF_STORAGE_*` is not set on the portal. | + +{% code title="index.json" %} +```json +{ + "provider_urls": [ + "https:///H1234/2026/InsurancePlan-001.json", + "https:///H1234/2026/Organization-001.json" + ] +} +``` +{% endcode %} diff --git a/docs/run-payerbox/provider-directory-pipeline.md b/docs/run-payerbox/provider-directory-pipeline.md new file mode 100644 index 0000000..c398e9d --- /dev/null +++ b/docs/run-payerbox/provider-directory-pipeline.md @@ -0,0 +1,200 @@ +--- +description: >- + Set up and run the Payerbox pipeline that builds a scoped CMS Plan-Net + provider directory from the FHIR engine and publishes it to object storage. +--- + +# MPF Pipeline + +An optional module of the [FHIR App Portal](../fhir-app-portal/README.md), built into its [image](https://hub.docker.com/r/healthsamurai/fhir-app-portal) and enabled with `MPF_ENABLED=true`. It builds a CMS Plan-Net provider directory and publishes it as static FHIR `Bundle` files for the CMS [Medicare Plan Finder](https://www.medicare.gov/plan-compare/) (MPF) crawler. + +Data flow: + +```mermaid +graph LR + T(scheduler):::yellow2 --> S(portal
sync endpoint):::violet2 + S --> A(FHIR engine
/$export):::blue2 + A --> SRC(source bucket):::green2 + SRC --> B(portal
scope filter):::violet2 + B --> C(bundles + index.json):::violet2 + C --> PUB(storage bucket):::green2 + PUB --> E(portal
public endpoint):::violet2 + E --> F(CMS crawler):::yellow2 +``` + +All bucket access goes through Aidbox-signed URLs, so neither bucket needs to be public. The pipeline is triggered over HTTP, typically by a daily Kubernetes CronJob. A production-scale run takes upwards of half an hour. Endpoint details live in the [API reference](../api-reference/operations/mpf-pipeline-api.md). + +## Prerequisites + +- Aidbox access (Payerbox's FHIR engine). +- Two buckets: a **source bucket** for `$export` output and a **storage bucket** for the final files. +- Aidbox connected to both buckets. All bucket access goes through it. GCP and Azure use workload identity, AWS an `AwsAccount` resource. [File storage](https://www.health-samurai.io/docs/aidbox/file-storage) in the Aidbox docs covers the setups and IAM roles. [Step 4](#verify-bucket-signing) verifies the setup. + +{% hint style="warning" %} +Every run adds a new folder to the source bucket. Set a lifecycle rule to expire old ones, keeping a few days for `folder` re-bundling. The storage bucket holds only the latest set. +{% endhint %} + +## Set up + +{% stepper %} +{% step %} + +### Create the sync client + +The pipeline authenticates to Aidbox as its own client. `PUT` this (and [step 2](#create-the-access-policy)'s policy) with admin credentials. The secret reappears in [step 3](#configure-the-environment). + +{% code title="PUT /Client/mpf-sync" %} +```json +{ + "resourceType": "Client", + "id": "mpf-sync", + "secret": "", + "grant_types": ["client_credentials"], + "auth": { "client_credentials": { "access_token_expiration": 3600 } } +} +``` +{% endcode %} + +{% endstep %} +{% step %} + +### Create the access policy + +Least privilege: only the calls the portal makes. + +{% code title="PUT /AccessPolicy/mpf-sync-policy" %} +```json +{ + "resourceType": "AccessPolicy", + "id": "mpf-sync-policy", + "engine": "matcho", + "matcho": { + "$one-of": [ + { "client": { "id": "mpf-sync" }, "request-method": "get", "uri": "#^/fhir/\\$export(\\?|$)" }, + { "client": { "id": "mpf-sync" }, "request-method": "get", "uri": "#^/fhir/\\$export-status/" }, + { "client": { "id": "mpf-sync" }, "request-method": "put", "uri": "#^/Notification/" }, + { "client": { "id": "mpf-sync" }, "request-method": "post", "uri": "#^/Notification/[^/]+/\\$send$" }, + { "client": { "id": "mpf-sync" }, "request-method": "post", "uri": "#^/gcp/workload-identity/storage/" }, + { "client": { "id": "mpf-sync" }, "request-method": "get", "uri": "#^/gcp/workload-identity/storage/" }, + { "client": { "id": "mpf-sync" }, "request-method": "delete", "uri": "#^/gcp/workload-identity/storage/" }, + { "client": { "id": "mpf-sync" }, "request-method": "post", "uri": "#^/aws/storage/" }, + { "client": { "id": "mpf-sync" }, "request-method": "get", "uri": "#^/aws/storage/" }, + { "client": { "id": "mpf-sync" }, "request-method": "delete", "uri": "#^/aws/storage/" }, + { "client": { "id": "mpf-sync" }, "request-method": "post", "uri": "#^/azure/workload-identity/storage/" }, + { "client": { "id": "mpf-sync" }, "request-method": "get", "uri": "#^/azure/workload-identity/storage/" }, + { "client": { "id": "mpf-sync" }, "request-method": "delete", "uri": "#^/azure/workload-identity/storage/" } + ] + } +} +``` +{% endcode %} + +{% endstep %} +{% step %} + +### Configure the environment + +On **Aidbox**, point `$export` at the source bucket: + +| Variable | Description | +|---|---| +| `BOX_FHIR_BULK_STORAGE_PROVIDER` (required) | `gcp`, `aws`, or `azure`. Lets `$export` write to object storage. | +| `BOX_FHIR_BULK_STORAGE_GCP_BUCKET` (required) | The source bucket (setting name is provider-specific, GCP shown). | + +On the **portal**: + +| Variable | Description | +|---|---| +| `MPF_ENABLED` (required) | `true` to turn the module on. | +| `MPF_EXPORT_CLIENT_ID`, `MPF_EXPORT_CLIENT_SECRET` (required) | `mpf-sync` and the secret from [step 1](#create-the-sync-client). | +| `MPF_STORAGE_PROVIDER` (required) | Same provider as Aidbox's bulk storage. | +| `MPF_STORAGE_BUCKET` (required) | The bucket the bundles and `index.json` are published to. | +| `MPF_PUBLIC_BASE_URL` (required) | Prefix for the bundle links in `index.json`: the portal's public endpoint (`https:///mpf-provider-directory`) or a public bucket. | +| `MPF_FULL_URL_BASE` (required) | FHIR base URL for bundle entries' `fullUrl`, e.g. `https://fhir./fhir`. | +| `MPF_TRIGGER_CLIENT_IDS` (required) | Clients allowed to trigger runs. Set `admin-api,mpf-sync` (the default lacks `mpf-sync`). | +| `MPF_BUCKET_PREFIX` | Source bucket root URL. Only `folder` refresh uses it. | +| `MPF_ALERT_EMAIL_TO` | Failure-alert recipients via Aidbox `Notification` (needs its email provider configured). Unset: log-only. | +| `MPF_STORAGE_ACCOUNT_ID` | On AWS: the `AwsAccount` resource id. On Azure: the storage account name. Not used on GCP. | +| `MPF_BUNDLE_SIZE` | Max entries per bundle. Default `1000`. | +| `MPF_MAX_BUNDLE_BYTES` | Max bytes per bundle before rolling to a new file. Default 250 MB. | +| `MPF_OUTPUT_DIR` | Local directory where bundles are staged. Default `./mpf-output`. | + +Resource types, profile filters, scope IDs, and the default contract are fixed in the portal image. Changing them is a portal release (coordinate with Health Samurai), or use the [Custom export flow](#custom-export-flow). + +{% endstep %} +{% step %} + +### Verify bucket signing + +Prove the signing chain from [Prerequisites](#prerequisites) with one object before running the pipeline: + +{% code title="Signing probe" %} +```bash +# get a token +TOKEN=$(curl -s -X POST https:///auth/token \ + -H 'Content-Type: application/json' \ + -d '{"grant_type":"client_credentials","client_id":"mpf-sync","client_secret":""}' \ + | jq -r .access_token) + +# get a presigned upload URL +URL=$(curl -s -X POST https:///gcp/workload-identity/storage/ \ + -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' \ + -d '{"filename":"_probe.json"}' | jq -r .url) + +# put data through it +curl -i -X PUT "$URL" \ + -H 'Content-Type: application/json' -d '{"probe":true}' +``` +{% endcode %} + +On AWS or Azure, the endpoint prefix is `/aws/storage//` or `/azure/workload-identity/storage//`. + +{% endstep %} +{% step %} + +### Run and verify + +Trigger a sync as `mpf-sync` (listed in `MPF_TRIGGER_CLIENT_IDS`, [step 3](#configure-the-environment)): + +{% code title="First run" %} +```bash +TOKEN=$(curl -s -X POST https:///auth/token \ + -H 'Content-Type: application/json' \ + -d '{"grant_type":"client_credentials","client_id":"mpf-sync","client_secret":""}' \ + | jq -r .access_token) + +curl -X POST https:///admin/mpf/sync \ + -H "Authorization: Bearer $TOKEN" \ + -H 'Content-Type: application/json' \ + -d '{"contract":"H1234"}' +``` +{% endcode %} + +The endpoint is asynchronous and the pipeline runs in the background. Verify, in order: + +1. Pod logs: `[mpf:sync] export kicked off`, later `export completed`. +2. Source bucket: a new folder of NDJSON files. +3. Logs: `publishing via signed URLs`. A `403` here means the policy is missing the signing branches from [step 2](#create-the-access-policy). +4. Logs: `run completed` with `uploaded=true`. The storage bucket holds bundles and `index.json`. +5. The public endpoint (the URL the crawler will go to) works: `GET https:///mpf-provider-directory/H1234/2026/index.json`. + +{% endstep %} +{% endstepper %} + +## Custom export flow + +The prebuilt pipeline covers the whole path out of the box. A custom flow (different scope, resource types, or post-processing) can reuse the same `$export`, client, policy, and storage setup. A runnable example lives in the [Aidbox examples repository](https://github.com/Aidbox/examples). + +## Related + +{% content-ref url="../api-reference/operations/mpf-pipeline-api.md" %} +[mpf-pipeline-api.md](../api-reference/operations/mpf-pipeline-api.md) +{% endcontent-ref %} + +{% content-ref url="../interop-apis/provider-directory.md" %} +[provider-directory.md](../interop-apis/provider-directory.md) +{% endcontent-ref %} + +{% content-ref url="../compliance/cms-9115.md" %} +[cms-9115.md](../compliance/cms-9115.md) +{% endcontent-ref %}