This repository contains experimental tooling for transforming legacy WMDR 1.0 XML records into a simplified WMDR10 JSON representation and then into a draft WMDR2 JSON representation.
The current WMDR2 output is a facility-centric full record encoded as a GeoJSON-like Feature with WMDR2-specific content in properties. Output files use the .json extension.
- OGC API - Records - Part 1: Core
- WMO-No. 1192 WIGOS Metadata Standard
- WMDR 1.0 schemas
- WMDR2 draft UML aligned to OMS
The current conversion workflow has two active stages.
python convert_wmdr10_xml_to_wmdr10_json.pyor explicitly:
python convert_wmdr10_xml_to_wmdr10_json.py \
--config config.yaml \
--source resources/wmdr10_xml_examples \
--target resources/wmdr10_json_examplesThis stage simplifies the XML representation while preserving relevant WMDR content. XML/GML bookkeeping identifiers are stripped from container and descriptive objects, but source identifiers are preserved for referenceable WMDR entities such as deployments, contacts, equipment, or instruments when present.
python convert_wmdr10_json_to_wmdr2_json.pyor explicitly:
python convert_wmdr10_json_to_wmdr2_json.py \
--config config.yaml \
--source resources/wmdr10_json_examples \
--target results/wmdr2_json_examplesThis stage writes one .json WMDR2 full record per facility.
Each WMDR2 output file is a facility-centric JSON Feature.
{
"type": "Feature",
"id": "facility:0-20000-0-06725",
"geometry": {
"type": "Point",
"coordinates": [7.8232, 46.4204, 1540]
},
"temporalGeometry": {
"coordinates": [
[7.823197, 46.420453, 1538],
[7.8232, 46.4204, 1540]
],
"dates": ["2000-08-17", "2024-01-17"]
},
"time": {
"interval": ["2000-08-17", "2025-05-28"]
},
"conformsTo": [
"http://wigos.wmo.int/spec/wmdr/2/conf/core"
],
"properties": {
"type": "facility",
"title": "Blatten",
"created": "2025-07-29T00:00:00Z",
"updated": "2025-07-29T00:00:00Z",
"facilitySets": ["facilitySet:gaw"],
"keywords": ["0-20000-0-06725", "Blatten"],
"observations": [],
"deployments": [],
"instruments": []
}
}type: alwaysFeature.id: facility identifier, usually based on the WIGOS station identifier.geometry: current GeoJSON point geometry, derived from the most recent known coordinates.temporalGeometry: optional WMDR2MovingPointcoordinate history. It remains the only temporal object that uses alignedcoordinatesanddatesarrays.time: facility lifecycle interval. This uses date resolution only. Unknown bounds are represented with...conformsTo: declares the WMDR2 core conformance class. The only allowed value ishttp://wigos.wmo.int/spec/wmdr/2/conf/core. Usehttp, nothttps, because this is a stable identifier URI, not primarily a dereferenceable web URL.properties: contains the facility, observation, deployment, instrument, schedule, and facility-set references.
externalIds is not emitted, because it only repeats the feature id.
temporalGeometry is special and remains an aligned-array MovingPoint object:
"temporalGeometry": {
"coordinates": [
[7.823197, 46.420453, 1538],
[7.8232, 46.4204, 1540]
],
"dates": ["2000-08-17", "2024-01-17"]
}All other temporal* histories use arrays of dated objects. This avoids parallel-array alignment errors and keeps each historical assertion self-contained.
"temporalProgramAffiliation": [
{
"programAffiliation": "GOSGeneral",
"reportingStatus": "operational",
"programSpecificFacilityId": "GOS-06725",
"programSpecificFacilityTitle": "Blatten GOS facility",
"date": "2000-08-17"
},
{
"programAffiliation": "GBON",
"reportingStatus": "operational",
"date": "2022-09-08"
}
]Examples of the same convention include:
"temporalTerritory": [
{"territory": "CHE", "date": "2000-08-17"}
]"deployments": [
{
"id": "deployment:abc123",
"temporalObservingSchedule": [
{"observingSchedule": "schedule_daily_12", "date": "2025-01-01"}
]
}
]Environmental histories are grouped under properties.environment. temporalTopographyBathymetry is not emitted. Its former sub-elements are promoted to first-level environment temporal histories.
"environment": {
"temporalClimateZone": [
{"climateZone": "Cfb", "date": "1980-01-01"}
],
"temporalSurfaceCover": [
{"surfaceCover": "urbanBuiltup", "date": "1981-01-01"}
],
"temporalPopulationDensities": [
{"populationDensity": [100.0, 200.0], "date": "1990-01-01"}
],
"temporalSurfaceRoughness": [
{"surfaceRoughness": "rough", "date": "1991-01-01"}
],
"temporalLocalTopography": [
{"localTopography": "flat", "date": "1970-01-01"}
],
"temporalRelativeElevation": [
{"relativeElevation": "hilltop", "date": "1970-01-01"}
],
"temporalTopographicContext": [
{"topographicContext": "valley", "date": "1970-01-01"}
],
"temporalAltitudeOrDepth": [
{"altitudeOrDepth": 1540, "date": "1970-01-01"}
]
}A facility record references facility sets with facilitySets:
"facilitySets": ["facilitySet:gaw"]Facility-set catalogue entries are validated separately by schemas/wmdr2-facility-sets.schema.json:
{
"facilitySets": [
{
"id": "facilitySet:gaw",
"title": "GAW",
"description": "Global Atmosphere Watch facilities."
}
]
}The singular facilitySet property is obsolete.
The WMDR2 JSON output stores compact values, not full code-list URLs.
"observedVariable": 12006,
"observedDomain": "atmosphere",
"facilityType": "landFixed",
"wmoRegion": "europe"Validation against WMO code lists is expected to be handled by a validator that knows which code list applies to each property.
Observations contain observation-specific metadata and references to deployments.
{
"id": "observation:12006",
"title": "domain: atmosphere; geometry: point; variable: 12006 Horizontal wind speed at specified distance from reference surface",
"observedVariable": 12006,
"observedDomain": "atmosphere",
"observedGeometryType": "point",
"programAffiliations": ["GAWregional"],
"deployments": ["deployment:abc123"]
}Observation-level program affiliation is intentionally non-temporal and plural: use programAffiliations: ["GAWregional"]. Do not use the old singular temporal-object form:
"programAffiliation": [
{"programAffiliation": "GAWregional", "date": ".."}
]Facility-level program affiliation remains temporal under properties.temporalProgramAffiliation, because it can carry reportingStatus, programSpecificFacilityId, and programSpecificFacilityTitle.
Observation reporting uses aligned arrays. Reporting information is sourced from the WMDR1 dataGeneration.reporting block and belongs to the observation, not to the deployment schedule:
"reporting": {
"internationalExchange": [false],
"temporalAggregate": ["P1M"],
"uom": ["DU"],
"dataPolicy": [
{
"dataPolicy": "noLimitation",
"attribution": {
"originator": {
"role": null
}
}
}
],
"levelOfData": ["level1"],
"temporalTimeliness": [
{"timeliness": "PT30M", "date": "1982-03-13"}
]
}Schedules are first-class reusable objects in the WMDR2 full-record model. They are stored under properties.schedules as JSCalendar / RFC 8984 Event objects with a small WMDR2 extension profile. Observations do not embed schedule objects directly.
The schedule applicability history belongs under deployments[].temporalObservingSchedule, because the deployment is the atomic data-collection unit. Each deployment can use a different schedule, or several deployments can reuse the same schedule uid.
"schedules": [
{
"@type": "Event",
"uid": "schedule_df3ec3dc94b9",
"start": "0001-01-01T00:00:00",
"timeZone": "UTC",
"duration": "P1D",
"recurrenceRules": [
{"@type": "RecurrenceRule", "frequency": "daily"}
],
"wmo.int:aggregation": {
"temporalAggregate": "P1M",
"diurnalBaseTime": "00:00:00"
}
}
],
"deployments": [
{
"id": "deployment:abc123",
"temporalObservingSchedule": [
{"observingSchedule": "schedule_df3ec3dc94b9", "date": "1982-03-13"}
]
}
]Deployments are referenceable objects. Their id is preserved from the WMDR1 XML source when the source provides a useful deployment identifier.
{
"id": "deployment:abc123",
"observingMethod": "automaticWeatherStation",
"localReferenceSurface": "localGround",
"verticalDistanceFromReferenceSurface": 2.0,
"instrument": ["instrument:def456"],
"serialNumbers": {
"serialNumber": ["S123"],
"dates": ["2020-01-01"]
},
"temporalObservingSchedule": [
{"observingSchedule": "schedule_daily_12", "date": "2025-01-01"}
]
}Deployment records do not carry title, type, manufacturer, or model properties.
Instruments are reusable catalogue objects. Manufacturer and model are stored here, while serial-number histories remain with deployments. Optional title, description, verticalRange, observableVariables, and observableGeometry properties are part of the schema, but are only emitted when suitable source values are available; WMDR 1.0 records often do not provide all of them. verticalRange is a WMDR2 object with numeric min and max limits. observableVariables is an array of compact values from http://codes.wmo.int/wmdr/ObservedVariable where possible, or free-text descriptions where no code-list value is available. observableGeometry is a compact term from http://codes.wmo.int/wmdr/Geometry.
{
"id": "instrument:def456",
"title": "Weather transmitter",
"description": "Automatic weather instrument.",
"manufacturer": "Vaisala",
"model": "WXT536",
"verticalRange": {
"min": 0,
"max": 30
},
"observableVariables": [12006, "local free-text variable"],
"observableGeometry": "point"
}Contacts are stored in properties.contacts. A contact may include an id, organization, name, position, emails, phones, links, and roles.
{
"id": "contact:owner:rmi",
"organization": "Royal Meteorological Institute of Belgium",
"roles": ["owner"]
}Role values should be specific role codes, not URLs to a generic role code list.
keywords are retained as lightweight discovery text only when configured. If the converter section has no discovery block, the built-in defaults emit facility keywords from identifier and name, and deployment keywords from selected instrument/deployment fields. As soon as a discovery block is present in config.yaml, it is authoritative: omitted buckets and empty lists suppress extraction. For example, this disables keywords completely:
convert_wmdr10_json_to_wmdr2_json:
source: resources/wmdr10_json_examples
target: results/wmdr2_json_examples
discovery:
facility:
keywords: []
links: []
observation:
keywords: []
links: []
deployment:
keywords: []
links: []To retain the former default facility keywords explicitly, use:
convert_wmdr10_json_to_wmdr2_json:
discovery:
facility:
keywords: [identifier, name]themes are intentionally not emitted in the current WMDR2 core representation. Controlled-vocabulary values are represented as explicit WMDR2 properties instead.
The JSON Schemas carry human-readable description annotations adapted from WMDR 1.0 xs:documentation for comparable concepts. Examples include deployment vertical distance and local reference surface, equipment manufacturer/model/description, facility environmental context, surface cover, climate zone, programme affiliation, reporting status, population, surface roughness, and facility-set association. New WMDR2-only instrument elements such as verticalRange, observableVariables, and observableGeometry are documented directly in the WMDR2 schema and are optional when no WMDR 1.0 source content exists.
The active schema files should live under schemas/:
schemas/
wmdr2-common.schema.json
wmdr2-record-feature.schema.json
wmdr2-facility-sets.schema.json
Run the schema tests with:
pytest -q tests/test_wmdr2_schemas.pyRun all tests with:
pytest -qThe current WMDR2 workflow no longer uses the previous Records Part 1 GeoJSON conversion path. The following root-level files can be removed if they are not referenced by local branches or pending work:
convert_wmdr10_json_to_records_part1.pysettings.geojsonversion.geojson- root-level
wmdr2-common.schema.json - root-level
wmdr2-feature-collection.schema.json - root-level
wmdr2-record-feature.schema.json
Do not remove the active schema files under schemas/.