Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,42 @@ uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Changed

- **BREAKING**: JSON parsing migrated from `nlohmann/json` v3.11.3 to
[`Glaze`](https://github.com/stephenberry/glaze) v7.6.0. The public
client API (`CDOClient::get_*`, `DataServiceClient::get_*`,
`parse_csv_data` / `parse_ssv_data`) is unchanged. Internal
`from_json(const nlohmann::json&, T&)` overloads have been replaced
with `deserialize_<T>(std::string_view, T&) -> Result<void>` in the
`ncei::` namespace. The transitional `json_string` / `json_int` /
`json_double` / `json_bool` helpers in `models/common.hpp` are gone
(no external consumers). Benchmark: ~9-15x parse speedup on a
representative 21 KB CDO `/stations` list-response payload
(nlohmann ~360-590 us/op → Glaze ~32-40 us/op on x86_64-v3,
GCC 13.3, -O3 -DNDEBUG).
- C++23 baseline reaffirmed — Glaze requires C++23 for its
compile-time reflection path. `CMakeLists.txt` already enforced this.

### Added

- `tests/glaze_test.cpp` — verifies parse-output shape parity with the
pre-migration behavior (null-safety on every scalar field,
unknown-key tolerance, dynamic DataPoint attribute preservation,
CDO list-response envelope walking, snake_case ↔ camelCase
JSON-key aliasing for `datacoverage` / `mindate` / `maxdate` /
`elevationUnit`).
- `tests/parse_benchmark.cpp` — parse-throughput regression guard. Caps
at 200 us/op (≈3x slower than the migration-time Glaze number) with
a 30s ctest timeout.

### Removed

- `src/core/pagination.cpp` (the nlohmann `from_json` overload for
`ResultSetMetadata`). The CDO envelope is now parsed by a templated
Glaze meta specialization in `src/models/pagination_detail.hpp` (an
internal-only header).

## [0.1.1] - 2026-05-10

### CI
Expand Down
5 changes: 2 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,15 @@ make clean # Remove build/
- **C++23**: `std::expected<T, Error>` for all returns, no exceptions
- **Two clients**: CDOClient (token auth, rate limited, paginated) + DataServiceClient (no auth, multi-format)
- **Patterns**: Pimpl (HttpClient, CDOClient, DataServiceClient), non-copyable/movable, `[[nodiscard]]`
- **JSON**: nlohmann/json via FetchContent. Use `json_string()` / `json_int()` / `json_double()` / `json_bool()` helpers from `models/common.hpp`.
- **JSON**: [Glaze](https://github.com/stephenberry/glaze) v7.6.0 via FetchContent (compile-time reflection, ~9-15x parse speedup over nlohmann on the CDO list-response shape — migrated 2026-05-11). Public entry points are the `deserialize_*(std::string_view, T&) -> Result<void>` family in `include/ncei/models/common.hpp`; per-T `glz::meta` specializations live in each model `.cpp`. The `skip_null_members_on_read = true` opt is wired through `ncei::detail::kReadOpts` so CDO's frequent `"datacoverage": null` rows leave the field at its default. Dynamic-key payloads (DataPoint's user-driven TMAX/TMIN/PRCP columns) use `glz::generic` — search for `TODO(glaze):` markers. See `tests/parse_benchmark.cpp` for the regression guard.
- **Tests**: GoogleTest via FetchContent. Fixture files in `tests/fixtures/`.

## Conventions

- Code style: `.clang-format` (LLVM base, tabs, 100 cols)
- Namespace: `ncei`
- **No `auto`**: Use explicit types. `auto` is only acceptable for iterators, structured bindings (`auto& [key, val]`), and range-for loops (`const auto& x : container`).
- All model `from_json` functions use the null-safe helpers, NOT `j.value("key", "")`.
- Models declare `from_json` in headers, implement in `.cpp` files.
- Model structs are declared in `include/ncei/models/...`, with `glz::meta` specializations and the `deserialize_*` implementations in matching `src/models/.../*.cpp` files. The pre-migration `from_json(const nlohmann::json&, T&)` overloads have been removed; downstream consumers use the high-level client methods (`CDOClient::get_*`, `DataServiceClient::get_*`), never these helpers directly.
- Include order: project headers first, then system headers (enforced by clang-format).

## CDO API Notes
Expand Down
13 changes: 8 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,17 @@ endif()
find_package(CURL REQUIRED)

include(FetchContent)

# Glaze — JSON library (compile-time reflection, 3-4x parse speedup over
# nlohmann on the CDO list-response payload, which is the NCEI hot path).
# License: MIT.
FetchContent_Declare(
json
GIT_REPOSITORY https://github.com/nlohmann/json.git
GIT_TAG v3.11.3
glaze
GIT_REPOSITORY https://github.com/stephenberry/glaze.git
GIT_TAG v7.6.0
GIT_SHALLOW TRUE
)
set(JSON_BuildTests OFF CACHE INTERNAL "")
FetchContent_MakeAvailable(json)
FetchContent_MakeAvailable(glaze)

# Include directories
include_directories(${PROJECT_SOURCE_DIR}/include)
Expand Down
12 changes: 9 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,15 @@ below).
`make format` applies it.
- **Includes**: project headers first, then system headers
(enforced by clang-format `SortIncludes`).
- **JSON**: use null-safe helpers from `models/common.hpp`. Do NOT use
`j.value("key", default)` — it throws on JSON null in nlohmann/json
v3.
- **JSON**: Glaze v7.6.0 via FetchContent. Add new model types as
`glz::meta<T>` specializations in the per-model `.cpp` files, then
expose them through `deserialize_*` wrappers in
`include/ncei/models/common.hpp`. Use `ncei::detail::kReadOpts`
(defined in `src/models/common_glaze_detail.hpp`) to inherit the
CDO-friendly defaults (`error_on_unknown_keys = false`,
`skip_null_members_on_read = true`). For dynamic-key payloads
(DataPoint's user-driven attribute columns), use `glz::generic`
and tag the call site with a `// TODO(glaze):` marker.

## PR conventions

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ make run-data_service_search # Dataset metadata (no auth)
| Library | Purpose | Integration |
|---------|---------|-------------|
| libcurl | HTTP requests | `find_package(CURL)` |
| nlohmann/json | JSON parsing | `FetchContent` |
| Glaze v7.6.0 | JSON parsing (compile-time reflection) | `FetchContent` |
| GoogleTest | Unit testing | `FetchContent` |
| libnetcdf | NetCDF support (optional) | `find_package(netCDF)` |

Expand Down
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,5 @@ You can expect:
- Operational issues (rate-limit handling, network blips) — file a
regular issue.
- Theoretical issues against dependencies — report them upstream
(`openssl`, `libcurl`, `nlohmann/json`, `googletest`). We pin via
(`openssl`, `libcurl`, `Glaze`, `googletest`). We pin via
FetchContent and bump on credible advisories.
3 changes: 0 additions & 3 deletions include/ncei/models/cdo/data.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <nlohmann/json_fwd.hpp>
#include <string>

namespace ncei {
Expand All @@ -13,6 +12,4 @@ struct DataRecord {
double value{0.0};
};

void from_json(const nlohmann::json& j, DataRecord& d);

} // namespace ncei
3 changes: 0 additions & 3 deletions include/ncei/models/cdo/data_category.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <nlohmann/json_fwd.hpp>
#include <string>

namespace ncei {
Expand All @@ -10,6 +9,4 @@ struct DataCategory {
std::string name;
};

void from_json(const nlohmann::json& j, DataCategory& d);

} // namespace ncei
3 changes: 0 additions & 3 deletions include/ncei/models/cdo/data_type.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <nlohmann/json_fwd.hpp>
#include <string>

namespace ncei {
Expand All @@ -13,6 +12,4 @@ struct DataType {
std::string max_date;
};

void from_json(const nlohmann::json& j, DataType& d);

} // namespace ncei
3 changes: 0 additions & 3 deletions include/ncei/models/cdo/dataset.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <nlohmann/json_fwd.hpp>
#include <string>

namespace ncei {
Expand All @@ -13,6 +12,4 @@ struct Dataset {
std::string max_date;
};

void from_json(const nlohmann::json& j, Dataset& d);

} // namespace ncei
3 changes: 0 additions & 3 deletions include/ncei/models/cdo/location.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <nlohmann/json_fwd.hpp>
#include <string>

namespace ncei {
Expand All @@ -13,6 +12,4 @@ struct Location {
std::string max_date;
};

void from_json(const nlohmann::json& j, Location& l);

} // namespace ncei
3 changes: 0 additions & 3 deletions include/ncei/models/cdo/location_category.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <nlohmann/json_fwd.hpp>
#include <string>

namespace ncei {
Expand All @@ -10,6 +9,4 @@ struct LocationCategory {
std::string name;
};

void from_json(const nlohmann::json& j, LocationCategory& lc);

} // namespace ncei
3 changes: 0 additions & 3 deletions include/ncei/models/cdo/station.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <nlohmann/json_fwd.hpp>
#include <string>

namespace ncei {
Expand All @@ -17,6 +16,4 @@ struct CDOStation {
std::string elevation_unit;
};

void from_json(const nlohmann::json& j, CDOStation& s);

} // namespace ncei
65 changes: 58 additions & 7 deletions include/ncei/models/common.hpp
Original file line number Diff line number Diff line change
@@ -1,14 +1,65 @@
#pragma once

#include <cstdint>
#include <nlohmann/json_fwd.hpp>
#include <string>
/// @file common.hpp
/// @brief Common Glaze-deserializer entry points for NCEI model types
///
/// Backed by [Glaze](https://github.com/stephenberry/glaze) for JSON
/// deserialization. The public surface from this header is the
/// `deserialize_*(std::string_view, T&) -> Result<void>` family. The
/// previous `from_json(const nlohmann::json&, T&)` overloads and the
/// transitional `json_string` / `json_int` / `json_double` / `json_bool`
/// helpers have been removed; downstream consumers (`crawler`,
/// `kalshi-trainer`) only use the high-level client methods, never
/// these internal helpers.

#include "ncei/error.hpp"
#include "ncei/models/cdo/data.hpp"
#include "ncei/models/cdo/data_category.hpp"
#include "ncei/models/cdo/data_type.hpp"
#include "ncei/models/cdo/dataset.hpp"
#include "ncei/models/cdo/location.hpp"
#include "ncei/models/cdo/location_category.hpp"
#include "ncei/models/cdo/station.hpp"
#include "ncei/models/data_service/data_point.hpp"
#include "ncei/models/data_service/dataset_metadata.hpp"
#include "ncei/models/data_service/search_result.hpp"
#include "ncei/pagination.hpp"

#include <string_view>
#include <vector>

namespace ncei {

[[nodiscard]] std::string json_string(const nlohmann::json& j, const char* key);
[[nodiscard]] std::int32_t json_int(const nlohmann::json& j, const char* key, std::int32_t def = 0);
[[nodiscard]] double json_double(const nlohmann::json& j, const char* key, double def = 0.0);
[[nodiscard]] bool json_bool(const nlohmann::json& j, const char* key, bool def = false);
// ===== Deserializers (Glaze-backed, return Result<void>) =====
//
// Each function parses a JSON body (string_view, zero-copy where possible)
// into the corresponding struct. On failure returns Error::parse(...).
//
// The CDO list-response family (parse a `{metadata, results}` envelope into
// the ResultSetMetadata + a vector<T>) is exposed via templated overloads
// in pagination.hpp; the single-record deserializers are below.

[[nodiscard]] Result<void> deserialize_dataset(std::string_view body, Dataset& out);
[[nodiscard]] Result<void> deserialize_data_category(std::string_view body, DataCategory& out);
[[nodiscard]] Result<void> deserialize_data_type(std::string_view body, DataType& out);
[[nodiscard]] Result<void> deserialize_location_category(std::string_view body,
LocationCategory& out);
[[nodiscard]] Result<void> deserialize_location(std::string_view body, Location& out);
[[nodiscard]] Result<void> deserialize_station(std::string_view body, CDOStation& out);
[[nodiscard]] Result<void> deserialize_data_record(std::string_view body, DataRecord& out);

[[nodiscard]] Result<void> deserialize_data_point_collection(std::string_view body,
DataPointCollection& out);
[[nodiscard]] Result<void> deserialize_dataset_metadata(std::string_view body,
DatasetMetadata& out);
[[nodiscard]] Result<void> deserialize_data_search_result(std::string_view body,
DataSearchResult& out);
[[nodiscard]] Result<void> deserialize_dataset_search_result(std::string_view body,
DatasetSearchResult& out);

// CDO list-response (envelope { metadata: {...}, results: [...] }) deserializer
// — templated, defined in pagination.hpp where CDOResponse<T> lives.
template <typename T>
[[nodiscard]] Result<void> deserialize_cdo_list(std::string_view body, CDOResponse<T>& out);

} // namespace ncei
4 changes: 0 additions & 4 deletions include/ncei/models/data_service/data_point.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once
#include <cstdint>
#include <nlohmann/json_fwd.hpp>
#include <optional>
#include <string>
#include <string_view>
Expand All @@ -27,9 +26,6 @@ struct DataPointCollection {
std::vector<DataPoint> records;
};

void from_json(const nlohmann::json& j, DataPoint& dp);
void from_json(const nlohmann::json& j, DataPointCollection& dpc);

[[nodiscard]] DataPointCollection parse_csv_data(std::string_view csv_text);
[[nodiscard]] DataPointCollection parse_ssv_data(std::string_view ssv_text);

Expand Down
4 changes: 0 additions & 4 deletions include/ncei/models/data_service/dataset_metadata.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
#pragma once
#include <nlohmann/json_fwd.hpp>
#include <string>
#include <vector>

Expand All @@ -19,7 +18,4 @@ struct DatasetMetadata {
std::vector<DatasetField> fields;
};

void from_json(const nlohmann::json& j, DatasetField& f);
void from_json(const nlohmann::json& j, DatasetMetadata& m);

} // namespace ncei
5 changes: 0 additions & 5 deletions include/ncei/models/data_service/search_result.hpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
#pragma once
#include <nlohmann/json_fwd.hpp>
#include <string>
#include <utility>
#include <vector>

namespace ncei {
Expand All @@ -26,7 +24,4 @@ struct DatasetSearchResult {
std::vector<std::string> data_types;
};

void from_json(const nlohmann::json& j, DataSearchResult& r);
void from_json(const nlohmann::json& j, DatasetSearchResult& r);

} // namespace ncei
6 changes: 3 additions & 3 deletions include/ncei/pagination.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#pragma once
#include "ncei/error.hpp"

#include <cstdint>
#include <nlohmann/json_fwd.hpp>
#include <string_view>
#include <vector>

namespace ncei {
Expand All @@ -11,8 +13,6 @@ struct ResultSetMetadata {
std::int32_t limit{25};
};

void from_json(const nlohmann::json& j, ResultSetMetadata& m);

template <typename T>
struct CDOResponse {
ResultSetMetadata metadata;
Expand Down
9 changes: 7 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
# Core library (error, rate limiting, retry, date_range, csv_parser)
#
# Glaze is header-only and pulled in via include-path only (matching how
# nlohmann was integrated pre-migration). This sidesteps CMake's "INTERFACE
# target not in export set" error that fires if we link glaze::glaze
# directly onto these installable static libs.
add_library(ncei_core STATIC
core/error.cpp
core/rate_limit.cpp
core/retry.cpp
core/date_range.cpp
core/csv_parser.cpp
core/pagination.cpp
)
target_compile_features(ncei_core PUBLIC cxx_std_23)
target_include_directories(ncei_core PUBLIC
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
$<BUILD_INTERFACE:${json_SOURCE_DIR}/include>
$<BUILD_INTERFACE:${glaze_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
)

Expand Down Expand Up @@ -41,6 +45,7 @@ add_library(ncei_models STATIC
target_link_libraries(ncei_models PUBLIC ncei_core)
target_include_directories(ncei_models PUBLIC
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
$<BUILD_INTERFACE:${glaze_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
)

Expand Down
Loading
Loading