Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,13 +173,27 @@ curl "${curlArgs[@]}" -XGET http://localhost:8000/v1/databases/d3/tables/t1

#### Update a Table

The PUT request requires two values from a prior GET response:

- **`baseTableVersion`** — use the `tableVersion` field from GET (after the first update this becomes a metadata file path, not `"INITIAL_VERSION"`)
- **`tableProperties`** — must include all `openhouse.*` properties from the GET response merged with any user-defined properties; omitting them causes a 500 in the server's cross-cluster eligibility check

First GET the current state:

```
curl "${curlArgs[@]}" -XGET http://localhost:8000/v1/databases/d3/tables/t1
```

Then PUT with the returned `tableVersion` and `tableProperties`:

```
curl "${curlArgs[@]}" -XPUT http://localhost:8000/v1/databases/d3/tables/t1 \
--data-raw '{
"tableId": "t1",
"databaseId": "d3",
"baseTableVersion":<fill in previous version>
"clusterId": "<fill in cluster id>",
"clusterId": "<clusterId from GET response>",
"tableType": "PRIMARY_TABLE",
"baseTableVersion": "<tableVersion from GET response>",
"schema": "{\"type\": \"struct\", \"fields\": [{\"id\": 1,\"required\": true,\"name\": \"id\",\"type\": \"string\"},{\"id\": 2,\"required\": true,\"name\": \"name\",\"type\": \"string\"},{\"id\": 3,\"required\": true,\"name\": \"ts\",\"type\": \"timestamp\"}, {\"id\": 4,\"required\": true,\"name\": \"country\",\"type\": \"string\"}]}",
"timePartitioning": {
"columnName": "ts",
Expand All @@ -191,6 +205,7 @@ curl "${curlArgs[@]}" -XPUT http://localhost:8000/v1/databases/d3/tables/t1 \
}
],
"tableProperties": {
"<copy all key/value pairs from tableProperties in GET response, including openhouse.* keys>": "...",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be arbiltrarilly big, no? What is the max size / is this bounded?

"key": "value"
}
}'
Expand Down
54 changes: 48 additions & 6 deletions buildSrc/src/main/groovy/openhouse.springboot-conventions.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,60 @@ ext {
springLog4jVersion = '2.3.4.RELEASE'
}

configurations {
// Excluding these libraries avoids competing implementations for LoggerFactory
// Standardizing on slf4j + log4j2 as implementation.
all*.exclude module : 'spring-boot-starter-logging'
all*.exclude module : 'logback-classic'
def configureBootLoggingUnification = {
configurations.configureEach {
// Excluding these libraries avoids competing implementations for LoggerFactory
// in boot applications where we standardize on slf4j + log4j2.
exclude module: 'spring-boot-starter-logging'
exclude module: 'logback-classic'
// Exclude Log4j 1.x and its SLF4J bridge so Hadoop transitive deps don't introduce
// a competing SLF4J binding alongside log4j-slf4j2-impl. log4j-1.2-api below provides
// the Log4j 1.x API compatibility layer that routes to Log4j2 instead.
exclude group: 'org.slf4j', module: 'slf4j-log4j12'
exclude group: 'log4j', module: 'log4j'
// Exclude the SLF4J 1.x binding for log4j2. log4j-1.2-api:2.25.3 transitively pulls in
// log4j-core:2.25.3 which requires SLF4J 2.x (slf4j-api:2.0.x). log4j-slf4j-impl (1.x
// binding) has no SLF4JServiceProvider and causes SLF4J 2.x to fall back to Hadoop's
// slf4j-log4j12 at runtime, routing all logging through log4j 1.x instead of log4j2.
// log4j-slf4j2-impl below provides the correct SLF4J 2.x binding.
exclude group: 'org.apache.logging.log4j', module: 'log4j-slf4j-impl'
}

dependencies {
implementation 'org.springframework.boot:spring-boot-starter-log4j2:' + springLog4jVersion
// Bridge Log4j 1.x API calls (from Hadoop) to Log4j2, completing the logging unification.
// With slf4j-log4j12 and log4j:log4j excluded above, this is the sole provider of the
// Log4j 1.x API and routes all calls through Log4j2, making them visible to Spring
// Boot Actuator's /actuator/loggers endpoint.
implementation 'org.apache.logging.log4j:log4j-1.2-api:2.25.3'
// SLF4J 2.x binding for log4j2. log4j-1.2-api:2.25.3 transitively brings in
// log4j-core:2.25.3, which pulls in slf4j-api:2.0.x. The old log4j-slf4j-impl
// artifact only supports SLF4J 1.x (StaticLoggerBinder). log4j-slf4j2-impl provides
// the SLF4J 2.x ServiceProvider so SLF4J routes to log4j2 instead of falling back
// to Hadoop's slf4j-log4j12 at runtime.
implementation 'org.apache.logging.log4j:log4j-slf4j2-impl:2.25.3'
}
}

// Libraries that consume this convention should not dictate transitive logging bindings
// for downstream consumers, so only apply logging unification when boot plugin is present.
pluginManager.withPlugin('org.springframework.boot') {
configureBootLoggingUnification()
}

// Library modules that use this convention still run Spring-based tests and can pull
// Hadoop's legacy SLF4J binding transitively. Keep test classpaths deterministic
// without exporting logging exclusions to downstream consumers.
configurations.matching { cfg ->
['testRuntimeClasspath', 'testCompileClasspath', 'testImplementation', 'testRuntimeOnly'].contains(cfg.name)
}.configureEach {
exclude group: 'org.slf4j', module: 'slf4j-log4j12'
}

dependencies {
api 'io.micrometer:micrometer-registry-prometheus:1.12.3'
api 'org.springframework.boot:spring-boot-starter-web:' + springVersion

implementation 'org.springframework.boot:spring-boot-starter-log4j2:' + springLog4jVersion
api 'org.springframework.boot:spring-boot-starter-actuator:2.7.8'
api 'org.springframework.boot:spring-boot-starter-validation:' + springVersion
annotationProcessor 'org.springframework.boot:spring-boot-configuration-processor:' + springVersion
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ dependencies {
exclude group: 'org.ow2.asm'
exclude group: 'org.xerial'
exclude group: 'javax'
// Keep fixtures logging-neutral so consuming apps own their logging stack.
exclude group: 'org.springframework.boot', module: 'spring-boot-starter-log4j2'
exclude group: 'org.apache.logging.log4j'
exclude group: 'org.slf4j'
exclude group: 'log4j'
}

compileOnly 'org.springframework.boot:spring-boot-starter-tomcat:' + spring_web_version
Expand Down
120 changes: 120 additions & 0 deletions scripts/enable-hdfs-debug.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
#!/usr/bin/env bash
#
# enable-hdfs-debug.sh — Enable HDFS/Iceberg debug logging on a local OpenHouse cluster.
#
# Targets the tables-service running via:
# ./gradlew dockerUp -Precipe=oh-hadoop
#
# Sets DEBUG on the key Hadoop, HDFS, and Iceberg loggers via the Spring Boot
# Actuator /actuator/loggers endpoint at runtime. No restart required.
#
# Usage:
# ./scripts/enable-hdfs-debug.sh # Enable DEBUG logging
# ./scripts/enable-hdfs-debug.sh --undo # Reset to INFO
#
# To target a non-default host/port:
# ./scripts/enable-hdfs-debug.sh --host localhost --port 8000
#
# To set a specific logger manually:
# curl -X POST http://localhost:8000/actuator/loggers/org.apache.hadoop.hdfs.DFSClient \
# -H 'Content-Type: application/json' \
# -d '{"configuredLevel": "DEBUG"}'
#
# To check the current effective level of a logger:
# curl -s http://localhost:8000/actuator/loggers/org.apache.hadoop.hdfs.DFSClient \
# | python3 -m json.tool
#
# To reset a logger to its inherited level:
# curl -X POST http://localhost:8000/actuator/loggers/org.apache.hadoop.hdfs.DFSClient \
# -H 'Content-Type: application/json' \
# -d '{"configuredLevel": null}'
#
set -euo pipefail

HOST="localhost"
PORT="8000"
UNDO=false

LOGGERS=(
# Filesystem layer — logs on FileSystem init and mount table refresh
"org.apache.hadoop.fs"
# NameNode HA proxy provider — covers all variants (IPFailover, ObserverRead, RequestHedging)
"org.apache.hadoop.hdfs.server.namenode.ha"
# IPC/RPC layer — connection establishment and retry events
"org.apache.hadoop.ipc.Client"
"org.apache.hadoop.io.retry.RetryInvocationHandler"
# HDFS client data path — silent on healthy ops; verbose on block errors/retries
"org.apache.hadoop.hdfs.DFSClient"
"org.apache.hadoop.hdfs.DFSInputStream"
"org.apache.hadoop.hdfs.DFSOutputStream"
"org.apache.hadoop.hdfs.DataStreamer"
# Iceberg metadata operations — table refresh, commit, CAS
"org.apache.iceberg"
)

usage() {
cat <<'EOF'
Usage: enable-hdfs-debug.sh [OPTIONS]

Enable or disable HDFS/Iceberg debug logging on a local OpenHouse cluster.

Options:
--host HOST tables-service host (default: localhost)
--port PORT tables-service port (default: 8000)
--undo Reset loggers to inherited level
-h, --help Show this help
EOF
exit 0
}

while [[ $# -gt 0 ]]; do
case "$1" in
--host) HOST="$2"; shift 2 ;;
--port) PORT="$2"; shift 2 ;;
--undo) UNDO=true; shift ;;
-h|--help) usage ;;
*) echo "Unknown option: $1"; usage ;;
esac
done

BASE_URL="http://${HOST}:${PORT}"

if ! curl -sf "${BASE_URL}/actuator/health" &>/dev/null; then
echo "ERROR: tables-service not responding at ${BASE_URL}."
echo " Start the cluster: ./gradlew dockerUp -Precipe=oh-hadoop"
exit 1
fi

if $UNDO; then
echo "Resetting HDFS debug loggers at ${BASE_URL} ..."
for logger in "${LOGGERS[@]}"; do
rc=$(curl -s -o /dev/null -w '%{http_code}' \
-X POST "${BASE_URL}/actuator/loggers/${logger}" \
-H 'Content-Type: application/json' \
-d '{"configuredLevel": null}')
[[ "$rc" == "204" || "$rc" == "200" ]] && echo " RESET $logger" || echo " FAILED $logger (HTTP $rc)"
done
echo ""
echo "Done. Logging restored to defaults."
else
echo "Enabling HDFS debug logging at ${BASE_URL} ..."
echo ""
FAILED=0
for logger in "${LOGGERS[@]}"; do
rc=$(curl -s -o /dev/null -w '%{http_code}' \
-X POST "${BASE_URL}/actuator/loggers/${logger}" \
-H 'Content-Type: application/json' \
-d '{"configuredLevel": "DEBUG"}')
if [[ "$rc" == "204" || "$rc" == "200" ]]; then
echo " DEBUG $logger"
else
echo " FAILED $logger (HTTP $rc)"
FAILED=$((FAILED + 1))
fi
done
echo ""
[[ $FAILED -gt 0 ]] && echo "WARNING: $FAILED logger(s) failed to set."
echo "HDFS debug logging active. Run table operations to generate log output."
echo ""
echo "When done: $0 --undo"
fi
1 change: 0 additions & 1 deletion services/tables/src/main/resources/application.properties
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ management.endpoint.health.enabled=true
management.endpoint.shutdown.enabled=true
management.endpoint.prometheus.enabled=true
management.endpoint.beans.enabled=true
management.endpoint.loggers.enabled=true
management.metrics.distribution.percentiles-histogram.all=true
management.metrics.distribution.maximum-expected-value.catalog_metadata_retrieval_latency=600s
server.shutdown=graceful
Expand Down