fix: source jvm.config from peon.sh for K8s peon containers#19364
Open
Yomanz wants to merge 5 commits intoapache:masterfrom
Open
fix: source jvm.config from peon.sh for K8s peon containers#19364Yomanz wants to merge 5 commits intoapache:masterfrom
Yomanz wants to merge 5 commits intoapache:masterfrom
Conversation
| * {@code /status/properties} on the peon pod. | ||
| */ | ||
| @Disabled("requires charts.datainfra.io chart, see https://github.com/apache/druid/pull/19047") | ||
| public class KubernetesPeonJvmConfigDockerTest extends BaseKubernetesTaskRunnerDockerTest |
Member
There was a problem hiding this comment.
P2 Regression coverage is permanently disabled
The new regression test that verifies peon.sh sources jvm.config is annotated with @disabled, so the fix has no active automated coverage. Since this bug is specifically in container startup behavior, leaving the only regression test disabled means the same breakage can return without CI noticing. Please either make this test runnable in the existing embedded test setup or add a narrower active test around the script/config generation path.
Author
There was a problem hiding this comment.
Just pushed something to cover this
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #18791.
Description
distribution/docker/peon.shnever sourced$SERVICE_CONF_DIR/jvm.config. As a result, JVM options declared in ajvm.configConfigMap, which is documented atdocs/development/extensions-core/k8s-jobs.mdunder the Custom Template Pod Adapter are silently ignored by peon containers.Peons started with whatever
$JAVA_OPTSthe container happened to inherit and then go back to container-aware JDK defaults (MaxRAMPercentage=25%,MaxDirectMemorySize=Xmx). This causesOutOfMemoryErrorwith a mysterious cause.The bug affects all three K8s pod adapters equally at the shell level.
peon.shis the entry point for every K8s-launched peon but is most impactful forcustomTemplateAdapterusers.K8sTaskAdapter(used byoverlordSingleContainer/overlordMultiContainer) additionally injects aJAVA_OPTSenv var fromdruid.indexer.runner.javaOptsArray, so its users had a working configuration path even withoutjvm.configsourcing.PodTemplateTaskAdapter#getEnvdoes not injectJAVA_OPTS, leavingcustomTemplateAdapterusers with only the documented-but-brokenjvm.configpath and the workaround of hardcodingJAVA_OPTSinto the pod template'senv:block.Fix: source jvm.config from peon.sh
Mirror the behaviour that
distribution/docker/druid.shalready has at line 188:The
-fguard is the only deviation fromdruid.sh. It preserves existing behaviour for clusters that don't mount ajvm.configat all, avoiding a newcat: No such file or directorystderr line on every peon startup.Options in
JAVA_OPTStake precedence over those injvm.configunder OpenJDK (same caveat asdruid.sh), so adapters that injectJAVA_OPTSviajavaOptsArraycontinue to win duplicates.Docs update
Added a bit of info to
docs/development/extensions-core/k8s-jobs.mdunder the Custom Template Pod Adapter section, making the mechanism explicit:peon.shsourcesjvm.configfrom the directory mounted asnodetype-config-volume, prepends it toJAVA_OPTS, andJAVA_OPTSwins for duplicates. The existing example ConfigMap with-Xmx1024M -XX:MaxDirectMemorySize=1000Mnow behaves as documented.Test added
Added
KubernetesPeonJvmConfigDockerTestunderembedded-tests/.../k8s/. It uses a new operator manifest variant that injects-Ddruid.test.peon.jvmconfig.marker=trueinto the cluster-leveljvm.options(written by the Druid operator into each node'sjvm.config), submits a long-runningNoopTask, port-forwards to the resulting peon pod via the fabric8 client, and asserts the marker appears in the peon JVM's/status/properties. The test is disabled though, like the existingKubernetesTaskRunnerDirectModeDockerTest/KubernetesTaskRunnerCachingModeDockerTestpending resolution of thecharts.datainfra.ioavailability issue (#19047).Small harness refactor to support this:
BaseKubernetesTaskRunnerDockerTestnow exposesgetManifestTemplate()as an overridable hook and promotes the localk3sClustervariable to a protected field so subclasses can reach into the cluster.K3sClusterResourcegains a publicgetKubernetesClient()getter returning the fabric8 client.Release note
Fixed: JVM options declared in a
jvm.configConfigMap are now honoured by peons launched via the Kubernetes overlord extension (including when using thecustomTemplateAdapter). Previously,peon.shsilently ignoredjvm.config, causing the peon JVM to fall back to container-aware JDK defaults.Key changed/added files in this PR
distribution/docker/peon.shdocs/development/extensions-core/k8s-jobs.mdembedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/KubernetesPeonJvmConfigDockerTest.java(new)embedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/BaseKubernetesTaskRunnerDockerTest.javaembedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/K3sClusterResource.javaembedded-tests/src/test/resources/manifests/druid-service-with-operator-peonjvmconfig.yaml(new)This PR has: