Skip to content

Conversation

@vverman
Copy link
Contributor

@vverman vverman commented Nov 7, 2025

This feature implements the ability for agentic identities to authenticate themselves via X509 cert bound tokens. We are limiting the scope here to only cloud run based agentic workloads.

@vverman vverman requested review from a team as code owners November 7, 2025 01:55
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Nov 7, 2025
@vverman vverman assigned vverman and nbayati and unassigned vverman Nov 7, 2025
@vverman vverman added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Nov 7, 2025
Copy link
Contributor

@nbayati nbayati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a really good PR! Just a few small comments on the implementation. I'll review the unit tests next.

private AgentIdentityUtils() {}

/**
* Gets the Agent Identity certificate if available and enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can make this documentation more accurate by saying something along the lines of "Loads and parses the Agent Identity certificate if available and not opted out. "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes, do lmk if this looks good!

@vverman vverman force-pushed the feat/agentic-identities-cloudrun branch from e0d450a to 5353838 Compare December 2, 2025 17:12
@vverman vverman requested a review from nbayati December 4, 2025 20:43
import java.security.cert.CertificateFactory;
import java.security.cert.CertificateParsingException;
import java.security.cert.X509Certificate;
import java.util.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you list the imports out?

final class AgentIdentityUtils {
private static final Logger LOGGER = Logger.getLogger(AgentIdentityUtils.class.getName());

static final String GOOGLE_API_CERTIFICATE_CONFIG = "GOOGLE_API_CERTIFICATE_CONFIG";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a constant here:

static final String CERTIFICATE_CONFIGURATION_ENV_VARIABLE = "GOOGLE_API_CERTIFICATE_CONFIG";

Could we re-use the constant there?

Comment on lines +65 to +67
ImmutableList.of(
Pattern.compile("^agents\\.global\\.org-\\d+\\.system\\.id\\.goog$"),
Pattern.compile("^agents\\.global\\.proj-\\d+\\.system\\.id\\.goog$"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to double check this regex:

  1. It is supposed to be proj-\d+ and not project-\d+
  2. $ end of string regex and nothing comes after it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. This is captured in point 1. here in the design doc

boolean warned = false;

// Deterministic polling loop based on pre-calculated intervals.
for (long sleepInterval : POLLING_INTERVALS) {
Copy link
Member

@lqiu96 lqiu96 Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this given that this is to be opt-out and not opt-in. By default, customers may now experience up to 30s of busy waiting just because the filesystem on certain environments can't be loaded fast enough. This applies for customers who don't touch Agent Identities.

IIUC the code from above, the only way to to know if Agent Identities shouldn't be used:

  1. Load the cert file (up to 30s of busy waiting)
  2. Then load and parse the cert file to check the SAN

OR
Require users to explicitly add the new env var to their workloads + envs

Can you let me know if my understanding is correct? If this is the case that the cert has to be loaded, then I feel that opt-in would be better to avoid the possibility busy waiting for all customers.

Comment on lines +192 to +197
LOGGER.warning(
String.format(
"Certificate config file not found at %s (from %s environment variable). "
+ "Retrying for up to %d seconds.",
certConfigPath, GOOGLE_API_CERTIFICATE_CONFIG, TOTAL_TIMEOUT_MS / 1000));
warned = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this would be better if we start using slf4j logging?

CC: @zhumin8

for (long sleepInterval : POLLING_INTERVALS) {
try {
if (Files.exists(Paths.get(certConfigPath))) {
String certPath = extractCertPathFromConfig(certConfigPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, null is used to represent that the cert_path doesn't exist inside the certConfigPath. I think this would be a case where we wouldn't want to retry anymore since the certConfigPath is found.

try {
if (Files.exists(Paths.get(certConfigPath))) {
String certPath = extractCertPathFromConfig(certConfigPath);
if (!Strings.isNullOrEmpty(certPath) && Files.exists(Paths.get(certPath))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing that the certPath from certConfig exists, right? I think this would be the happy path.

Just want to confirm: What about the test case that the certPath from certConfig doesn't exist on the file system. Should we stop retrying and fail instead of continue to retry?

static final String GOOGLE_API_PREVENT_AGENT_TOKEN_SHARING_FOR_GCP_SERVICES =
"GOOGLE_API_PREVENT_AGENT_TOKEN_SHARING_FOR_GCP_SERVICES";

private static final List<Pattern> AGENT_IDENTITY_SPIFFE_PATTERNS =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you add a link to what SPIFFE is? I'm ok with this acronym, but may be helpful as I don't think it's something as well known as URI, WWW, etc

ImmutableList.of(
Pattern.compile("^agents\\.global\\.org-\\d+\\.system\\.id\\.goog$"),
Pattern.compile("^agents\\.global\\.proj-\\d+\\.system\\.id\\.goog$"));
private static final int SAN_URI_TYPE = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Same here, perhaps we can spell out SAN? I don't think this is as well known of an acronym

Comment on lines +72 to +75
private static final int FAST_POLL_CYCLES = 50;
private static final long FAST_POLL_INTERVAL_MS = 100; // 0.1 seconds
private static final long SLOW_POLL_INTERVAL_MS = 500; // 0.5 seconds
private static final long TOTAL_TIMEOUT_MS = 30000; // 30 seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason for these pre-configured values? I'm guessing the data shows that 5s resolves for a vast majority of cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Indicates a pull request not ready for merge, due to either quality or timing. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants