Skip to content

HIVE-29671: Harden Schema path authorization for avro serde#6549

Open
saihemanth-cloudera wants to merge 2 commits into
apache:masterfrom
saihemanth-cloudera:HIVE-29671
Open

HIVE-29671: Harden Schema path authorization for avro serde#6549
saihemanth-cloudera wants to merge 2 commits into
apache:masterfrom
saihemanth-cloudera:HIVE-29671

Conversation

@saihemanth-cloudera

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

@sonarqubecloud

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Avro SerDe schema resolution by restricting/validating avro.schema.url schemes (and optionally allowlisting HTTP hosts), and extends Hive authorization plumbing so filesystem-based schema URLs are authorized as DFS_URI inputs during create/alter and query-time reads.

Changes:

  • Add scheme/host validation for avro.schema.url, disable remote HTTP fetch by default, and materialize resolved schemas into avro.schema.literal.
  • Enrich authorization inputs to include filesystem-backed Avro schema URLs (metastore events + query authorization paths).
  • Add new HiveConf knobs and expand unit/integration test coverage for the new behavior.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java Validates schema URL scheme/host and changes how schemas are fetched/materialized.
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java Adds tests for scheme/host validation and schema materialization.
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Introduces new config vars for allowed schemes and remote HTTP settings/allowlist.
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java Adds helpers to derive/authorize filesystem schema URL inputs on reads.
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/command/CommandAuthorizerV2.java Enriches read inputs before privilege checks to include schema URL entities.
ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java Hooks input addition to also add schema URL read entities.
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/events/CreateTableEvent.java Adds DFS_URI privilege objects for Avro avro.schema.url at create time.
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/events/AlterTableEvent.java Adds DFS_URI privilege objects for updated Avro avro.schema.url at alter time.
ql/src/test/org/apache/hadoop/hive/ql/security/authorization/TestAvroSchemaUrlAuthorizationUtils.java New unit tests for schema URL authorization utilities.
ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/TestHiveMetaStoreAuthorizer.java Adds/adjusts metastore authorizer tests for DFS_URI objects from schema URLs.
ql/src/test/results/clientpositive/llap/compustat_avro.q.out Updates golden output masking lines affected by input changes.
ql/src/test/results/clientpositive/llap/avro_extschema_insert.q.out Updates golden output masking lines affected by input changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +149 to +152
URL url = new URL(schemaString);
try (InputStream in = url.openStream()) {
s = getSchemaParser().parse(in);
}
}

final String schemeLower = scheme.toLowerCase(Locale.ROOT);
if (HTTP_SCHEMES.contains(schemeLower)) {
Comment on lines +228 to +230
String schemeLower = scheme.toLowerCase(Locale.ROOT);
return !HTTP_SCHEMES.contains(schemeLower)
&& getAllowedFilesystemSchemes(null).contains(schemeLower);
Comment on lines +78 to +98
addAvroSchemaUrlInputAuth(ret, table, database);

return ret;
}

private void addAvroSchemaUrlInputAuth(List<HivePrivilegeObject> ret, Table table, Database database) {
if (!AvroSerDe.class.getName().equals(table.getSd().getSerdeInfo().getSerializationLib())) {
return;
}
String schemaUrl = table.getParameters().get(AvroTableProperties.SCHEMA_URL.getPropName());
if (StringUtils.isEmpty(schemaUrl) || AvroSerdeUtils.SCHEMA_NONE.equals(schemaUrl)) {
return;
}
if (!AvroSerdeUtils.isFilesystemSchemaUrl(schemaUrl)) {
return;
}
if (!needDFSUriAuth(schemaUrl, getDefaultTablePath(database, table))) {
return;
}
ret.add(getHivePrivilegeObjectDfsUri(schemaUrl));
}
Comment on lines +136 to +139
String newSchemaUrl = newTable.getParameters().get(AvroTableProperties.SCHEMA_URL.getPropName());
if (StringUtils.isEmpty(newSchemaUrl) || AvroSerdeUtils.SCHEMA_NONE.equals(newSchemaUrl)) {
return;
}
Comment on lines 29 to 32
import org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.metastore.api.HiveObjectPrivilege;
Comment on lines 48 to +51
import org.apache.hadoop.hive.ql.security.authorization.plugin.metastore.filtercontext.TableFilterContext;
import org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils;
import org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.AvroTableProperties;
import org.apache.hadoop.hive.serde2.avro.AvroSerDe;
Comment on lines +231 to +236
HiveConf conf = new HiveConf();
conf.setBoolVar(HiveConf.ConfVars.HIVE_AVRO_SCHEMA_URL_REMOTE_HTTP_ENABLED, true);
conf.setVar(HiveConf.ConfVars.HIVE_AVRO_SCHEMA_URL_HTTP_ALLOWED_HOSTS, "schema.example.com");
Properties props = new Properties();
props.put(AvroTableProperties.SCHEMA_URL.getPropName(), "http://schema.example.com/schema.avsc");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants