diff --git a/README.md b/README.md index 6f70a47..3f12fb0 100644 --- a/README.md +++ b/README.md @@ -28,8 +28,7 @@ with Google Cloud Platform as a cloud provider. ``` 5. [Create Google Cloud Platform account](https://cloud.google.com/free). 6. [Create a new GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects). -7. [Create GCP bucket](https://cloud.google.com/storage/docs/creating-buckets) -8. Create storage integration object in Snowflake using the following command: +7. Depending on using GCS or S3 as file system execute one of the following commands to create storage integration object in Snowflake: ``` CREATE OR REPLACE STORAGE INTEGRATION TYPE = EXTERNAL_STAGE @@ -37,11 +36,21 @@ with Google Cloud Platform as a cloud provider. ENABLED = TRUE STORAGE_ALLOWED_LOCATIONS = ('gcs:///'); ``` + ``` + CREATE STORAGE INTEGRATION aws_integration + TYPE = EXTERNAL_STAGE + STORAGE_PROVIDER = S3 + ENABLED = TRUE + STORAGE_AWS_ROLE_ARN = '' + STORAGE_ALLOWED_LOCATIONS = ('s3:///') + ``` Please note that `gcs` prefix is used here, not `gs`. -9. Authorize Snowflake to operate on your bucket by following [Step 3. Grant the Service Account Permissions to Access Bucket Objects](https://docs.snowflake.com/en/user-guide/data-load-gcs-config.html#step-3-grant-the-service-account-permissions-to-access-bucket-objects) -10. Setup gcloud on your computer by following [Using the Google Cloud SDK installer](https://cloud.google.com/sdk/docs/downloads-interactive) -11. [Install gradle](https://gradle.org/install/) -12. Run following command to set gradle wrapper +7. Authorize Snowflake to operate on your bucket + 1. For GCS follow [Step 3. Grant the Service Account Permissions to Access Bucket Objects](https://docs.snowflake.com/en/user-guide/data-load-gcs-config.html#step-3-grant-the-service-account-permissions-to-access-bucket-objects) + 1. For S3 follow [Configuring a Snowflake Storage Integration](https://docs.snowflake.com/en/user-guide/data-load-s3-config.html#option-1-configuring-a-snowflake-storage-integration) +9. Setup gcloud on your computer by following [Using the Google Cloud SDK installer](https://cloud.google.com/sdk/docs/downloads-interactive) +10. [Install gradle](https://gradle.org/install/) +11. Run following command to set gradle wrapper ``` gradle wrapper ``` @@ -64,7 +73,7 @@ An example consists of two pipelines: ``` ./gradlew run -PmainClass=batching.WordCountExample --args=" \ --inputFile=gs://apache-beam-samples/shakespeare/* \ - --output=gs:///counts \ + --output=:///counts \ --serverName= \ --username= \ --password= \ @@ -72,11 +81,14 @@ An example consists of two pipelines: --schema= \ --tableName= \ --storageIntegrationName= \ - --stagingBucketName= \ + --stagingBucketName= \ --runner= \ --project= \ --gcpTempLocation= \ --region= \ + --awsRegion= \ + --awsAccessKey=\ + --awsSecretKey=\ --appName=" ``` 2. Go to Snowflake console to check saved counts: @@ -84,7 +96,7 @@ An example consists of two pipelines: select from ..; ``` ![Batching snowflake result](./images/batching_snowflake_result.png) -3. Go to GCS bucket to check saved files: +3. Go to GCS or S3 bucket to check saved files: ![Batching gcs result](./images/batching_gcs_result.png) 4. Go to DataFlow to check submitted jobs: ![Batching DataFlow result](./images/batching_dataflow_result.png) @@ -102,12 +114,17 @@ An example is streaming taxi rides from PubSub into Snowflake. lat double ); ``` -2. [Create Snowflake stage](https://docs.snowflake.com/en/sql-reference/sql/create-stage.html) +2. Depending on using GCS or S3 execute one of the following commands to [create Snowflake stage](https://docs.snowflake.com/en/sql-reference/sql/create-stage.html) ``` create or replace stage url = 'gcs:///data/' storage_integration = ; ``` + ``` + create stage + url = 'S3:///data/' + storage_integration = ; + ``` note: SnowflakeIO requires that url must have /data/ as a sufix 3. [Create Key/Pair](https://docs.snowflake.com/en/user-guide/snowsql-start.html#using-key-pair-authentication) for authentication process. @@ -133,10 +150,13 @@ for authentication process. --schema= \ --snowPipe= \ --storageIntegrationName= \ - --stagingBucketName= \ + --stagingBucketName= \ --runner= \ --project= \ --region= \ + --awsRegion= \ + --awsAccessKey=\ + --awsSecretKey=\ --appName=" ``` 2. Go to Snowflake console to check saved taxi rides: @@ -166,7 +186,7 @@ list for currently supported runtime options. --templateLocation=gs:///templates/