Skip to content

Latest commit

 

History

History
97 lines (85 loc) · 4.82 KB

File metadata and controls

97 lines (85 loc) · 4.82 KB

Google Cloud Search Connector API - Python Example

Google Cloud Search allows users to search and retrieve information from a data repository. To handle items in Google Cloudsearch, you can create custom connector using the Google Cloudsearch SDK or API.

In this repository you can find an example of a working custom connector written in Python using the Cloudsearch API. This connector is not an offcial and supported connector, it wants to show the steps and the order of queries you need to implement to be able create your own custom connector.

In this example, files will be store in a GCS bucket with the following structure:

bucket
  |--> Folder A
  |     |---> file_1.pdf
  |     |---> file_2.pdf        
  |--> Folder B
  |     |---> file_3.pdf
  |     |---> file_4.pdf        

Folder name will be saved as metadata proprty in the Schema.

Prerequisites

  • gSuite Domain whitelisted for CloudSearch
  • Google Cloudsearch API enabled (link)
  • configured a custom Datasource in CloudSearch (link)
  • Project sotring files to be indexed in GCS. Files are expected to be within a folder structured as mentioned above
  • Service Account configured in the project with access to the project.

Configure virtualenv

We suggest you to use Python Virtualenv to run the following script (Installation guide link). Create a dedicated Virtualend with the following commands:

$ virtualenv -p python3 env
$ . env/bin/activate

Install all packages in the REQUIREMENTS.txt file. You can use the following command:

(env) $ pip install -r REQUIREMENTS.txt

Once concluded, to deactivate and exit from the virtualenv, use the following command:

$ deactivate

Schemas operations

For a given Google Cloudsearch datasource, you can specify a schema. To handle the schema in your datasource you can use the following API:

Once you have specified the schema structure of the item you want to index in the schema.json file you can:

  • Create or update the schema with the following command:
(env) $ python schema_create_or_update.py \
  --service_account_file service.json \
  --datasources YOUR_DATASOURCE_ID \
  --schema_json schema.json 
  • delete the schema with the following command:
(env) $ python schema_delete.py \
  --service_account_file service.json \
  --datasources YOUR_DATASOURCE_ID 

Items operations

For a given Google Cloudsearch datasource and schema, you can handle items you want to index with the following api:

If you are in the case to index content larger than 100KiB this is the full set of API you have to call:

Once you have specified the item basic structure of the item you want to index in the item.json file you can:

  • Insert Items present in the GCS bucket with the following command:
(env) $ python item_create.py  \
  --service_account_file service.json \
  --datasources YOUR_DATASOURCE_ID \
  --item_json item.json \
  --document_bucket GCS_BUCKET_NAME
  • list Items present in the datasource with the following command:
(env) $ python item_list.py \
  --service_account_file service.json \
  --datasources YOUR_DATASOURCE_ID
  • delete all Items present in the datasource with the following command:
(env) $ python item_delete.py \
  --service_account_file service.json \
  --datasources YOUR_DATASOURCE_ID 

Create a custom search application

Now that you have a Google Cloud datasource populated with your documents, you can create a custom search application following the procedure described here.