Skip to content

chore: Updating VectorStore batch size to improve performance#182

Open
jamie-ons wants to merge 6 commits into
mainfrom
181-review-batch_size-behaviour
Open

chore: Updating VectorStore batch size to improve performance#182
jamie-ons wants to merge 6 commits into
mainfrom
181-review-batch_size-behaviour

Conversation

@jamie-ons

@jamie-ons jamie-ons commented Jun 8, 2026

Copy link
Copy Markdown

✨ Summary

VectorStore previously exposed batch_size as a repeated parameter on individual methods, creating multiple independent sources of truth. This PR consolidates that to a single value set at construction time.

To inform the choice of default, a profiling analysis was run across the target GCP instance range at batch sizes from 2 to 250. The default has been updated to the value that minimises search time without risking OOM on the smallest supported instances.

Constraints: must not break or perform significantly worse on 2 vCPU instances; optimised for typical cloud deployments at 4–8 vCPUs.

📜 Changes Introduced

  • VectorStore methods updated so batch_size is self.batch_size from the constructor.
  • Profiling analysis across e2-standard-2, e2-medium, and e2-standard-8 measuring latency and memory
  • Default batch_size updated from 8 to 250 based on analysis findings

✅ Checklist

  • Code passes linting with Ruff
  • DocStrings follow Google-style and are added as per Pylint recommendations
  • Documentation has been updated if needed

🔍 How to Test

  1. To test this code, run the DEMO/general_workflow_demo.ipynb, confirm it all runs as usual with a batch size of 250.
  2. Add the batch_size=any_value to the VectorStore creation and re-run; confirm both the VectorStore creation and search use this batch size.
  3. Swap the VectorStore creation for reloading the saved VectorStore via VectorStore.from_filespace, and check the value for batch_size is identiccal to the value you changed it to previously.
  4. Keeping the VectorStore being reloaded, check setting the batch_size in the search allows you to overwrite it temporarily for the search.
  5. Read the Docucumentation updates & confirm it's clear to a prospective / current user.

Additionally test this on one gcp instance (workstation or other) and locally to ensure it works on both hardwares.

  • e2-medium
  • e2-standard-2
  • e2-standard-8
  • Macbook M4

@jamie-ons jamie-ons linked an issue Jun 8, 2026 that may be closed by this pull request

return result_df

def search(self, query: VectorStoreSearchInput, n_results=10, batch_size=8) -> VectorStoreSearchOutput: # noqa: C901, PLR0912, PLR0915

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd like to retain the option for users to specify a different batch size at this point, but we'd want the default behaviour to follow the single source of truth.

@lukeroantreeONS

Copy link
Copy Markdown
Contributor

A few things to note for the updates;

We're moving to having all (default) batch sizes inherit from the VectorStore's - so we'll need to make sure that

  • is persisted in metadata
  • is made available as an attribute when a VectorStore is reloaded via the .from_filespace() method
  • can be overridden to a new value via a parameter in the .from_filespace() method

@lukeroantreeONS lukeroantreeONS self-requested a review June 14, 2026 09:42
@jamie-ons jamie-ons marked this pull request as ready for review June 22, 2026 15:27
@jamie-ons jamie-ons requested a review from a team as a code owner June 22, 2026 15:27
@github-actions github-actions Bot added the chore label Jun 22, 2026
Comment thread src/classifai/indexers/main.py Outdated
@jamie-ons jamie-ons self-assigned this Jun 23, 2026
@jamie-ons

Copy link
Copy Markdown
Author

Using the hugging face vectoriser
vectoriser = HuggingFaceVectoriser(model_name="sentence-transformers/all-MiniLM-L6-v2")
and two vector stores of 92 records (small dataset) and 44,000 records (standard dataset).

I ran a search query of 2,000 input queries using a range of batch sizes. I repeated this test 3 times across different hardwares

Machine types vCPUs Fractional vCPUs1 Memory (GB)
e2-medium 2 11 4
e2-standard-2 2 N/A 8
e2-standard-8 8 N/A 32
macbook m4 16-CPU 40-GPU N/A 36

As GCP models only allow up to 250 input texts for each request, I tested the following batch sizes:

BATCH_SIZES = [2, 4, 8, 16, 32, 64, 128, 250]
image

We can see that there is a relationship similar to exponential decay between batch_size and time taken to process the 2,000 input queries.

  • The advantage of a higher batch size is more prevalent in larger datasets and greater compute.
  • The effect is less seen in low compute and small datasets, however no adverse effect is seen either.

Therefore I think setting the default to 250, the max value allowed by GCP models is the best choice.

@frayle-ons

Copy link
Copy Markdown
Contributor

Have we done any testing for this with the On-Net machines? If not it would be a good idea to test and confirm these findings since our current main user base use these machines

@jamie-ons

jamie-ons commented Jun 24, 2026

Copy link
Copy Markdown
Author

Have we done any testing for this with the On-Net machines? If not it would be a good idea to test and confirm these findings since our current main user base use these machines
@frayle-ons

Yes - the mac in the graph is the on net machine. It performs best on the on net machine which is good.

  • We did some brief further testing at higher batch sizes than 250 and the performance does increase on the on net machines (although its marginal gains depending on the dataset).
  • Due to the requirement for it to work on cloud and (relatively) low compute VMs I think 250 is best.

If you mean the Thinkpad then as the compute of the Thinkpad is far greater than the chosen GCP instances I would assume It will also perform well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review 'batch_size' behaviour

3 participants