Skip to content

Fix/tokenize simple output#2019

Merged
dirkkul merged 16 commits intodev/1.37from
fix/tokenize_simple_output
Apr 22, 2026
Merged

Fix/tokenize simple output#2019
dirkkul merged 16 commits intodev/1.37from
fix/tokenize_simple_output

Conversation

@amourao
Copy link
Copy Markdown

@amourao amourao commented Apr 20, 2026

  • Update in and out format to match latest server changes

Copilot AI review requested due to automatic review settings April 20, 2026 14:59
@amourao amourao requested a review from a team as a code owner April 20, 2026 14:59
Copy link
Copy Markdown

@orca-security-eu orca-security-eu Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the client’s tokenization request/response shapes to align with recent server changes for /v1/tokenize, including stopwords handling and the minimal response format.

Changes:

  • Simplifies TokenizeResult to match the generic endpoint’s minimal response (indexed/query only) and makes tokenization optional (property endpoint only).
  • Updates tokenization executor input shape: adds stopwords, changes stopword_presets to Dict[str, List[str]], and enforces the server’s mutual-exclusion rule client-side.
  • Adjusts integration tests and CI Weaviate version pin to reflect the updated behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
weaviate/tokenization/models.py Aligns TokenizeResult fields with new server response shape and makes tokenization optional.
weaviate/tokenization/executor.py Updates request payload schema for stopwords/presets and adds client-side mutex validation.
integration/test_tokenize.py Updates expected outputs and adds coverage for default stopword behavior and mutex validation.
.github/workflows/main.yaml Pins the 1.37.1 CI Weaviate version to a specific build suffix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread weaviate/tokenization/executor.py Outdated
Comment thread weaviate/tokenization/executor.py Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 41.17647% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.72%. Comparing base (d760577) to head (081aaef).

Files with missing lines Patch % Lines
integration/test_tokenize.py 40.98% 36 Missing ⚠️
weaviate/tokenization/executor.py 17.85% 23 Missing ⚠️
weaviate/collections/classes/config.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           dev/1.37    #2019      +/-   ##
============================================
+ Coverage     86.69%   86.72%   +0.02%     
============================================
  Files           296      297       +1     
  Lines         22862    22826      -36     
============================================
- Hits          19821    19795      -26     
+ Misses         3041     3031      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread integration/test_tokenize.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread weaviate/tokenization/models.py
Comment thread weaviate/collections/classes/config.py
Copy link
Copy Markdown
Collaborator

@tsmith023 tsmith023 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For collection.config.tokenize_property, wdyt about moving it to collection.tokenize.property to align the namespaces between the client and collection objects?

@dirkkul dirkkul merged commit cae3e33 into dev/1.37 Apr 22, 2026
234 of 241 checks passed
@dirkkul dirkkul deleted the fix/tokenize_simple_output branch April 22, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants