Skip to content

feat(medcat): CU-869cw9zmj Improve inference speed#410

Merged
mart-r merged 10 commits intomainfrom
feat/medcat/CU-869cw9zmj-improve-inference-speed
Apr 24, 2026
Merged

feat(medcat): CU-869cw9zmj Improve inference speed#410
mart-r merged 10 commits intomainfrom
feat/medcat/CU-869cw9zmj-improve-inference-speed

Conversation

@mart-r
Copy link
Copy Markdown
Collaborator

@mart-r mart-r commented Apr 13, 2026

This PR improves inference speed somewhat (around 10%) by:

  • Using a more efficient unitvec calculation
    • This gave a nice boost to speed (around 20% on this segment)
    • But it was doing the change in place so that polluted other bits which broke a bunch of stuff
    • However, this speedup came from a very small section of the overall time
    • So the overall speedup of this would have been closer to 1%
  • Reusing smaller context vectors mutiple times
    • We've cont multiple context windows (small, medium, large, extra larget)
    • So far we've been recalculating the vectors for all of these for every context window
    • However, the smaller ones will always need to be a subset of the bigger ones
    • So this change will take advantage of that

prev_right = tokens_right

# Center is identical for all window sizes, only compute once
if center_vecs is None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor feeling that this might flow better - can do centre tokens on it's own outside of the loop?

Instead of every window doing the check.

        prev_left_vecs: list[np.ndarray] = []
        prev_right_vecs: list[np.ndarray] = []

     # Calc center vecs once outside the loop?
          if not self.config.context_ignore_center_tokens:
tokens_center = self.get_context_tokens(
          entity, doc, sorted_contexts[0], per_doc_valid_token_cache)[1] # or whatever is the best way
              center_vecs = list(
                  self._preprocess_center_tokens(cui, tokens_center))
          else:
              center_vecs = []


        for context_type, window_size in sorted_contexts:
      # do everything other than the center_vecs calculation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. If it's only done once, why do it in the loop at all!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, turns out the reason is that the centre tokens aren't available before you get to the loop!
But I do think it logically belongs outside the loop still.

@mart-r mart-r merged commit cfdbab0 into main Apr 24, 2026
22 checks passed
@mart-r mart-r deleted the feat/medcat/CU-869cw9zmj-improve-inference-speed branch April 24, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants