Commit 8a630ba
authored
perf: fix low-hanging performance issues in MetaCAT and linking (#400)
* perf(metacat): scope max_seq_len and batch slice to current batch
create_batch_piped_data was computing max_seq_len over the entire
dataset on every batch call, and slicing data[start_ind:end_ind]
three times. Scope both to a single batch slice — reduces padding
overhead and eliminates redundant iteration.
* perf(linking): update similarities in-place during disambiguation
Replace list copy + clear + rebuild with a simple in-place loop.
Eliminates three intermediate list allocations in the disambiguation
hot path.
* perf(metacat): replace O(n) dict values scan with O(1) key lookup
undersample_data and encode_category_values both checked membership
against category_value2id.values() (linear scan) on every iteration.
Since label_data dicts are keyed by the same IDs, check membership
against the dict itself (O(1) hash lookup).
* perf(metacat): use append instead of list concatenation in eval
dict.get(k, []) + [item] allocates a new list on every iteration,
making example collection O(n*k). Use setdefault + append for O(1)
amortized per insertion.1 parent 823151c commit 8a630ba
3 files changed
Lines changed: 17 additions & 18 deletions
File tree
- medcat-v2/medcat/components
- addons/meta_cat
- linking
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
319 | 319 | | |
320 | 320 | | |
321 | 321 | | |
322 | | - | |
323 | | - | |
324 | | - | |
325 | | - | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
326 | 325 | | |
327 | 326 | | |
328 | 327 | | |
| |||
414 | 413 | | |
415 | 414 | | |
416 | 415 | | |
417 | | - | |
418 | | - | |
419 | | - | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
420 | 419 | | |
421 | 420 | | |
422 | 421 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
| 66 | + | |
| 67 | + | |
67 | 68 | | |
68 | | - | |
69 | | - | |
| 69 | + | |
| 70 | + | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
73 | | - | |
| 74 | + | |
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
| |||
511 | 512 | | |
512 | 513 | | |
513 | 514 | | |
514 | | - | |
515 | | - | |
| 515 | + | |
| 516 | + | |
516 | 517 | | |
517 | | - | |
| 518 | + | |
518 | 519 | | |
519 | 520 | | |
520 | 521 | | |
Lines changed: 3 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
238 | 237 | | |
239 | 238 | | |
240 | 239 | | |
| |||
0 commit comments