-
Notifications
You must be signed in to change notification settings - Fork 731
fix: software value failing for large repos [CM-1029] #3947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
80db187
22868fc
e6955b8
d849c6c
beb0f25
0bf3da1
4a68cc7
76e7d30
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,4 +1,5 @@ | ||||||||||
| import json | ||||||||||
| import subprocess | ||||||||||
| import time | ||||||||||
| from decimal import Decimal | ||||||||||
|
|
||||||||||
|
|
@@ -8,6 +9,21 @@ | |||||||||
| from crowdgit.services.base.base_service import BaseService | ||||||||||
| from crowdgit.services.utils import run_shell_command | ||||||||||
|
|
||||||||||
| _LARGE_REPO_THRESHOLD_BYTES = 10 * 1024 * 1024 * 1024 # 10 GB | ||||||||||
|
|
||||||||||
|
|
||||||||||
| def _get_repo_size_bytes(repo_path: str) -> int: | ||||||||||
| """Return total disk usage of repo_path in bytes using du -sb.""" | ||||||||||
| try: | ||||||||||
| result = subprocess.run( | ||||||||||
| ["du", "-sb", repo_path], capture_output=True, text=True, timeout=120 | ||||||||||
| ) | ||||||||||
|
Comment on lines
+18
to
+20
|
||||||||||
| if result.returncode == 0: | ||||||||||
| return int(result.stdout.split()[0]) | ||||||||||
| except Exception: | ||||||||||
| pass | ||||||||||
| return 0 | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Blocking synchronous subprocess call in async methodMedium Severity
Additional Locations (1)
Comment on lines
+23
to
+25
|
||||||||||
|
|
||||||||||
|
|
||||||||||
| class SoftwareValueService(BaseService): | ||||||||||
| """Service for calculating software value metrics""" | ||||||||||
|
|
@@ -20,16 +36,30 @@ def __init__(self): | |||||||||
| async def run(self, repo_id: str, repo_path: str) -> None: | ||||||||||
| """ | ||||||||||
| Triggers software value binary for given repo. | ||||||||||
| Results are saved into insights database directly | ||||||||||
| Results are saved into insights database directly. | ||||||||||
| For repos larger than 10 GB, scc is run with minimum parallelism (1 worker) | ||||||||||
| to avoid OOM; results are identical. | ||||||||||
|
Comment on lines
+40
to
+41
|
||||||||||
| For repos larger than 10 GB, scc is run with minimum parallelism (1 worker) | |
| to avoid OOM; results are identical. | |
| For repos larger than 10 GB, scc is run with the --no-large flag (skipping files >100MB) | |
| to reduce memory usage and avoid OOM errors. |
Copilot
AI
Mar 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New behavior adds conditional --no-large based on repo size, but there are no tests asserting the command built for large vs small repos. Since this is reliability-critical logic, please add unit coverage (e.g., mock _get_repo_size_bytes / run_shell_command and assert --no-large is included only when expected).


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runSCChard-codes the large-file threshold as the string literal"100000000". To reduce magic numbers and keep the threshold consistent across Go/Python (and with the flag help text), consider extracting this into a named constant (and potentially expressing it as 10010001000 or 10010241024 explicitly).