Summary
pr_files.filename and pr_file_contents.filename are defined as VARCHAR(500). A valid Git file path can exceed 500 characters when composed from multiple path components.
When DAS ingests PR files, GitHubFetcherService.fetchAndStorePrFiles() upserts GitHub’s file.filename directly into pr_files.filename. If a PR contains a valid path longer than 500 characters, Postgres rejects the insert with:
ERROR: value too long for type character varying(500)
That makes the PR_FILES job fail, preventing scoring_data_stored=true and leaving the PR without file/content scoring data.
Reproduction Steps
- Confirm local Git accepts a 602-character repo-relative file path.
- Insert a smoke repo and PR into local Postgres.
- Attempt to insert the same 602-character path into
pr_files.filename.
- Observe Postgres reject the row because the mirror schema caps filenames at 500 characters.
Actual Result
Local Git accepted the file path:
git_path_length 602
A segment0-aaaaaaaa.../segment5-aaaaaaaa....ts
Postgres rejected the mirror row:
long_path_length=602
exit_code=1
ERROR: value too long for type character varying(500)
Expected Result
DAS should store any valid GitHub PR file path that GitHub can return from /pulls/:number/files.
A long-but-valid path should not fail PR file ingestion or block scoring data capture.
Evidence
Schema caps the filename fields:
-- packages/db/08_pr_files.sql
filename VARCHAR(500) NOT NULL
-- packages/db/09_pr_file_contents.sql
filename VARCHAR(500) NOT NULL
Runtime path:
// packages/das/src/webhook/github-fetcher.service.ts
filename: file.filename
Validation performed:
DAS:
npm run format:check
npm run lint
npm run build
passed
gittensor:
237 passed in 4.39s
Runtime:
DAS started in NODE_ENV=production
/api/v1/health returned status ok
Root Cause
The DB schema uses fixed VARCHAR(500) for Git file paths:
filename VARCHAR(500) NOT NULL
previous_filename VARCHAR(500)
But the ingestion path treats GitHub file paths as unbounded enough to store directly. Valid Git paths can exceed 500 characters, so the schema constraint is too small.
Security/Business Impact
A miner or contributor can create a PR containing a valid long path and cause DAS PR file ingestion to fail for that PR.
Impact:
- PR file metadata/content is not stored.
scoring_data_stored cannot be marked complete.
- Token/tree-diff scoring can be skipped or degraded.
- The failure occurs in production ingestion, not only in an API edge case.
Suggested Fix
Use TEXT for file paths in both file tables and matching TypeORM entities:
ALTER TABLE pr_files
ALTER COLUMN filename TYPE TEXT,
ALTER COLUMN previous_filename TYPE TEXT;
ALTER TABLE pr_file_contents
ALTER COLUMN filename TYPE TEXT;
Also update schema files and add a regression test that inserts a PR file path longer than 500 characters and verifies ingestion succeeds.
Summary
pr_files.filenameandpr_file_contents.filenameare defined asVARCHAR(500). A valid Git file path can exceed 500 characters when composed from multiple path components.When DAS ingests PR files,
GitHubFetcherService.fetchAndStorePrFiles()upserts GitHub’sfile.filenamedirectly intopr_files.filename. If a PR contains a valid path longer than 500 characters, Postgres rejects the insert with:That makes the
PR_FILESjob fail, preventingscoring_data_stored=trueand leaving the PR without file/content scoring data.Reproduction Steps
pr_files.filename.Actual Result
Local Git accepted the file path:
Postgres rejected the mirror row:
Expected Result
DAS should store any valid GitHub PR file path that GitHub can return from
/pulls/:number/files.A long-but-valid path should not fail PR file ingestion or block scoring data capture.
Evidence
Schema caps the filename fields:
Runtime path:
Validation performed:
Root Cause
The DB schema uses fixed
VARCHAR(500)for Git file paths:But the ingestion path treats GitHub file paths as unbounded enough to store directly. Valid Git paths can exceed 500 characters, so the schema constraint is too small.
Security/Business Impact
A miner or contributor can create a PR containing a valid long path and cause DAS PR file ingestion to fail for that PR.
Impact:
scoring_data_storedcannot be marked complete.Suggested Fix
Use
TEXTfor file paths in both file tables and matching TypeORM entities:Also update schema files and add a regression test that inserts a PR file path longer than 500 characters and verifies ingestion succeeds.