Skip to content

Critical: PR File Ingestion Fails for Valid Git Paths Longer Than 500 Characters #134

@Helios531

Description

@Helios531

Summary

pr_files.filename and pr_file_contents.filename are defined as VARCHAR(500). A valid Git file path can exceed 500 characters when composed from multiple path components.
When DAS ingests PR files, GitHubFetcherService.fetchAndStorePrFiles() upserts GitHub’s file.filename directly into pr_files.filename. If a PR contains a valid path longer than 500 characters, Postgres rejects the insert with:

ERROR: value too long for type character varying(500)

That makes the PR_FILES job fail, preventing scoring_data_stored=true and leaving the PR without file/content scoring data.

Reproduction Steps

  1. Confirm local Git accepts a 602-character repo-relative file path.
  2. Insert a smoke repo and PR into local Postgres.
  3. Attempt to insert the same 602-character path into pr_files.filename.
  4. Observe Postgres reject the row because the mirror schema caps filenames at 500 characters.

Actual Result

Local Git accepted the file path:

git_path_length 602
A  segment0-aaaaaaaa.../segment5-aaaaaaaa....ts

Postgres rejected the mirror row:

long_path_length=602
exit_code=1
ERROR:  value too long for type character varying(500)

Expected Result

DAS should store any valid GitHub PR file path that GitHub can return from /pulls/:number/files.
A long-but-valid path should not fail PR file ingestion or block scoring data capture.

Evidence

Schema caps the filename fields:

-- packages/db/08_pr_files.sql
filename VARCHAR(500) NOT NULL

-- packages/db/09_pr_file_contents.sql
filename VARCHAR(500) NOT NULL

Runtime path:

// packages/das/src/webhook/github-fetcher.service.ts
filename: file.filename

Validation performed:

DAS:
npm run format:check
npm run lint
npm run build
passed

gittensor:
237 passed in 4.39s
Runtime:
DAS started in NODE_ENV=production
/api/v1/health returned status ok

Root Cause

The DB schema uses fixed VARCHAR(500) for Git file paths:

filename VARCHAR(500) NOT NULL
previous_filename VARCHAR(500)

But the ingestion path treats GitHub file paths as unbounded enough to store directly. Valid Git paths can exceed 500 characters, so the schema constraint is too small.

Security/Business Impact

A miner or contributor can create a PR containing a valid long path and cause DAS PR file ingestion to fail for that PR.
Impact:

  • PR file metadata/content is not stored.
  • scoring_data_stored cannot be marked complete.
  • Token/tree-diff scoring can be skipped or degraded.
  • The failure occurs in production ingestion, not only in an API edge case.

Suggested Fix

Use TEXT for file paths in both file tables and matching TypeORM entities:

ALTER TABLE pr_files
  ALTER COLUMN filename TYPE TEXT,
  ALTER COLUMN previous_filename TYPE TEXT;

ALTER TABLE pr_file_contents
  ALTER COLUMN filename TYPE TEXT;

Also update schema files and add a regression test that inserts a PR file path longer than 500 characters and verifies ingestion succeeds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions