perf(validator): free per-PR file content after scoring by anderdc · Pull Request #1456 · entrius/gittensor

anderdc · 2026-06-05T20:05:49Z

Summary

score_pr attaches each PR's full source text to its ScoredPR via scored.files — MirrorFile.head_content / base_content, up to ~1 MB per file. That content is only needed transiently to compute the tree-diff scalar scores (token/structural/leaf counts + base score), but it currently stays attached to every ScoredPR and is held across all miners in miner_evaluations for the entire scoring round.

This frees the heavy text as soon as the scalar scores are extracted, keeping only the lightweight file metadata (filename / additions / deletions). The persistent cache already does exactly this — _scored_mirror_pr_for_cache sets files = None (classes.py); this just applies the same treatment to the live round dict.

Effect: peak scoring-round memory drops from roughly miners × PRs × files × content down to metadata + a single PR's content at a time, with no change to scoring output.

Why it's safe

The file content is consumed entirely within score_pr (the file_contents dict built by mirror_files_to_legacy and passed to calculate_base_score_for_pr_files). Nothing downstream reads head_content / base_content after that point:

_calculate_pr_multipliers reads only PR metadata + scalar scores
finalize_miner_scores operates on scalar scores
bulk_store_evaluation → get_all_file_changes reads only file metadata (discards content)

Type of Change

Performance / optimization

Testing

Existing mirror scoring tests pass (72 passed)

Drop MirrorFile head_content/base_content from each ScoredPR once the scalar scores are extracted. The full source text is only needed transiently for tree-diff scoring; retaining it on every ScoredPR across all miners in miner_evaluations for the whole round needlessly inflates peak memory. The persistent cache already does this in _scored_mirror_pr_for_cache; apply the same to the live round. File metadata (filename/additions/deletions) is preserved.

anderdc and others added 2 commits June 5, 2026 15:05

Merge branch 'test' into perf/free-pr-content-after-scoring

137d6e4

LandynDev approved these changes Jun 5, 2026

View reviewed changes

LandynDev merged commit a1fc8e1 into test Jun 5, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(validator): free per-PR file content after scoring#1456

perf(validator): free per-PR file content after scoring#1456
LandynDev merged 2 commits into
testfrom
perf/free-pr-content-after-scoring

anderdc commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anderdc commented Jun 5, 2026

Summary

Why it's safe

Type of Change

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants