[WIP] RFC-0054: HUD Integration for Out-of-Tree CI Results#96
[WIP] RFC-0054: HUD Integration for Out-of-Tree CI Results#96jewelkm89 wants to merge 2 commits intopytorch:masterfrom
Conversation
Defines the HUD-side ingestion and display layer for OOT CI results, building on RFC-0050 (Cross-Repository CI Relay). Covers write path, storage schemas, DB protection, security, and three frontend views. Reference implementation: subinz1/test-infra#1
|
Hi @jewelkm89! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Summary
This RFC defines the HUD-side ingestion and display layer for Out-of-Tree (OOT) CI results, building on RFC-0050 (Cross-Repository CI Relay for PyTorch Out-of-Tree Backends).
Data Flow
flowchart LR subgraph Downstream["Downstream CI (OOT Backend)"] DS["Run tests\n+ upload artifacts"] end subgraph ART["Artifact Storage (org-managed)"] STORE[("Logs, test reports,\nJUnit XML")] end subgraph Relay["Relay Server"] RH["Result Handler\n• OIDC verify\n• Allowlist check\n• Rate limit"] end subgraph HUD["HUD"] API["/api/oot/results\n• Auth check\n• Payload validation\n• Payload caps (2MB)"] end subgraph Storage["Storage"] DDB[("DynamoDB\ntorchci-oot-workflow-job\n(in_progress + completed)")] STR["DynamoDB Stream"] REP["clickhouse-replicator-dynamo"] CH[("ClickHouse\ndefault.oot_workflow_job\n(completed only)")] end subgraph Frontend["HUD Frontend"] P1["/oot — Global Summary"] P2["/oot/org/repo — Per-Backend"] P3["/pr/N — OOT Section"] end DS -->|"Upload artifacts"| STORE DS -->|"① POST in_progress\n② POST completed\n+ artifact_url\n(OIDC token)"| RH RH -->|"X-Hud-Internal-Bot\n{trusted, untrusted}"| API API -->|"PutItem"| DDB DDB --> STR --> REP -->|"completed only"| CH CH -->|"Query results +\nartifact_url"| P1 & P2 & P3 P2 & P3 -.->|"User clicks\nexternal link"| STOREKey points:
completedcallback payload and flow through the Result Handler → HUD API → DynamoDB → ClickHouseartifact_urlfrom ClickHouse and render it as an external link — no direct connection between HUD and downstream storagecompletedrecords are replicated to ClickHouse;in_progressstays in DynamoDB for mutable state trackingWhat this RFC covers
in_progresscallbacks → DynamoDB only (mutable state tracking)completedcallbacks → DynamoDB → replicated to ClickHouse (dashboard queries)/oot— Global OOT CI summary (cross-repo health overview, repos sorted by pass rate)/oot/[org]/[repo]— Per-backend dashboard (matrix view: PRs × jobs, failure drill-down, external artifact links)/pr/[number]— Collapsible "Out-of-Tree Backends" section in existing PR pagesReference implementation
A reference implementation is available at subinz1/test-infra#1, which includes the API endpoint, ClickHouse schema, replicator mapping, saved ClickHouse queries, and all three frontend pages.
Status
This is a WIP draft. Feedback welcome.