Skip to content

[WIP] RFC-0054: HUD Integration for Out-of-Tree CI Results#96

Draft
jewelkm89 wants to merge 2 commits intopytorch:masterfrom
jewelkm89:oot-hud-integration-rfc
Draft

[WIP] RFC-0054: HUD Integration for Out-of-Tree CI Results#96
jewelkm89 wants to merge 2 commits intopytorch:masterfrom
jewelkm89:oot-hud-integration-rfc

Conversation

@jewelkm89
Copy link
Copy Markdown

Summary

This RFC defines the HUD-side ingestion and display layer for Out-of-Tree (OOT) CI results, building on RFC-0050 (Cross-Repository CI Relay for PyTorch Out-of-Tree Backends).

Data Flow

flowchart LR
    subgraph Downstream["Downstream CI (OOT Backend)"]
        DS["Run tests\n+ upload artifacts"]
    end

    subgraph ART["Artifact Storage (org-managed)"]
        STORE[("Logs, test reports,\nJUnit XML")]
    end

    subgraph Relay["Relay Server"]
        RH["Result Handler\n• OIDC verify\n• Allowlist check\n• Rate limit"]
    end

    subgraph HUD["HUD"]
        API["/api/oot/results\n• Auth check\n• Payload validation\n• Payload caps (2MB)"]
    end

    subgraph Storage["Storage"]
        DDB[("DynamoDB\ntorchci-oot-workflow-job\n(in_progress + completed)")]
        STR["DynamoDB Stream"]
        REP["clickhouse-replicator-dynamo"]
        CH[("ClickHouse\ndefault.oot_workflow_job\n(completed only)")]
    end

    subgraph Frontend["HUD Frontend"]
        P1["/oot — Global Summary"]
        P2["/oot/org/repo — Per-Backend"]
        P3["/pr/N — OOT Section"]
    end

    DS -->|"Upload artifacts"| STORE
    DS -->|"① POST in_progress\n② POST completed\n+ artifact_url\n(OIDC token)"| RH
    RH -->|"X-Hud-Internal-Bot\n{trusted, untrusted}"| API
    API -->|"PutItem"| DDB
    DDB --> STR --> REP -->|"completed only"| CH
    CH -->|"Query results +\nartifact_url"| P1 & P2 & P3
    P2 & P3 -.->|"User clicks\nexternal link"| STORE
Loading

Key points:

  • Artifact URLs are included in the completed callback payload and flow through the Result Handler → HUD API → DynamoDB → ClickHouse
  • HUD pages read artifact_url from ClickHouse and render it as an external link — no direct connection between HUD and downstream storage
  • Only completed records are replicated to ClickHouse; in_progress stays in DynamoDB for mutable state tracking

What this RFC covers

  • Write path: Downstream CI → Result Handler → HUD API → DynamoDB → ClickHouse (completed records only)
    • in_progress callbacks → DynamoDB only (mutable state tracking)
    • completed callbacks → DynamoDB → replicated to ClickHouse (dashboard queries)
    • Artifact URLs flow through the callback payload, not sent directly to HUD
  • Read path: Three new HUD views:
    • /oot — Global OOT CI summary (cross-repo health overview, repos sorted by pass rate)
    • /oot/[org]/[repo] — Per-backend dashboard (matrix view: PRs × jobs, failure drill-down, external artifact links)
    • /pr/[number] — Collapsible "Out-of-Tree Backends" section in existing PR pages
  • Storage schemas: DynamoDB table and ClickHouse table designs
  • DB protection: Rate limiting (per-repo at relay), payload caps (2MB at HUD API)
  • Security: OIDC authentication, trusted/untrusted payload split, error handling strategy (delivered/hud_rejected/hud_unavailable/skipped), signed callback token proposal, state machine for status transitions
  • Sample payloads: In-progress, success, and failure callback examples with full field definitions
  • Implementation plan: 6-phase rollout with task-level breakdown:
    1. Storage Layer — DynamoDB + ClickHouse + replicator mapping
    2. HUD API Endpoint — types, validation, write logic
    3. Relay Integration — handler → HUD forwarding, rate limiting, reusable GHA action
    4. HUD Frontend Pages — 3 views + saved ClickHouse queries
    5. End-to-End Validation — real downstream repo testing
    6. Security Hardening — callback token, state machine (future)

Reference implementation

A reference implementation is available at subinz1/test-infra#1, which includes the API endpoint, ClickHouse schema, replicator mapping, saved ClickHouse queries, and all three frontend pages.

Status

This is a WIP draft. Feedback welcome.

Defines the HUD-side ingestion and display layer for OOT CI results,
building on RFC-0050 (Cross-Repository CI Relay). Covers write path,
storage schemas, DB protection, security, and three frontend views.
Reference implementation: subinz1/test-infra#1
@meta-cla
Copy link
Copy Markdown

meta-cla Bot commented Apr 29, 2026

Hi @jewelkm89!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla Bot added the cla signed label Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants