Skip to content

Transfer triage spike: CORA-owned TransferPort + Globus adapter (not for merge)#362

Draft
xmap wants to merge 5 commits into
mainfrom
worktree-transferport-spike
Draft

Transfer triage spike: CORA-owned TransferPort + Globus adapter (not for merge)#362
xmap wants to merge 5 commits into
mainfrom
worktree-transferport-spike

Conversation

@xmap

@xmap xmap commented Jun 25, 2026

Copy link
Copy Markdown
Owner

What this is

A design-triage spike, not a merge candidate. It answers the open question from the transfer-orchestration design: is a transfer a step the conductor walks, or a long-running edge job with its own lifecycle? It builds the CORA-owned TransferPort, a test double, a 2-BM/DMagic scenario, and a real GlobusTransferPort adapter, then lets the shape settle the question.

The build trigger has not fired and there is no production consumer yet, so this is intentionally parked on a branch.

Commits

  1. feat(operation): CORA-owned TransferPort + in-memory double (triage spike)
  2. feat(operation): GlobusTransferPort adapter over globus-sdk (triage spike)

CORA-owned, not Globus-shaped

The port is shaped from CORA's own needs and the substrate is tested against it, not copied from it:

  • Verbs begin / observe / cancel (a non-blocking observe-loop), not the corpus submit/poll/cancel.
  • TransferState = Pending / Active / Suspended / Succeeded / Failed / Cancelled; a partial move is carried as files_failed > 0 on a terminal Failed (no PartiallyFailed enum, deferred until a substrate has a native one).
  • idempotency_key is on the request because CORA's replay-determinism needs it; actuation_kind is deliberately off the surface (a route-layer concern, as on the control path).
  • GlobusTransferPort is a pure ACL: injected TransferClient, status + error mapping, asyncio.to_thread around the sync client. A TYPE_CHECKING conformance line fails the type-check if the real client ever drifts from the seam.

Triage conclusion

A transfer is a long-running edge job (begin-then-observe-loop, waits through a non-terminal Suspended on credential expiry, folds a partial terminal), driven by the existing EdgeConductor like the Run FSM. It is not a synchronous conductor step like ComputeStep. The step union is untouched.

Verification

ruff, ruff-format, pyright (strict), tach, the full architecture suite, and the operation unit module are all green; naming-r3 OK on both the port and the adapter. Each commit passed the pre-commit gate.

Caveat: the Globus adapter is unit-tested against a fake TransferClient; it has not been run against a live Globus endpoint.

Before any merge

  • Full gate-review panel.
  • A credentialed live-Globus sanity run.
  • The build trigger must fire (a promote/publish-blocking custody invariant, a substrate with a native partial terminal, or a chain-of-custody the Attestation path cannot carry). The lever is the beamline storage-tier answers, not code.

🤖 Generated with Claude Code

xmap and others added 2 commits June 25, 2026 11:46
…pike)

Finalizes the data-transfer design triage with a CORA-shaped seam rather
than a Globus transliteration. The verbs are begin/observe/cancel (a
non-blocking observe-loop), the state set is Pending/Active/Suspended/
Succeeded/Failed/Cancelled, and a partial move is carried as files_failed>0
on a terminal Failed (no PartiallyFailed enum yet). The in-memory double
plus a 2-BM/DMagic scenario show the deciding fact: a transfer is a
long-running edge job (it waits through a non-terminal Suspended and folds a
partial terminal), not a synchronous conductor step like ComputeStep.

Not for merge: no production consumer yet and the build trigger has not
fired. TransferPort takes the bare-verb port-suffix carve-out alongside
ControlPort/ComputePort.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pike)

The first real TransferPort substrate, validating the CORA-owned shape
against an outside adapter rather than a fake alone. It takes an
already-authorized globus_sdk TransferClient by injection (the OAuth2 dance
stays a composition-root concern), builds a TransferData submission, and maps
Globus task status into TransferState (INACTIVE -> Suspended, a FAILED task
with subtasks_failed>0 -> the partial signal). Globus calls run in
asyncio.to_thread since the client is synchronous. Error mapping: NetworkError
-> EndpointUnreachable; GlobusAPIError dispatched on http_status ->
AccessDenied / EndpointUnreachable / Rejected.

Adds globus-sdk>=3,<4 (resolved 3.65) as a hard dep, following the aioca/p4p
substrate-adapter convention. Unit-tested against a fake TransferClient; NOT
run against a live Globus endpoint (needs credentials + two collections).
Still a spike: no production consumer until the build trigger fires.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  apps/api/src/cora/api
  _distribution_materializer.py 170-173
  apps/api/src/cora/infrastructure/ports
  dataset_distribution_lookup.py
  apps/api/src/cora/operation/adapters
  fdt_transfer_port.py 71-72, 86, 89-91
  globus_transfer_port.py
  in_memory_transfer_port.py 125
  apps/api/src/cora/operation/ports
  __init__.py
  transfer_port.py 249-251, 264-266, 278-279
Project Total  

This report was generated by python-coverage-comment-action

xmap and others added 3 commits June 25, 2026 20:26
The acquisition-to-analysis stage-in substrate for stage-then-reconstruct, the
sibling of GlobusTransferPort (which serves the outer user-delivery leg). It
runs an APS Fast Data Transfer (fdt.jar) client as a subprocess via an injected
TransferRunner and maps the exit code into a TransferState. Integrity and sync
are deferred on purpose: checksum-on-arrival is recorded as an Attestation in
the materialize-a-Distribution edge job that consumes this port, and general
sync belongs to richer substrates (Globus). Progress is coarse (a subprocess
exposes no per-file counters).

Grounded in the real 2-BM pipeline (2bm-docs ops/item_018 + item_025): the
per-scan tomdet:/local1 -> /data2 copy via fdt.jar or scp is exactly the leg
reconstruction on the analysis nodes waits on. Unit-tested against a fake
runner; not run against a real fdt.jar.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onstruct (triage spike)

Sequences the materialize-a-Distribution flow: move bytes over a TransferPort,
then on success register a new analysis-tier Distribution of the same raw
Dataset (register_distribution) and record a checksum Attestation
(record_attestation), whose Match flips the Distribution to Verified in the Data
BC projection. That Verified-at-tier fact is exactly what leg C's start_run gate
reads.

It owns the sequence and the transfer gate (registers only on a Succeeded move,
waits through a non-terminal Suspended); it trusts the caller's
RegisterDistribution (byte-identical-copy fields built from the parent Dataset).
Lives in cora.api, the only module that may reach both operation.ports and the
Data BC handlers; injected collaborators, unit-tested against the in-memory
TransferPort plus fake handlers. Not yet wired into the EdgeConductor or the app
(deferred to the real build, post gate-review of leg C).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 1 (triage spike)

The cross-BC seam the start_run input gate (leg C of stage-then-reconstruct)
will read: given an input Dataset, return its non-Discarded Distributions with
status, so the decider can gate on Verified AND distinguish Stale from absent.
Mirrors SupplyLookup (one Data-BC adapter, multiple consumers; the port returns
rows, the decider partitions on status); lives in cora.infrastructure.ports
because Run may not import the Data-internal DistributionLookup (the Edition
canonical-pick query, a different need).

Ships the Protocol + result DTO + two test stubs; no consumer yet. The Method
input-role declaration, the Run input-Dataset binding, and the start_run
genesis Verified-gate (plus the Data BC Postgres adapter) are the following
sub-slices; the gate sub-slice is where gate-review-before-merge bites.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant