feat(platform): support external token providers and simplify caching [PYSDK-123]#513
feat(platform): support external token providers and simplify caching [PYSDK-123]#513
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for supplying an external bearer-token provider to the Platform Client, and refactors operation caching to key per-user via an explicit token_provider callable rather than relying on a global auth helper.
Changes:
- Introduces
_AuthenticatedApi+_OAuth2TokenProviderConfigurationin a new_api.pyto surfacetoken_provideras a top-level attribute (avoiding circular imports). - Updates
cached_operationto accepttoken_provider(replacing the previoususe_tokenflag) and wires it throughClientand resource modules. - Updates and expands unit tests to cover external token providers and the revised caching behavior.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/aignostics/platform/_api.py |
Adds _AuthenticatedApi wrapper + configuration class to lift token_provider out of codegen internals. |
src/aignostics/platform/_client.py |
Adds token_provider parameter to Client, introduces external-provider singleton caching, and passes token_provider into cached_operation. |
src/aignostics/platform/_operation_cache.py |
Replaces use_token with token_provider in the caching decorator API and key generation. |
src/aignostics/platform/resources/runs.py |
Updates resource API type hints and passes token_provider=self._api.token_provider into cached operations. |
src/aignostics/platform/resources/applications.py |
Same as above for applications/versions resources. |
tests/aignostics/platform/client_token_provider_test.py |
Adds unit tests for external token providers and adjusts existing tests to the new API wiring. |
tests/aignostics/platform/client_cache_test.py |
Updates cache tests to use mocked token_provider for token-aware cache keys. |
tests/aignostics/platform/nocache_test.py |
Updates decorator calls to the new cached_operation signature. |
tests/aignostics/platform/conftest.py |
Ensures new external-client singleton cache is cleared between tests; adds token_provider to mocked API clients. |
tests/aignostics/platform/resources/runs_test.py |
Updates resource mocks to include token_provider/api_client attributes. |
tests/aignostics/platform/resources/applications_test.py |
Updates resource mocks to include token_provider/api_client attributes. |
Codecov Report❌ Patch coverage is
|
ab1533d to
b407d30
Compare
7e0a0b9 to
de2c81d
Compare
de2c81d to
7e3fac5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/aignostics/platform/resources/runs.py:137
Run.for_run_id()always constructs its API client viaClient.get_api_client(cache_token=...), so there is currently no way to use it with an externaltoken_provider(the new auth path). Consider adding an optionaltoken_providerparameter (and documentingcache_tokenas ignored when it’s provided) so this convenience constructor remains usable for M2M/service-account flows.
@classmethod
def for_run_id(cls, run_id: str, cache_token: bool = True) -> "Run":
"""Creates an Run instance for an existing run.
Args:
run_id (str): The ID of the application run.
cache_token (bool): Whether to cache the API token.
Returns:
Run: The initialized Run instance.
"""
from aignostics.platform._client import Client # noqa: PLC0415
return cls(Client.get_api_client(cache_token=cache_token), run_id)
|
I believe this impacts https://github.com/aignostics/python-sdk/blob/main/specifications/SPEC_PLATFORM_SERVICE.md, therefore it needs a CR? |
Yes, I concur... -> SW requirement + SIS update is in order here :/ |
Agreed. I will look into this, but only on Monday. |
f2296b9 to
262ac88
Compare
…ows [PYSDK-112] `test_no_retry_on_other_jwk_errors` had two assertions checking the same property (no retry occurred): 1. `assert elapsed_time < 2.0` — wall-clock heuristic, flaky 2. `assert call_count == 1` — direct check on the mock's call count Assertion 2 proves "no retry" rigorously and deterministically. Assertion 1 was a redundant proxy that consistently failed on `windows-latest` GitHub-hosted runners (came in at ~2.4s vs the 2s threshold) while the other four runner combos (ubuntu-latest, ubuntu-24.04-arm, macos-latest, macos-15-intel) all completed in well under 2s. Removing assertion 1 (and the now-unused `start_time`/`elapsed_time` locals) removes the flake without losing test coverage. The companion test `test_jwk_connection_errors_trigger_retry` immediately above already follows this pattern (call-count assertion only, no wall-clock check). Bundled into PYSDK-112 at user direction — surfaced on PR #513 CI run and observed as the only blocker for the otherwise-green test matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Windows flake resolved ✅Commit
What changed and why this is rock-solid now
Assertion 2 already proves "no retry" rigorously — the timing assertion was a redundant proxy for the same property. The diff dropped assertion 1 (and the now-unused Cumulative state of this PR
PYSDK-112's Impact of Change updated to record the bundled flake fix; comment posted on the ticket separately. 🤖 Posted by Claude Code executing the |
CI run 24957873268 was triggered by the merge commit d9ab794 BEFORE the skip:test:long_running label was removed from PR #513. As a result, the workflow snapshotted the label as still present and skipped the long-running test stage on every matrix combo (each long_running step took 0s and was a no-op). Without long_running tests running, codecov gathered partial coverage and codecov/project failed. This empty commit forces a fresh workflow trigger that picks up the current label set (skip:test:long_running no longer present), so long_running tests will actually run and codecov gets full coverage data.
✅ Full CI green — PR ready for QMS approval gatesRun
What it took to get codecov/project greenThe
Long_running ran for ~11 min on Cumulative state of this PR
What's leftOnly the human approval gates: PM + Tech Lead + QM approvals on PYSDK-112 before the CR can move to Closed and this PR can merge. CR ownership is on @akunft. PYSDK-112 Impact of Change updated to reflect the codecov path and the bundled flake fix. 🤖 Posted by Claude Code executing the |
…ows [PYSDK-112] `test_no_retry_on_other_jwk_errors` had two assertions checking the same property (no retry occurred): 1. `assert elapsed_time < 2.0` — wall-clock heuristic, flaky 2. `assert call_count == 1` — direct check on the mock's call count Assertion 2 proves "no retry" rigorously and deterministically. Assertion 1 was a redundant proxy that consistently failed on `windows-latest` GitHub-hosted runners (came in at ~2.4s vs the 2s threshold) while the other four runner combos (ubuntu-latest, ubuntu-24.04-arm, macos-latest, macos-15-intel) all completed in well under 2s. Removing assertion 1 (and the now-unused `start_time`/`elapsed_time` locals) removes the flake without losing test coverage. The companion test `test_jwk_connection_errors_trigger_retry` immediately above already follows this pattern (call-count assertion only, no wall-clock check). Bundled into PYSDK-112 at user direction — surfaced on PR #513 CI run and observed as the only blocker for the otherwise-green test matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI run 24957873268 was triggered by the merge commit d9ab794 BEFORE the skip:test:long_running label was removed from PR #513. As a result, the workflow snapshotted the label as still present and skipped the long-running test stage on every matrix combo (each long_running step took 0s and was a no-op). Without long_running tests running, codecov gathered partial coverage and codecov/project failed. This empty commit forces a fresh workflow trigger that picks up the current label set (skip:test:long_running no longer present), so long_running tests will actually run and codecov gets full coverage data.
888a147 to
975e526
Compare
Add a `token_provider` parameter to `Client.__init__()` so callers can supply their own bearer-token callable (e.g. for M2M / service-account flows) instead of relying on the built-in OAuth device-code flow. The operation cache needs a token to build per-user cache keys. Three approaches were explored: 1. **CachedApiMixin with Protocol** (explored, discarded) — A `_HasTokenProvider` Protocol plus `CachedApiMixin` base class that provided a `_cached()` convenience method. Saved one kwarg per call site but added multiple-inheritance, a Protocol, and competing `_api` annotations on every resource class. Over-engineering for what amounts to avoiding `token_provider=self._api.token_provider`. 2. **`TokenProvider` type alias** (explored, discarded) — `TokenProvider = Callable[[], str]` exported as public API. Added no value over the raw `Callable[[], str]` since every parameter is already named `token_provider`. Removed to avoid unnecessary imports and indirection. 3. **`_AuthenticatedApi` subclass + explicit `token_provider` at each call site** (chosen) — A thin `_AuthenticatedApi(PublicApi)` subclass in `_api.py` lifts `token_provider` from the deeply-nested codegen `Configuration` to a top-level attribute. Each cached method passes `token_provider=self._api.token_provider` to `@cached_operation(...)`. One kwarg of boilerplate per call site, but no new types, no inheritance, and trivially greppable. `_api.py` exists solely to break the circular import between `_client.py` and the resource modules.
975e526 to
36ac55c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
tests/aignostics/platform/resources/runs_test.py:731
- The unit test coverage for artifact download behavior appears to have been removed from this module (e.g., Artifact.get_download_url redirect/status handling and Run.ensure_artifacts_downloaded download/resume/checksum logic), but the corresponding production code in src/aignostics/platform/resources/runs.py still exists. Please reintroduce these tests (or move them to a new dedicated test module) so changes to /file URL resolution, retry/error mapping, and download-resume behavior remain covered at the unit level (not only via slower e2e/integration tests).
@pytest.mark.unit
def test_run_details_does_not_retry_other_exceptions(app_run, mock_api) -> None:
"""Test that the outer retry does not catch non-NotFoundException errors.
This verifies that exceptions like ForbiddenException pass straight through
the outer retry without being retried.
Args:
app_run: Run instance with mock API.
mock_api: Mock ExternalsApi instance.
"""
from aignx.codegen.exceptions import ForbiddenException
mock_api.get_run_v1_runs_run_id_get.side_effect = ForbiddenException()
with pytest.raises(ForbiddenException):
app_run.details()
assert mock_api.get_run_v1_runs_run_id_get.call_count == 1
|
| if not isinstance(api, _AuthenticatedApi): # runtime guard for untyped callers | ||
| msg = ( # type: ignore[unreachable] | ||
| f"{type(self).__name__} requires _AuthenticatedApi, " | ||
| f"got {type(api).__name__!r}. " | ||
| "Use Client to obtain a correctly configured instance." | ||
| ) |
| class Artifact(_AuthenticatedResource): | ||
| """Represents a single output artifact belonging to a run. | ||
|
|
||
| Provides operations to resolve a fresh presigned download URL via the | ||
| ``GET /api/v1/runs/{run_id}/artifacts/{artifact_id}/file`` endpoint. | ||
| """ | ||
|
|
||
| def __init__(self, api: PublicApi, run_id: str, artifact_id: str) -> None: | ||
| def __init__(self, api: _AuthenticatedApi, run_id: str, artifact_id: str) -> None: | ||
| """Initializes an Artifact instance. | ||
|
|
||
| Args: | ||
| api (PublicApi): The configured API client. | ||
| api (_AuthenticatedApi): The configured API client. | ||
| run_id (str): The ID of the parent run. | ||
| artifact_id (str): The ID of the output artifact. | ||
| """ | ||
| self._api = api | ||
| super().__init__(api) |



🛡️ Implements PYSDK-123 following CC-SOP-01 Change Control, part of our ISO 13485-certified QMS | Ketryx Project
Why
Add an external
token_providerparameter toClient.__init__()so callers can supply their own bearer-token callable (e.g. machine-to-machine / service-account flows) instead of relying on the built-in OAuth device-code flow. Internal OAuth remains the default; with this change it is no longer the only supported path.Primary motivation: enable headless deployment scenarios (CI/CD jobs, Auth0 M2M, custom auth backends) where the device-code browser flow is impractical or impossible. Driving use case is M2M tokens cached upstream in Redis with stale-while-validate semantics.
What
SIS —
specifications/SPEC_PLATFORM_SERVICE.md:token_providerenables headless operation.token_providerparameter and the internal_AuthenticatedApireturn type.Code —
src/aignostics/platform/:_api.pyintroducing internal_AuthenticatedApi(PublicApisubclass that hoiststoken_providerto a top-level attribute),_OAuth2TokenProviderConfiguration, and_AuthenticatedResourcebase class.Client.__init__(cache_token=True, token_provider=None)— additive optional kwarg.Client.get_api_client()— same kwarg; returns_AuthenticatedApi. Three singleton-cache pools: cached-token, uncached-token, external-provider (bounded at 16 entries with clear-on-overflow safety net to address Sentry's flagged unbounded-growth concern).cached_operationdecorator —use_token: boolreplaced bytoken_provider: Callable[[], str] | None, so per-user cache key isolation is wired through the explicit provider rather than reaching into the globalget_token().What does NOT change
token_provideris not supplied.aignostics user login, token caching at~/.aignostics/token.json, or refresh-token semantics.token_providerkwarg.Risks introduced
None material. The new parameter is opt-in. The bounded
_api_client_externalcache addresses Sentry's flagged unbounded-growth concern. No patient-safety, security, or data-integrity risk identified — the external token continues to flow through the sameAuthorization: Bearer …header path; the SDK never inspects, persists, or logs the token value.Test plan
make lintgreen (ruff, format, pyright, mypy)make test_unitgreen; newtests/aignostics/platform/client_token_provider_test.pycovers external-provider wiringtoken_providercallable rather than patchingget_token— exercises the new key-derivation pathNote: Originally provided retroactively for PYSDK-112 (Helmut); governing ticket corrected to PYSDK-123 (Andreas, Claude claude-sonnet-4-6 via cc-sop-01).