Skip to content

Add GitHub-issue-driven test framework (+ src/ library carve-out)#67

Open
gaurav wants to merge 128 commits into
mainfrom
add-github-issue-tests
Open

Add GitHub-issue-driven test framework (+ src/ library carve-out)#67
gaurav wants to merge 128 commits into
mainfrom
add-github-issue-tests

Conversation

@gaurav

@gaurav gaurav commented Jan 8, 2026

Copy link
Copy Markdown
Collaborator

Introduces a GitHub-issue-driven test framework for babel-validation, alongside a library carve-out and supporting tooling.

Major pieces

  • Library carve-out — shared code moves from tests/common/ to an installable src/babel_validation/ package (core, services, sources, assertions, tools). The project now builds as a hatchling wheel; existing from src.babel_validation.X import Y imports keep working.
  • Assertion framework — 8 AssertionHandler types (Resolves, DoesNotResolve, ResolvesWith, DoesNotResolveWith, HasLabel, ResolvesWithType, SearchByName, Needed) with an auto-generated assertions/README.md and a drift test that fails CI if it goes stale.
  • GitHub integration (PyGitHub) — parses {{BabelTest|...}} wiki lines and ```yaml babel_tests: blocks from issue bodies. Open issues are expected to fail (strict xfail, so XPASS flags a closeable issue); closed issues are expected to pass.
  • CSV→YAML CLI (csv-to-babeltests) for bulk-authoring assertion blocks, reusing the same handlers and YAML schema as the issue parser.
  • CI workflow running pytest -m unit on PRs.
  • xdist parallelism with FileLock-coordinated CSV/issue caches; the controller refreshes caches, workers share them.

Review fixes (this PR)

Addressing the code review on the original revision:

  • pytest -m unit now runs fully offline. Both the GitHub issue tests and the Google Sheet / blocklist tests previously fetched from the network at collection time, because their parametrization is built before pytest applies -m deselection. The marker expression is now evaluated up front (shared tests/_pytest_helpers.deselected_by_markexpr), and the GitHub fetch and Google Sheet downloads are skipped when the test would be deselected. Net effect: pytest -m unit drops from ~2.5s + network to ~0.3s offline. Row IDs (row=N) and xfail marks are unchanged, so -k 'row=42' still works.
  • ResolvesWith / DoesNotResolveWith now enforce their documented 2+ CURIE arity. A single-CURIE param_set previously passed (or failed) vacuously; it now fails loudly with a clear message, matching HasLabel/SearchByName. Adds unit coverage.
  • SearchByName top-N is read from targets.ini (NameResXFailIfInTop) instead of being hardcoded to 5, so it's configurable per environment.
  • CachedNodeNorm.normalize_curie uses .get() to avoid a KeyError if NodeNorm ever omits a requested CURIE, consistent with the rest of the code.

Unrelated changes riding along in this PR

These are intentional and correct, but not part of the GitHub-issue test framework itself:

  • pytest bumped 8.4.2 → 9.0.2 (plus new deps: pygithub, pytest-xdist[psutil], pytest-subtests, click, pyyaml, tqdm, filelock, python-dotenv).
  • tests/pytest.ini removed; pytest config (testpaths, timeout, markers) moves into pyproject.toml [tool.pytest.ini_options].
  • tests/targets.ini [ci]/[ci-es] restructure: both CI targets now point NodeNorm at nodenorm-es.ci.transltr.io (the old biothings.ci.transltr.io/nodenorm/ URL is gone); [ci] pairs the ES NodeNorm with the non-ES NameRes, [ci-es] with the ES NameRes. A Repositories list (scanned for embedded assertions) is added to [DEFAULT].
  • tests/nameres/test_nameres_from_gsheet.py: the biolink_type request parameter is now sent as a list ([biolink_class]) instead of a bare string, matching the current NameRes API.

gaurav and others added 10 commits February 15, 2026 02:39
Each assertion type (Resolves, DoesNotResolve, ResolvesWith, ResolvesWithType,
SearchByName, Needed) is now a self-contained class in
src/babel_validation/assertions/, grouped by which service it targets
(nodenorm.py, nameres.py, common.py). A central ASSERTION_HANDLERS registry
in __init__.py maps lowercase assertion names to handler instances, and
NodeNormAssertion / NameResAssertion marker base classes allow isinstance()
checks for applicability. GitHubIssueTest.test_with_nodenorm() and
test_with_nameres() are now 3-line dispatchers. A README.md documents all
supported assertion types with examples in both wiki and YAML syntax.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gaurav and others added 2 commits February 19, 2026 19:03
…pendently.

Bumps pytest to >=9.0 (which includes built-in subtests support). Each
TestResult is now evaluated in its own subtest block, so a failure no longer
short-circuits the rest. Adds post-loop state-consistency subtests: a closed
issue with failing tests fails with a "consider reopening" message, and an open
issue where all tests pass emits an xfail "consider closing" hint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, get_github_issue_id() and GitHubIssueTest.__str__() both resolved
the org/repo name via github_issue.repository.organization.name, which triggers
lazy PyGitHub API calls. Parse org/repo from html_url instead (always present
in the issue JSON, no extra round-trip needed). Also resolves the TODO comment
in get_test_issues_from_issue().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gaurav and others added 4 commits May 17, 2026 23:29
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
A bare YAML scalar (e.g. 'Resolves: CHEBI:15365') is now wrapped as a
single one-element param_set rather than iterating its characters. This
test documents that behaviour and guards against regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 41 changed files in this pull request and generated 5 comments.

Comment on lines +38 to +48
class _FlowList(list):
"""List subclass that yaml.safe_dump emits in inline flow style."""


def _represent_flow_list(dumper, data):
return dumper.represent_sequence(
"tag:yaml.org,2002:seq", data, flow_style=True
)


yaml.SafeDumper.add_representer(_FlowList, _represent_flow_list)
Comment on lines +185 to +191
body = yaml.safe_dump(
data,
sort_keys=False,
default_flow_style=False,
allow_unicode=True,
width=10_000,
)
Comment thread tests/nameres/test_blocklist.py Outdated
Comment on lines +88 to +92
def pytest_generate_tests(metafunc):
global _github_auth_error
if "github_issue_id" not in metafunc.fixturenames:
return
issue_id_filter = metafunc.config.getoption("issue", default=[])
Comment on lines +178 to +180
for assertion, original_param_sets in babel_tests.items():
# YAML syntax: each entry under an assertion key becomes one param_set.
# A bare string becomes a single-element param_set; a list is used as-is.
gaurav and others added 3 commits June 10, 2026 11:35
- Keep blocklist code at src/babel_validation/sources/google_sheets/blocklist.py
  (consistent with this branch's src/ layout), incorporating Optional field types
  and the comment from main's tests/common/blocklist.py
- Adopt main's improved test_blocklist.py: nameres_synonyms_url variable name,
  preferred_curies param, and better assert error messages
- Keep tests/pytest.ini deleted (timeout config already in pyproject.toml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace `found`-flag tracking in NodeNormTest/NameResTest with a
results-list approach. Extract `_to_list()` helper in
github_issues_test_cases to collapse the two-level str/list
type-dispatch into a single reusable function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract _silent_unlink() in tests/conftest.py to replace three
copy-paste try/unlink/except blocks. In github_issues/conftest.py:
fix _issue_id() to use issue.repository.full_name instead of parsing
html_url, extract _record_auth_error() to deduplicate 401 handling,
and add _cached_ids sentinel to avoid repeated JSON file reads during
collection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 41 changed files in this pull request and generated 1 comment.

Comment on lines +177 to +187
# Parse string as YAML.
yaml_dict = yaml.safe_load(match.removeprefix("```yaml").removesuffix("```"))

babel_tests = yaml_dict.get('babel_tests') if isinstance(yaml_dict, dict) else None
if babel_tests is None:
raise ValueError(
f"YAML block in issue {github_issue_id} matched the detection pattern "
f"but contains no 'babel_tests' top-level key: {match!r}"
)

for assertion, original_param_sets in babel_tests.items():
gaurav and others added 7 commits June 10, 2026 12:36
pytest-subtests failures do not respect the xfail marker on the parent
test, so an open issue with failing assertions was reported as real
SUBFAILED failures (failing the whole run) even though the outer test
xfailed. For open issues, collect failing assertion messages instead of
asserting inside subtests, and xfail with a summary that includes the
first few failure messages. Closed issues keep granular subtest
reporting, and a fully-passing open issue still surfaces as a strict
XPASS failure signalling the issue may be closeable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Assertion names were already case-insensitive, but the wiki marker
regex matched '{{BabelTest' exactly, so '{{babeltest|...}}' was
silently ignored. Match the marker case-insensitively, document it,
and add unit tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The system tests in test_system.py only parse mocked issue bodies and
never call the GitHub API, but the shared github_issues_test_cases
fixture skipped them when GITHUB_TOKEN was unset. Override the fixture
locally with a dummy token so 'pytest -m unit' works anywhere.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Malformed CURIEs were rejected per-param_set, but only after the
cache-warming normalize_curies() call had already sent them to the
NodeNorm API. Validate all param_sets first and exclude invalid ones
from the warm-up request.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
pytest deletes the tempdir cache at the start of each run, but other
consumers (csv-to-babeltests, library users) would read a stale cache
indefinitely. Treat cached downloads older than CACHE_TTL_SECONDS
(1 hour) as expired and re-download.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The GitHub issue tests always read Repositories from [DEFAULT];
per-environment overrides are not implemented.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 41 changed files in this pull request and generated 3 comments.

Comment on lines +100 to +118
def pytest_generate_tests(metafunc):
if "github_issue_id" not in metafunc.fixturenames:
return
issue_id_filter = metafunc.config.getoption("issue", default=[])
if issue_id_filter:
try:
issues = _get_github_issues_test_cases().get_issues_by_ids(issue_id_filter)
except GithubException as e:
if e.status == 401:
_record_auth_error(e)
metafunc.parametrize("github_issue_id", [_AUTH_ERROR_ID], ids=[_AUTH_ERROR_ID])
return
raise
ids = [_issue_id(i) for i in issues]
for issue, id_ in zip(issues, ids):
_fetched_issues_cache[id_] = issue
else:
ids = _get_all_test_issue_ids()
metafunc.parametrize("github_issue_id", ids, ids=ids)
Comment on lines +178 to +186
# Parse string as YAML.
yaml_dict = yaml.safe_load(match.removeprefix("```yaml").removesuffix("```"))

babel_tests = yaml_dict.get('babel_tests') if isinstance(yaml_dict, dict) else None
if babel_tests is None:
raise ValueError(
f"YAML block in issue {github_issue_id} matched the detection pattern "
f"but contains no 'babel_tests' top-level key: {match!r}"
)
Comment on lines +12 to +13
def test_with_nodenorm(self, param_sets, nodenorm, label=""):
yield self.failed("Test needed for issue")
gaurav and others added 4 commits June 10, 2026 17:14
pytest_generate_tests for test_github_issue fetches issues from GitHub at
collection time. pytest applies -m marker deselection only *after* generation,
so running 'pytest -m unit' (as CI does) would still hit the GitHub search API
across every configured repo before deselecting the test.

Evaluate the marker expression ourselves against the test's own markers and,
when it would deselect this test, parametrize an empty set instead of fetching.
This keeps unit-only runs offline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both assertions document 'Two or more CURIEs per param_set' but never checked.
A single-CURIE ResolvesWith passed vacuously (canonical id == itself) while a
single-CURIE DoesNotResolveWith always failed (one canonical id) — both silent
authoring traps. Fail loudly with a clear message instead, matching HasLabel
and SearchByName. Adds unit coverage for the too-few-params case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
normalize_curie indexed the result dict directly while every other code path
uses .get(). NodeNorm normally echoes every requested CURIE (null when
unresolvable), but switch to .get() so an omitted key returns None rather than
raising.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The NameRes 'found in top N' threshold was hardcoded to the handler default of
5, ignoring NameResXFailIfInTop in targets.ini. Read it from target_info so the
threshold is configurable per environment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gaurav gaurav changed the title Add GitHub issue tests Add GitHub-issue-driven test framework (+ src/ library carve-out) Jun 10, 2026
gaurav and others added 2 commits June 10, 2026 17:57
Move the marker-expression check out of the github_issues conftest into
tests/_pytest_helpers.py so the Google Sheet test modules can reuse it to defer
their own network-backed parametrization.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The gsheet and blocklist test modules instantiated GoogleSheetTestCases /
load_blocklist_from_gsheet at import time and parametrized via a module-level
@pytest.mark.parametrize, so every collection — including 'pytest -m unit' —
downloaded the sheets even though those tests aren't unit tests.

Move parametrization into pytest_generate_tests with a lazy, cached fetch, and
skip the fetch entirely when a -m filter would deselect the test. Row IDs
(row=N) and xfail marks are unchanged, so -k 'row=42' still works.

With this, 'pytest -m unit' runs fully offline: ~0.3s and no network, versus
~2.5s and a GitHub + Google Sheet round trip before.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants