Skip to content

fix(ci): repair dev live-data test failures#252

Draft
Elarwei001 wants to merge 7 commits into
scverse:devfrom
Elarwei001:fix/dev-ci-repair
Draft

fix(ci): repair dev live-data test failures#252
Elarwei001 wants to merge 7 commits into
scverse:devfrom
Elarwei001:fix/dev-ci-repair

Conversation

@Elarwei001

@Elarwei001 Elarwei001 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

This draft consolidates the CI/test-stability fixes for the current dev hatch-test failures into one branch so the GitHub test matrix can be evaluated in a single PR.

Included fixes:

  • OpenTargets drugs: add GraphQL sub-selections for synonyms and tradeNames so the live drugs query no longer returns HTTP 400, then flatten those fields back to lists of labels to preserve the existing gget output shape.
  • ARCHS4 tissue expression: tolerate the upstream CSV omitting optional color; keep required expression columns and sorting.
  • OpenTargets live-data tests: replace exact snapshots for drifting live resources with structural/semantic invariants for diseases, DepMap, interactions, pharmacogenetics, drugs, and ARCHS4.
  • ELM setup in tests: retry transient live ELM database download failures instead of immediately failing collection.

Not included:

Live contracts being tested

  • Drugs: live results must expose valid CHEMBL ids, names, synonym lists, indication rows, and the expected GraphQL columns without HTTP 400.
  • ARCHS4 tissue: live tissue results must return id/min/q1/median/q3/max, numeric quartile ordering, median-descending sort, and no dependency on optional color.
  • OpenTargets drift tests: diseases, DepMap, interactions, and pharmacogenetics assert stable schema/id/score/filter invariants instead of pinning release-specific values.
  • ELM setup: transient download failures get a small number of retries; persistent upstream failure still fails the test rather than being hidden.

Local verification

Passing locally:

~/.local/bin/hatch test -py 3.12,3.13,3.14 tests/test_opentargets.py tests/test_archs4.py tests/test_elm.py
# py3.12: 31 passed, 2 skipped
# py3.13: 31 passed, 2 skipped
# py3.14: 31 passed, 2 skipped

uvx pre-commit run ruff-check --files gget/gget_opentargets.py tests/test_opentargets.py tests/fixtures/test_opentargets.json docs/src/en/opentargets.md docs/src/en/updates.md
# Passed

uvx pre-commit run ruff-format --files gget/gget_opentargets.py tests/test_opentargets.py
# Passed

The two skipped tests are the known OpenTargets expression upstream break tracked in #247.

Related

Elarwei001 and others added 5 commits June 25, 2026 13:50
…0 fix)

OpenTargets changed the Drug 'synonyms' and 'tradeNames' fields from
[String!]! to the object type [DrugLabelAndSource!]!, which now requires
a sub-selection. The bare-scalar selection caused every drug query to
fail with HTTP 400.

Request '{ label }' for both fields and flatten the response objects
back to a list of label strings so downstream output stays
backward-compatible (a list of strings).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ev-drift)

ARCHS4's tissue-expression CSV intermittently omits the 'color' column,
which made `gget archs4 --which tissue` crash with
`KeyError: "['color'] not found in axis"`. The 'color' column is only used
for plotting upstream and is dropped (never used) by gget, so a missing
column should not be fatal.

Use `drop(columns=["color"], errors="ignore")` so the request degrades
gracefully when the column is absent. Adds network-free regression tests
covering both the present-color and missing-color responses.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OpenTargets retired the `target.expressions` field (it now returns an empty
list for every gene), so `gget opentargets -r expression` returned nothing.
Baseline expression data moved to the paginated `target.baselineExpression`
field with a new per-biosample data model.

- Repoint the expression query to `baselineExpression(page:{index:0,size:250})
  { rows {...} }` and update rows_path to ["baselineExpression","rows"].
- Output columns change accordingly (per-biosample summary stats: median/min/
  q1/q3/max/unit + tissueBiosample/celltypeBiosample ids + datasource/datatype),
  because the upstream data model changed and the old shape no longer exists.
- Remove the two now-invalid live exact-match fixtures and replace them with
  network-free mocked tests; update docs (example, resource table, updates.md).

Verified live: http_json with the new query returns 1409 rows in ~0.6s and the
parsing pipeline yields the documented columns.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t (data drifts across releases)

OpenTargets is a live database re-released regularly; several opentargets tests
pinned exact current values (disease ids/scores, result hashes, interaction
partner ids, genotypes) that legitimately change every release, so they failed
on unrelated PRs even though gget returns correct current data.

Replace the exact-value/hash assertions for test_opentargets, _diseases,
_depmap, _depmap_filter, _interactions, _interactions_no_limit and
_pharmacogenetics with structural/invariant assertions (expected columns
present, numeric dtypes, value-format patterns — ontology-curie disease/tissue
ids, ENSG interaction partners, ACH DepMap ids, score in [0,1], nucleotide
genotypes — and the depmap filter invariant). The fixture entries are marked
`code_defined`; the structural methods live in tests/test_opentargets.py.

These stay meaningful (they break on wrong columns, malformed ids, non-numeric
scores, broken filtering, or empty-where-guaranteed) without pinning drifting
data. Verified live against current OpenTargets data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov-commenter

codecov-commenter commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 56.40%. Comparing base (5cf607f) to head (2a8a2b7).
⚠️ Report is 1 commits behind head on dev.

Files with missing lines Patch % Lines
gget/gget_opentargets.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #252      +/-   ##
==========================================
+ Coverage   56.14%   56.40%   +0.25%     
==========================================
  Files          29       29              
  Lines        9244     9253       +9     
==========================================
+ Hits         5190     5219      +29     
+ Misses       4054     4034      -20     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Elarwei001

Elarwei001 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

currently opentarget change the format of its resource="drugs" response from "string" to a "DrugLabelAndSource" object. the response example looks like this:

`query: { drug(chemblId: "CHEMBL1201631") { name synonyms { label source } tradeNames { label source } } }

HTTP 200
name: INSULIN HUMAN

synonyms (first 8, now WITH source):
{'label': 'Biosulin r', 'source': 'ChEMBL'}
{'label': 'Capsulin', 'source': 'ChEMBL'}
{'label': 'Exubera (inhaled insulin human)', 'source': 'ChEMBL'}
{'label': 'High molecular weight insulin human', 'source': 'ChEMBL'}
{'label': 'Human insulin', 'source': 'ChEMBL'}
{'label': 'Humulin', 'source': 'ChEMBL'}
{'label': 'Insulin', 'source': 'ChEMBL'}
{'label': 'Insulin (human)', 'source': 'ChEMBL'}

tradeNames (first 8, now WITH source):
{'label': 'Actrapid', 'source': 'ChEMBL'}
{'label': 'Afrezza', 'source': 'ChEMBL'}
{'label': 'Exubera', 'source': 'ChEMBL'}
{'label': 'Humulin br', 'source': 'ChEMBL'}
{'label': 'Humulin r', 'source': 'ChEMBL'}
{'label': 'Inpremzia', 'source': 'ChEMBL'}
{'label': 'Ins humulin r', 'source': 'ChEMBL'}
{'label': 'Insuman', 'source': 'ChEMBL'}

-> Each entry now shows {'label': ..., 'source': ...}. So source was
always available; the earlier output omitted it only because the query
selected { label } (and gget intentionally keeps just label to
preserve its historical plain-string output).`

so, will need to change the decoding structure to capture current response labels.
image

test code:

#!/usr/bin/env python3
"""
Show that DrugLabelAndSource really has BOTH `label` and `source` —
we only saw `label` before because our query only selected `label`.
GraphQL returns exactly the fields you request, nothing more.

Read-only OpenTargets API call. No GitHub, no writes.
Run: /Users/elar/gget/.venv/bin/python /Users/elar/cc_output/gget/verify-opentargets-source.py
"""
import json

import requests

URL = "https://api.platform.opentargets.org/api/v4/graphql"
DRUG = "CHEMBL1201631"

# Note: now we ALSO request `source` in the sub-selection.
q = f'{{ drug(chemblId: "{DRUG}") {{ name synonyms {{ label source }} tradeNames {{ label source }} }} }}'
print("query:", q, "\n")

r = requests.post(URL, json={"query": q}, timeout=60)
print("HTTP", r.status_code)
drug = (r.json().get("data") or {}).get("drug") or {}
print("name:", drug.get("name"), "\n")

print("synonyms (first 8, now WITH source):")
for item in (drug.get("synonyms") or [])[:8]:
    print("   ", item)

print("\ntradeNames (first 8, now WITH source):")
for item in (drug.get("tradeNames") or [])[:8]:
    print("   ", item)

print(
    "\n-> Each entry now shows {'label': ..., 'source': ...}. So `source` was"
    "\n   always available; the earlier output omitted it only because the query"
    "\n   selected `{ label }` (and gget intentionally keeps just `label` to"
    "\n   preserve its historical plain-string output)."
)

@Elarwei001

Copy link
Copy Markdown
Contributor Author

try the GENE's associate diseases, the result may change (which means opentarget's response won't be stable).

for example, if compare with the previous test case in dev, current value is update both on ID and Score:
image

test code:

#!/usr/bin/env python3
"""
Measure the ACTUAL drift of `gget opentargets -r diseases` for IL13
(ENSG00000169194, the gene used in test_opentargets), so we can design a
RIGOROUS test (known-ID-set + score tolerance), not a weak one.

Compares the live OpenTargets top-2 associated diseases against the OLD
fixture values that PR #250 deleted.

Read-only OpenTargets API. No GitHub, no writes.
Run: /Users/elar/gget/.venv/bin/python /Users/elar/cc_output/gget/verify-opentargets-diseases-drift.py
"""
import json

import requests

URL = "https://api.platform.opentargets.org/api/v4/graphql"
GENE = "ENSG00000169194"  # IL13

# What the OLD fixture asserted (the 171 lines PR #250 removed), top 2:
OLD = [
    {"score": 0.7297489019498119, "id": "EFO_0000274", "name": "atopic eczema"},
    {"score": 0.6642728577751653, "id": "MONDO_0004979", "name": "asthma"},
]

q = f"""
{{ target(ensemblId: "{GENE}") {{
     associatedDiseases(page: {{index: 0, size: 2}}) {{
       count
       rows {{ score disease {{ id name }} }}
     }}
}} }}
"""

r = requests.post(URL, json={"query": q}, timeout=60)
print("HTTP", r.status_code)
body = r.json()
rows = (((body.get("data") or {}).get("target") or {}).get("associatedDiseases") or {}).get("rows")
if not rows:
    print(json.dumps(body, indent=2)[:800])
    raise SystemExit("no rows — inspect raw response above")

print(f"\n{'rank':<5}{'LIVE id':<16}{'LIVE name':<22}{'LIVE score':<14}"
      f"{'OLD id':<16}{'OLD score':<14}{'Δscore':<12}{'id changed?'}")
print("-" * 110)
for i, row in enumerate(rows):
    live_id = row["disease"]["id"]
    live_name = row["disease"]["name"]
    live_score = row["score"]
    old = OLD[i] if i < len(OLD) else {"id": "-", "score": float("nan"), "name": "-"}
    dscore = live_score - old["score"]
    rel = abs(dscore) / old["score"] if old["score"] else float("nan")
    idchg = "SAME" if live_id == old["id"] else f"{old['id']} -> {live_id}"
    print(f"{i:<5}{live_id:<16}{live_name:<22}{live_score:<14.6f}"
          f"{old['id']:<16}{old['score']:<14.6f}{dscore:<+12.6f}{idchg}")
    print(f"     (relative score drift this row: {rel*100:.3f}%)")

print("\nUse this to decide: (a) the acceptable ID SET per disease (did EFO<->MONDO"
      "\nflip, or is the name stable while id changed?), and (b) a score TOLERANCE"
      "\nthat survives normal release drift but still catches wrong-data. Run on a"
      "\nfew genes / a few days to see the real drift magnitude before fixing the test.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants