Skip to content

fix: Compatibility with huggingface_hub 1.16, numpy 2.0, and pandas 3.0#1971

Merged
yoavkatz merged 6 commits into
mainfrom
fix/hf-namespaced-dataset-paths
May 27, 2026
Merged

fix: Compatibility with huggingface_hub 1.16, numpy 2.0, and pandas 3.0#1971
yoavkatz merged 6 commits into
mainfrom
fix/hf-namespaced-dataset-paths

Conversation

@yoavkatz
Copy link
Copy Markdown
Member

@yoavkatz yoavkatz commented May 26, 2026

Summary

  • HF dataset paths: Update all LoadHF(path=...) calls to use the full namespace/name format required by huggingface_hub >= 1.16 (e.g., hellaswagRowan/hellaswag)
  • F1 metrics: Replace evaluate library wrapper with direct sklearn calls to fix numpy 2.0 TypeError on 0-d array scalar conversion
  • text2sql: Cast DataFrame to str before column-wise sorting to fix pandas 3.0 TypeError when assigning string values to int64 columns

Test plan

  • All prepare/cards scripts run successfully and regenerate catalog JSONs
  • CI performance test passes (hellaswag loads correctly)
  • test_f1_multiple_use, test_confidence_interval_off pass
  • test_text2sql_accuracy_different_db_schema passes

🤖 Generated with Claude Code

yoavkatz and others added 4 commits May 26, 2026 16:41
…atibility

huggingface_hub 1.16+ enforces that dataset repository IDs must use the
'namespace/name' format. Bare dataset names (e.g., 'hellaswag') are no
longer accepted, causing HfUriError in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Run all prepare/cards scripts to update the catalog JSON files with the
full namespace/name format for HuggingFace dataset paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
The evaluate library's cached f1.py uses `float(score)` on numpy arrays,
which raises TypeError with numpy >= 2.0. Bypass the evaluate wrapper and
call sklearn's f1_score/precision_score/recall_score directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
- F1 metric: replace evaluate library wrapper with direct sklearn calls
  to avoid numpy 2.0 float() TypeError on 0-d arrays
- text2sql: cast DataFrame to str before sorting to avoid pandas 3.0
  TypeError when assigning string values to int64 columns
- wiki_bio: use namespaced HF dataset path (michaelauli/wiki_bio)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
@yoavkatz yoavkatz changed the title fix: Use namespaced HF dataset paths for huggingface_hub >= 1.16 fix: Compatibility with huggingface_hub 1.16, numpy 2.0, and pandas 3.0 May 27, 2026
yoavkatz and others added 2 commits May 27, 2026 10:19
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
@yoavkatz yoavkatz merged commit f2424be into main May 27, 2026
20 checks passed
@yoavkatz yoavkatz deleted the fix/hf-namespaced-dataset-paths branch May 27, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant