feat: add xtable-converter Claude Code skill by vaibhavk1992 · Pull Request #829 · apache/incubator-xtable

vaibhavk1992 · 2026-06-11T16:57:00Z

Adds a natural-language XTable conversion skill that extracts slots from user requests, generates datasetConfig YAML, runs the XTable bundled jar, and produces a verified per-table SUCCESS/FAILED/UNVERIFIED verdict. Includes no-op sync detection, jol-core workaround docs, and team-wide permission rules to suppress prompts.

Usage - Convert my hudi table at some_path to iceberg table

Adds a natural-language XTable conversion skill that extracts slots from user requests, generates datasetConfig YAML, runs the XTable bundled jar, and produces a verified per-table SUCCESS/FAILED/UNVERIFIED verdict. Includes no-op sync detection, jol-core workaround docs, and team-wide permission rules to suppress prompts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vinishjail97 · 2026-06-19T01:34:59Z

@@ -0,0 +1,11 @@
+{


Scope/governance question before anything else: this commits Claude-Code-specific tooling and a team-wide permission allowlist into a vendor-neutral ASF incubator repo. Everyone who clones and runs Claude Code here inherits these auto-approve rules. Has this been raised on dev@ / with the PPMC? At minimum, consider whether the skill belongs in-repo vs. a separate tools repo, and whether committed permissions are appropriate for an Apache project (vs. each contributor opting in via the git-ignored settings.local.json).

@vinishjail97 I am thinking to get rid of this and have settlings.local.json or adding a permission section in SKILLS.md and taking the permissions form user on prompts.Let me know what do you think?

vinishjail97 · 2026-06-19T01:34:59Z

+    "allow": [
+      "Bash(mkdir -p /tmp/xtable_run*)",
+      "Bash(python3 .claude/skills/xtable-converter/scripts/*)",
+      "Bash(java -cp *)",


Bash(java -cp *) auto-approves running any Java main class with any classpath and args, with no prompt, for every contributor who clones the repo — that is effectively arbitrary-code-execution approval. Scope this down to the exact invocation the jol-core workaround needs (a specific main class / jar pattern), or drop it and let it prompt.

vinishjail97 · 2026-06-19T01:34:59Z

+        for tgt in targets:
+            status, message = "unknown", None
+            # error lines mentioning the table, or table mentions inside a stack-trace window
+            for i, ln in mentions:


Error detection isn't target-scoped. mentions is every line that names the table (or its basePath), so this error branch fires for all targets of a table as soon as any one of them logs an error. The success branch below correctly gates on tgt.lower() in ln.lower(), but this one doesn't. Combined with verify.py — which short-circuits log_status == "error" straight to FAILED and never runs the storage check — a single failed target produces false FAILED verdicts for sibling targets that actually succeeded. Scope the error match to the target the same way the success match is scoped.

vinishjail97 · 2026-06-19T01:34:59Z

+        return "missing", f"no matching metadata at {url}"
+    import datetime
+    try:
+        ts = datetime.datetime.strptime(newest[0], "%Y-%m-%d %H:%M:%S").timestamp()


datetime.strptime(...).timestamp() interprets the naive datetime in the host's local timezone, but aws s3 ls prints timestamps in UTC. On any non-UTC machine the freshness comparison is wrong by the tz offset, yielding spurious stale/fresh. Use calendar.timegm(struct_time) (or attach tzinfo=timezone.utc) so the epoch is computed as UTC.

vinishjail97 · 2026-06-19T01:34:59Z

+### Step 3 — Run the conversion
+
+```bash
+bash scripts/run_xtable.sh /tmp/xtable_run/config.yaml \


This invokes bash scripts/run_xtable.sh (relative to the skill dir), but the committed allow rule is Bash(bash .claude/skills/xtable-converter/scripts/*). The two won't match, so this will still prompt. Make the documented command and the permission pattern use the same path.

vinishjail97 · 2026-06-19T01:34:59Z

+Write it to disk immediately — do not wait for confirmation:
+
+```
+RUN_DIR=/tmp/xtable_run


RUN_DIR=/tmp/xtable_run is a fixed path reused on every invocation, so run_meta.json and the run logs are overwritten each run and two concurrent conversions race on the same files. Use a per-run dir (mktemp -d /tmp/xtable_run.XXXXXX) and pass it through.

vinishjail97 · 2026-06-19T01:34:59Z

+METADATA_CHECKS = {
+    "DELTA":   {"subdir": "_delta_log",  "pattern": r"\.json$"},
+    "ICEBERG": {"subdir": "metadata",    "pattern": r"\.metadata\.json$"},
+    "HUDI":    {"subdir": ".hoodie",     "pattern": r"(\.commit|\.deltacommit|\.replacecommit|hoodie\.properties)$"},


The HUDI freshness pattern includes hoodie.properties, which is written once at table creation and not rewritten per sync. Since freshness is max(mtime) over matches, a present-but-old timeline plus a hoodie.properties from creation could still read as the newest file and skew the check. Restrict the freshness match to timeline instants (*.commit/*.deltacommit/*.replacecommit) and keep hoodie.properties only for the existence check if needed.

- Scope error detection per-target in parse_result.py (false FAILED verdicts on sibling targets when one target fails) - Fix UTC timezone bug in verify.py check_s3 (aws s3 ls timestamps are UTC, not local time) - Remove hoodie.properties from HUDI freshness pattern (written once at creation, skews max(mtime) check) - Replace fixed /tmp/xtable_run path with mktemp UTC-timestamped dir to prevent concurrent run collisions - Fix script path mismatch between SKILL.md commands and settings.json allow rules (relative vs absolute paths) - Scope java invocation to PATH prepend so bare 'java' allow rule matches - Update mktemp allow rule in settings.json (replaces stale mkdir rule)

vaibhavk1992 · 2026-06-19T14:59:15Z

@vinishjail97 I have addressed all your comment in my latest commit.Please check.

vinishjail97 reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add xtable-converter Claude Code skill#829

feat: add xtable-converter Claude Code skill#829
vaibhavk1992 wants to merge 2 commits into
apache:mainfrom
vaibhavk1992:feat/xtable-converter-skill-upstream

vaibhavk1992 commented Jun 11, 2026

Uh oh!

vinishjail97 Jun 19, 2026

Uh oh!

vaibhavk1992 Jun 19, 2026

Uh oh!

vinishjail97 Jun 19, 2026

Uh oh!

vinishjail97 Jun 19, 2026

Uh oh!

vinishjail97 Jun 19, 2026

Uh oh!

vinishjail97 Jun 19, 2026

Uh oh!

vinishjail97 Jun 19, 2026

Uh oh!

vinishjail97 Jun 19, 2026

Uh oh!

vaibhavk1992 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vaibhavk1992 commented Jun 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vaibhavk1992 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants