Skip to content

feat: Add bigframes.execution_history API to track BigQuery jobs#16588

Open
shuoweil wants to merge 15 commits intomainfrom
shuowei-execution-history
Open

feat: Add bigframes.execution_history API to track BigQuery jobs#16588
shuoweil wants to merge 15 commits intomainfrom
shuowei-execution-history

Conversation

@shuoweil
Copy link
Copy Markdown
Contributor

@shuoweil shuoweil commented Apr 8, 2026

This PR promotes execution_history() to the top-level bigframes namespace and upgrades it to track rich metadata for every BigQuery job executed during your session.

Key User Benefits:

  • Easier Access: Call bigframes.execution_history() directly instead of digging into sub-namespaces.

  • Rich Metadata Tracking: Captures structured statistics for both Query Jobs and Load Jobs including:
    - job_id and a direct Google Cloud Console URL for easy debugging.
    - Performance metrics: total_bytes_processed, duration_seconds, and slot_millis.
    - Query details (truncated preview of the SQL ran).

  • Clean, Focused Logs: Automatically filters out internal library overhead (like schema validations and index uniqueness checks) so your history only shows the data processing steps you actually care about.

Usage Example:

    1 import bigframes.pandas as bpd
    2 import pandas as pd
    3 import bigframes
    4
    5 # ... run some bigframes operations ...
    6 df = bpd.read_gbq("SELECT 1")
    7
    8 # Upload some local data (triggers a Load Job)
    9 bpd.read_pandas(pd.DataFrame({'a': [1, 2, 3]}))
   10
   11 # Get a DataFrame of all BQ jobs run in this session
   12 history = bigframes.execution_history()
   13
   14 # Inspect recent queries, their costs, and durations
   15 print(history[['job_id', 'job_type', 'total_bytes_processed', 'duration_seconds', 'query']])

verified at:

  1. vs code notebook: screen/8u2yhaRV9iHbDbF
  2. colab notebook: screen/9L8VrP5y9DXhnZz

More testcases and notebook update will be checked in using separate PRs for easier review.

Fixes #<481840739> 🦕

@shuoweil shuoweil self-assigned this Apr 8, 2026
@shuoweil shuoweil requested review from a team as code owners April 8, 2026 22:01
@shuoweil shuoweil force-pushed the shuowei-execution-history branch from 35a379d to eab6cdb Compare April 8, 2026 22:03
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements an execution history feature to track and display BigQuery and local Polars jobs initiated during a session. Key changes include the addition of a JobMetadata dataclass, updates to ExecutionMetrics for job tracking, and a specialized _ExecutionHistory DataFrame for formatted output. Review feedback identifies opportunities to improve error logging in the HTML representation, remove redundant attribute assignments in the metrics logic, and ensure that bytes processed during local executions are consistently aggregated into session-level metrics.

Comment thread packages/bigframes/bigframes/session/metrics.py Outdated
Comment thread packages/bigframes/bigframes/session/metrics.py
return _ExecutionHistory

def _repr_html_(self) -> str | None:
try:
Copy link
Copy Markdown
Contributor

@sycai sycai Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we be able to pin down the lines that are susceptible to exceptions?

If yes, let's reduce the size of this try block by keeping only the code that is likely to throw exceptions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've reduced the size of the try block in repr_html by moving the imports and self.empty check outside. Now it only wraps the code that might throw exceptions during data formatting and rendering.

@googleapis googleapis deleted a comment from gemini-code-assist bot Apr 9, 2026
@shuoweil shuoweil requested a review from sycai April 10, 2026 19:53
@shuoweil shuoweil requested a review from a team as a code owner April 10, 2026 19:56
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need execution history to be a dataframe itself? this comes with a lot of baggage

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I've refactored _ExecutionHistory to use composition instead of inheritance. It is now a standard class that wraps a DataFrame internally, avoiding the baggage of subclassing. Users can call .to_dataframe() on it to get the DataFrame representation.

if self.empty:
return "<div>No executions found.</div>"

cols = ["job_id", "status", "total_bytes_processed", "job_url"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this generalize to other execution tasks beyond specifically bq jobs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added job_type to the HTML display to make it clearer when a job is Polars vs BigQuery. I also made the column selection safe against missing columns, so it can adapt to different execution engines that might provide different sets of metadata.

@shuoweil shuoweil requested a review from TrevorBergeron April 10, 2026 20:32
@shuoweil shuoweil force-pushed the shuowei-execution-history branch from 342e099 to 8c9deb8 Compare April 10, 2026 23:21
shuoweil and others added 6 commits April 15, 2026 14:23
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# Conflicts:
#	packages/bigframes/bigframes/session/loader.py
@shuoweil shuoweil force-pushed the shuowei-execution-history branch from 227fb76 to 39f4c2a Compare April 15, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants