Release Release 4.6.0 · Blosc/python-blosc2

Changes from 4.5.1 to 4.6.0

`CTable.sort_by(view=True)`: zero-copy sorted views

CTable.sort_by() now accepts view=True, returning a lightweight
sorted view that shares the parent's column data and gathers rows on
demand in sorted order — no whole-table copy. This is ideal for reading a
sorted slice of a large (possibly on-disk) table::
```
t.sort_by("col", view=True)[:10]      # top-10 without materialising
```
Sorting on a fully indexed column streams directly from the index, so the
table is never materialised. Multi-column sorts and dotted (nested) leaf
names are supported (e.g. t.sort_by(["trip.begin.lon", "payment.fare"], ascending=[True, False])).

`where` on dictionary (string) columns

where expressions now work over dictionary-encoded (string) columns,
including membership tests such as '"Acme" in company', so categorical
text columns can be filtered without decoding the whole column.

`b2view` is now an opt-in extra

The b2view terminal browser and its TUI stack (textual,
textual-plotext) are no longer core dependencies: a plain
pip install blosc2 no longer pulls them, keeping the compression library
lean (and dropping deps that are unusable under wasm32, which has no TTY).
Install the viewer with pip install "blosc2[tui]", or
pip install "blosc2[hires]" to also get the high-res h view. The
b2view command prints this hint if the dependencies are missing.

`group_by`: flexible aggregation naming

CTable.group_by(...).agg() now accepts a list of (column, ops) pairs
and explicit output names (pandas-style keyword arguments), alongside the
existing auto-suffixed mapping; the forms can be combined::
```
g.agg({"sales": ["sum", "mean"]})              # auto: sales_sum, sales_mean
g.agg([(t.sales, ["sum", "mean"])])            # auto, but accepts Column objects
g.agg(revenue=("sales", "sum"))                # explicit: revenue
g.agg({"sales": "sum"}, n=("*", "size"))       # combined, with a named row count
```
The list-of-pairs and named forms accept Column objects (t.sales), which
the mapping form cannot because Column is unhashable and so cannot be a dict
key.
Aggregation ops may also be given as the matching blosc2 reduction functions
(blosc2.sum, mean, min, max, argmin, argmax), matched by
identity -- e.g. g.agg([(t.sales, [blosc2.sum, "mean"])]). This is a
naming shorthand only; arbitrary/UDF callables (and look-alikes such as
np.sum or a user function named sum) are rejected rather than silently
misinterpreted.

`group_by` / `group_reduce`: tri-state `sort=`

Vectorized dictionary group ordering: group_by() result building now
batch-decodes dictionary (string) keys in one pass (decode_batch) instead of
one decode() per group, making high-cardinality string group-bys dramatically
faster (end-to-end group_by().size() dropped from seconds to milliseconds on
~100k-group workloads).
sort= is now a tri-state (None / True / False) on both
CTable.group_by() and blosc2.group_reduce():
- True — always return groups sorted by key.
- False — never sort; deterministic but unspecified order.
- None (the new default) — auto: sort only when cheap. Integer and
  dictionary keys are sorted (free / vectorized); float and multi-key results,
  whose only ordering is an O(G log G) Python sort over every distinct group,
  are left unsorted to avoid a cost that can rival the grouping itself on
  high-cardinality data.
Behavior changes (the two APIs had different prior defaults, so they move
in opposite directions):
- CTable.group_by() previously returned results always sorted. Under the
  new None default, float-key and multi-key group-bys are no longer
  key-sorted by default — pass sort=True to restore sorted output. This is
  a deliberate divergence from pandas (which defaults to sort=True), suited
  to blosc2's large / on-disk datasets.
- blosc2.group_reduce() previously defaulted to sort=False (unsorted).
  Under the new None default its cheap kernels now sort by default —
  most visibly float keys, which previously came out in hash order. Integer
  keys were already ascending; the generic Python fallback stays unsorted.
  Pass sort=False to opt out.

Accelerated reductions from index summaries

min/max on indexed Columns, and argmin/argmax inside group_by, are
now accelerated using the index's per-block min/max summaries: when an
index is available these reductions run from the precomputed summaries instead
of decompressing the underlying data, which is dramatically faster on large
columns. A fast path also builds min/max envelope plots from any index.
The last group_by operation is memoized and reused when the same
grouping is requested again, avoiding recomputation in interactive / repeated
workflows (e.g. b2view).

b2view: group-by, sort, and richer plots

Interactive group-by (G): group a CTable by a column (integer, string,
or now float keys) directly in the viewer, with a three-list / two-column
menu; while grouped, S/R operate on the grouped result and the data
panel's subtitle shows a G(roup) chip. The last grouping is memoized for
instant reuse.
Sort by column (S): sort a CTable by a fully indexed column via a
dropdown (R toggles reverse) as a zero-copy sort_by(view=True) that streams
from the index — the table is never materialised, Esc restores the original
order, and a SORTED chip shows in the status bar. Non-indexed columns can
now be sorted too. Sort and filter are mutually exclusive; a row window
composes over a sort, and an filter is preserved across Sort / Group.
Better plots of grouped/sorted views: a grouped view plots bars for a
categorical key and lines for a numeric key; numeric-key group plots
render as stem/impulse charts rather than misleading connected lines. Bar
plots gain an hi-res counterpart mirroring the line/scatter plots, and +/-
zoom about the view's left edge.
--max maximizes the current panel, and escape is now the single,
consistent way to back out of every modal.

Other / bug fixes

C-Blosc2 upgraded to 3.1.5.
Open-file cache correctness: cached open handles are now validated against
the file's fingerprint (st_mtime_ns, st_size) and cached index handles are
released when a table closes, so a file changed underneath an open handle is no
longer served stale.
NumPy 2.5 compatibility: adjusted for deprecations in NumPy 2.5.
Substantially reduced test-suite runtime, and emscripten builds no longer
attempt to spawn subprocesses (unsupported there).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Release 4.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changes from 4.5.1 to 4.6.0

`CTable.sort_by(view=True)`: zero-copy sorted views

`where` on dictionary (string) columns

`b2view` is now an opt-in extra

`group_by`: flexible aggregation naming

`group_by` / `group_reduce`: tri-state `sort=`

Accelerated reductions from index summaries

b2view: group-by, sort, and richer plots

Other / bug fixes

Uh oh!

Uh oh!

Uh oh!

Release 4.6.0

Changes from 4.5.1 to 4.6.0

CTable.sort_by(view=True): zero-copy sorted views

where on dictionary (string) columns

b2view is now an opt-in extra

group_by: flexible aggregation naming

group_by / group_reduce: tri-state sort=

Accelerated reductions from index summaries

b2view: group-by, sort, and richer plots

Other / bug fixes

Uh oh!

`CTable.sort_by(view=True)`: zero-copy sorted views

`where` on dictionary (string) columns

`b2view` is now an opt-in extra

`group_by`: flexible aggregation naming

`group_by` / `group_reduce`: tri-state `sort=`