Parallelism of `FeatureUnion` for xarray_filters

I just closed issue #10 based on parameterizing chained `MLDataset` transformations, deferring the `FeatureUnion` discussion there to this separate issue.  
 * `FeatureUnion` in scikit-learn is an transformer that uses the scikit-learn parallelism (within one machine) to run a transform for each column of a feature matrix.  
 * `dask_searchcv` has `FeatureUnion` based on `dask.distributed` (single- or multi-node parallelism) that follows the same usage patterns.
* `FeatureUnion` an important relative to `elm` / `xarray_filters` goals because most of the rest of our parallelism relates to tools for multiple models where a Pipeline-like instance is the embarassingly parallel task being automated.  Some important workflows for our climate science and satellite imagery use cases may be slow in the processing of each column step(s) where `FeatureUnion` can speed things up, e.g. a `Pipeline` with a histogram or Gaussian process on each column individually as a preprocessing step.  
* Also note that `FeatureUnion` is associated with scikit-learn and generally people think of it then in ML contexts, but the parallelism approach to `FeatureUnion` also has benefits outside of ML, e.g. preprocessing each column of a large array before visualization or summary stats.  This is a documentation need for us in however we wrap `FeatureUnion` in `xarray_filters`/`elm`: make sure this it is explained for usage in- or outside of ML contexts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelism of `FeatureUnion` for xarray_filters #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallelism of FeatureUnion for xarray_filters #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Parallelism of `FeatureUnion` for xarray_filters #22