Sequence conservation as a first-class feature (df_seq / df_parts input)

## Problem

`comp_seq_cons` currently requires a pre-existing MSA file on disk and returns a position-indexed DataFrame independent of AAanalysis' core data structures. Users who already have a `df_seq` or `df_parts` cannot compute conservation without manually exporting sequences, generating an MSA, and round-tripping through files.

## Goal

Expose sequence conservation as a first-class operation on AAanalysis data: accept `df_seq` or `df_parts` as input, compute per-position conservation, and return results in a format compatible with the existing CPP / feature-map pipeline (so conservation can be used as a feature, an annotation, or a filter alongside physicochemical features).

## Tasks

- Add `comp_seq_cons(df_seq=...)` and `comp_seq_cons(df_parts=...)` overloads that internally trigger MSA generation per entry (see related issue on Biopython interface)
- Define output schema: per-entry, per-position conservation aligned to the existing position numbering used in CPP (`tmd_start`, `jmd_n`, `tmd`, `jmd_c` parts)
- Support multiple conservation metrics (Shannon entropy, Jensen-Shannon divergence, von Neumann entropy, position-specific scoring) via a `metric` parameter
- Allow conservation to be merged into a `df_feat`-compatible output so it can be plotted with `plot_feature_map`
- Handle gaps (`_`) consistently with the rest of the AAanalysis API (`accept_gaps=True`)
- Add caching for MSA-derived conservation scores (avoid re-running expensive alignments)
- Add tests with a small synthetic alignment plus one real example (e.g., `DOM_GSEC` benchmark)

## How this improves AAanalysis

- Closes the gap between sequence-based and feature-based interpretation: conservation becomes another "scale" in the CPP framework rather than a separate workflow
- Enables direct comparison between physicochemical importance (CPP) and evolutionary importance (conservation) on the same plot
- Removes a manual file-handling step that breaks the otherwise-clean Python API

## Acceptance criteria

- `aa.comp_seq_cons(df_seq=df_seq)` returns a DataFrame compatible with `plot_feature_map`
- At least two conservation metrics are supported with documented trade-offs
- End-to-end tutorial showing conservation overlaid on a CPP feature map
- Existing file-based `comp_seq_cons(msa_file=...)` API continues to work (no breaking change)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence conservation as a first-class feature (df_seq / df_parts input) #64

Problem

Goal

Tasks

How this improves AAanalysis

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sequence conservation as a first-class feature (df_seq / df_parts input) #64

Description

Problem

Goal

Tasks

How this improves AAanalysis

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions