PARCE

Programmable Agent for Retrieving Contextualized Experiments

PARCE is an agentic workflow that fetches public omics data and produces structured JSON narratives describing how the data was obtained. Each narrative interleaves human-readable descriptions with URI references to raw data files, making it suitable for training multimodal autoregressive embedding models.

Architecture

User / CLI
    │
    ▼
main.py ──► agent/curator.py ──► AzureAIAgentClient
                                        │
                              ┌─────────┴─────────┐
                              │  Azure AI Foundry  │
                              │  (any model)       │
                              └─────────┬─────────┘
                                        │
                            ┌───────────┼───────────┐
                            ▼           │           ▼
                    tool_call:          │    structured output:
                 geo_fetcher.py         │    ExperimentNarrative
                    (tools/)            │       (models/)
                            │           │
                            ▼           │
                     metadata JSON ─────┘

The agent uses Azure AI Foundry as its model gateway. Any model deployed in your Foundry project -- Mistral, GPT-4o, DeepSeek, Llama, etc. -- can be used by changing a single environment variable (AZURE_AI_MODEL_DEPLOYMENT_NAME). Tools, prompts, and Pydantic output schemas remain unchanged.

Directory Structure

parce/
├── pyproject.toml              # Dependencies and build config
├── .env.example                # Template for required env vars
├── src/
│   └── parce/
│       ├── main.py             # Entry point
│       ├── agent/
│       │   ├── curator.py      # Agent factory (AzureAIAgentClient)
│       │   └── prompts.py      # System prompts
│       ├── tools/
│       │   └── geo_fetcher.py  # GEO metadata fetcher (stub)
│       ├── models/
│       │   └── narrative.py    # Pydantic output schemas
│       └── config/
│           └── settings.py     # Env-based configuration
├── data_pipelines/             # Future: Spark / ADLS scripts
├── data/                       # Local test data (gitignored)
└── tests/
    └── test_models.py

Key design decisions:

src-layout prevents accidental imports from the project root.
agent/ is decoupled from tools/ so new data sources (e.g. TCellAtlas, SRA) are added as new tool files without touching orchestration logic.
models/ schemas serve double duty: structured output for the agent (response_format) and validation for downstream consumers.
data_pipelines/ lives at the repo root because Spark jobs are submitted independently from the Python package.

Prerequisites

Python 3.11+
Azure CLI installed
An Azure AI Foundry project with at least one model deployed (e.g. Mistral Large, GPT-4o, DeepSeek-R1)

Local Setup

1. Authenticate with Azure

az login

This lets AzureCliCredential obtain tokens for your Foundry project without managing API keys.

2. Configure environment

cp .env.example .env

Edit .env with your Foundry project endpoint and deployment name:

AZURE_AI_PROJECT_ENDPOINT=https://<your-resource>.services.ai.azure.com/api/projects/<project-id>
AZURE_AI_MODEL_DEPLOYMENT_NAME=mistral-large

The project endpoint is found in your Azure AI Foundry project settings page.

3. Install

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

Note: The agent-framework packages are currently in pre-release. If installation fails, run:
pip install agent-framework agent-framework-azure-ai --pre
pip install -e ".[dev]"

4. Run the agent

parce
# or
python -m parce.main

The agent will fetch mock metadata for GSE164378 and return a structured ExperimentNarrative JSON.

5. Run tests

pytest

Switching Models

Because PARCE uses Azure AI Foundry as a model gateway, swapping the underlying LLM is a one-line change in .env:

# Mistral
AZURE_AI_MODEL_DEPLOYMENT_NAME=mistral-large

# OpenAI
AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o

# DeepSeek
AZURE_AI_MODEL_DEPLOYMENT_NAME=DeepSeek-R1

# Meta Llama
AZURE_AI_MODEL_DEPLOYMENT_NAME=Meta-Llama-3-70B

The deployment name must match a model you have deployed in your Foundry project's model catalog. No code changes are required -- all providers produce a standard Agent with the same interface.

For models not in the Azure AI Foundry catalog (e.g. Anthropic Claude), the Microsoft Agent Framework provides dedicated providers (AnthropicChatClient) with an identical agent interface.

Future Roadmap

Data Engineering Pipelines (`data_pipelines/`)

FASTQ-to-Parquet conversion using PySpark on Azure Databricks
Azure Data Lake Storage Gen2 integration for persisting narratives and raw genomic data at scale
Batch orchestration for processing large T-cell transcriptomics datasets

Additional Data Sources

TCellAtlas fetcher tool
SRA direct metadata fetcher
CellxGene Census integration

Embedding Model Training

The structured narratives produced by PARCE will serve as training data for a multimodal autoregressive embedding model that jointly learns from experimental metadata and raw omics signals.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
data		data
data_pipelines		data_pipelines
docs		docs
scripts		scripts
src/parce		src/parce
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PARCE

Architecture

Directory Structure

Prerequisites

Local Setup

1. Authenticate with Azure

2. Configure environment

3. Install

4. Run the agent

5. Run tests

Switching Models

Future Roadmap

Data Engineering Pipelines (`data_pipelines/`)

Additional Data Sources

Embedding Model Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PARCE

Architecture

Directory Structure

Prerequisites

Local Setup

1. Authenticate with Azure

2. Configure environment

3. Install

4. Run the agent

5. Run tests

Switching Models

Future Roadmap

Data Engineering Pipelines (data_pipelines/)

Additional Data Sources

Embedding Model Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Data Engineering Pipelines (`data_pipelines/`)

Packages