Skip to content

[3b] Async ingestion worker — chunk, embed, index #10

@andrmaz

Description

@andrmaz

What to build

Implement the BullMQ worker that consumes ingestion jobs enqueued by source upload. The worker normalizes raw content, splits it into text chunks, generates embeddings via the configured embedding provider, and persists Document + Chunk rows with pgvector embeddings and org ownership metadata. Failed jobs transition to an error state with a surfaced reason.

Acceptance criteria

  • Worker picks up enqueued jobs and processes them without blocking the request path.
  • Chunks and embeddings are persisted in the Chunk table with pgvector column populated.
  • Document and Chunk rows carry Organization FK.
  • A failed job (e.g. embedding provider error) surfaces an actionable error state on the source record.
  • Integration test covers the happy path: upload → job → indexed chunks.

Blocked by

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmvpCortex MVP scope

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions