Kiro Ception

Your AI now remembers everything you've ever done with it, across every machine you own. Finally, an elephant-grade memory for your coding assistant, minus the 12,000-pound footprint.

Kiro Ception gives Kiro a long-term memory, persistent recall that spans every session, every window, CLI and IDE, and even across multiple machines. Your agent remembers what you discussed yesterday, last month, or six months ago, in any project, on any computer you work from. It automatically indexes all conversation history in the background and provides instant hybrid search (semantic + keyword) so you can find past discussions, decisions, and implementations by meaning, keywords, date, or any combination.

"We discussed this already..." "What was that approach we used last week?" "Didn't we solve this exact problem in the other project?" "How did I usually set up CI pipelines?"

All things you can now just ask, and actually get an answer.

How It Works

Kiro Ception is an MCP Power that runs as a background service alongside your Kiro IDE. It:

Discovers all Kiro CLI and IDE session files on your machine
Extracts meaningful messages (filtering out system prompts and boilerplate, condensing long code blocks into [code:lang] placeholders)
Embeds each message into a vector representation using your configured model
Indexes everything into an in-memory numpy matrix for instant hybrid search (semantic + FTS5 keyword)
Serves search results via MCP tools that Kiro can call naturally during conversation
Federates across machines, search your laptop and desktop simultaneously with encrypted peer-to-peer queries

Sessions are processed newest first, so your most recent conversations are searchable within seconds of startup, even while older history is still being indexed in the background.

Search results include surrounding context (messages before/after each match), relevance scores, workspace origin, and pagination, so Kiro gets the full picture of what was discussed.

Architecture Highlights

Two-process model: A thin MCP proxy (stdio) spawns a separate engine subprocess that holds the index in RAM. Code changes are detected via fingerprinting — the engine auto-restarts with fresh code when you git pull && uv sync.
Non-blocking: Heavy work (indexing, embedding) runs in the engine's background threads. The MCP proxy responds instantly.
Hybrid search: Combines semantic vector similarity (70%) with FTS5 full-text keyword search (30%). Find things by meaning and exact names.
Recency-aware: Recent conversations rank higher automatically. The decay curve scales with your history depth, no manual tuning.
Multi-window efficient: All MCP proxy instances share one engine process. A PID registry tracks connected clients — the engine auto-shuts-down when all clients die.
Multi-machine: Optional peer federation searches across all your computers simultaneously with AES-256-GCM encrypted transport.
Crash-safe: SQLite with WAL mode. Lose at most one in-flight message on Ctrl+C/crash/quit.
Instant cold-start: Loads from existing cache in under 1 second. No waiting for re-indexing after restarts.
No build step: Uses editable install (python -m kiro_ception.engine_main). Source changes are picked up immediately — no recompile needed.
Auto-migrating: Schema upgrades run automatically on startup, updates never require deleting your cache, future-proofing this tool.
Observable: Built-in status dashboard, indexing progress monitoring, hot-reloadable config, and health diagnostics, all accessible to the agent or via browser.

Installation

Prerequisites

Kiro - the AI-powered IDE
Git - for cloning/updating the power
Python 3.11+ (3.12, 3.13 also supported and tested officially)
uv - fast Python package manager

Install as a Kiro Power from Local Clone (Recommended)

Clone the repo and install as a local power. This gives you immediate updates via git pull and the full Power experience (keyword triggers, automatic activation, POWER.md guidance):

git clone https://github.com/DevOps-Nirvana/Kiro-Ception.git
cd Kiro-Ception
uv sync

Then in Kiro IDE: Powers panel → Add power from Local Path → select the Kiro-Ception folder you just cloned.

To update later:

cd Kiro-Ception
git pull
uv sync

Kiro picks up changes automatically — the MCP proxy detects source code changes via fingerprinting and restarts the engine process with the new code. No manual restart needed.

Install as a Kiro Power from GitHub (Alternative)

If you prefer not to manage a local clone:

In Kiro IDE: Powers panel → Add power from GitHub
Enter: https://github.com/DevOps-Nirvana/Kiro-Ception
Click Install

Note: Due to current bugs in how Kiro handles MCP servers within Powers installed from GitHub, the local clone method above is more reliable. The GitHub install may have issues with server startup or reconnection and/or with updating due to a possible split-brain scenario.

Manual MCP Setup (Last Resort)

If you prefer manual configuration without the Power wrapper, add to your Kiro MCP configuration (~/.kiro/settings/mcp.json):

Warning: Installing as a Power (above) is strongly recommended. The POWER.md file contains keyword triggers and usage guidance that help Kiro automatically activate search when you reference past conversations. With MCP-only setup, you'll need to explicitly ask Kiro to search history — it won't trigger on its own from phrases like "as we discussed" or "what did we do last time".

{
  "mcpServers": {
    "kiro-ception": {
      "command": "uv",
      "args": ["tool", "run", "--from", "git+https://github.com/DevOps-Nirvana/Kiro-Ception", "kiro-ception"]
    }
  }
}

This uses uv tool run to fetch and run the package directly from GitHub, no local clone needed.

Alternatively, if you've cloned the repo locally:

{
  "mcpServers": {
    "kiro-ception": {
      "command": "/path/to/Kiro-Ception/.venv/bin/kiro-ception"
    }
  }
}

Replace /path/to/Kiro-Ception with the actual clone location. Usually just saving your mcp config will do it, but if needed, restart Kiro.

Configuration

Create ~/.config/kiro-ception/config.toml to customize behavior. If this file doesn't exist, sensible defaults are used (local CPU-based embeddings with all-MiniLM-L6-v2). Query the tool get_config for full information on your file location(s) for your config and database.

A full annotated default config is in config.default.toml; copy it as a starting point:

mkdir -p ~/.config/kiro-ception
cp config.default.toml ~/.config/kiro-ception/config.toml

Minimal Config (Zero Setup)

With no config file at all, Kiro Ception uses:

Backend: sentence-transformers (local, CPU-based, no API/GPU needed)
Model: all-MiniLM-L6-v2 (384 dimensions, ~80MB download on first run)
Sources: Auto-discovers Kiro CLI and IDE conversations in both old and new formats
Memory: Uses up to 1/3 of available RAM for the index (by default)

This is a good starting point; it runs entirely on CPU with no external dependencies.

GPU-Accelerated with Ollama (Recommended for Power Users)

If you have Ollama running with a GPU, you can use much larger, higher-quality embedding models by putting something like the following in your config file:

[embedding]
backend = "openai-compatible"
model = "qwen3-embedding:4b"
api_base = "http://localhost:11434/v1"
dimensions = 1024
batch_size = 1

Setup:

# Install Ollama (if not already): https://ollama.com
ollama pull qwen3-embedding:4b

This gives significantly better search quality than MiniLM, especially for nuanced queries. The 4b model runs comfortably on a 6GB+ GPU and indexes at ~3–5 messages/second.

OpenAI / Hosted Providers

[embedding]
backend = "openai-compatible"
model = "text-embedding-3-large"
api_base = "https://api.openai.com/v1"
api_key = "sk-..."
dimensions = 1024

LM Studio

[embedding]
backend = "openai-compatible"
model = "your-model-name"
api_base = "http://localhost:1234/v1"
dimensions = 768

MCP Tools

Kiro can call these tools naturally during conversation:

Tool	Purpose
`search_project_history`	Search conversations scoped to the current workspace
`search_global_history`	Search across all workspaces (supports `source` filter: all/cli/ide)
`get_indexing_status`	Check indexer progress, rate, errors, ETA
`rescan`	Trigger a rescan for new sessions (`full=True` to re-read everything)
`get_config`	Show effective config, paths, cache stats, instance role, etc
`reload_config`	Hot-reload config from disk without requiring restart of Kiro

Search Parameters

Both search tools accept:

Parameter	Default	Description
`query`	(required)	Natural language search query
`after`	—	Only messages on/after this date (ISO 8601)
`before`	—	Only messages before this date (ISO 8601)
`context_size`	3	Messages before/after each match to include
`threshold`	0.2	Minimum similarity score (0–1)
`max_results`	10	Maximum results to return
`offset`	0	Skip results for pagination

Technologies & Libraries

Component	Library	Purpose
MCP Server	mcp (FastMCP)	Exposes tools to Kiro via Model Context Protocol
Embedding (local)	sentence-transformers	Local CPU/GPU embeddings (default: all-MiniLM-L6-v2)
Embedding (API)	requests	OpenAI-compatible HTTP API for Ollama/LM Studio/OpenAI
Vector Search	numpy	In-memory cosine similarity via dot product
Data Models	Pydantic	Typed data validation and serialization
Cache	SQLite (stdlib)	Persistent embedding + metadata storage (WAL mode)
Process Coordination	filelock	Engine process election via file locks
Encryption	cryptography + argon2-cffi	AES-256-GCM peer encryption with Argon2id key derivation
Build	hatchling	PEP 517 build backend
Package Manager	uv	Fast dependency resolution and venv management
Linter/Formatter	ruff	Linting and formatting
Tests	pytest	Test framework (300 tests)

Optional Features

Peer Federation

Search across multiple machines (e.g., your laptop + desktop). Each machine runs its own independent index. When you search, queries fan out to all peers in parallel and results are merged.

[peers]
enabled = true
nodes = ["192.168.1.50:19742", "workpc.tailscale:19742"]
secret = "my-shared-passphrase"  # Optional: encrypts all peer traffic with AES-256-GCM
timeout_seconds = 5

Peers communicate over HTTP. If secret is set, payloads are encrypted with AES-256-GCM (key derived via Argon2id from the passphrase). Both machines must use the same secret. Without a secret, traffic is plaintext; fine on VPNs or Tailscale or when local-only at your own house (up to you).

Memory Limits

Control how much RAM the index uses:

[memory]
fraction = 0.33     # Use up to 1/3 of RAM (default)
# limit_mb = 512    # Or set an explicit limit
# limit_mb = 0      # Disable limit (use all available)

Indexing Throttle

Reduce GPU/CPU load during active work:

[indexing]
throttle_ms = 5000   # Sleep 5000ms (5 seconds) between embedding batches (default: 0)
rescan_interval_minutes = 10  # Check for new sessions every 10 minutes (this is the default)

Once your initial index is built, it can be quite nice to add the throttle_ms value of 5-10 seconds (5000-10000) to ensure your computer runs quickly and your usage is not negatively affected. This is especially valuable if you are using a large local GPU-based model.

Secondarily, if you are trying to be sparing on battery life, and/or if you don't care about getting your index up to date so quickly, you can greatly increase the rescan interval to 60 minutes, OR you can disable this automated rescan/reindexing process by setting this to 0.

Performance

Metric	Value
First-time indexing (MiniLM, CPU)	~4 minutes (4300+ sessions)
First-time indexing (Qwen3-Embedding:4b, GPU)	~35 minutes (4300+ sessions)
Subsequent startups	<2 seconds
Search latency	<10ms
Index refresh (backgrounded)	Every 60 seconds
Periodic rescan to update indexes (backgrounded)	Every 10 minutes
Embedding rate (Qwen3-Embedding:4b)	~3–5 messages/second

Indexing order: Sessions are indexed newest first, so your most recent conversations become searchable within seconds of startup. Older conversations fill in progressively in the background.

Troubleshooting

"Backend not ready" or "still loading"

On first startup, the index eagerly loads from SQLite into RAM. If embeddings exist but metadata hasn't populated yet, you'll see a "still loading" message. Retry in a few seconds. Also, as your size of your embeddings increases this may make it take a little longer. I have six months of Kiro work across 4300 chat documents with an (currently) 300MB embedding db, and it takes 10-15 seconds to load the index into RAM.

Empty search results

Check get_indexing_status; indexing may still be in progress
Use rescan() to immediately pick up recent conversations
Verify your config with get_config
Check "Kiro Powers / MCP" log

Embedding errors / timeouts

For Ollama: ensure it's running (ollama ps) and the model is pulled
Very long messages (>50K chars) may timeout; they're skipped with a warning
Check your "Kiro Powers" outputs for logs/errors

Config changes not taking effect

Use reload_config tool (applies safe changes immediately)
Model/backend/dimensions changes require rescan(full=True)

Multiple windows

All Kiro windows share a single engine process automatically. Each MCP proxy registers its PID with the engine. If the engine dies, the next proxy request will respawn it. Use get_config to see the engine PID and port. If the engine has stale code (you updated the source), it will be killed and restarted automatically via fingerprint comparison.

Nuclear option

If the database is corrupt or everything is broken, find your file path to your database calling the get_config tool. Then, once you find it, uninstall this power (or disable the MCP) then remove your database, then reinstall this power (or re-enable MCP).

rm -rf ~/.cache/kiro-ception/

When you Restart Kiro (or re-enable MCP) it will rebuild the embeddings database from scratch.

Development

uv sync                         # Install deps
uv run pytest tests/ -q         # Run tests (300, ~30s)
uv run ruff check src/          # Lint
uv run kiro-ception             # Run MCP server locally

Data Locations

For information about where your data is being kept, call the MCP tool "get_config". On an unix-ey system, the file(s) at are...

Path	Contents
`~/.config/kiro-ception/config.toml`	User configuration
`~/.cache/kiro-ception/cache_<hash>.db`	SQLite database (embeddings, metadata)
`~/.cache/kiro-ception/engine.lock`	Engine process file lock
`~/.cache/kiro-ception/engine.json`	Engine port/PID info for MCP proxies

The cache DB filename includes a hash of the backend configuration. Changing model/backend/dimensions creates a new DB file (old ones are preserved for rollback).

Session Data Sources (macOS)

Kiro Ception auto-discovers and indexes conversations from three IDE formats plus the CLI:

Format	Location (macOS)	Notes
Kiro 1.0 (current)	`~/.kiro/sessions/<sha256_prefix>/<session_id>/messages.jsonl`	Primary format since Kiro IDE 1.0. Each session has `session.json` (metadata) + `messages.jsonl` (JSONL stream). Full assistant responses stored inline alongside tool calls. Directory names are the first 16 hex chars of SHA256(workspace_path).
Workspace-sessions (pre-1.0)	`~/Library/Application Support/Kiro/User/globalStorage/kiro.kiroagent/workspace-sessions/<base64_path>/<uuid>.json`	Older format where sessions were JSON files with a `history` array. Directory names are base64url-encoded workspace paths. Assistant responses were stubs ("On it.") — real responses came from execution logs.
Legacy .chat	`~/Library/Application Support/Kiro/User/globalStorage/kiro.kiroagent/<workspace_hash>/<uuid>.chat`	Earliest format. Full conversations in a single JSON file with `chat` array.
Execution logs (pre-1.0)	`~/Library/Application Support/Kiro/User/globalStorage/kiro.kiroagent/<workspace_hash>/414d1636299d2b9e4ce7e17fb11f63e9/<exec_id>`	Separate files containing assistant responses (actionType="say") and tool actions. Used to reconstruct full conversations for workspace-sessions format.
CLI	`~/.kiro/cli/conversations.db`	SQLite database with `conversations_v2` table. Indexed automatically.

When the same session exists in multiple formats (e.g., migrated from workspace-sessions to Kiro 1.0), deduplication ensures it is only indexed once, preferring the richest format (Kiro 1.0 > workspace-sessions > legacy).

Privacy: All data is processed and stored locally on your machine. No telemetry, no external API calls, and no data leaves your device; unless you explicitly configure a third-party embedding provider (e.g., OpenAI). The default configuration uses fully local, offline embeddings.

Contributing

Found a bug? Have a feature request? Open an issue on GitHub.

Areas Where Help is Wanted

If you're looking to contribute, here are some areas where we'd love help:

Cross-platform testing — The codebase targets macOS, Windows, and Linux. We develop primarily on macOS and have done targeted Windows work, but need broader real-world testing on Windows (especially around the engine subprocess lifecycle, file locking, and native DLL preloading) and Linux (various distros, ARM64).
Integration tests / CI pipeline — Currently all tests are unit/mock-based. We need end-to-end integration tests that spin up the actual engine process with test fixture data and exercise the full MCP proxy → HTTP → engine → SQLite → search path. This would enable a proper GitHub Actions CI matrix across OS and Python versions.
Remove legacy workspace decode fallback — The vector search path (search.py) and FTS search path (cache.py) include fallbacks that decode base64-encoded workspace values at query time. These handle indexes created before the _decode_workspace_dir_name bug was fixed. After a couple release cycles, these become dead code and can be safely removed.
Migrate engine_main.py from print() to logging — The engine process uses bare print() for all status messages. Switching entire codebase to Python's logging module would give levels, timestamps, and configurable filtering while still routing through the existing log file support.
SIGTERM-based graceful shutdown on Windows — On Unix, stale engines receive SIGTERM before SIGKILL for graceful cleanup. Windows has no SIGTERM equivalent for non-console processes, so we use TerminateProcess directly. A Windows-native approach (e.g., named event signaling) could enable graceful shutdown there too.

Support

Found a bug? Have a feature request? Open an issue on GitHub.

License

MIT - See: LICENSE.

Attribution

Built by Farley Farley (DevOps-Nirvana), based upon Kiro Total Recall by Danilo Poccia (MIT licensed). The original session loaders, data models, and core embed/search concept originate from that project. Kiro Ception is a ground-up rewrite for production use; see the Architecture Highlights above for what's different.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
.kiro/steering		.kiro/steering
docs/images		docs/images
scripts		scripts
src/kiro_ception		src/kiro_ception
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
POWER.md		POWER.md
README.md		README.md
config.default.toml		config.default.toml
logo.png		logo.png
mcp.json		mcp.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Kiro Ception

How It Works

Architecture Highlights

Installation

Prerequisites

Install as a Kiro Power from Local Clone (Recommended)

Install as a Kiro Power from GitHub (Alternative)

Manual MCP Setup (Last Resort)

Configuration

Minimal Config (Zero Setup)

GPU-Accelerated with Ollama (Recommended for Power Users)

OpenAI / Hosted Providers

LM Studio

MCP Tools

Search Parameters

Technologies & Libraries

Optional Features

Peer Federation

Memory Limits

Indexing Throttle

Performance

Troubleshooting

"Backend not ready" or "still loading"

Empty search results

Embedding errors / timeouts

Config changes not taking effect

Multiple windows

Nuclear option

Development

Data Locations

Session Data Sources (macOS)

Contributing

Areas Where Help is Wanted

Support

License

Attribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages