Your AI now remembers everything you've ever done with it, across every machine you own. Finally, an elephant-grade memory for your coding assistant, minus the 12,000-pound footprint.
Kiro Ception gives Kiro a long-term memory, persistent recall that spans every session, every window, CLI and IDE, and even across multiple machines. Your agent remembers what you discussed yesterday, last month, or six months ago, in any project, on any computer you work from. It automatically indexes all conversation history in the background and provides instant hybrid search (semantic + keyword) so you can find past discussions, decisions, and implementations by meaning, keywords, date, or any combination.
"We discussed this already..." "What was that approach we used last week?" "Didn't we solve this exact problem in the other project?" "How did I usually set up CI pipelines?"
- All things you can now just ask, and actually get an answer.
Kiro Ception is an MCP Power that runs as a background service alongside your Kiro IDE. It:
- Discovers all Kiro CLI and IDE session files on your machine
- Extracts meaningful messages (filtering out system prompts and boilerplate, condensing long code blocks into
[code:lang]placeholders) - Embeds each message into a vector representation using your configured model
- Indexes everything into an in-memory numpy matrix for instant hybrid search (semantic + FTS5 keyword)
- Serves search results via MCP tools that Kiro can call naturally during conversation
- Federates across machines, search your laptop and desktop simultaneously with encrypted peer-to-peer queries
Sessions are processed newest first, so your most recent conversations are searchable within seconds of startup, even while older history is still being indexed in the background.
Search results include surrounding context (messages before/after each match), relevance scores, workspace origin, and pagination, so Kiro gets the full picture of what was discussed.
- Two-process model: A thin MCP proxy (stdio) spawns a separate engine subprocess that holds the index in RAM. Code changes are detected via fingerprinting — the engine auto-restarts with fresh code when you
git pull && uv sync. - Non-blocking: Heavy work (indexing, embedding) runs in the engine's background threads. The MCP proxy responds instantly.
- Hybrid search: Combines semantic vector similarity (70%) with FTS5 full-text keyword search (30%). Find things by meaning and exact names.
- Recency-aware: Recent conversations rank higher automatically. The decay curve scales with your history depth, no manual tuning.
- Multi-window efficient: All MCP proxy instances share one engine process. A PID registry tracks connected clients — the engine auto-shuts-down when all clients die.
- Multi-machine: Optional peer federation searches across all your computers simultaneously with AES-256-GCM encrypted transport.
- Crash-safe: SQLite with WAL mode. Lose at most one in-flight message on Ctrl+C/crash/quit.
- Instant cold-start: Loads from existing cache in under 1 second. No waiting for re-indexing after restarts.
- No build step: Uses editable install (
python -m kiro_ception.engine_main). Source changes are picked up immediately — no recompile needed. - Auto-migrating: Schema upgrades run automatically on startup, updates never require deleting your cache, future-proofing this tool.
- Observable: Built-in status dashboard, indexing progress monitoring, hot-reloadable config, and health diagnostics, all accessible to the agent or via browser.
- Kiro - the AI-powered IDE
- Git - for cloning/updating the power
- Python 3.11+ (3.12, 3.13 also supported and tested officially)
- uv - fast Python package manager
Clone the repo and install as a local power. This gives you immediate updates via git pull and the full Power experience (keyword triggers, automatic activation, POWER.md guidance):
git clone https://github.com/DevOps-Nirvana/Kiro-Ception.git
cd Kiro-Ception
uv syncThen in Kiro IDE: Powers panel → Add power from Local Path → select the Kiro-Ception folder you just cloned.
To update later:
cd Kiro-Ception
git pull
uv syncKiro picks up changes automatically — the MCP proxy detects source code changes via fingerprinting and restarts the engine process with the new code. No manual restart needed.
If you prefer not to manage a local clone:
- In Kiro IDE: Powers panel → Add power from GitHub
- Enter:
https://github.com/DevOps-Nirvana/Kiro-Ception - Click Install
Note: Due to current bugs in how Kiro handles MCP servers within Powers installed from GitHub, the local clone method above is more reliable. The GitHub install may have issues with server startup or reconnection and/or with updating due to a possible split-brain scenario.
If you prefer manual configuration without the Power wrapper, add to your Kiro MCP configuration (~/.kiro/settings/mcp.json):
Warning: Installing as a Power (above) is strongly recommended. The POWER.md file contains keyword triggers and usage guidance that help Kiro automatically activate search when you reference past conversations. With MCP-only setup, you'll need to explicitly ask Kiro to search history — it won't trigger on its own from phrases like "as we discussed" or "what did we do last time".
{
"mcpServers": {
"kiro-ception": {
"command": "uv",
"args": ["tool", "run", "--from", "git+https://github.com/DevOps-Nirvana/Kiro-Ception", "kiro-ception"]
}
}
}This uses uv tool run to fetch and run the package directly from GitHub, no local clone needed.
Alternatively, if you've cloned the repo locally:
{
"mcpServers": {
"kiro-ception": {
"command": "/path/to/Kiro-Ception/.venv/bin/kiro-ception"
}
}
}Replace /path/to/Kiro-Ception with the actual clone location. Usually just saving your mcp config will do it, but if needed, restart Kiro.
Create ~/.config/kiro-ception/config.toml to customize behavior. If this file doesn't exist, sensible defaults are used (local CPU-based embeddings with all-MiniLM-L6-v2). Query the tool get_config for full information on your file location(s) for your config and database.
A full annotated default config is in config.default.toml; copy it as a starting point:
mkdir -p ~/.config/kiro-ception
cp config.default.toml ~/.config/kiro-ception/config.tomlWith no config file at all, Kiro Ception uses:
- Backend:
sentence-transformers(local, CPU-based, no API/GPU needed) - Model:
all-MiniLM-L6-v2(384 dimensions, ~80MB download on first run) - Sources: Auto-discovers Kiro CLI and IDE conversations in both old and new formats
- Memory: Uses up to 1/3 of available RAM for the index (by default)
This is a good starting point; it runs entirely on CPU with no external dependencies.
If you have Ollama running with a GPU, you can use much larger, higher-quality embedding models by putting something like the following in your config file:
[embedding]
backend = "openai-compatible"
model = "qwen3-embedding:4b"
api_base = "http://localhost:11434/v1"
dimensions = 1024
batch_size = 1Setup:
# Install Ollama (if not already): https://ollama.com
ollama pull qwen3-embedding:4bThis gives significantly better search quality than MiniLM, especially for nuanced queries. The 4b model runs comfortably on a 6GB+ GPU and indexes at ~3–5 messages/second.
[embedding]
backend = "openai-compatible"
model = "text-embedding-3-large"
api_base = "https://api.openai.com/v1"
api_key = "sk-..."
dimensions = 1024[embedding]
backend = "openai-compatible"
model = "your-model-name"
api_base = "http://localhost:1234/v1"
dimensions = 768Kiro can call these tools naturally during conversation:
| Tool | Purpose |
|---|---|
search_project_history |
Search conversations scoped to the current workspace |
search_global_history |
Search across all workspaces (supports source filter: all/cli/ide) |
get_indexing_status |
Check indexer progress, rate, errors, ETA |
rescan |
Trigger a rescan for new sessions (full=True to re-read everything) |
get_config |
Show effective config, paths, cache stats, instance role, etc |
reload_config |
Hot-reload config from disk without requiring restart of Kiro |
Both search tools accept:
| Parameter | Default | Description |
|---|---|---|
query |
(required) | Natural language search query |
after |
— | Only messages on/after this date (ISO 8601) |
before |
— | Only messages before this date (ISO 8601) |
context_size |
3 | Messages before/after each match to include |
threshold |
0.2 | Minimum similarity score (0–1) |
max_results |
10 | Maximum results to return |
offset |
0 | Skip results for pagination |
| Component | Library | Purpose |
|---|---|---|
| MCP Server | mcp (FastMCP) | Exposes tools to Kiro via Model Context Protocol |
| Embedding (local) | sentence-transformers | Local CPU/GPU embeddings (default: all-MiniLM-L6-v2) |
| Embedding (API) | requests | OpenAI-compatible HTTP API for Ollama/LM Studio/OpenAI |
| Vector Search | numpy | In-memory cosine similarity via dot product |
| Data Models | Pydantic | Typed data validation and serialization |
| Cache | SQLite (stdlib) | Persistent embedding + metadata storage (WAL mode) |
| Process Coordination | filelock | Engine process election via file locks |
| Encryption | cryptography + argon2-cffi | AES-256-GCM peer encryption with Argon2id key derivation |
| Build | hatchling | PEP 517 build backend |
| Package Manager | uv | Fast dependency resolution and venv management |
| Linter/Formatter | ruff | Linting and formatting |
| Tests | pytest | Test framework (300 tests) |
Search across multiple machines (e.g., your laptop + desktop). Each machine runs its own independent index. When you search, queries fan out to all peers in parallel and results are merged.
[peers]
enabled = true
nodes = ["192.168.1.50:19742", "workpc.tailscale:19742"]
secret = "my-shared-passphrase" # Optional: encrypts all peer traffic with AES-256-GCM
timeout_seconds = 5Peers communicate over HTTP. If secret is set, payloads are encrypted with AES-256-GCM (key derived via Argon2id from the passphrase). Both machines must use the same secret. Without a secret, traffic is plaintext; fine on VPNs or Tailscale or when local-only at your own house (up to you).
Control how much RAM the index uses:
[memory]
fraction = 0.33 # Use up to 1/3 of RAM (default)
# limit_mb = 512 # Or set an explicit limit
# limit_mb = 0 # Disable limit (use all available)Reduce GPU/CPU load during active work:
[indexing]
throttle_ms = 5000 # Sleep 5000ms (5 seconds) between embedding batches (default: 0)
rescan_interval_minutes = 10 # Check for new sessions every 10 minutes (this is the default)Once your initial index is built, it can be quite nice to add the throttle_ms value of 5-10 seconds (5000-10000) to ensure your computer runs quickly and your usage is not negatively affected. This is especially valuable if you are using a large local GPU-based model.
Secondarily, if you are trying to be sparing on battery life, and/or if you don't care about getting your index up to date so quickly, you can greatly increase the rescan interval to 60 minutes, OR you can disable this automated rescan/reindexing process by setting this to 0.
| Metric | Value |
|---|---|
| First-time indexing (MiniLM, CPU) | ~4 minutes (4300+ sessions) |
| First-time indexing (Qwen3-Embedding:4b, GPU) | ~35 minutes (4300+ sessions) |
| Subsequent startups | <2 seconds |
| Search latency | <10ms |
| Index refresh (backgrounded) | Every 60 seconds |
| Periodic rescan to update indexes (backgrounded) | Every 10 minutes |
| Embedding rate (Qwen3-Embedding:4b) | ~3–5 messages/second |
Indexing order: Sessions are indexed newest first, so your most recent conversations become searchable within seconds of startup. Older conversations fill in progressively in the background.
On first startup, the index eagerly loads from SQLite into RAM. If embeddings exist but metadata hasn't populated yet, you'll see a "still loading" message. Retry in a few seconds. Also, as your size of your embeddings increases this may make it take a little longer. I have six months of Kiro work across 4300 chat documents with an (currently) 300MB embedding db, and it takes 10-15 seconds to load the index into RAM.
- Check
get_indexing_status; indexing may still be in progress - Use
rescan()to immediately pick up recent conversations - Verify your config with
get_config - Check "Kiro Powers / MCP" log
- For Ollama: ensure it's running (
ollama ps) and the model is pulled - Very long messages (>50K chars) may timeout; they're skipped with a warning
- Check your "Kiro Powers" outputs for logs/errors
- Use
reload_configtool (applies safe changes immediately) - Model/backend/dimensions changes require
rescan(full=True)
All Kiro windows share a single engine process automatically. Each MCP proxy registers its PID with the engine. If the engine dies, the next proxy request will respawn it. Use get_config to see the engine PID and port. If the engine has stale code (you updated the source), it will be killed and restarted automatically via fingerprint comparison.
If the database is corrupt or everything is broken, find your file path to your database calling the get_config tool. Then, once you find it, uninstall this power (or disable the MCP) then remove your database, then reinstall this power (or re-enable MCP).
rm -rf ~/.cache/kiro-ception/When you Restart Kiro (or re-enable MCP) it will rebuild the embeddings database from scratch.
uv sync # Install deps
uv run pytest tests/ -q # Run tests (300, ~30s)
uv run ruff check src/ # Lint
uv run kiro-ception # Run MCP server locallyFor information about where your data is being kept, call the MCP tool "get_config". On an unix-ey system, the file(s) at are...
| Path | Contents |
|---|---|
~/.config/kiro-ception/config.toml |
User configuration |
~/.cache/kiro-ception/cache_<hash>.db |
SQLite database (embeddings, metadata) |
~/.cache/kiro-ception/engine.lock |
Engine process file lock |
~/.cache/kiro-ception/engine.json |
Engine port/PID info for MCP proxies |
The cache DB filename includes a hash of the backend configuration. Changing model/backend/dimensions creates a new DB file (old ones are preserved for rollback).
Kiro Ception auto-discovers and indexes conversations from three IDE formats plus the CLI:
| Format | Location (macOS) | Notes |
|---|---|---|
| Kiro 1.0 (current) | ~/.kiro/sessions/<sha256_prefix>/<session_id>/messages.jsonl |
Primary format since Kiro IDE 1.0. Each session has session.json (metadata) + messages.jsonl (JSONL stream). Full assistant responses stored inline alongside tool calls. Directory names are the first 16 hex chars of SHA256(workspace_path). |
| Workspace-sessions (pre-1.0) | ~/Library/Application Support/Kiro/User/globalStorage/kiro.kiroagent/workspace-sessions/<base64_path>/<uuid>.json |
Older format where sessions were JSON files with a history array. Directory names are base64url-encoded workspace paths. Assistant responses were stubs ("On it.") — real responses came from execution logs. |
| Legacy .chat | ~/Library/Application Support/Kiro/User/globalStorage/kiro.kiroagent/<workspace_hash>/<uuid>.chat |
Earliest format. Full conversations in a single JSON file with chat array. |
| Execution logs (pre-1.0) | ~/Library/Application Support/Kiro/User/globalStorage/kiro.kiroagent/<workspace_hash>/414d1636299d2b9e4ce7e17fb11f63e9/<exec_id> |
Separate files containing assistant responses (actionType="say") and tool actions. Used to reconstruct full conversations for workspace-sessions format. |
| CLI | ~/.kiro/cli/conversations.db |
SQLite database with conversations_v2 table. Indexed automatically. |
When the same session exists in multiple formats (e.g., migrated from workspace-sessions to Kiro 1.0), deduplication ensures it is only indexed once, preferring the richest format (Kiro 1.0 > workspace-sessions > legacy).
Privacy: All data is processed and stored locally on your machine. No telemetry, no external API calls, and no data leaves your device; unless you explicitly configure a third-party embedding provider (e.g., OpenAI). The default configuration uses fully local, offline embeddings.
Found a bug? Have a feature request? Open an issue on GitHub.
If you're looking to contribute, here are some areas where we'd love help:
-
Cross-platform testing — The codebase targets macOS, Windows, and Linux. We develop primarily on macOS and have done targeted Windows work, but need broader real-world testing on Windows (especially around the engine subprocess lifecycle, file locking, and native DLL preloading) and Linux (various distros, ARM64).
-
Integration tests / CI pipeline — Currently all tests are unit/mock-based. We need end-to-end integration tests that spin up the actual engine process with test fixture data and exercise the full MCP proxy → HTTP → engine → SQLite → search path. This would enable a proper GitHub Actions CI matrix across OS and Python versions.
-
Remove legacy workspace decode fallback — The vector search path (
search.py) and FTS search path (cache.py) include fallbacks that decode base64-encoded workspace values at query time. These handle indexes created before the_decode_workspace_dir_namebug was fixed. After a couple release cycles, these become dead code and can be safely removed. -
Migrate engine_main.py from print() to logging — The engine process uses bare
print()for all status messages. Switching entire codebase to Python'sloggingmodule would give levels, timestamps, and configurable filtering while still routing through the existing log file support. -
SIGTERM-based graceful shutdown on Windows — On Unix, stale engines receive SIGTERM before SIGKILL for graceful cleanup. Windows has no SIGTERM equivalent for non-console processes, so we use
TerminateProcessdirectly. A Windows-native approach (e.g., named event signaling) could enable graceful shutdown there too.
Found a bug? Have a feature request? Open an issue on GitHub.
MIT - See: LICENSE.
Built by Farley Farley (DevOps-Nirvana), based upon Kiro Total Recall by Danilo Poccia (MIT licensed). The original session loaders, data models, and core embed/search concept originate from that project. Kiro Ception is a ground-up rewrite for production use; see the Architecture Highlights above for what's different.
