Bot-BIN: Persistent Semantic Memory for AI Chatbots
How we built a local-first memory system that lets AI assistants remember everything and search by meaning, not keywords
Every AI chatbot today has the same fundamental limitation: it forgets everything the moment the conversation ends. Bot-BIN fixes that by giving your AI persistent, searchable memory that works entirely offline.
The Problem: AI Amnesia
If you've worked with ChatGPT, Claude, or any modern AI assistant, you've experienced the frustration: every conversation starts from scratch. The AI has no memory of your past discussions, your preferences, or the decisions you've made together.
Current context windows are getting larger (Claude can handle 200K tokens), but they're still fundamentally ephemeral. The moment you close that chat, everything is gone. Want to ask about a decision from three months ago? You'll need to find the chat log, copy it in, and hope it fits in the context window.
This creates a stark gap between how AI works and how humans expect intelligent assistants to behave. A good assistant remembers things. They know your preferences. They can recall that conversation from last month where you decided on the API architecture.
Figure 1: Bot-BIN converts ephemeral markdown notes into persistent, searchable vector memory
Enter Bot-BIN
Bot-BIN is a synchronization layer that converts your AI's markdown memory files into a binary format with embedded vector embeddings. This enables semantic search: finding relevant past context by meaning, not just keywords.
How It Works
1. Your AI writes notes to memory/*.md files during sessions
2. Bot-BIN syncs changed files to .aif-bin with 384-dimensional vectors
3. Semantic search finds relevant context by meaning
4. Your AI recalls "what did we decide about X" across sessions
The key insight is that we're not just storing text; we're storing meaning. When you search for "API architecture decisions," Bot-BIN doesn't just match those keywords. It finds semantically related content: discussions about REST vs GraphQL, endpoint structure debates, authentication design.
The Numbers
We obsess over performance because memory search needs to be fast enough to happen during conversation without perceptible delay. Here's what our benchmarks show:
That means for a typical personal memory store with thousands of chunks (representing months or years of conversations), search completes in under 5 milliseconds. By the time you've finished typing your question, the AI has already found the relevant memories.
Figure 2: Search latency scales linearly with collection size, remaining sub-50ms even at 100K chunks
Local-First, Zero Cloud
Bot-BIN operates entirely on your machine. There's no cloud service, no API calls, no data leaving your computer. The embedding model (all-MiniLM-L6-v2) runs locally and gets cached after first download.
This matters for several reasons:
- Privacy: Your conversation history, decisions, and context stay on your machine
- Speed: No network latency, no API rate limits, no downtime
- Cost: No per-query charges, no subscription fees
- Control: Your data, your rules, your backup strategy
Under the Hood: AIF-BIN v2
Bot-BIN uses the AIF-BIN v2 binary format for storage. This is a MessagePack-encoded format that bundles the original text, pre-computed embeddings, and rich metadata into a single portable file.
Each .aif-bin file is self-contained: you can copy it anywhere, share it, or back it up without worrying about database state or index files. The format is documented and open.
Figure 3: AIF-BIN v2 binary format structure with header, metadata, and chunk arrays
Real-World Usage
Here's what typical Bot-BIN usage looks like in practice:
# Sync changed memory files (runs in ~10ms if nothing changed)
python3 botbin.py sync
# Search for relevant past context
python3 botbin.py search "what API architecture did we choose"
# Results:
#1 [0.847] 2026-01-28.aif-bin
We decided on REST with resource-based endpoints. GraphQL was
considered but rejected due to complexity for the MVP...
#2 [0.721] 2026-02-02.aif-bin
API versioning discussion: agreed on URL path versioning (/v1/)
rather than headers. Simpler for debugging and documentation...
The search returns the most semantically relevant chunks from your memory files, scored by similarity. Your AI can then inject these as context for the current conversation.
Automated Sync
For AI systems with heartbeat or polling mechanisms, Bot-BIN sync can run automatically. We use this pattern with our own Clawdbot setup:
# In HEARTBEAT.md (runs every heartbeat cycle)
python3 botbin.py sync
# Silent if no changes, logs if files synced
This means memory files are always up-to-date and searchable without manual intervention.
Benchmark Details
For those who want the full picture, here are the detailed benchmark results from our February 2026 test run:
| Operation | Iterations | Avg Latency | Throughput |
|---|---|---|---|
| Cosine Similarity (384 dims) | 10,000 | 0.47 microseconds | 2,111,469 ops/sec |
| Search 1,000 chunks | 1,000 | 0.39 ms | 2,570 ops/sec |
| Search 10,000 chunks | 100 | 4.81 ms | 208 ops/sec |
| Search 100,000 chunks | 10 | 48.04 ms | 20.8 ops/sec |
| Chunk 10K words | 1,000 | 0.46 ms | 2,161 ops/sec |
| MessagePack encode | 100 | 0.50 ms | 1,988 ops/sec |
| MessagePack decode | 100 | 0.48 ms | 2,104 ops/sec |
Test environment: Linux (WSL2), Node.js v22.22.0, 384-dimensional embeddings.
Figure 4: Operations per second across different Bot-BIN workloads
Part of a Larger Ecosystem
Bot-BIN is one piece of the AIF-BIN ecosystem we're building at Terronex. The full stack includes:
- AIF-BIN Lite: Free CLI for basic file operations
- AIF-BIN Pro: Professional CLI with AI extraction and batch processing
- Bot-BIN: Memory sync for chatbots (you are here)
- Recall: Memory server with HTTP API and MCP integration
- Studio: Visual inspector for memory files
All tools work with the same .aif-bin format, so you can mix and match based on your needs.
What's Next
We're actively developing several enhancements:
- Memory Agent: Autonomous background process that consolidates, prunes, and optimizes memories over time
- VS Code Extension: Search and inspect memories directly from your editor
- Obsidian Plugin: Turn your note-taking into AI-searchable memory
- Better Models: Support for larger embedding models when you need higher accuracy
Figure 5: The AIF-BIN ecosystem of tools for AI memory management
Get Started with Bot-BIN
Bot-BIN is open source and free to use. Clone the repo, install dependencies, and give your AI chatbot a memory.
View on GitHub Read the Whitepaper