February 6, 2026 12 min read

Bot-BIN: Persistent Semantic Memory for AI Chatbots

How we built a local-first memory system that lets AI assistants remember everything and search by meaning, not keywords

AI Memory Semantic Search Local-First Open Source

Every AI chatbot today has the same fundamental limitation: it forgets everything the moment the conversation ends. Bot-BIN fixes that by giving your AI persistent, searchable memory that works entirely offline.

The Problem: AI Amnesia

If you've worked with ChatGPT, Claude, or any modern AI assistant, you've experienced the frustration: every conversation starts from scratch. The AI has no memory of your past discussions, your preferences, or the decisions you've made together.

Current context windows are getting larger (Claude can handle 200K tokens), but they're still fundamentally ephemeral. The moment you close that chat, everything is gone. Want to ask about a decision from three months ago? You'll need to find the chat log, copy it in, and hope it fits in the context window.

This creates a stark gap between how AI works and how humans expect intelligent assistants to behave. A good assistant remembers things. They know your preferences. They can recall that conversation from last month where you decided on the API architecture.

Figure 1: Bot-BIN converts ephemeral markdown notes into persistent, searchable vector memory

Enter Bot-BIN

Bot-BIN is a synchronization layer that converts your AI's markdown memory files into a binary format with embedded vector embeddings. This enables semantic search: finding relevant past context by meaning, not just keywords.

How It Works

1. Your AI writes notes to memory/*.md files during sessions
2. Bot-BIN syncs changed files to .aif-bin with 384-dimensional vectors
3. Semantic search finds relevant context by meaning
4. Your AI recalls "what did we decide about X" across sessions

The key insight is that we're not just storing text; we're storing meaning. When you search for "API architecture decisions," Bot-BIN doesn't just match those keywords. It finds semantically related content: discussions about REST vs GraphQL, endpoint structure debates, authentication design.

The Numbers

We obsess over performance because memory search needs to be fast enough to happen during conversation without perceptible delay. Here's what our benchmarks show:

2.1M

Similarity ops/sec

0.39ms

Search 1K chunks

4.8ms

Search 10K chunks

48ms

Search 100K chunks

That means for a typical personal memory store with thousands of chunks (representing months or years of conversations), search completes in under 5 milliseconds. By the time you've finished typing your question, the AI has already found the relevant memories.

Figure 2: Search latency scales linearly with collection size, remaining sub-50ms even at 100K chunks

Local-First, Zero Cloud

Bot-BIN operates entirely on your machine. There's no cloud service, no API calls, no data leaving your computer. The embedding model (all-MiniLM-L6-v2) runs locally and gets cached after first download.

This matters for several reasons:

Privacy: Your conversation history, decisions, and context stay on your machine
Speed: No network latency, no API rate limits, no downtime
Cost: No per-query charges, no subscription fees
Control: Your data, your rules, your backup strategy

Under the Hood: AIF-BIN v2

Bot-BIN uses the AIF-BIN v2 binary format for storage. This is a MessagePack-encoded format that bundles the original text, pre-computed embeddings, and rich metadata into a single portable file.

memory/2026-02-02.md --> [Bot-BIN Sync] --> memory/aifbin/2026-02-02.aif-bin | | | Raw markdown Embeddings Searchable vectors (384-dim)

Each .aif-bin file is self-contained: you can copy it anywhere, share it, or back it up without worrying about database state or index files. The format is documented and open.

Figure 3: AIF-BIN v2 binary format structure with header, metadata, and chunk arrays

Real-World Usage

Here's what typical Bot-BIN usage looks like in practice:

# Sync changed memory files (runs in ~10ms if nothing changed)
python3 botbin.py sync

# Search for relevant past context
python3 botbin.py search "what API architecture did we choose"

# Results:
#1 [0.847] 2026-01-28.aif-bin
   We decided on REST with resource-based endpoints. GraphQL was
   considered but rejected due to complexity for the MVP...

#2 [0.721] 2026-02-02.aif-bin
   API versioning discussion: agreed on URL path versioning (/v1/)
   rather than headers. Simpler for debugging and documentation...

The search returns the most semantically relevant chunks from your memory files, scored by similarity. Your AI can then inject these as context for the current conversation.

Automated Sync

For AI systems with heartbeat or polling mechanisms, Bot-BIN sync can run automatically. We use this pattern with our own Clawdbot setup:

# In HEARTBEAT.md (runs every heartbeat cycle)
python3 botbin.py sync
# Silent if no changes, logs if files synced

This means memory files are always up-to-date and searchable without manual intervention.

Benchmark Details

For those who want the full picture, here are the detailed benchmark results from our February 2026 test run:

Operation	Iterations	Avg Latency	Throughput
Cosine Similarity (384 dims)	10,000	0.47 microseconds	2,111,469 ops/sec
Search 1,000 chunks	1,000	0.39 ms	2,570 ops/sec
Search 10,000 chunks	100	4.81 ms	208 ops/sec
Search 100,000 chunks	10	48.04 ms	20.8 ops/sec
Chunk 10K words	1,000	0.46 ms	2,161 ops/sec
MessagePack encode	100	0.50 ms	1,988 ops/sec
MessagePack decode	100	0.48 ms	2,104 ops/sec

Test environment: Linux (WSL2), Node.js v22.22.0, 384-dimensional embeddings.

Figure 4: Operations per second across different Bot-BIN workloads

Part of a Larger Ecosystem

Bot-BIN is one piece of the AIF-BIN ecosystem we're building at Terronex. The full stack includes:

AIF-BIN Lite: Free CLI for basic file operations
AIF-BIN Pro: Professional CLI with AI extraction and batch processing
Bot-BIN: Memory sync for chatbots (you are here)
Recall: Memory server with HTTP API and MCP integration
Studio: Visual inspector for memory files

All tools work with the same .aif-bin format, so you can mix and match based on your needs.

What's Next

We're actively developing several enhancements:

Memory Agent: Autonomous background process that consolidates, prunes, and optimizes memories over time
VS Code Extension: Search and inspect memories directly from your editor
Obsidian Plugin: Turn your note-taking into AI-searchable memory
Better Models: Support for larger embedding models when you need higher accuracy

Figure 5: The AIF-BIN ecosystem of tools for AI memory management

Get Started with Bot-BIN

Bot-BIN is open source and free to use. Clone the repo, install dependencies, and give your AI chatbot a memory.

View on GitHub Read the Whitepaper