Bot-BIN: Persistent Semantic Memory for AI Chatbots

A synchronization framework for converting ephemeral conversation context into durable, searchable memory using the AIF-BIN v2 binary format

Terronex Research

terronex.dev

Version 1.0 | February 2026

Abstract

Modern AI chatbots and agents suffer from context window limitations that prevent long-term memory persistence. Bot-BIN addresses this fundamental constraint by providing a synchronization layer that converts markdown-based memory files into the AIF-BIN v2 binary format with embedded 384-dimensional vector embeddings. This enables semantic search across conversation history, decisions, and accumulated knowledge. Performance benchmarks demonstrate search latency of 0.39ms per 1,000 chunks and encoding throughput of 2.1 million similarity operations per second. Bot-BIN operates entirely offline with zero cloud dependencies, maintaining user privacy while enabling AI systems to develop persistent, queryable memory.

Table of Contents

  1. Introduction
  2. The Memory Problem in AI Systems
  3. System Architecture
  4. AIF-BIN v2 Binary Format
  5. Synchronization Engine
  6. Semantic Search Implementation
  7. Performance Benchmarks
  8. Integration Patterns
  9. Security Considerations
  10. Future Work

1. Introduction

The current generation of large language models operates within fixed context windows, typically ranging from 4,096 to 200,000 tokens. While these windows have grown substantially, they remain fundamentally ephemeral. Each conversation session begins with a blank slate, and accumulated knowledge from prior interactions is lost unless explicitly re-injected into the context.

This limitation creates a significant gap between how AI systems operate and how humans expect intelligent assistants to behave. A human assistant remembers past conversations, accumulates knowledge about preferences and decisions, and builds contextual understanding over time. Current AI chatbots cannot replicate this behavior without external memory systems.

Bot-BIN provides a practical solution to this problem. It operates as a synchronization layer between human-readable markdown files and machine-optimized binary memory files. The system watches for changes to memory files, automatically generates vector embeddings, and maintains a searchable archive of all accumulated knowledge.

2. The Memory Problem in AI Systems

2.1 Context Window Constraints

Large language models process input as a sequence of tokens within a fixed-size window. Once this window is filled, older tokens are truncated or the system must employ compression strategies. This creates several practical limitations:

2.2 Existing Solutions and Their Limitations

Several approaches have been developed to address AI memory limitations:

Approach Mechanism Limitations
Vector Databases Cloud-hosted embedding storage Requires network, vendor lock-in, privacy concerns
RAG Pipelines Retrieve-then-generate Complex infrastructure, high latency
Conversation Logs Append-only text files No semantic search, keyword matching only
Knowledge Graphs Entity-relationship storage Requires structured data extraction, brittle

Bot-BIN takes a different approach: local-first binary files that contain both the original text and pre-computed embeddings. This eliminates cloud dependencies while enabling fast semantic search.

3. System Architecture

3.1 Component Overview

workspace/ +-- memory/ | +-- 2026-02-01.md Source: Human-readable notes | +-- 2026-02-02.md Source: Daily conversation logs | +-- aifbin/ | +-- 2026-02-01.aif-bin Binary: Embedded vectors | +-- 2026-02-02.aif-bin Binary: Semantic searchable +-- MEMORY.md Source: Long-term persistent memory +-- botbin.py Engine: Sync + Search CLI +-- aifbin_pro.py Library: AIF-BIN v2 operations +-- aifbin_spec_v2.py Library: Binary format spec

3.2 Data Flow

The Bot-BIN system operates through a three-phase pipeline:

Phase 1: Change Detection +------------------+ +------------------+ +------------------+ | memory/*.md | --> | Hash Compare | --> | Changed Files | | MEMORY.md | | (MD5 state) | | Queue | +------------------+ +------------------+ +------------------+ Phase 2: Embedding Generation +------------------+ +------------------+ +------------------+ | Changed Files | --> | Chunk Text | --> | all-MiniLM-L6 | | Queue | | (500 tokens) | | (384-dim) | +------------------+ +------------------+ +------------------+ Phase 3: Binary Serialization +------------------+ +------------------+ +------------------+ | Embeddings + | --> | MessagePack | --> | .aif-bin | | Metadata | | Encoding | | Files | +------------------+ +------------------+ +------------------+

3.3 State Management

Bot-BIN maintains sync state in a JSON file that tracks file hashes and modification times. This enables incremental sync, where only changed files are re-processed. The state file structure:

{
  "files": {
    "memory/2026-02-01.md": {
      "hash": "a1b2c3d4e5f6...",
      "synced_at": "2026-02-01T23:45:00Z",
      "chunks": 12,
      "output": "memory/aifbin/2026-02-01.aif-bin"
    }
  },
  "last_sync": "2026-02-01T23:45:00Z"
}

4. AIF-BIN v2 Binary Format

4.1 Format Specification

AIF-BIN v2 uses MessagePack encoding for efficient binary serialization. The format provides a self-contained unit of semantic memory that includes source text, pre-computed embeddings, and rich metadata.

4.1.1 Header Structure

Offset  Size    Field           Description
0       4       Magic           "AIFB" (0x41 0x49 0x46 0x42)
4       1       Version         Format version (2)
5       1       Flags           Compression, encryption flags
6       2       Reserved        Future use
8       4       Metadata Len    Length of metadata block
12      4       Chunks Len      Number of chunks
16      N       Metadata        MessagePack-encoded metadata
16+N    M       Chunks          MessagePack-encoded chunk array

4.1.2 Chunk Structure

Each chunk contains:

Field Type Description
id string UUID v4 chunk identifier
text string Original text content (up to 500 tokens)
embedding float32[] 384-dimensional vector from MiniLM
metadata object Source position, timestamps, tags

4.2 Embedding Model

Bot-BIN uses the all-MiniLM-L6-v2 sentence transformer model by default. This model provides an optimal balance of speed, quality, and resource efficiency:

384 Embedding Dimensions
22M Model Parameters
~10ms Inference Latency
~90MB Model Size

5. Synchronization Engine

5.1 Change Detection Algorithm

The sync engine uses MD5 hashing to detect file changes. This approach is chosen over file modification times because it correctly handles cases where file content is restored to a previous state (same mtime, different content).

def needs_sync(file_path, state):
    current_hash = md5(read_file(file_path))
    previous_hash = state.get(file_path, {}).get('hash')
    return current_hash != previous_hash

5.2 Chunking Strategy

Text is split into chunks using a paragraph-aware algorithm that respects natural boundaries while maintaining target chunk sizes. The chunking parameters:

5.3 Tracked Files

Bot-BIN automatically tracks these files in the workspace:

Additional paths can be configured via environment variables or command-line arguments.

6.1 Query Processing

Semantic search operates by embedding the query text using the same MiniLM model, then computing cosine similarity against all indexed chunk embeddings:

def search(query, collection, limit=10):
    # Embed the query
    query_embedding = model.encode(query)
    
    # Load all indexed chunks
    results = []
    for file in collection.files:
        chunks = load_aifbin(file)
        for chunk in chunks:
            score = cosine_similarity(query_embedding, chunk.embedding)
            results.append((chunk, score))
    
    # Return top-k by similarity
    results.sort(key=lambda x: x[1], reverse=True)
    return results[:limit]

6.2 Cosine Similarity

Cosine similarity measures the angle between two vectors, producing a score between -1 and 1. For normalized embeddings (which MiniLM produces), this simplifies to a dot product:

similarity(a, b) = sum(a[i] * b[i] for i in range(384))

6.3 Hybrid Retrieval

Bot-BIN supports optional hybrid retrieval that combines vector similarity with BM25 keyword matching. This improves recall for queries containing specific named entities or technical terms:

hybrid_score = (alpha * vector_score) + ((1 - alpha) * keyword_score)
# Default alpha = 0.7 (favor semantic similarity)

7. Performance Benchmarks

7.1 Test Environment

Component Specification
Platform Linux (WSL2)
Runtime Node.js v22.22.0
Embedding Dimensions 384 (all-MiniLM-L6-v2)
Benchmark Date February 5, 2026

7.2 Results Summary

2.1M Cosine ops/sec
0.39ms Search 1K chunks
4.8ms Search 10K chunks
48ms Search 100K chunks

7.3 Detailed Benchmarks

Operation Iterations Avg Latency Throughput
Cosine Similarity (384 dims) 10,000 0.47 microseconds 2,111,469 ops/sec
Search 1,000 chunks 1,000 0.39 ms 2,570 ops/sec
Search 10,000 chunks 100 4.81 ms 208 ops/sec
Search 100,000 chunks 10 48.04 ms 20.8 ops/sec
Chunk 10K words 1,000 0.46 ms 2,161 ops/sec
MessagePack encode (100 chunks) 100 0.50 ms 1,988 ops/sec
MessagePack decode (100 chunks) 100 0.48 ms 2,104 ops/sec

Performance Insight

Search latency scales linearly with collection size. For typical chatbot memory stores (1,000-10,000 chunks), search completes in under 5ms, enabling real-time retrieval during conversation without perceptible delay.

8. Integration Patterns

8.1 Heartbeat-Based Sync

For AI systems with periodic heartbeat or polling mechanisms, Bot-BIN sync can be triggered automatically:

# In HEARTBEAT.md or similar automation config
1. Run `python3 botbin.py sync` on each heartbeat
2. Silent output if no files changed
3. Memory files always up-to-date for search

8.2 MCP Integration

Bot-BIN integrates with the Model Context Protocol (MCP) for AI agent tooling. The MCP server exposes memory search as a callable tool:

{
  "mcpServers": {
    "botbin-memory": {
      "command": "python3",
      "args": ["botbin.py", "mcp-server"]
    }
  }
}

8.3 Direct CLI Usage

# Sync all changed memory files
python3 botbin.py sync

# Search memories by meaning
python3 botbin.py search "what API architecture did we choose"

# Show sync status
python3 botbin.py status

# Extract original text from binary
python3 botbin.py extract memory/aifbin/file.aif-bin

9. Security Considerations

9.1 Local-First Architecture

Bot-BIN operates entirely offline. No data is transmitted to external servers. This provides strong privacy guarantees for sensitive conversation data:

9.2 File Security

AIF-BIN v2 supports optional encryption for at-rest security. When enabled, chunk content and embeddings are encrypted with AES-256-GCM before serialization. The encryption key is derived from a user-provided passphrase using Argon2id.

9.3 Access Control

Memory files should be protected with appropriate filesystem permissions. Recommended configuration:

# Restrict memory directory to owner only
chmod 700 memory/
chmod 600 memory/*.md memory/aifbin/*.aif-bin

10. Future Work

10.1 Planned Enhancements

10.2 Integration Roadmap

References

  1. AIF-BIN Format Specification v2.0, Terronex, 2026
  2. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Reimers & Gurevych, 2019
  3. MessagePack Specification, msgpack.org
  4. Model Context Protocol, Anthropic, 2024