Your launch hit 86 points and 37 comments in 10 hours. Three of those comments were substantive technical critique (altmanaltman, cyanydeez, axeldunkel). One questioned the entire premise (SwellJoe). Your README answers none of them. This page does.
Local-first MCP server. Forgetting curve + spaced repetition + graph layer over the vector store. The bet: what to forget is as critical as what to remember.
Open-source SDK + hosted SaaS. Compresses chat history into "memories" and re-injects on recall. Largest GitHub-stars community in the category.
OS-style runtime with three memory tiers (core / recall / archival). You instantiate a Letta agent, not a memory store.
Compiles chat into a time-aware knowledge graph. Strong opinions about entity extraction. Cloud + community edition.
Chunk → embed → top-k retrieve. No memory product. The thing altmanaltman was implicitly comparing you to. Sometimes it's the right answer.
altmanaltman's accusation is the thread's harshest: "the whole 'biological memory' thing seems like marketing fluff on basic cache mechanisms." The honest response isn't to defend the framing — it's to make the mechanism difference legible enough that a skeptic can verify it themselves. Three things below are not in plain RAG.
| Dimension | YourMemory | Mem0 | Letta | Zep | Plain RAG |
|---|---|---|---|---|---|
| Source of truth | Agent's own activity stream + recall events | Chat / message history | Agent runtime state (3 tiers) | Chat compiled into temporal graph | Whatever you chunk |
| Forgetting model | Ebbinghaus curve + spaced repetition — strength score per memory, recall reinforces, unused decays past threshold and is pruned | No native decay — relies on compression to keep recall set tractable | Eviction tier-based (archival vs recall vs core) | Time-anchored (facts have a `valid_from`/`valid_to`) | none — manual cleanup or unbounded growth |
| Retrieval model | vector + graph — graph layer catches "logical neighbors" that semantic search misses | Vector top-k | Tier-aware (function-call surface for recall) | Graph traversal + vector | Vector top-k |
| Storage layer | DuckDB local-first — no service required, embeddable | SQLite or hosted | Postgres (self-hosted) or hosted | Postgres + Neo4j-style graph | Whatever vector DB you bring |
| Open source? | yes — repo on GitHub | yes Apache-2.0 + hosted | yes Apache-2.0 + hosted | community edition + cloud | yes by definition |
| Reported benchmark | 52% Recall@5 on LoCoMo · ~84% token reduction vs unfiltered context | Mem0 paper claims +26% LLM-as-judge over OpenAI memory | MemGPT paper benchmarks on document QA over long contexts | DMR / LongMemEval benchmarks reported | Varies wildly by chunking + reranker |
| Caveats on the benchmark | LoCoMo has known issues (altmanaltman's point) — see Section 3 | Self-reported; LLM-as-judge has variance | Older benchmark window | Newer; less third-party replication | No single number applies |
| "Hello, memory" code | mcp.add(text, kind=…) → recalls auto-strengthen the entry |
m.add(messages, user_id=…) |
letta.create_agent(memory=…) |
zep.memory.add(session_id, msgs) |
vec.add(chunks); vec.search(q) |
| The unique bet | "Forgetting is a first-class operation, not a side-effect of running out of tokens" | "Chat-history compression is the memory primitive" | "Agents need a typed runtime, not a vector store" | "Time-aware knowledge graphs beat embeddings" | "Just throw it in the prompt" |
Honest recommendations beat overclaiming. Below is the cohort for which each project — including "no memory product at all" — is the correct answer. Telling SwellJoe "use no memory product" earns the right to tell the next reader why YourMemory matters.
Your agent runs against the same project / codebase / document set across many sessions. Without forgetting, the recall set grows monotonically and either tokens explode or you start hand-pruning. YourMemory makes pruning a property of the data, not a manual chore.
Building a chatbot, a support agent, or anything where the conversation history *is* the memory you need. Mem0's compression-and-recall loop is the most mature implementation of that pattern. Open source means cloud bills can't surprise you.
You buy into the MemGPT thesis: agents need an OS-style runtime with explicit tier management. Pick Letta when you'd rather instantiate Agents than glue a memory product into your own loop.
You're modeling entities and their relationships across time — a CRM agent, a healthcare agent, a legal-events agent. Embeddings keep returning fuzz; you actually need typed nodes and edges.
If your agent runs once, answers a question, and exits — there is no memory problem. Don't add a dependency. SwellJoe's frustration with memory products distracting agents is real for this case. Add memory the moment a session crosses a meaningful boundary.
tra3 in your thread already names this: "preserve all my Claude Code conversations and set the context from there." For a single expert user with strong intuitions about what's worth remembering, hand-curated memory will outperform any automated decay model. YourMemory's argument is for the case where you're not at the keyboard, or you've got many sessions to track.
The thread is the brief. Below is a direct response to each substantive critique, by HN handle, so a reader who arrived here from the comments can verify the page actually engages with what was said.
"I am sorry but the whole 'biological memory' thing seems like marketing fluff on basic cache mechanisms. You said it cuts token usage by 84% but isn't that typical for any typical chunked RAG system? And why did you specifically chose to test against the LoMoCo dataset when there's a lot of issues with it and it being very easy to cheat?"
"the decay rate shouldn't be based on a real clock but a lifetime of it's use within the coding session. Elsewise your memory fades even when there's no process change (eg, coder goes on vacation)."
now() - last_recall to tick_count - last_recall_tick. Wall-clock decay can stay as a fallback for use-cases where it does fit (e.g., "memories about the user's emotional state probably shouldn't last for years").
Page recommendation: name this out loud as a v0.2 roadmap item. Doing so neutralizes the critique and signals the project is alive.
"What concerns me more are memory chunks with errors in them — they need to be corrected/removed by some other mechanism, not by decay (since they might get retrieved often)."
"I know everybody seems to want the agent to remember every conversation they've ever had with it, but I just don't see the value in that. In fact, it seems to hurt productivity to have the agent second guessing me based on something I said yesterday. Every time I've used any memory system, the agent gets distracted from the current tasks based on previous conversations and branches of development."
"How to decide effectively what to save in the memory: Is it the model to decide what is important, summarize and save it to the memory? How to avoid redundancy and categorize the memory correctly so you could get the right hit and decide what to forget?"
memory.add() deliberately is honest about that. Auto-summary is a v0.3 feature, not a v0.1.
Redundancy: embed-on-write + similarity check before insert. If a new memory is > 0.95 cosine to an existing one, merge (sum strength, take latest text) instead of inserting. This keeps the index from drowning in near-duplicates from chatty agents.
These two answers should be in the README's "How it works" section. waterbuffaloai is your highest-intent reader in the thread — they're literally building the same thing.
"It strikes me as funny how we want to get super AI inteligence but keep trying to anthropomorphizing all AI aspects to make it more 'human'. IMHO, if we keep doing it we will create Human AI with all errors and deficiencies humans have."