Skip to content

Full Conversation History in OpenRouter Chat

Root-cause analysis and implementation plan to fix duplicate user messages and preserve recent conversation context in OpenRouter prompts.

Phase 17 — Full Conversation History in OpenRouter Chat

Section titled “Phase 17 — Full Conversation History in OpenRouter Chat”

Goal: The AI should have full context of the entire conversation thread (all previous user messages and AI replies in the session), not just the last message. This document covers the root-cause analysis, exact bugs in the current code, and all file-level changes required.


  • Customer asks something, AI replies correctly.
  • Customer asks a follow-up that refers to the previous exchange (“what about the price of that one?”).
  • AI replies as if it has never seen the prior messages — it answers from scratch.

Step 3: Save current user message ← message saved FIRST
Step 4: Load history from DB ← query runs AFTER save
→ history includes the just-saved message
Step N: call buildPrompt(chunks, message, historyForPrompt, ...)
const messages = [
{ role: "system", content: systemContent },
...history.slice(-6), // ← last 6 from historyForPrompt
{ role: "user", content: message },// ← current message added AGAIN
];

Bug 1 — Duplicate current user message

  • The current user message is saved to DB (step 3).
  • History is loaded from DB (step 4) — it now includes the just-saved message.
  • buildPrompt then appends the current message a second time at the end.
  • OpenRouter receives the user’s latest message twice in every request.

Bug 2 — History window is silently too small

  • take: 10 in step 4 fetches at most 10 messages, but one of those 10 is the duplicate current message.
  • history.slice(-6) in buildPrompt cuts that down to 6.
  • Since slot -1 of the 6 is the duplicate current message, only 5 real previous turns reach the model.
  • In a 10-turn conversation (20 messages), only the last 5 messages before the current one are visible to the model — everything earlier is silently discarded.
[system prompt + store context]
[message N-4] user
[message N-3] assistant
[message N-2] user
[message N-1] assistant
[message N] user ← current message (from history, slot -1 of slice(-6))
[message N] user ← current message again (explicit append in buildPrompt)

[system prompt + store context]
[message 1] user
[message 2] assistant
[message 3] user
[message 4] assistant
...
[message N-1] user ← all previous turns, up to configured cap
[message N] assistant
[message N+1] user ← current message, once, at the end

No duplicates. History cap is configurable via env. History is fetched before the current message is saved, so it never contaminates the history window.


FileTypeChange
app/lib/rag.server.jsModifyLoad history BEFORE saving current user message; pass correct history to buildPrompt
app/lib/prompt-builder.server.jsModifyRemove explicit { role: "user", content: message } append — it is already last in history
.env.sampleModifyAdd CHAT_HISTORY_MESSAGES env var (default 20)

No DB schema change needed. No migration needed. No widget change needed.


Current order (broken):

Step 3: save user message to DB
Step 4: load history (includes current message → bug)
Step N: buildPrompt(chunks, message, history)

Fixed order:

Step 3: load history from DB ← MOVED HERE (BEFORE saving)
Step 4: save user message to DB
Step N: buildPrompt(chunks, history) ← history already ends with correct last assistant reply

Exact diff:

// BEFORE (rag.server.js steps 3–4):
// 3. Save user message
await prisma.chatMessage.create({
data: { sessionId: session.id, role: "user", messageText: message },
});
// 4. Load conversation history (last 3 exchanges = 6 messages)
const history = await prisma.chatMessage.findMany({
where: { sessionId: session.id },
orderBy: { createdAt: "asc" },
take: 10,
select: { role: true, messageText: true },
});
const historyForPrompt = history.map((m) => ({
role: m.role,
content: m.messageText,
}));
// AFTER (rag.server.js steps 3–4 swapped + cap from env):
// 3. Load conversation history BEFORE saving the current message.
// This ensures the current user turn is NOT in the history window
// and does not create a duplicate in the final prompt sent to OpenRouter.
const historyLimit = parseInt(process.env.CHAT_HISTORY_MESSAGES ?? "20", 10);
const rawHistory = await prisma.chatMessage.findMany({
where: { sessionId: session.id },
orderBy: { createdAt: "asc" },
take: historyLimit,
select: { role: true, messageText: true },
});
const historyForPrompt = rawHistory.map((m) => ({
role: m.role,
content: m.messageText,
}));
// 4. Save user message (after history is already captured).
await prisma.chatMessage.create({
data: { sessionId: session.id, role: "user", messageText: message },
});

The current explicit append of the user message is the source of the duplication. After the fix in rag.server.js, history already contains all previous turns (no current message). We append the current user message once here.

No change is needed to buildPrompt — it already appends { role: "user", content: message } at the end and calls history.slice(-6). But the slice cap must be removed or increased significantly, because it was silently discarding context.

// BEFORE (prompt-builder.server.js):
const messages = [
{ role: "system", content: systemContent },
...history.slice(-6), // ← silently discards old context
{ role: "user", content: message },
];
// AFTER (prompt-builder.server.js):
// history is already sized to CHAT_HISTORY_MESSAGES in rag.server.js.
// We do not slice here so the full history window reaches the model.
const messages = [
{ role: "system", content: systemContent },
...history,
{ role: "user", content: message },
];

Add:

# Number of previous messages (user + assistant) sent to OpenRouter for context.
# Higher = better memory, higher token cost per reply. Default: 20 (= ~10 back-and-forth turns).
CHAT_HISTORY_MESSAGES=20

Sending more history means more input tokens per call. Approximate cost comparison at gpt-4o-mini rates (~$0.15/M input tokens):

History settingAvg extra tokens/callExtra cost/call
Current (5 prev messages)~500~$0.000075
20 messages (default)~2000~$0.0003
40 messages~4000~$0.0006

These are small amounts. The default of 20 messages provides good context without meaningful cost increase.


ScenarioHow it’s handled
First message in session (empty history)historyForPrompt = []; buildPrompt sends just system + user message. No change needed.
Clarifying turn (order number ask) returns earlyThe clarifying assistant message is still saved via saveClarifyingReply. On next user message, history load picks it up correctly. No change needed.
Very long conversation (> CHAT_HISTORY_MESSAGES)take: historyLimit naturally caps oldest messages. Recent context is prioritised because orderBy: createdAt asc + take returns oldest first up to the limit.
Token limit exceeded at model levelModel returns a context length error. We already retry in chat.server.js. If this becomes frequent, reduce CHAT_HISTORY_MESSAGES.

Note on ordering: findMany with orderBy: { createdAt: "asc" } and take: N returns the oldest N messages, not the newest N. If the session has 30 messages and CHAT_HISTORY_MESSAGES=20, we’d send messages 1–20 and miss the most recent 10. This must be fixed:

// Fetch the MOST RECENT historyLimit messages, then re-order chronologically.
const rawHistory = await prisma.chatMessage.findMany({
where: { sessionId: session.id },
orderBy: { createdAt: "desc" }, // newest first
take: historyLimit,
select: { role: true, messageText: true },
});
const historyForPrompt = rawHistory
.reverse() // flip back to oldest-first for the prompt
.map((m) => ({ role: m.role, content: m.messageText }));

After implementation:

  • Start a new session. Send 5+ messages back and forth on different topics. On message 6, refer to message 2 topic. AI should answer correctly using context.
  • Check debug.rag.systemPromptPreview (with ORDER_LOOKUP_DEBUG=true) — confirm you see multiple user/assistant turns in the messages array sent to OpenRouter.
  • Confirm no duplicate user message at the end of the messages array.
  • Session with 25 messages: confirm only the most recent 20 (not oldest 20) are sent.
  • Clarifying turns (order number): after the AI asks for order number and customer replies, AI should have context of the prior conversation.
  • Cold start (first message ever): no error, normal reply.

WhatWhereLines affected
Swap steps 3 and 4 (load history before save)rag.server.js~10 lines
Use orderBy: desc + reverse() for recency-first historyrag.server.js~5 lines
Replace take: 10 with take: historyLimit from envrag.server.js1 line
Remove history.slice(-6) in buildPromptprompt-builder.server.js1 line
Add CHAT_HISTORY_MESSAGES env var.env.sample2 lines

Total: ~20 lines changed across 3 files. No migration, no DB schema change, no widget change.