Full Conversation History in OpenRouter Chat

Root-cause analysis and implementation plan to fix duplicate user messages and preserve recent conversation context in OpenRouter prompts.

Phase 17 — Full Conversation History in OpenRouter Chat

Goal: The AI should have full context of the entire conversation thread (all previous user messages and AI replies in the session), not just the last message. This document covers the root-cause analysis, exact bugs in the current code, and all file-level changes required.

1. Current Behaviour (What the User Sees)

Customer asks something, AI replies correctly.
Customer asks a follow-up that refers to the previous exchange (“what about the price of that one?”).
AI replies as if it has never seen the prior messages — it answers from scratch.

2. Root-Cause Analysis (Code Level)

`app/lib/rag.server.js`

Step 3: Save current user message   ← message saved FIRST
Step 4: Load history from DB        ← query runs AFTER save
                                      → history includes the just-saved message
Step N: call buildPrompt(chunks, message, historyForPrompt, ...)

`app/lib/prompt-builder.server.js`

const messages = [
  { role: "system", content: systemContent },
  ...history.slice(-6),              // ← last 6 from historyForPrompt
  { role: "user", content: message },// ← current message added AGAIN
];

Two Concrete Bugs

Bug 1 — Duplicate current user message

The current user message is saved to DB (step 3).
History is loaded from DB (step 4) — it now includes the just-saved message.
buildPrompt then appends the current message a second time at the end.
OpenRouter receives the user’s latest message twice in every request.

Bug 2 — History window is silently too small

take: 10 in step 4 fetches at most 10 messages, but one of those 10 is the duplicate current message.
history.slice(-6) in buildPrompt cuts that down to 6.
Since slot -1 of the 6 is the duplicate current message, only 5 real previous turns reach the model.
In a 10-turn conversation (20 messages), only the last 5 messages before the current one are visible to the model — everything earlier is silently discarded.

What the Model Actually Receives Today

[system prompt + store context]
[message N-4]   user
[message N-3]   assistant
[message N-2]   user
[message N-1]   assistant
[message N]     user   ← current message (from history, slot -1 of slice(-6))
[message N]     user   ← current message again (explicit append in buildPrompt)

3. Desired Behaviour After Fix

[system prompt + store context]
[message 1]     user
[message 2]     assistant
[message 3]     user
[message 4]     assistant
...
[message N-1]   user       ← all previous turns, up to configured cap
[message N]     assistant
[message N+1]   user       ← current message, once, at the end

No duplicates. History cap is configurable via env. History is fetched before the current message is saved, so it never contaminates the history window.

4. Files to Change

File	Type	Change
`app/lib/rag.server.js`	Modify	Load history BEFORE saving current user message; pass correct history to buildPrompt
`app/lib/prompt-builder.server.js`	Modify	Remove explicit `{ role: "user", content: message }` append — it is already last in history
`.env.sample`	Modify	Add `CHAT_HISTORY_MESSAGES` env var (default 20)

No DB schema change needed. No migration needed. No widget change needed.

5. Detailed Changes

5.1 `app/lib/rag.server.js`

Current order (broken):

Step 3: save user message to DB
Step 4: load history (includes current message → bug)
Step N: buildPrompt(chunks, message, history)

Fixed order:

Step 3: load history from DB  ← MOVED HERE (BEFORE saving)
Step 4: save user message to DB
Step N: buildPrompt(chunks, history)  ← history already ends with correct last assistant reply

Exact diff:

// BEFORE (rag.server.js steps 3–4):
// 3. Save user message
await prisma.chatMessage.create({
  data: { sessionId: session.id, role: "user", messageText: message },
});

// 4. Load conversation history (last 3 exchanges = 6 messages)
const history = await prisma.chatMessage.findMany({
  where: { sessionId: session.id },
  orderBy: { createdAt: "asc" },
  take: 10,
  select: { role: true, messageText: true },
});
const historyForPrompt = history.map((m) => ({
  role: m.role,
  content: m.messageText,
}));

// AFTER (rag.server.js steps 3–4 swapped + cap from env):
// 3. Load conversation history BEFORE saving the current message.
//    This ensures the current user turn is NOT in the history window
//    and does not create a duplicate in the final prompt sent to OpenRouter.
const historyLimit = parseInt(process.env.CHAT_HISTORY_MESSAGES ?? "20", 10);
const rawHistory = await prisma.chatMessage.findMany({
  where: { sessionId: session.id },
  orderBy: { createdAt: "asc" },
  take: historyLimit,
  select: { role: true, messageText: true },
});
const historyForPrompt = rawHistory.map((m) => ({
  role: m.role,
  content: m.messageText,
}));

// 4. Save user message (after history is already captured).
await prisma.chatMessage.create({
  data: { sessionId: session.id, role: "user", messageText: message },
});

5.2 `app/lib/prompt-builder.server.js`

The current explicit append of the user message is the source of the duplication. After the fix in rag.server.js, history already contains all previous turns (no current message). We append the current user message once here.

No change is needed to buildPrompt — it already appends { role: "user", content: message } at the end and calls history.slice(-6). But the slice cap must be removed or increased significantly, because it was silently discarding context.

// BEFORE (prompt-builder.server.js):
const messages = [
  { role: "system", content: systemContent },
  ...history.slice(-6),              // ← silently discards old context
  { role: "user", content: message },
];

// AFTER (prompt-builder.server.js):
// history is already sized to CHAT_HISTORY_MESSAGES in rag.server.js.
// We do not slice here so the full history window reaches the model.
const messages = [
  { role: "system", content: systemContent },
  ...history,
  { role: "user", content: message },
];

5.3 `.env.sample`

Add:

# Number of previous messages (user + assistant) sent to OpenRouter for context.
# Higher = better memory, higher token cost per reply. Default: 20 (= ~10 back-and-forth turns).
CHAT_HISTORY_MESSAGES=20

6. Token Cost Consideration

Sending more history means more input tokens per call. Approximate cost comparison at gpt-4o-mini rates (~$0.15/M input tokens):

History setting	Avg extra tokens/call	Extra cost/call
Current (5 prev messages)	~500	~$0.000075
20 messages (default)	~2000	~$0.0003
40 messages	~4000	~$0.0006

These are small amounts. The default of 20 messages provides good context without meaningful cost increase.

7. Edge Cases to Handle

Scenario	How it’s handled
First message in session (empty history)	`historyForPrompt = []`; `buildPrompt` sends just system + user message. No change needed.
Clarifying turn (order number ask) returns early	The clarifying assistant message is still saved via `saveClarifyingReply`. On next user message, history load picks it up correctly. No change needed.
Very long conversation (> CHAT_HISTORY_MESSAGES)	`take: historyLimit` naturally caps oldest messages. Recent context is prioritised because `orderBy: createdAt asc` + `take` returns oldest first up to the limit.
Token limit exceeded at model level	Model returns a context length error. We already retry in `chat.server.js`. If this becomes frequent, reduce `CHAT_HISTORY_MESSAGES`.

Note on ordering: findMany with orderBy: { createdAt: "asc" } and take: N returns the oldest N messages, not the newest N. If the session has 30 messages and CHAT_HISTORY_MESSAGES=20, we’d send messages 1–20 and miss the most recent 10. This must be fixed:

// Fetch the MOST RECENT historyLimit messages, then re-order chronologically.
const rawHistory = await prisma.chatMessage.findMany({
  where: { sessionId: session.id },
  orderBy: { createdAt: "desc" },   // newest first
  take: historyLimit,
  select: { role: true, messageText: true },
});
const historyForPrompt = rawHistory
  .reverse()                         // flip back to oldest-first for the prompt
  .map((m) => ({ role: m.role, content: m.messageText }));

8. Testing Checklist

After implementation:

Start a new session. Send 5+ messages back and forth on different topics. On message 6, refer to message 2 topic. AI should answer correctly using context.
Check debug.rag.systemPromptPreview (with ORDER_LOOKUP_DEBUG=true) — confirm you see multiple user/assistant turns in the messages array sent to OpenRouter.
Confirm no duplicate user message at the end of the messages array.
Session with 25 messages: confirm only the most recent 20 (not oldest 20) are sent.
Clarifying turns (order number): after the AI asks for order number and customer replies, AI should have context of the prior conversation.
Cold start (first message ever): no error, normal reply.

9. Summary of Changes

What	Where	Lines affected
Swap steps 3 and 4 (load history before save)	`rag.server.js`	~10 lines
Use `orderBy: desc + reverse()` for recency-first history	`rag.server.js`	~5 lines
Replace `take: 10` with `take: historyLimit` from env	`rag.server.js`	1 line
Remove `history.slice(-6)` in buildPrompt	`prompt-builder.server.js`	1 line
Add `CHAT_HISTORY_MESSAGES` env var	`.env.sample`	2 lines

Total: ~20 lines changed across 3 files. No migration, no DB schema change, no widget change.

my-rnd

Full Conversation History in OpenRouter Chat

Phase 17 — Full Conversation History in OpenRouter Chat

1. Current Behaviour (What the User Sees)

2. Root-Cause Analysis (Code Level)

`app/lib/rag.server.js`

`app/lib/prompt-builder.server.js`

Two Concrete Bugs

What the Model Actually Receives Today

3. Desired Behaviour After Fix

4. Files to Change

5. Detailed Changes

5.1 `app/lib/rag.server.js`

5.2 `app/lib/prompt-builder.server.js`

5.3 `.env.sample`

6. Token Cost Consideration

7. Edge Cases to Handle

8. Testing Checklist

9. Summary of Changes

01 Overview

01 Platform Foundation

02 Knowledge and RAG

03 Billing and Usage

04 Admin and Storefront

05 Chat Experience and Safety

06 Merchant Lifecycle

07 Ops and Release

08 Research and Ideas

09 Archive

10 Marketing and SEO

11 Video Testimonial Collector

Full Conversation History in OpenRouter Chat

Phase 17 — Full Conversation History in OpenRouter Chat

1. Current Behaviour (What the User Sees)

2. Root-Cause Analysis (Code Level)

app/lib/rag.server.js

app/lib/prompt-builder.server.js

Two Concrete Bugs

What the Model Actually Receives Today

3. Desired Behaviour After Fix

4. Files to Change

5. Detailed Changes

5.1 app/lib/rag.server.js

5.2 app/lib/prompt-builder.server.js

5.3 .env.sample

6. Token Cost Consideration

7. Edge Cases to Handle

8. Testing Checklist

9. Summary of Changes

01 Overview

01 Platform Foundation

02 Knowledge and RAG

03 Billing and Usage

04 Admin and Storefront

05 Chat Experience and Safety

06 Merchant Lifecycle

07 Ops and Release

08 Research and Ideas

09 Archive

10 Marketing and SEO

11 Video Testimonial Collector

`app/lib/rag.server.js`

`app/lib/prompt-builder.server.js`

5.1 `app/lib/rag.server.js`

5.2 `app/lib/prompt-builder.server.js`

5.3 `.env.sample`