Order Lookup Reliability Guide
This guide explains the current order-lookup reliability logic in plain language, including the retry + token-refresh behavior and the main edge cases to watch.
1) End-to-end flow (quick view)
Section titled “1) End-to-end flow (quick view)”When a shopper asks for order status:
api.chatresolves the shop and picks a token source (Session table first, then Shop table).generateChatReplyinapp/lib/rag.server.jsdecides whether this turn is:- order intent without number -> asks for order number
- order lookup turn with parsed reference -> fetches live order
- regular RAG chat
- If order lookup is needed, it calls
fetchOrderByShopResilientinapp/lib/order-lookup.server.js. fetchOrderByShopResilient:- retries auth failures with backoff
- re-reads latest token candidates each attempt
- can trigger refresh-token exchange on 401/403
- On success: order context is formatted and included in assistant reply.
- On failure: user gets safe fallback message (auth vs generic API issue vs not found).
2) Conversation-state logic
Section titled “2) Conversation-state logic”State is tracked in ChatSession.metadata:
awaitingOrderNumberorderAskRetries
Behavior:
- If shopper says “where is my order” (no order number), assistant asks for order number.
- If shopper replies with
#1001or1001, it is accepted as a direct follow-up order ref. - If order API auth fails, session remains in order-follow-up context so the next
#1001is still handled as an order lookup turn. - On terminal outcomes (success or not found), order-follow-up state is cleared.
This prevents the flow from drifting into generic RAG after a transient auth issue.
3) Resilient lookup behavior
Section titled “3) Resilient lookup behavior”fetchOrderByShopResilient currently does:
- Backoff attempts:
0ms,300ms,900ms,1800ms - Candidate token order per attempt:
- current request token
- request fallback token
- latest offline Session token
- latest Shop token
- For each candidate, it calls
fetchOrderByShop. - If all candidates in an attempt fail with auth error, it can call
refreshAccessTokencallback and try refreshed token.
4) Refresh-on-401 behavior
Section titled “4) Refresh-on-401 behavior”Implemented in:
rag.server.js(chat path)api.order-lookup-test.jsx(API test path)scripts/test-order-lookup.mjs(CLI script path)
Refresh logic:
- Read latest offline Session row.
- Use
refreshToken(if present and not expired). - Call Shopify token endpoint:
POST https://{shopDomain}/admin/oauth/access_tokengrant_type: refresh_token
- Persist new access token to:
- Session row (
accessToken, optional rotated refresh fields) - Shop row (
accessToken)
- Session row (
- Retry order lookup immediately.
5) Remaining edge cases + concrete fixes
Section titled “5) Remaining edge cases + concrete fixes”These are the main scenarios where it can still fail:
-
Missing/expired
refreshTokenin offline Session
Why it fails: auto-refresh cannot run without a valid refresh token.
Fix:- Re-auth the app once from Shopify Admin to regenerate offline Session credentials.
- Add a daily health check job: flag shops where offline Session has missing/expired
refreshToken. - In-app admin banner: if order lookup hits this condition, prompt merchant to “Reconnect Shopify”.
- Keep fallback behavior user-safe: customer sees temporary order lookup error, not raw auth details.
-
Revoked token / reauth required by merchant
Why it fails: Shopify rejects both access token and refresh token after revocation/scope change/uninstall-reinstall paths.
Fix:- Detect repeated 401 after refresh attempt and mark shop auth status as degraded (internal flag or log signal).
- Show merchant-facing reconnect CTA in app admin.
- Block repeated blind retries after hard-auth failure (avoid useless load and noisy logs).
- After successful reauth, immediately test one known order via script or test endpoint.
-
App uninstalled or scopes changed
Token invalid even after retries; lookup fails until reauth. -
Older orders outside default
read_orderswindow
Why it fails:read_ordersusually limits visibility; older orders may appear as not found even when they exist.
Fix:- Confirm order age in Shopify Admin when lookup returns not found with no auth error.
- If business use-case truly needs historical orders, request
read_all_orderswith clear justification for App Review. - Add user-facing fallback copy: “I can only access recent orders right now. Please contact support for older orders.”
- Keep current query strategy (
name:#1001,name:1001,#1001) to maximize match quality for accessible orders.
-
Shopify-side transient API issues / throttling
Why it fails: temporary 5xx, 429, or network instability can outlive local retries.
Fix:- Add jitter to retry backoff (for example
300-500ms,900-1300ms) to reduce synchronized spikes. - Treat 429 separately: read throttle hints/headers where available and delay accordingly.
- Log per-attempt status code and elapsed time when
ORDER_LOOKUP_DEBUG=true. - Keep graceful shopper fallback and avoid exposing “rate limit” internals in widget replies.
- Add jitter to retry backoff (for example
-
Concurrent refresh races (rare under load)
Why it fails: two requests may refresh simultaneously and temporarily use mixed token states.
Fix:- Add a short-lived per-shop refresh lock (DB/advisory lock or in-memory mutex in single-node env).
- If lock exists, second request waits briefly and re-reads Session token before retrying.
- Use idempotent update policy: always persist latest refreshed token to both Session and Shop.
- Keep retry loop in place even with lock, because cross-instance races can still occur.
6) Troubleshooting checklist
Section titled “6) Troubleshooting checklist”Use this order:
- Run script:
node scripts/test-order-lookup.mjs --shop=<shop> --orderRef=#1001
- If
erroris401:- confirm Session has valid
refreshToken - confirm app API key/secret env vars exist
- confirm Session has valid
- If
found: falsewith no auth error:- verify order ref format and existence in Shopify Admin
- verify scope window (
read_ordersvs old order age)
- If intermittent:
- retry once and inspect server logs for refresh success
- consider adding temporary debug logs around refresh attempts
7) Recommended hardening backlog
Section titled “7) Recommended hardening backlog”If you want this to be production-strong at scale, implement in this order:
- Per-shop refresh lock around refresh-token exchange.
- Auth degradation signal (internal status/event) after repeated 401 + failed refresh.
- Ops telemetry: attempt count, token source, refresh success/failure reason.
- Merchant reconnect UX in admin for degraded auth state.
- Optional scope expansion to
read_all_ordersonly if product requirements require old-order access.
8) Files to know
Section titled “8) Files to know”- Core fetch + resilience:
app/lib/order-lookup.server.js
- Conversation orchestration:
app/lib/rag.server.js
- API test endpoint:
app/routes/api.order-lookup-test.jsx
- CLI test script:
scripts/test-order-lookup.mjs