RAG Pipeline¶
The codebase contains two distinct RAG implementations that coexist:
- In-memory RAG — local embedding + cosine similarity over per-account documents in S3. Used by
helper/ragProcess.js,getRagData.js,getRagDataTopics.js. - Cloud RAG (Vertex AI) — Google Vertex AI's managed RAG with server-side corpora. Used by
helper/functionsForAi/cloudRag.js.
Both surface to higher-level callers (jobs and route handlers) through getAiParametersFunction(), which decides per-account/per-feature whether to use a corpus, an in-memory retrieval, or no retrieval at all.
At a Glance¶
| Implementation | File | Provider | Vector store | Where used |
|---|---|---|---|---|
| In-memory (single doc) | helper/ragProcess.js |
Bedrock Cohere embed | In-memory SimpleVectorStore |
Targeted document Q&A |
| In-memory (multi-doc) | helper/getRagData.js |
Bedrock Cohere embed + Claude 3 Sonnet (optional) | In-memory cosine | RAG over a user's full document folder |
| In-memory (topics) | helper/getRagDataTopics.js |
Same as getRagData.js |
In-memory cosine | Topic-focused variant |
| Cloud RAG | helper/functionsForAi/cloudRag.js |
Vertex AI (Gemini 2.5 Flash + Vertex RAG store) | Vertex managed corpus | Production content generation |
| Parameter loader | helper/functionsForAi/getAiParameters.js |
— | — | Selects model + prompt for any job |
In-Memory RAG¶
ragProcess.js — single-doc / single-account flow¶
Export: ragFilesOperation(accountId, content)
Flow:
- Reads documents from
s3://${bucket}/${conf.S3_Path_RAG}account_id_${accountId}/. - Splits text into chunks (default ~400 chars with ~10-char overlap).
- Embeds each chunk via Bedrock Cohere
cohere.embed-multilingual-v3. - Stores vectors in a
SimpleVectorStorein memory. - Embeds the query, scores all chunks by cosine similarity, returns the top 12 matches.
The SimpleVectorStore is a class private to this file — it's not a shared singleton; each call rebuilds the store from scratch. There is no caching across calls.
getRagData.js — multi-format folder RAG¶
Export: performRAGWorkflow(accountId, query)
Differences from ragProcess.js:
- Lists all files in the account's S3 prefix, not just one document.
- Parses each file based on extension:
- .pdf → pdf-parse
- .docx → mammoth
- .xlsx → xlsx
- .csv → csv-parse
- .txt / fallback → raw read
- Image files → skipped
- Larger chunk size (~2000 chars).
- Returns top 3 chunks with source filename metadata.
- Optionally calls Claude 3 Sonnet (Bedrock) to synthesize an answer over the retrieved chunks.
getRagDataTopics.js — topic variant¶
Export: getRAGdataTopics(accountId, query)
Architecturally identical to getRagData.js. The prompt and post-processing are tuned for topic-discovery queries (e.g., "what themes show up across this account's brand documents") rather than direct Q&A.
When to use which¶
| Need | Use |
|---|---|
| Quick lookup against a known document | ragFilesOperation |
| Retrieve over a user's full document library | performRAGWorkflow |
| Discover topics across a corpus | getRAGdataTopics |
| Production-scale RAG with managed indexing | cloudRagResult (below) |
Limitations¶
- No persistence — every call re-downloads, re-parses, re-embeds. For accounts with many large files, this is slow and costly. Consider caching the vector store per account if you reuse it across requests.
- No incremental updates — adding a file to S3 doesn't invalidate anything because nothing is cached.
- Chunking is naive — fixed character windows, no sentence/paragraph boundary awareness.
- Bedrock Cohere has a 512-token input cap — chunks larger than ~2000 characters may silently truncate.
Cloud RAG (Vertex AI)¶
File: helper/functionsForAi/cloudRag.js
Export: cloudRagResult({ ragCorpus, finalPrompt, model })
This is the production RAG path. It uses Google Vertex AI's managed vertexRagStore retrieval tool, where the corpus is pre-indexed server-side and the LLM is given retrieval-augmented context automatically by the Vertex API.
Inputs¶
| Param | Description |
|---|---|
ragCorpus |
Vertex corpus ID — see ⚠ note below about where this actually comes from |
finalPrompt |
Pre-templated prompt string (placeholders already substituted) |
model |
Optional model override; defaults to gemini-2.5-flash. Falls back to Claude 3 Sonnet on certain failures |
Credential handling¶
GCP service-account credentials are fetched from AWS Secrets Manager (secret name in GCS_SECRET_NAME env var), not from a local JSON file. The Vertex AI client is cached for 3600 seconds to avoid refetching credentials on every call.
Reliability¶
- 30-second timeout with an
AbortController. - On retrieval / generation failure, the function logs and returns a stringified fallback so callers don't crash.
Callers¶
Higher-level jobs invoke cloudRagResult directly:
generate_company_industry.jsgenerate_organization_profile.jsgenerate_template_industry.js- Various content-generation paths in
routes/routes.jsand the user-specific AI jobs
getAiParametersFunction¶
File: helper/functionsForAi/getAiParameters.js
Export: getAiParametersFunction({ languageId, ...otherParams })
This is the single source of truth for AI configuration in the app. Every job and route that calls a model goes through it first.
What it does¶
- Reads model configuration from
tLanguageModels(filtered bylanguageIdand feature ID). - Optionally pulls examples from
tTrainingDatafor few-shot prompts. - Substitutes placeholders in the prompt template:
${companyWebsite},${orgProfile},${content},${impersonation}, etc.- Returns:
{
aiModel, // e.g. "gemini-2.5-flash" or "anthropic.claude-3-sonnet-..."
maxTokens,
temperature,
topP,
topK,
impersonation, // "you are a marketing copywriter" persona
inputData, // raw user input
outputData, // last response (for chained calls)
replacedImpersonation,
replacedPrompt // fully substituted prompt ready to send
}
Why it matters for RAG¶
If the row in tLanguageModels has a corpus reference, the caller passes that corpus to cloudRagResult. If not, the caller may either skip retrieval entirely or fall back to the in-memory implementations.
This is also where you'd flip a feature from in-memory RAG to Vertex Cloud RAG — by editing the row in tLanguageModels rather than touching code.
Data Sources¶
S3 layout¶
In-memory RAG documents live under:
Files are uploaded by the user via the document-upload UI. No size or count limit is enforced at the API layer.
Vertex corpora¶
Vertex corpora are managed externally (via the GCP console, Vertex AI's import API, or an out-of-band ingestion pipeline).
⚠ The corpus ID is hardcoded in caller files, not stored in the database.
Earlier versions of this doc claimed the Vertex corpus identifier was persisted on
tAccounts.corpusName. That column does not exist in the schema (verified indata-model.md§1).The actual corpus IDs are literal strings hardcoded in caller files like
generate_template_industry.jsandgenerate_company_industry.js:This is the same anti-pattern as the Polotno API key — every caller embeds the literal, rotation requires touching every file. Recommended fix: move corpus IDs to environment variables (e.g.,
VERTEX_RAG_CORPUS,VERTEX_RAG_CORPUS_TEMPLATE_INDUSTRY, etc.) and reference them viaconf.js.If the platform needs per-account corpora (different corpus per tenant), this should become a real DB column on
tAccounts(or live intCloudKnowledgeBase, which exists but its role hasn't been fully verified).
Per-category enablement¶
If a category has rag_active = 0, the content-generation pipeline skips RAG retrieval entirely for that category.
Call Chains (Examples)¶
Production content generation (Vertex)¶
HTTP request / job tick
→ getAiParametersFunction({languageId, ...})
→ cloudRagResult({ragCorpus, finalPrompt, model})
→ Vertex AI (RAG store retrieval + LLM completion)
→ response written to tUserLibrary
Document Q&A (in-memory, multi-format)¶
HTTP request
→ performRAGWorkflow(accountId, query)
→ list S3 → parse → chunk → Bedrock Cohere embed → cosine match
→ top-3 chunks (optionally synthesized via Claude 3 Sonnet)
→ response to client
Operational Notes¶
- Vertex client cache is per-process. If you rotate the GCS secret, restart all PM2 jobs that use Vertex.
- No retries on Vertex failures. A single 30s timeout is the entire failure budget. Add idempotent retries upstream if reliability matters.
- In-memory RAG re-runs per call. Don't use it inside a tight loop — embedding cost scales linearly with input size.
- Bedrock Cohere quota is per-region. If you hit limits, check the Bedrock service quotas in the same region as
S3_Region(usuallyus-east-1).
Related¶
- Integration Inventory — Bedrock, Vertex AI, Gemini configuration
- Configuration Reference —
S3_Path_RAG,GCS_SECRET_NAME,GEMINI_API_KEY - Data Model —
tLanguageModels,tTrainingData,tDefaultCategories.rag_active(note:tAccounts.corpusNamewas previously cited but does not exist — see § Vertex corpora above) - Content Pipeline — where retrieval fits into the end-to-end content generation flow
- User-Specific AI Jobs — async workers that call
getAiParametersFunction