Skip to content

RAG Pipeline

The codebase contains two distinct RAG implementations that coexist:

  1. In-memory RAG — local embedding + cosine similarity over per-account documents in S3. Used by helper/ragProcess.js, getRagData.js, getRagDataTopics.js.
  2. Cloud RAG (Vertex AI) — Google Vertex AI's managed RAG with server-side corpora. Used by helper/functionsForAi/cloudRag.js.

Both surface to higher-level callers (jobs and route handlers) through getAiParametersFunction(), which decides per-account/per-feature whether to use a corpus, an in-memory retrieval, or no retrieval at all.


At a Glance

Implementation File Provider Vector store Where used
In-memory (single doc) helper/ragProcess.js Bedrock Cohere embed In-memory SimpleVectorStore Targeted document Q&A
In-memory (multi-doc) helper/getRagData.js Bedrock Cohere embed + Claude 3 Sonnet (optional) In-memory cosine RAG over a user's full document folder
In-memory (topics) helper/getRagDataTopics.js Same as getRagData.js In-memory cosine Topic-focused variant
Cloud RAG helper/functionsForAi/cloudRag.js Vertex AI (Gemini 2.5 Flash + Vertex RAG store) Vertex managed corpus Production content generation
Parameter loader helper/functionsForAi/getAiParameters.js Selects model + prompt for any job

In-Memory RAG

ragProcess.js — single-doc / single-account flow

Export: ragFilesOperation(accountId, content)

Flow:

  1. Reads documents from s3://${bucket}/${conf.S3_Path_RAG}account_id_${accountId}/.
  2. Splits text into chunks (default ~400 chars with ~10-char overlap).
  3. Embeds each chunk via Bedrock Cohere cohere.embed-multilingual-v3.
  4. Stores vectors in a SimpleVectorStore in memory.
  5. Embeds the query, scores all chunks by cosine similarity, returns the top 12 matches.

The SimpleVectorStore is a class private to this file — it's not a shared singleton; each call rebuilds the store from scratch. There is no caching across calls.

getRagData.js — multi-format folder RAG

Export: performRAGWorkflow(accountId, query)

Differences from ragProcess.js: - Lists all files in the account's S3 prefix, not just one document. - Parses each file based on extension: - .pdfpdf-parse - .docxmammoth - .xlsxxlsx - .csvcsv-parse - .txt / fallback → raw read - Image files → skipped - Larger chunk size (~2000 chars). - Returns top 3 chunks with source filename metadata. - Optionally calls Claude 3 Sonnet (Bedrock) to synthesize an answer over the retrieved chunks.

getRagDataTopics.js — topic variant

Export: getRAGdataTopics(accountId, query)

Architecturally identical to getRagData.js. The prompt and post-processing are tuned for topic-discovery queries (e.g., "what themes show up across this account's brand documents") rather than direct Q&A.

When to use which

Need Use
Quick lookup against a known document ragFilesOperation
Retrieve over a user's full document library performRAGWorkflow
Discover topics across a corpus getRAGdataTopics
Production-scale RAG with managed indexing cloudRagResult (below)

Limitations

  • No persistence — every call re-downloads, re-parses, re-embeds. For accounts with many large files, this is slow and costly. Consider caching the vector store per account if you reuse it across requests.
  • No incremental updates — adding a file to S3 doesn't invalidate anything because nothing is cached.
  • Chunking is naive — fixed character windows, no sentence/paragraph boundary awareness.
  • Bedrock Cohere has a 512-token input cap — chunks larger than ~2000 characters may silently truncate.

Cloud RAG (Vertex AI)

File: helper/functionsForAi/cloudRag.js

Export: cloudRagResult({ ragCorpus, finalPrompt, model })

This is the production RAG path. It uses Google Vertex AI's managed vertexRagStore retrieval tool, where the corpus is pre-indexed server-side and the LLM is given retrieval-augmented context automatically by the Vertex API.

Inputs

Param Description
ragCorpus Vertex corpus ID — see ⚠ note below about where this actually comes from
finalPrompt Pre-templated prompt string (placeholders already substituted)
model Optional model override; defaults to gemini-2.5-flash. Falls back to Claude 3 Sonnet on certain failures

Credential handling

GCP service-account credentials are fetched from AWS Secrets Manager (secret name in GCS_SECRET_NAME env var), not from a local JSON file. The Vertex AI client is cached for 3600 seconds to avoid refetching credentials on every call.

Reliability

  • 30-second timeout with an AbortController.
  • On retrieval / generation failure, the function logs and returns a stringified fallback so callers don't crash.

Callers

Higher-level jobs invoke cloudRagResult directly:

  • generate_company_industry.js
  • generate_organization_profile.js
  • generate_template_industry.js
  • Various content-generation paths in routes/routes.js and the user-specific AI jobs

getAiParametersFunction

File: helper/functionsForAi/getAiParameters.js

Export: getAiParametersFunction({ languageId, ...otherParams })

This is the single source of truth for AI configuration in the app. Every job and route that calls a model goes through it first.

What it does

  1. Reads model configuration from tLanguageModels (filtered by languageId and feature ID).
  2. Optionally pulls examples from tTrainingData for few-shot prompts.
  3. Substitutes placeholders in the prompt template:
  4. ${companyWebsite}, ${orgProfile}, ${content}, ${impersonation}, etc.
  5. Returns:
{
  aiModel,              // e.g. "gemini-2.5-flash" or "anthropic.claude-3-sonnet-..."
  maxTokens,
  temperature,
  topP,
  topK,
  impersonation,        // "you are a marketing copywriter" persona
  inputData,            // raw user input
  outputData,           // last response (for chained calls)
  replacedImpersonation,
  replacedPrompt        // fully substituted prompt ready to send
}

Why it matters for RAG

If the row in tLanguageModels has a corpus reference, the caller passes that corpus to cloudRagResult. If not, the caller may either skip retrieval entirely or fall back to the in-memory implementations.

This is also where you'd flip a feature from in-memory RAG to Vertex Cloud RAG — by editing the row in tLanguageModels rather than touching code.


Data Sources

S3 layout

In-memory RAG documents live under:

s3://${conf.S3_Bucket_Name}/${conf.S3_Path_RAG}account_id_<id>/

Files are uploaded by the user via the document-upload UI. No size or count limit is enforced at the API layer.

Vertex corpora

Vertex corpora are managed externally (via the GCP console, Vertex AI's import API, or an out-of-band ingestion pipeline).

The corpus ID is hardcoded in caller files, not stored in the database.

Earlier versions of this doc claimed the Vertex corpus identifier was persisted on tAccounts.corpusName. That column does not exist in the schema (verified in data-model.md §1).

The actual corpus IDs are literal strings hardcoded in caller files like generate_template_industry.js and generate_company_industry.js:

const ragCorpus = 'projects/1069774850833/locations/us-central1/ragCorpora/7631349568579305472';

This is the same anti-pattern as the Polotno API key — every caller embeds the literal, rotation requires touching every file. Recommended fix: move corpus IDs to environment variables (e.g., VERTEX_RAG_CORPUS, VERTEX_RAG_CORPUS_TEMPLATE_INDUSTRY, etc.) and reference them via conf.js.

If the platform needs per-account corpora (different corpus per tenant), this should become a real DB column on tAccounts (or live in tCloudKnowledgeBase, which exists but its role hasn't been fully verified).

Per-category enablement

tDefaultCategories.rag_active   # boolean per content category

If a category has rag_active = 0, the content-generation pipeline skips RAG retrieval entirely for that category.


Call Chains (Examples)

Production content generation (Vertex)

HTTP request / job tick
  → getAiParametersFunction({languageId, ...})
  → cloudRagResult({ragCorpus, finalPrompt, model})
  → Vertex AI (RAG store retrieval + LLM completion)
  → response written to tUserLibrary

Document Q&A (in-memory, multi-format)

HTTP request
  → performRAGWorkflow(accountId, query)
  → list S3 → parse → chunk → Bedrock Cohere embed → cosine match
  → top-3 chunks (optionally synthesized via Claude 3 Sonnet)
  → response to client

Operational Notes

  • Vertex client cache is per-process. If you rotate the GCS secret, restart all PM2 jobs that use Vertex.
  • No retries on Vertex failures. A single 30s timeout is the entire failure budget. Add idempotent retries upstream if reliability matters.
  • In-memory RAG re-runs per call. Don't use it inside a tight loop — embedding cost scales linearly with input size.
  • Bedrock Cohere quota is per-region. If you hit limits, check the Bedrock service quotas in the same region as S3_Region (usually us-east-1).

  • Integration Inventory — Bedrock, Vertex AI, Gemini configuration
  • Configuration ReferenceS3_Path_RAG, GCS_SECRET_NAME, GEMINI_API_KEY
  • Data ModeltLanguageModels, tTrainingData, tDefaultCategories.rag_active (note: tAccounts.corpusName was previously cited but does not exist — see § Vertex corpora above)
  • Content Pipeline — where retrieval fits into the end-to-end content generation flow
  • User-Specific AI Jobs — async workers that call getAiParametersFunction