RAG Pipeline¶

The codebase contains two distinct RAG implementations that coexist:

In-memory RAG — local embedding + cosine similarity over per-account documents in S3. Used by helper/ragProcess.js, getRagData.js, getRagDataTopics.js.
Cloud RAG (Vertex AI) — Google Vertex AI's managed RAG with server-side corpora. Used by helper/functionsForAi/cloudRag.js.

Both surface to higher-level callers (jobs and route handlers) through getAiParametersFunction(), which decides per-account/per-feature whether to use a corpus, an in-memory retrieval, or no retrieval at all.

At a Glance¶

Implementation	File	Provider	Vector store	Where used
In-memory (single doc)	`helper/ragProcess.js`	Bedrock Cohere embed	In-memory `SimpleVectorStore`	Targeted document Q&A
In-memory (multi-doc)	`helper/getRagData.js`	Bedrock Cohere embed + Claude 3 Sonnet (optional)	In-memory cosine	RAG over a user's full document folder
In-memory (topics)	`helper/getRagDataTopics.js`	Same as `getRagData.js`	In-memory cosine	Topic-focused variant
Cloud RAG	`helper/functionsForAi/cloudRag.js`	Vertex AI (Gemini 2.5 Flash + Vertex RAG store)	Vertex managed corpus	Production content generation
Parameter loader	`helper/functionsForAi/getAiParameters.js`	—	—	Selects model + prompt for any job

In-Memory RAG¶

`ragProcess.js` — single-doc / single-account flow¶

Export: ragFilesOperation(accountId, content)

Flow:

Reads documents from s3://${bucket}/${conf.S3_Path_RAG}account_id_${accountId}/.
Splits text into chunks (default ~400 chars with ~10-char overlap).
Embeds each chunk via Bedrock Cohere cohere.embed-multilingual-v3.
Stores vectors in a SimpleVectorStore in memory.
Embeds the query, scores all chunks by cosine similarity, returns the top 12 matches.

The SimpleVectorStore is a class private to this file — it's not a shared singleton; each call rebuilds the store from scratch. There is no caching across calls.

`getRagData.js` — multi-format folder RAG¶

Export: performRAGWorkflow(accountId, query)

Differences from ragProcess.js: - Lists all files in the account's S3 prefix, not just one document. - Parses each file based on extension: - .pdf → pdf-parse - .docx → mammoth - .xlsx → xlsx - .csv → csv-parse - .txt / fallback → raw read - Image files → skipped - Larger chunk size (~2000 chars). - Returns top 3 chunks with source filename metadata. - Optionally calls Claude 3 Sonnet (Bedrock) to synthesize an answer over the retrieved chunks.

`getRagDataTopics.js` — topic variant¶

Export: getRAGdataTopics(accountId, query)

Architecturally identical to getRagData.js. The prompt and post-processing are tuned for topic-discovery queries (e.g., "what themes show up across this account's brand documents") rather than direct Q&A.

When to use which¶

Need	Use
Quick lookup against a known document	`ragFilesOperation`
Retrieve over a user's full document library	`performRAGWorkflow`
Discover topics across a corpus	`getRAGdataTopics`
Production-scale RAG with managed indexing	`cloudRagResult` (below)

Limitations¶

No persistence — every call re-downloads, re-parses, re-embeds. For accounts with many large files, this is slow and costly. Consider caching the vector store per account if you reuse it across requests.
No incremental updates — adding a file to S3 doesn't invalidate anything because nothing is cached.
Chunking is naive — fixed character windows, no sentence/paragraph boundary awareness.
Bedrock Cohere has a 512-token input cap — chunks larger than ~2000 characters may silently truncate.

Cloud RAG (Vertex AI)¶

File: helper/functionsForAi/cloudRag.js

Export: cloudRagResult({ ragCorpus, finalPrompt, model })

This is the production RAG path. It uses Google Vertex AI's managed vertexRagStore retrieval tool, where the corpus is pre-indexed server-side and the LLM is given retrieval-augmented context automatically by the Vertex API.

Inputs¶

Param	Description
`ragCorpus`	Vertex corpus ID — see ⚠ note below about where this actually comes from
`finalPrompt`	Pre-templated prompt string (placeholders already substituted)
`model`	Optional model override; defaults to `gemini-2.5-flash`. Falls back to Claude 3 Sonnet on certain failures

Credential handling¶

GCP service-account credentials are fetched from AWS Secrets Manager (secret name in GCS_SECRET_NAME env var), not from a local JSON file. The Vertex AI client is cached for 3600 seconds to avoid refetching credentials on every call.

Reliability¶

30-second timeout with an AbortController.
On retrieval / generation failure, the function logs and returns a stringified fallback so callers don't crash.

Callers¶

Higher-level jobs invoke cloudRagResult directly:

generate_company_industry.js
generate_organization_profile.js
generate_template_industry.js
Various content-generation paths in routes/routes.js and the user-specific AI jobs

getAiParametersFunction¶

File: helper/functionsForAi/getAiParameters.js

Export: getAiParametersFunction({ languageId, ...otherParams })

This is the single source of truth for AI configuration in the app. Every job and route that calls a model goes through it first.

What it does¶

Reads model configuration from tLanguageModels (filtered by languageId and feature ID).
Optionally pulls examples from tTrainingData for few-shot prompts.
Substitutes placeholders in the prompt template:
${companyWebsite}, ${orgProfile}, ${content}, ${impersonation}, etc.
Returns:

{
  aiModel,              // e.g. "gemini-2.5-flash" or "anthropic.claude-3-sonnet-..."
  maxTokens,
  temperature,
  topP,
  topK,
  impersonation,        // "you are a marketing copywriter" persona
  inputData,            // raw user input
  outputData,           // last response (for chained calls)
  replacedImpersonation,
  replacedPrompt        // fully substituted prompt ready to send
}

Why it matters for RAG¶

If the row in tLanguageModels has a corpus reference, the caller passes that corpus to cloudRagResult. If not, the caller may either skip retrieval entirely or fall back to the in-memory implementations.

This is also where you'd flip a feature from in-memory RAG to Vertex Cloud RAG — by editing the row in tLanguageModels rather than touching code.

Data Sources¶

S3 layout¶

In-memory RAG documents live under:

s3://${conf.S3_Bucket_Name}/${conf.S3_Path_RAG}account_id_<id>/

Files are uploaded by the user via the document-upload UI. No size or count limit is enforced at the API layer.

Vertex corpora¶

Vertex corpora are managed externally (via the GCP console, Vertex AI's import API, or an out-of-band ingestion pipeline).

⚠ The corpus ID is hardcoded in caller files, not stored in the database.

Earlier versions of this doc claimed the Vertex corpus identifier was persisted on tAccounts.corpusName. That column does not exist in the schema (verified in data-model.md §1).

The actual corpus IDs are literal strings hardcoded in caller files like generate_template_industry.js and generate_company_industry.js:
const ragCorpus = 'projects/1069774850833/locations/us-central1/ragCorpora/7631349568579305472';
This is the same anti-pattern as the Polotno API key — every caller embeds the literal, rotation requires touching every file. Recommended fix: move corpus IDs to environment variables (e.g., VERTEX_RAG_CORPUS, VERTEX_RAG_CORPUS_TEMPLATE_INDUSTRY, etc.) and reference them via conf.js.

If the platform needs per-account corpora (different corpus per tenant), this should become a real DB column on tAccounts (or live in tCloudKnowledgeBase, which exists but its role hasn't been fully verified).

Per-category enablement¶

tDefaultCategories.rag_active   # boolean per content category

If a category has rag_active = 0, the content-generation pipeline skips RAG retrieval entirely for that category.

Call Chains (Examples)¶

Production content generation (Vertex)¶

HTTP request / job tick
  → getAiParametersFunction({languageId, ...})
  → cloudRagResult({ragCorpus, finalPrompt, model})
  → Vertex AI (RAG store retrieval + LLM completion)
  → response written to tUserLibrary

Document Q&A (in-memory, multi-format)¶

HTTP request
  → performRAGWorkflow(accountId, query)
  → list S3 → parse → chunk → Bedrock Cohere embed → cosine match
  → top-3 chunks (optionally synthesized via Claude 3 Sonnet)
  → response to client

Operational Notes¶

Vertex client cache is per-process. If you rotate the GCS secret, restart all PM2 jobs that use Vertex.
No retries on Vertex failures. A single 30s timeout is the entire failure budget. Add idempotent retries upstream if reliability matters.
In-memory RAG re-runs per call. Don't use it inside a tight loop — embedding cost scales linearly with input size.
Bedrock Cohere quota is per-region. If you hit limits, check the Bedrock service quotas in the same region as S3_Region (usually us-east-1).

Integration Inventory — Bedrock, Vertex AI, Gemini configuration
Configuration Reference — S3_Path_RAG, GCS_SECRET_NAME, GEMINI_API_KEY
Data Model — tLanguageModels, tTrainingData, tDefaultCategories.rag_active (note: tAccounts.corpusName was previously cited but does not exist — see § Vertex corpora above)
Content Pipeline — where retrieval fits into the end-to-end content generation flow
User-Specific AI Jobs — async workers that call getAiParametersFunction