Control-Flow Verification Report¶

A second-pass audit verifying the architectural narratives that the prior verification (verification-report.md §F) explicitly did not cover. Four parallel deep-reads of the actual code paths against the doc claims.

Generated as a snapshot. Re-run when the codebase changes materially.

Summary¶

Category	Count
Claims verified ✅	38
Claims with drift 🔧	14
Critical findings (must-fix)	5
Already-flagged but now resolved	2

Five findings rise to the level of "doc is materially wrong about a security or operational claim." They are listed first; the rest follows by subsystem.

Critical Findings¶

CF-1 ⚠️ `/auth/login` issues plaintext tokens, not encrypted¶

Doc claim (authentication.md): every token is JWT-signed and AES-encrypted via encryptData.

Reality (routes/routes.js:3024–3086): the login handler returns a token that is a plain string concatenation:

access_token = `${userId}_${timestamp}_${userId}`

There is no encryptData call on the login path. The encrypted-token mechanism described in the doc applies elsewhere (e.g., OAuth callbacks via social.js) but not to first-party login.

Severity: high. Anyone who reads this token in any log / analytics pipeline / cookie store can: - Trivially extract the user ID - Forge a token by guessing a timestamp

Impact: - authentication.md § Token Lifecycle is misleading - security.md § Authentication may need to add this as a finding - This is a real security issue, not just a doc fix

Recommendation: treat as a finding for the security backlog. Update authentication.md to be explicit that login produces a plaintext token while OAuth produces an encrypted one. Add to the security findings list.

CF-2 ⚠️ Streaming uses Server-Sent Events (SSE), not Socket.IO¶

Doc claim (agents-and-ai.md § ResearchAgent / ProfileAgent): streaming events (research_chunk, profile_chunk, etc.) are emitted via Socket.IO.

Reality (routes/routes.js): the streaming callbacks are handled via SSE (res.write('data: ...\n\n')) on the HTTP response. Socket.IO references in routes/routes.js are commented out (lines 38–39).

Impact: - agents-and-ai.md is materially wrong about the streaming transport - realtime-events.md characterizes Socket.IO as "prepared but unused" — the agents-and-ai claim contradicts this

Recommendation: update agents-and-ai.md § Streaming sections to reference SSE. Verify whether any client code expects Socket.IO for these events (they may be receiving SSE and the doc is the only thing that's wrong).

CF-3 ⚠️ `tMemberAuth` schema differs materially from documented¶

Doc claim (data-model.md §5.1, authentication.md): tMemberAuth has columns access_token, refresh_token, token_secret, expiry_date, acc_name, profileUrl, isLinked, isDeleted.

Reality (someli-schema.sql): the actual columns are id, accountId, member_Id, provider_Id, providerAuth, verified, detail, acctResp, FbAccounts, IgAccounts, LnAccounts, TwtAccounts, token_status, tactive, Added_at, expiryDate, Updated_at, removed, revoked_at, refresh_revoked, refresh_at, isinvalid, last_error, last_check_at.

Significant differences: - Tokens live inside the JSON detail blob, not as discrete columns - Per-platform columns FbAccounts, IgAccounts, LnAccounts, TwtAccounts exist that aren't documented - Status fields verified, tactive, removed, revoked_at, refresh_revoked, isinvalid aren't documented - Audit fields last_error, last_check_at aren't documented - acc_name, profileUrl, isLinked, isDeleted claimed by doc — don't exist as columns

Impact: anyone querying tMemberAuth based on the doc will write broken queries. This is one of the most-referenced tables in the codebase.

Recommendation: rewrite the tMemberAuth section in data-model.md §5.1 from the actual schema. Tokens-in-JSON is itself a finding worth calling out (less indexable, harder to rotate, harder to audit).

CF-4 ⚠️ `tEmailSchedule.Status` uses strings, not integers¶

Doc claim (data-model.md §5.13, notifications.md): the email queue uses status 0 (pending) → 1 (sent) → 2 (failed).

Reality (job_send_mail.js, helper/helper.js): the column holds string values:

INSERT: status = 'Inserted' or status = 'Delivered'
Polling: WHERE status = 'Inserted'
Update on success: status = 'Delivered'
Update on failure: status = 'Pending'

The string 'Pending' for failure (as opposed to retry) is itself confusing and probably reflects ambiguity in the original implementation.

Impact: - data-model.md's tEmailSchedule row is wrong about column behavior - notifications.md § Email queue flow describes an integer state machine that doesn't exist - Anyone querying this table based on the doc would miss every email

Recommendation: update both docs to reflect string-based status. Flag the 'Pending' overload (means "failed, not pending") as a confusion point worth fixing in code eventually.

CF-5 ⚠️ `job_send_mail.js` filters by a single TemplateId¶

Doc claim (notifications.md): job_send_mail.js polls all pending emails (any template ID), calls SendGrid for each, updates status.

Reality (job_send_mail.js:20): the SELECT query filters to a single specific TemplateId 'd-bf7b8ec288304eefba4039d08ccf0cbb':

SELECT ... FROM tEmailSchedule WHERE status = 'Inserted' AND TemplateId = 'd-bf7b8ec288304eefba4039d08ccf0cbb'

This means only emails using that one template are processed by this job. Either: - (a) Other templates are processed by other workers I haven't found - (b) Other templates accumulate forever in tEmailSchedule with status = 'Inserted'

Impact: - notifications.md is materially wrong about the email-flow architecture - Possible operational issue — if (b) is the case, the email queue has a silent backlog

Recommendation: investigate whether other email-sending workers exist. Update notifications.md to reflect what the code actually does. If (b) is the case, this is also an operational bug.

Subsystem Verifications (Detail)¶

A. Authentication Subsystem¶

Claim	Result
`encryptData` and `decryptData` exist in `helper/tokenGenerator.js` and use AES via CryptoJS	✅
Tokens use JWT-then-AES dual encryption (in `tokenGenerator.js`)	✅
Key source: `JWT_SECRET_KEY` env var	✅
AES mode is CBC with random IV embedded in ciphertext (CryptoJS default)	✅ (clarification: CryptoJS defaults to AES-256-CBC, not ECB)
Dual auth paths (`middlewares/auth.js` + `methods.js`) are independent implementations	✅
`middlewares/auth.js` returns 401 + plain string `"unauthorized access"`	✅
`methods.js` returns 403 + `sendStatus` (no body)	✅
OAuth flow uses Passport's standard `authenticate` callback	✅
`tMemberAuth` columns match doc	❌ CF-3
`express-session` hardcoded secret on `server.js:13`	✅
Token header read from `TOKEN_HEADER_KEY` env var	🔧 only `middlewares/auth.js` does this; `methods.js` hardcodes `"authorization"`
`/auth/login` produces an encrypted token	❌ CF-1

Other findings worth noting:

OAuth callback social.js:32–38 uses encryptData(userData, '10m') to wrap the token before the redirect — confirming the encryption mechanism IS applied on OAuth flows but NOT on first-party login.
The DB write that creates the tMemberAuth row appears to happen elsewhere (not in the visible callback handler) — worth tracing if the team needs to debug OAuth onboarding issues.

B. AI Subsystems¶

Claim	Result
4 agents (Conversation, InputParser, Research, Profile) are imported in `routes/routes.js`	✅
ConversationAgent instantiates InputParserAgent internally (`conversationAgent.js:2,32`)	✅
Sequential orchestration (Conversation → Parser → Research → Profile)	🔧 not strictly sequential — orchestrated as session-managed conditional flow
All 4 agents use `gemini-2.5-flash`	✅
ResearchAgent uses `tools: [{ googleSearch: {} }]`	✅
Streaming via Socket.IO	❌ CF-2 (uses SSE)
brand → objective → recom dependency chain enforced by status flags	✅
`job_brand_positioning_ai.js` uses Gemini via `getGeminiResult`	✅
`job_recom_subjects_ai.js` uses `LlamaFunction({ id: 45, ... })`	✅
`job_translate_and_rephrase.js` enqueues `RGEN` follow-up	✅
`tJobs` status state machine 0→1→2/3/4	✅
Doc-claimed type codes match reality	🔧 see below
`helper/ragProcess.js` uses Bedrock Cohere `cohere.embed-multilingual-v3`	✅
Cloud RAG corpus ID source	🔧 hardcoded literal in callers, not a DB column
`getAiParametersFunction` reads from `tLanguageModels` + substitutes	✅

Type-code drift (significant):

The documented and actual type-code lists diverge:

In docs but NOT in code	In code but NOT in docs
`DPGEN`, `UMGEN`, `MCC`, `TCH`, `FC`, `RTCON`	`BGEN`, `CLKB`, `GEML`, `HPB`, `PIC`, `TCC`

Actual code-used type codes: BGEN, CGEN, CLKB, DGEN, GEML, HPB, PIC, RGEN, TCC, TCON, TIV, TSB, UCL, UGEN (14 codes).

Doc-claimed type codes: CGEN, UGEN, DGEN, DPGEN, UMGEN, TIV, MCC, TCH, UCL, TSB, FC, TCON, RTCON, RGEN (14 codes; only 8 overlap with reality).

Recommendation: update content-pipeline.md § Job Type Codes and jobs-inventory.md § Job Type Codes from the actual codes. The doc-claimed codes that aren't in code may be: (a) historical and removed, (b) my own audit's invention based on file naming, (c) defined in a part of the code I didn't inspect. Run SELECT DISTINCT type FROM tJobs ORDER BY count(*) DESC to settle which are actively used in the running DB.

Cloud RAG corpus ID: the agent confirmed that the corpus ID is hardcoded as a literal in callers like generate_template_industry.js:

const ragCorpus = 'projects/1069774850833/locations/us-central1/ragCorpora/7631349568579305472';

This resolves the open question from verification-report.md §C.2 (the tAccounts.corpusName reference). It's not a DB column at all — it's a hardcoded literal. This is itself a finding: changing the corpus requires editing every caller that uses it, similar to the Polotno-key problem. Should be moved to .env (VERTEX_RAG_CORPUS).

C. Operational Subsystems¶

Claim	Result
Email producers INSERT with status (numeric 0)	❌ CF-4
`tEmailSchedule` columns match doc	🔧 missing `Updated_time`, `Sent_time` from doc
`job_send_mail.js` runs every 60 seconds	✅ (`/1 * * *`)
`job_send_mail.js` polls ALL pending emails	❌ CF-5
Slack token + channel hardcoded in multiple files	✅
Slack failures silently dropped, no retry	✅
`helper/postValidation.js` `checkVisibity` flow	🔧 see below
Polotno hardcoded key in 49 files	✅ (97 occurrences; agent counted occurrences vs files — both metrics correct, the doc says files)
22 of 97 createInstance sites lack a close call	🔧 see below
Dashboard mounted at `/auth` in production	✅
Dashboard standalone on port 6001	✅
Dashboard cron job schedules	🔧 `job_post_insights.js` is wrong
Dashboard has 22 endpoints	✅
Socket.IO `socketConnection` global	✅
Only `update` event handled, broadcast to all	✅
`emitDataToClient` defined in actions.js, never called	✅

postValidation.js checkVisibity flow:

The agent confirmed the documented sequence is mostly correct, with two clarifications: - Step 5 "Download JSON again" is present in code but not in the doc — a redundant fetch worth flagging - Step 11 instance.close() is not called in checkVisibity — confirms this function leaks the Polotno instance

This is itself a leak site to add to the count. The doc said "close instance"; reality is the helper function leaks.

Polotno leak count clarification:

I previously claimed "22 leak sites" based on 97 createInstance - 75 close = 22. The agent counted differently: "files with createInstance but no close in same file" = 31. Both metrics tell a similar story (~22-31 leak locations) but the methodology differs. The 22 figure is the call-site delta; the 31 figure is the file-level non-coverage. Both are reasonable signals; the docs should pick one and explain.

Dashboard job_post_insights.js schedule:

Documented: every 30 seconds
Actual: "0 */12 * * *" = every 12 hours at the hour boundary

This is a substantial scheduling drift. Either: - (a) The doc is wrong and the cron is intentionally rare - (b) The cron was changed without updating the doc - (c) There's a real operational issue (insights aren't being captured at the documented frequency)

Recommendation: investigate which is the case. Update dashboard-analytics.md to match reality, or fix the cron if the change was unintentional.

D. Deployment & Architecture¶

Claim	Result
Three deployment paths (GHA / Jenkins / push.sh)	✅
Dockerfile has no `yarn install` step	✅
GitHub Actions does not run `yarn install`	✅
Jenkinsfile DOES run `yarn install --frozen-lockfile`	✅
`push.sh` hardcodes `/home/nisanth/someli-api`	✅
`nginx.conf` server name `uapi.someli.ai _`, port 3000, `100M` body limit	✅
Express body parser at 150 MB	✅ (mismatch with nginx's 100M confirmed)
Middleware stack order matches doc	✅
Helmet and rate-limiting absent from middleware chain	✅
`App = { db, server, socket }`, exposed via `module.exports.appData`	✅
`/health` and `/db-health` endpoints exist	✅
`/health` returns `{ status: 'Server is running', port, environment, timestamp }`	🔧 doc said `status: 'ok'`; actual is the string `'Server is running'`
`getSuccessResponse` / `getErrorResponse` shapes match `helper/helper.js`	✅
No global Express error handler	✅
`actions/actions.js` CRUD methods signal errors via `cb(null)`	✅
`winston` in package.json but never imported	🔧 imported once in `helper/functionsForAi/cloudRag.js`; effectively unused but not zero callers

Other findings:

Healthcheck /health returns status: 'Server is running' (string), not 'ok'. Minor doc fix.
Winston's single import in cloudRag.js is the one place it's used — worth noting this isolated usage in logging-observability.md so the picture is complete.

Already-Flagged Items Now Resolved¶

The control-flow audit answered two open questions from prior reports:

Question	Resolution
Where is the RAG corpus ID actually stored? (`verification-report.md` §C.2)	Hardcoded as a literal in caller files (e.g., `generate_template_industry.js`). Not in any DB column. Should be moved to `.env`.
Which job-type codes are actually used? (`data-model.md` §1 audit)	The 14-code list above (BGEN, CGEN, CLKB, DGEN, GEML, HPB, PIC, RGEN, TCC, TCON, TIV, TSB, UCL, UGEN) is the code-side truth. Run `SELECT DISTINCT type FROM tJobs` to confirm against the running DB.

Recommended Doc Fixes¶

In priority order:

#	Doc	Change
1	`authentication.md` § Token Lifecycle	Add explicit note: first-party `/auth/login` returns a plaintext concatenation token; OAuth callbacks return AES-encrypted tokens. The two flows are not consistent.
2	`security.md` § Authentication / Hard Truths	Add the plaintext-login-token finding
3	`agents-and-ai.md` § Streaming sections	Replace Socket.IO references with SSE
4	`data-model.md` §5.1 `tMemberAuth`	Rewrite from actual schema; flag tokens-in-JSON as anti-pattern
5	`data-model.md` §5.13 `tEmailSchedule` + `notifications.md` § Email queue	Status is string-based (`Inserted`/`Delivered`/`Pending`), not numeric; add `Updated_time`, `Sent_time` columns; flag the single-template filter in `job_send_mail.js`
6	`notifications.md` § Email flow	Note the single-template filter in `job_send_mail.js`; flag as an open question
7	`content-pipeline.md` § Job Type Codes + `jobs-inventory.md` § Job Type Codes	Replace the doc-claimed list with the actual 14 codes
8	`rag-pipeline.md` § Cloud RAG	Note the corpus ID is hardcoded in callers, not a DB column. Recommend moving to `.env`.
9	`dashboard-analytics.md` § Cron services	Fix the `job_post_insights.js` schedule (every 12h at hour, not every 30s)
10	`media-processing.md` + `Integration-inventory.md` §16	Note that `helper/postValidation.js:checkVisibity` is itself a Polotno leak site
11	`agents-and-ai.md` § Onboarding flow	Soften "sequential pipeline" → "session-managed conditional flow"
12	`authentication.md` § Token validation	Note `methods.js` hardcodes `"authorization"` header rather than reading TOKEN_HEADER_KEY
13	`logging-observability.md` § Logging	Correct: Winston has one caller (`cloudRag.js`), not zero
14	`architecture-overview.md` § Healthcheck	`/health` returns `status: 'Server is running'`, not `'ok'`

The first 6 are the substantive corrections. 7–14 are minor accuracy fixes.

What This Audit Did NOT Cover¶

In the interest of finishing this pass:

Per-endpoint auth role declarations — there are 728 endpoints; the audit didn't validate that auth posture matches what the docs claim per route family.
The tCloudKnowledgeBase table — flagged in earlier audits as a likely RAG-corpus location, but with the corpus-ID-hardcoded finding now confirmed, this table's role remains unclear. Worth a focused investigation.
Browser / Socket.IO client side — we don't have access to the web client repo, so claims about "the frontend uses Socket.IO" can't be verified end-to-end.
Production runtime — only static code analysis was done. Whether the cron schedules listed actually fire as expected, whether the queue depth is what we'd expect, etc., need observability data, not code reading.

Recommendation¶

The 5 critical findings (CF-1 through CF-5) should be:

Reflected in the relevant docs immediately
Triaged as engineering issues:
CF-1 (plaintext login token) — a security finding. Should be on the security backlog.
CF-2 (SSE not Socket.IO) — purely a doc fix.
CF-3 (tMemberAuth schema) — purely a doc fix.
CF-4 (string status) — purely a doc fix; the string-based status is not technically wrong, just not what's documented.
CF-5 (single-template filter) — needs investigation: is it intentional or an operational bug? Either way the doc needs to reflect reality.

I can apply doc fixes #1–#14 from §Recommended Doc Fixes if you'd like — same as the prior pass. Some of the changes (CF-3 specifically — rewriting tMemberAuth) require reading the actual schema for the new column list, but otherwise these are all surgical edits.

Want me to apply the fixes?