Control-Flow Verification Report¶
A second-pass audit verifying the architectural narratives that the prior verification (verification-report.md §F) explicitly did not cover. Four parallel deep-reads of the actual code paths against the doc claims.
Generated as a snapshot. Re-run when the codebase changes materially.
Summary¶
| Category | Count |
|---|---|
| Claims verified ✅ | 38 |
| Claims with drift 🔧 | 14 |
| Critical findings (must-fix) | 5 |
| Already-flagged but now resolved | 2 |
Five findings rise to the level of "doc is materially wrong about a security or operational claim." They are listed first; the rest follows by subsystem.
Critical Findings¶
CF-1 ⚠️ /auth/login issues plaintext tokens, not encrypted¶
Doc claim (authentication.md): every token is JWT-signed and AES-encrypted via encryptData.
Reality (routes/routes.js:3024–3086): the login handler returns a token that is a plain string concatenation:
There is no encryptData call on the login path. The encrypted-token mechanism described in the doc applies elsewhere (e.g., OAuth callbacks via social.js) but not to first-party login.
Severity: high. Anyone who reads this token in any log / analytics pipeline / cookie store can: - Trivially extract the user ID - Forge a token by guessing a timestamp
Impact:
- authentication.md § Token Lifecycle is misleading
- security.md § Authentication may need to add this as a finding
- This is a real security issue, not just a doc fix
Recommendation: treat as a finding for the security backlog. Update authentication.md to be explicit that login produces a plaintext token while OAuth produces an encrypted one. Add to the security findings list.
CF-2 ⚠️ Streaming uses Server-Sent Events (SSE), not Socket.IO¶
Doc claim (agents-and-ai.md § ResearchAgent / ProfileAgent): streaming events (research_chunk, profile_chunk, etc.) are emitted via Socket.IO.
Reality (routes/routes.js): the streaming callbacks are handled via SSE (res.write('data: ...\n\n')) on the HTTP response. Socket.IO references in routes/routes.js are commented out (lines 38–39).
Impact:
- agents-and-ai.md is materially wrong about the streaming transport
- realtime-events.md characterizes Socket.IO as "prepared but unused" — the agents-and-ai claim contradicts this
Recommendation: update agents-and-ai.md § Streaming sections to reference SSE. Verify whether any client code expects Socket.IO for these events (they may be receiving SSE and the doc is the only thing that's wrong).
CF-3 ⚠️ tMemberAuth schema differs materially from documented¶
Doc claim (data-model.md §5.1, authentication.md): tMemberAuth has columns access_token, refresh_token, token_secret, expiry_date, acc_name, profileUrl, isLinked, isDeleted.
Reality (someli-schema.sql): the actual columns are id, accountId, member_Id, provider_Id, providerAuth, verified, detail, acctResp, FbAccounts, IgAccounts, LnAccounts, TwtAccounts, token_status, tactive, Added_at, expiryDate, Updated_at, removed, revoked_at, refresh_revoked, refresh_at, isinvalid, last_error, last_check_at.
Significant differences:
- Tokens live inside the JSON detail blob, not as discrete columns
- Per-platform columns FbAccounts, IgAccounts, LnAccounts, TwtAccounts exist that aren't documented
- Status fields verified, tactive, removed, revoked_at, refresh_revoked, isinvalid aren't documented
- Audit fields last_error, last_check_at aren't documented
- acc_name, profileUrl, isLinked, isDeleted claimed by doc — don't exist as columns
Impact: anyone querying tMemberAuth based on the doc will write broken queries. This is one of the most-referenced tables in the codebase.
Recommendation: rewrite the tMemberAuth section in data-model.md §5.1 from the actual schema. Tokens-in-JSON is itself a finding worth calling out (less indexable, harder to rotate, harder to audit).
CF-4 ⚠️ tEmailSchedule.Status uses strings, not integers¶
Doc claim (data-model.md §5.13, notifications.md): the email queue uses status 0 (pending) → 1 (sent) → 2 (failed).
Reality (job_send_mail.js, helper/helper.js): the column holds string values:
- INSERT:
status = 'Inserted'orstatus = 'Delivered' - Polling:
WHERE status = 'Inserted' - Update on success:
status = 'Delivered' - Update on failure:
status = 'Pending'
The string 'Pending' for failure (as opposed to retry) is itself confusing and probably reflects ambiguity in the original implementation.
Impact:
- data-model.md's tEmailSchedule row is wrong about column behavior
- notifications.md § Email queue flow describes an integer state machine that doesn't exist
- Anyone querying this table based on the doc would miss every email
Recommendation: update both docs to reflect string-based status. Flag the 'Pending' overload (means "failed, not pending") as a confusion point worth fixing in code eventually.
CF-5 ⚠️ job_send_mail.js filters by a single TemplateId¶
Doc claim (notifications.md): job_send_mail.js polls all pending emails (any template ID), calls SendGrid for each, updates status.
Reality (job_send_mail.js:20): the SELECT query filters to a single specific TemplateId 'd-bf7b8ec288304eefba4039d08ccf0cbb':
SELECT ... FROM tEmailSchedule WHERE status = 'Inserted' AND TemplateId = 'd-bf7b8ec288304eefba4039d08ccf0cbb'
This means only emails using that one template are processed by this job. Either:
- (a) Other templates are processed by other workers I haven't found
- (b) Other templates accumulate forever in tEmailSchedule with status = 'Inserted'
Impact:
- notifications.md is materially wrong about the email-flow architecture
- Possible operational issue — if (b) is the case, the email queue has a silent backlog
Recommendation: investigate whether other email-sending workers exist. Update notifications.md to reflect what the code actually does. If (b) is the case, this is also an operational bug.
Subsystem Verifications (Detail)¶
A. Authentication Subsystem¶
| Claim | Result |
|---|---|
encryptData and decryptData exist in helper/tokenGenerator.js and use AES via CryptoJS |
✅ |
Tokens use JWT-then-AES dual encryption (in tokenGenerator.js) |
✅ |
Key source: JWT_SECRET_KEY env var |
✅ |
| AES mode is CBC with random IV embedded in ciphertext (CryptoJS default) | ✅ (clarification: CryptoJS defaults to AES-256-CBC, not ECB) |
Dual auth paths (middlewares/auth.js + methods.js) are independent implementations |
✅ |
middlewares/auth.js returns 401 + plain string "unauthorized access" |
✅ |
methods.js returns 403 + sendStatus (no body) |
✅ |
OAuth flow uses Passport's standard authenticate callback |
✅ |
tMemberAuth columns match doc |
❌ CF-3 |
express-session hardcoded secret on server.js:13 |
✅ |
Token header read from TOKEN_HEADER_KEY env var |
🔧 only middlewares/auth.js does this; methods.js hardcodes "authorization" |
/auth/login produces an encrypted token |
❌ CF-1 |
Other findings worth noting:
- OAuth callback
social.js:32–38usesencryptData(userData, '10m')to wrap the token before the redirect — confirming the encryption mechanism IS applied on OAuth flows but NOT on first-party login. - The DB write that creates the
tMemberAuthrow appears to happen elsewhere (not in the visible callback handler) — worth tracing if the team needs to debug OAuth onboarding issues.
B. AI Subsystems¶
| Claim | Result |
|---|---|
4 agents (Conversation, InputParser, Research, Profile) are imported in routes/routes.js |
✅ |
ConversationAgent instantiates InputParserAgent internally (conversationAgent.js:2,32) |
✅ |
| Sequential orchestration (Conversation → Parser → Research → Profile) | 🔧 not strictly sequential — orchestrated as session-managed conditional flow |
All 4 agents use gemini-2.5-flash |
✅ |
ResearchAgent uses tools: [{ googleSearch: {} }] |
✅ |
| Streaming via Socket.IO | ❌ CF-2 (uses SSE) |
| brand → objective → recom dependency chain enforced by status flags | ✅ |
job_brand_positioning_ai.js uses Gemini via getGeminiResult |
✅ |
job_recom_subjects_ai.js uses LlamaFunction({ id: 45, ... }) |
✅ |
job_translate_and_rephrase.js enqueues RGEN follow-up |
✅ |
tJobs status state machine 0→1→2/3/4 |
✅ |
| Doc-claimed type codes match reality | 🔧 see below |
helper/ragProcess.js uses Bedrock Cohere cohere.embed-multilingual-v3 |
✅ |
| Cloud RAG corpus ID source | 🔧 hardcoded literal in callers, not a DB column |
getAiParametersFunction reads from tLanguageModels + substitutes |
✅ |
Type-code drift (significant):
The documented and actual type-code lists diverge:
| In docs but NOT in code | In code but NOT in docs |
|---|---|
DPGEN, UMGEN, MCC, TCH, FC, RTCON |
BGEN, CLKB, GEML, HPB, PIC, TCC |
Actual code-used type codes: BGEN, CGEN, CLKB, DGEN, GEML, HPB, PIC, RGEN, TCC, TCON, TIV, TSB, UCL, UGEN (14 codes).
Doc-claimed type codes: CGEN, UGEN, DGEN, DPGEN, UMGEN, TIV, MCC, TCH, UCL, TSB, FC, TCON, RTCON, RGEN (14 codes; only 8 overlap with reality).
Recommendation: update content-pipeline.md § Job Type Codes and jobs-inventory.md § Job Type Codes from the actual codes. The doc-claimed codes that aren't in code may be: (a) historical and removed, (b) my own audit's invention based on file naming, (c) defined in a part of the code I didn't inspect. Run SELECT DISTINCT type FROM tJobs ORDER BY count(*) DESC to settle which are actively used in the running DB.
Cloud RAG corpus ID: the agent confirmed that the corpus ID is hardcoded as a literal in callers like generate_template_industry.js:
This resolves the open question from verification-report.md §C.2 (the tAccounts.corpusName reference). It's not a DB column at all — it's a hardcoded literal. This is itself a finding: changing the corpus requires editing every caller that uses it, similar to the Polotno-key problem. Should be moved to .env (VERTEX_RAG_CORPUS).
C. Operational Subsystems¶
| Claim | Result |
|---|---|
| Email producers INSERT with status (numeric 0) | ❌ CF-4 |
tEmailSchedule columns match doc |
🔧 missing Updated_time, Sent_time from doc |
job_send_mail.js runs every 60 seconds |
✅ (*/1 * * * *) |
job_send_mail.js polls ALL pending emails |
❌ CF-5 |
| Slack token + channel hardcoded in multiple files | ✅ |
| Slack failures silently dropped, no retry | ✅ |
helper/postValidation.js checkVisibity flow |
🔧 see below |
| Polotno hardcoded key in 49 files | ✅ (97 occurrences; agent counted occurrences vs files — both metrics correct, the doc says files) |
| 22 of 97 createInstance sites lack a close call | 🔧 see below |
Dashboard mounted at /auth in production |
✅ |
| Dashboard standalone on port 6001 | ✅ |
| Dashboard cron job schedules | 🔧 job_post_insights.js is wrong |
| Dashboard has 22 endpoints | ✅ |
Socket.IO socketConnection global |
✅ |
Only update event handled, broadcast to all |
✅ |
emitDataToClient defined in actions.js, never called |
✅ |
postValidation.js checkVisibity flow:
The agent confirmed the documented sequence is mostly correct, with two clarifications:
- Step 5 "Download JSON again" is present in code but not in the doc — a redundant fetch worth flagging
- Step 11 instance.close() is not called in checkVisibity — confirms this function leaks the Polotno instance
This is itself a leak site to add to the count. The doc said "close instance"; reality is the helper function leaks.
Polotno leak count clarification:
I previously claimed "22 leak sites" based on 97 createInstance - 75 close = 22. The agent counted differently: "files with createInstance but no close in same file" = 31. Both metrics tell a similar story (~22-31 leak locations) but the methodology differs. The 22 figure is the call-site delta; the 31 figure is the file-level non-coverage. Both are reasonable signals; the docs should pick one and explain.
Dashboard job_post_insights.js schedule:
- Documented: every 30 seconds
- Actual:
"0 */12 * * *"= every 12 hours at the hour boundary
This is a substantial scheduling drift. Either: - (a) The doc is wrong and the cron is intentionally rare - (b) The cron was changed without updating the doc - (c) There's a real operational issue (insights aren't being captured at the documented frequency)
Recommendation: investigate which is the case. Update dashboard-analytics.md to match reality, or fix the cron if the change was unintentional.
D. Deployment & Architecture¶
| Claim | Result |
|---|---|
| Three deployment paths (GHA / Jenkins / push.sh) | ✅ |
Dockerfile has no yarn install step |
✅ |
GitHub Actions does not run yarn install |
✅ |
Jenkinsfile DOES run yarn install --frozen-lockfile |
✅ |
push.sh hardcodes /home/nisanth/someli-api |
✅ |
nginx.conf server name uapi.someli.ai _, port 3000, 100M body limit |
✅ |
| Express body parser at 150 MB | ✅ (mismatch with nginx's 100M confirmed) |
| Middleware stack order matches doc | ✅ |
| Helmet and rate-limiting absent from middleware chain | ✅ |
App = { db, server, socket }, exposed via module.exports.appData |
✅ |
/health and /db-health endpoints exist |
✅ |
/health returns { status: 'Server is running', port, environment, timestamp } |
🔧 doc said status: 'ok'; actual is the string 'Server is running' |
getSuccessResponse / getErrorResponse shapes match helper/helper.js |
✅ |
| No global Express error handler | ✅ |
actions/actions.js CRUD methods signal errors via cb(null) |
✅ |
winston in package.json but never imported |
🔧 imported once in helper/functionsForAi/cloudRag.js; effectively unused but not zero callers |
Other findings:
- Healthcheck
/healthreturnsstatus: 'Server is running'(string), not'ok'. Minor doc fix. - Winston's single import in
cloudRag.jsis the one place it's used — worth noting this isolated usage inlogging-observability.mdso the picture is complete.
Already-Flagged Items Now Resolved¶
The control-flow audit answered two open questions from prior reports:
| Question | Resolution |
|---|---|
Where is the RAG corpus ID actually stored? (verification-report.md §C.2) |
Hardcoded as a literal in caller files (e.g., generate_template_industry.js). Not in any DB column. Should be moved to .env. |
Which job-type codes are actually used? (data-model.md §1 audit) |
The 14-code list above (BGEN, CGEN, CLKB, DGEN, GEML, HPB, PIC, RGEN, TCC, TCON, TIV, TSB, UCL, UGEN) is the code-side truth. Run SELECT DISTINCT type FROM tJobs to confirm against the running DB. |
Recommended Doc Fixes¶
In priority order:
| # | Doc | Change |
|---|---|---|
| 1 | authentication.md § Token Lifecycle |
Add explicit note: first-party /auth/login returns a plaintext concatenation token; OAuth callbacks return AES-encrypted tokens. The two flows are not consistent. |
| 2 | security.md § Authentication / Hard Truths |
Add the plaintext-login-token finding |
| 3 | agents-and-ai.md § Streaming sections |
Replace Socket.IO references with SSE |
| 4 | data-model.md §5.1 tMemberAuth |
Rewrite from actual schema; flag tokens-in-JSON as anti-pattern |
| 5 | data-model.md §5.13 tEmailSchedule + notifications.md § Email queue |
Status is string-based (Inserted/Delivered/Pending), not numeric; add Updated_time, Sent_time columns; flag the single-template filter in job_send_mail.js |
| 6 | notifications.md § Email flow |
Note the single-template filter in job_send_mail.js; flag as an open question |
| 7 | content-pipeline.md § Job Type Codes + jobs-inventory.md § Job Type Codes |
Replace the doc-claimed list with the actual 14 codes |
| 8 | rag-pipeline.md § Cloud RAG |
Note the corpus ID is hardcoded in callers, not a DB column. Recommend moving to .env. |
| 9 | dashboard-analytics.md § Cron services |
Fix the job_post_insights.js schedule (every 12h at hour, not every 30s) |
| 10 | media-processing.md + Integration-inventory.md §16 |
Note that helper/postValidation.js:checkVisibity is itself a Polotno leak site |
| 11 | agents-and-ai.md § Onboarding flow |
Soften "sequential pipeline" → "session-managed conditional flow" |
| 12 | authentication.md § Token validation |
Note methods.js hardcodes "authorization" header rather than reading TOKEN_HEADER_KEY |
| 13 | logging-observability.md § Logging |
Correct: Winston has one caller (cloudRag.js), not zero |
| 14 | architecture-overview.md § Healthcheck |
/health returns status: 'Server is running', not 'ok' |
The first 6 are the substantive corrections. 7–14 are minor accuracy fixes.
What This Audit Did NOT Cover¶
In the interest of finishing this pass:
- Per-endpoint auth role declarations — there are 728 endpoints; the audit didn't validate that auth posture matches what the docs claim per route family.
- The
tCloudKnowledgeBasetable — flagged in earlier audits as a likely RAG-corpus location, but with the corpus-ID-hardcoded finding now confirmed, this table's role remains unclear. Worth a focused investigation. - Browser / Socket.IO client side — we don't have access to the web client repo, so claims about "the frontend uses Socket.IO" can't be verified end-to-end.
- Production runtime — only static code analysis was done. Whether the cron schedules listed actually fire as expected, whether the queue depth is what we'd expect, etc., need observability data, not code reading.
Recommendation¶
The 5 critical findings (CF-1 through CF-5) should be:
- Reflected in the relevant docs immediately
- Triaged as engineering issues:
- CF-1 (plaintext login token) — a security finding. Should be on the security backlog.
- CF-2 (SSE not Socket.IO) — purely a doc fix.
- CF-3 (
tMemberAuthschema) — purely a doc fix. - CF-4 (string status) — purely a doc fix; the string-based status is not technically wrong, just not what's documented.
- CF-5 (single-template filter) — needs investigation: is it intentional or an operational bug? Either way the doc needs to reflect reality.
I can apply doc fixes #1–#14 from §Recommended Doc Fixes if you'd like — same as the prior pass. Some of the changes (CF-3 specifically — rewriting tMemberAuth) require reading the actual schema for the new column list, but otherwise these are all surgical edits.
Want me to apply the fixes?