Observability¶

Effectively none, same as sibling backends.

What exists¶

console.log / console.error everywhere
One job logs "Starting Job..." with new Date() to mark cron ticks (job_check_color_with_json.js)
Slack notifications for missing content (teamsnotification.js)

What does not exist¶

No structured logging
No error tracker (Sentry, Bugsnag)
No tracing / metrics
No /health / /ready endpoint
No DB connection pool metrics
No OpenAI usage tracking (cost visibility)
No Polotno render duration tracking

Specific concerns¶

57 cron jobs × console.log("\\n Starting Job... ", new Date()); = a noisy log stream. With no structure, finding the right job's failure requires grep-by-filename.
No correlation between routes and jobs. A user-reported issue in the FE that turns out to be a stale-content problem requires reading both routes/routes.js logs and several job_*.js logs to piece together.
OpenAI cost is invisible. The bots burn OpenAI tokens hourly; no per-bot cost reporting. Adding a tApiCall log table would close this gap and pay for itself in a month if it surfaces a bot calling wastefully.
Polotno render duration is invisible. If renders slow down (Chrome memory leak, network slow to S3), no one knows until the queue backs up.

Minimal improvements¶

In order of cost:

Add /health endpoint (1-liner)
Add morgan access log to server.js (1 line)
Add structured logging via pino — replace console.log with logger.info({...})
Add request-id middleware for log correlation
Tag every job's log lines with [<job-name>] prefix — minimal change, big readability win
Add tJobRun audit table — every cron tick inserts a row with start, end, rows-processed, errors
Track Polotno render duration — wrap instance.render(...) in a timer; warn on > 10s
Track OpenAI token usage — every OpenAI call inserts a row in tApiCall
Sentry integration for unhandled errors
Prometheus metrics exposed at /metrics (if a Grafana board exists)

Recommendations¶

Same as sibling backends. The biggest leverage here is tagging logs with job names and adding a tJobRun audit table — both are cheap and immediately useful for the content team.

Cross-component implication¶

Whatever observability stack is adopted platform-wide should support mixed Node.js + PM2 + bots/jobs. Sentry, OpenTelemetry, and Pino all do this. The pattern would be:

Pick the stack (recommend Sentry for errors + Pino for logs)
Implement in one of the smaller services first (e.g., someli-dashboard-be or Someli-admin-api)
Roll out to designer-api next (high benefit because of the cron-job sprawl)
Roll out to someli-api last (biggest blast radius)