Skip to content

Observability

Effectively none, same as sibling backends.

What exists

  • console.log / console.error everywhere
  • One job logs "Starting Job..." with new Date() to mark cron ticks (job_check_color_with_json.js)
  • Slack notifications for missing content (teamsnotification.js)

What does not exist

  • No structured logging
  • No error tracker (Sentry, Bugsnag)
  • No tracing / metrics
  • No /health / /ready endpoint
  • No DB connection pool metrics
  • No OpenAI usage tracking (cost visibility)
  • No Polotno render duration tracking

Specific concerns

  1. 57 cron jobs × console.log("\\n Starting Job... ", new Date()); = a noisy log stream. With no structure, finding the right job's failure requires grep-by-filename.
  2. No correlation between routes and jobs. A user-reported issue in the FE that turns out to be a stale-content problem requires reading both routes/routes.js logs and several job_*.js logs to piece together.
  3. OpenAI cost is invisible. The bots burn OpenAI tokens hourly; no per-bot cost reporting. Adding a tApiCall log table would close this gap and pay for itself in a month if it surfaces a bot calling wastefully.
  4. Polotno render duration is invisible. If renders slow down (Chrome memory leak, network slow to S3), no one knows until the queue backs up.

Minimal improvements

In order of cost:

  1. Add /health endpoint (1-liner)
  2. Add morgan access log to server.js (1 line)
  3. Add structured logging via pino — replace console.log with logger.info({...})
  4. Add request-id middleware for log correlation
  5. Tag every job's log lines with [<job-name>] prefix — minimal change, big readability win
  6. Add tJobRun audit table — every cron tick inserts a row with start, end, rows-processed, errors
  7. Track Polotno render duration — wrap instance.render(...) in a timer; warn on > 10s
  8. Track OpenAI token usage — every OpenAI call inserts a row in tApiCall
  9. Sentry integration for unhandled errors
  10. Prometheus metrics exposed at /metrics (if a Grafana board exists)

Recommendations

Same as sibling backends. The biggest leverage here is tagging logs with job names and adding a tJobRun audit table — both are cheap and immediately useful for the content team.

Cross-component implication

Whatever observability stack is adopted platform-wide should support mixed Node.js + PM2 + bots/jobs. Sentry, OpenTelemetry, and Pino all do this. The pattern would be:

  1. Pick the stack (recommend Sentry for errors + Pino for logs)
  2. Implement in one of the smaller services first (e.g., someli-dashboard-be or Someli-admin-api)
  3. Roll out to designer-api next (high benefit because of the cron-job sprawl)
  4. Roll out to someli-api last (biggest blast radius)