Skip to content

Error Handling

Policy in practice

Same shape as siblings:

  • Handler try { ... } catch (e) { console.log('error', e); /* return error envelope */ }
  • No next(err) propagation
  • No central Express error middleware
  • No retry on transient failures

Anti-patterns inherited from siblings

(All called out for Someli-admin-api, all apply here too.)

  1. No next(err) → async errors must be caught per-handler
  2. No retry / dead-letter for jobs and bots
  3. No correlation id
  4. "Something Went Wrong!" default message → not actionable for debugging
  5. Mixed adoption of getSuccessResponse / getErrorResponse (in helper/index.js) vs hand-built envelopes

Specific to designer-api

Cron-tick error swallowing

Most jobs are structured as:

async function doWork() {
  if (!isOnProcess) {
    isOnProcess = true;
    try {
      let result = con.query(`SELECT ...`);
      result.forEach(async (row) => { ... });  // ← unawaited
      isOnProcess = false;
    } catch (e) {
      console.log("error", e);
      isOnProcess = false;
    }
  }
}

Two problems:

  1. The forEach(async ...) doesn't await, so errors inside the async callback don't reach the outer try/catch. They become unhandled promise rejections.
  2. The isOnProcess = false runs immediately after forEach, before the async work completes. The next cron tick can re-enter even though the previous work is still in flight.

Fix: replace forEach(async ...) with for (const row of result) { await ... } and put isOnProcess = false after the loop.

Polotno render errors

Polotno-node uses headless Chrome via puppeteer. If Chrome crashes or hangs, the render promise may never resolve. Jobs that don't add a timeout will hang forever on that row, blocking subsequent ticks.

Fix: wrap renders in Promise.race([render, timeoutPromise]) with a 30-second timeout.

OpenAI errors

Bots call OpenAI without retry. A 429 (rate limit) or 5xx is logged and the call is abandoned. The DB row may end up in an inconsistent state (status flipped to "in-progress" but never completed).

Fix: per-call retry with exponential backoff; mark DB row "failed" if all retries exhausted (so the next cron tick doesn't pick it up).

MySQL errors

sync-mysql throws on connection loss. The job's outer try/catch catches it but the connection object stays in a broken state. Subsequent calls fail. Fix: re-create the sync-mysql connection on error, or use a connection pool.

Recommendations

ID Recommendation Effort
E-1 Replace forEach(async) with for...of await in all 60+ workers Medium (touches every job file)
E-2 Add Polotno render timeout + retry Small
E-3 Add OpenAI retry with exponential backoff (move to helper/openaiClient.js) Small
E-4 Add a tJobFailure table to capture errors instead of console.log Medium
E-5 Add an Express error middleware in server.js Trivial
E-6 Standardise on getSuccessResponse / getErrorResponse envelopes Medium
E-7 Add setTimeout(() => process.exit(1), 60*1000) watchdog in cron-only worker files so a hung handler kills its own PM2 process (PM2 restarts it) Small