Error Handling¶

Policy in practice¶

Same shape as siblings:

Handler try { ... } catch (e) { console.log('error', e); /* return error envelope */ }
No next(err) propagation
No central Express error middleware
No retry on transient failures

Anti-patterns inherited from siblings¶

(All called out for Someli-admin-api, all apply here too.)

No next(err) → async errors must be caught per-handler
No retry / dead-letter for jobs and bots
No correlation id
"Something Went Wrong!" default message → not actionable for debugging
Mixed adoption of getSuccessResponse / getErrorResponse (in helper/index.js) vs hand-built envelopes

Specific to `designer-api`¶

Cron-tick error swallowing¶

Most jobs are structured as:

async function doWork() {
  if (!isOnProcess) {
    isOnProcess = true;
    try {
      let result = con.query(`SELECT ...`);
      result.forEach(async (row) => { ... });  // ← unawaited
      isOnProcess = false;
    } catch (e) {
      console.log("error", e);
      isOnProcess = false;
    }
  }
}

Two problems:

The forEach(async ...) doesn't await, so errors inside the async callback don't reach the outer try/catch. They become unhandled promise rejections.
The isOnProcess = false runs immediately after forEach, before the async work completes. The next cron tick can re-enter even though the previous work is still in flight.

Fix: replace forEach(async ...) with for (const row of result) { await ... } and put isOnProcess = false after the loop.

Polotno render errors¶

Polotno-node uses headless Chrome via puppeteer. If Chrome crashes or hangs, the render promise may never resolve. Jobs that don't add a timeout will hang forever on that row, blocking subsequent ticks.

Fix: wrap renders in Promise.race([render, timeoutPromise]) with a 30-second timeout.

OpenAI errors¶

Bots call OpenAI without retry. A 429 (rate limit) or 5xx is logged and the call is abandoned. The DB row may end up in an inconsistent state (status flipped to "in-progress" but never completed).

Fix: per-call retry with exponential backoff; mark DB row "failed" if all retries exhausted (so the next cron tick doesn't pick it up).

MySQL errors¶

sync-mysql throws on connection loss. The job's outer try/catch catches it but the connection object stays in a broken state. Subsequent calls fail. Fix: re-create the sync-mysql connection on error, or use a connection pool.

Recommendations¶

ID	Recommendation	Effort
E-1	Replace `forEach(async)` with `for...of await` in all 60+ workers	Medium (touches every job file)
E-2	Add Polotno render timeout + retry	Small
E-3	Add OpenAI retry with exponential backoff (move to `helper/openaiClient.js`)	Small
E-4	Add a `tJobFailure` table to capture errors instead of `console.log`	Medium
E-5	Add an Express error middleware in `server.js`	Trivial
E-6	Standardise on `getSuccessResponse` / `getErrorResponse` envelopes	Medium
E-7	Add `setTimeout(() => process.exit(1), 60*1000)` watchdog in cron-only worker files so a hung handler kills its own PM2 process (PM2 restarts it)	Small