Error Handling¶
Policy in practice¶
Same shape as siblings:
- Handler
try { ... } catch (e) { console.log('error', e); /* return error envelope */ } - No
next(err)propagation - No central Express error middleware
- No retry on transient failures
Anti-patterns inherited from siblings¶
(All called out for Someli-admin-api, all apply here too.)
- No
next(err)→ async errors must be caught per-handler - No retry / dead-letter for jobs and bots
- No correlation id
"Something Went Wrong!"default message → not actionable for debugging- Mixed adoption of
getSuccessResponse/getErrorResponse(inhelper/index.js) vs hand-built envelopes
Specific to designer-api¶
Cron-tick error swallowing¶
Most jobs are structured as:
async function doWork() {
if (!isOnProcess) {
isOnProcess = true;
try {
let result = con.query(`SELECT ...`);
result.forEach(async (row) => { ... }); // ← unawaited
isOnProcess = false;
} catch (e) {
console.log("error", e);
isOnProcess = false;
}
}
}
Two problems:
- The
forEach(async ...)doesn't await, so errors inside the async callback don't reach the outertry/catch. They become unhandled promise rejections. - The
isOnProcess = falseruns immediately afterforEach, before the async work completes. The next cron tick can re-enter even though the previous work is still in flight.
Fix: replace forEach(async ...) with for (const row of result) { await ... } and put isOnProcess = false after the loop.
Polotno render errors¶
Polotno-node uses headless Chrome via puppeteer. If Chrome crashes or hangs, the render promise may never resolve. Jobs that don't add a timeout will hang forever on that row, blocking subsequent ticks.
Fix: wrap renders in Promise.race([render, timeoutPromise]) with a 30-second timeout.
OpenAI errors¶
Bots call OpenAI without retry. A 429 (rate limit) or 5xx is logged and the call is abandoned. The DB row may end up in an inconsistent state (status flipped to "in-progress" but never completed).
Fix: per-call retry with exponential backoff; mark DB row "failed" if all retries exhausted (so the next cron tick doesn't pick it up).
MySQL errors¶
sync-mysql throws on connection loss. The job's outer try/catch catches it but the connection object stays in a broken state. Subsequent calls fail. Fix: re-create the sync-mysql connection on error, or use a connection pool.
Recommendations¶
| ID | Recommendation | Effort |
|---|---|---|
| E-1 | Replace forEach(async) with for...of await in all 60+ workers |
Medium (touches every job file) |
| E-2 | Add Polotno render timeout + retry | Small |
| E-3 | Add OpenAI retry with exponential backoff (move to helper/openaiClient.js) |
Small |
| E-4 | Add a tJobFailure table to capture errors instead of console.log |
Medium |
| E-5 | Add an Express error middleware in server.js |
Trivial |
| E-6 | Standardise on getSuccessResponse / getErrorResponse envelopes |
Medium |
| E-7 | Add setTimeout(() => process.exit(1), 60*1000) watchdog in cron-only worker files so a hung handler kills its own PM2 process (PM2 restarts it) |
Small |