Skip to content

Engineering Practices

This is the operational catalog of practices that the Engineering Charter commits to. Each practice has:

  • What — the rule
  • Why — the rationale (so engineers can apply judgement to edge cases)
  • Enforcement — how the rule is enforced (automated, process, cultural)
  • Escape valve — when and how the rule can be exceptioned

For the per-PR checklist that translates these practices into a single forcing function, see Definition of Done.


Adoption Waves

Practices are not adopted all at once — that approach has consistently failed in industry. They are adopted in waves, each closing before the next opens.

Wave Timing Focus Closes when
Wave 1 Week 1 Pre-commit hooks, PR template, branch protection Hooks installed; one week of zero direct-pushes to main
Wave 2 Weeks 2–4 CI gates: lint, typecheck (TS files), no-secrets, format One week of zero CI bypasses on these gates
Wave 3 Months 1–2 Structured logging, validation on new endpoints, migrations framework, module-boundary linter (advisory) One sprint of new code 100% compliant
Wave 4 Months 2–4 Patch coverage gate, OpenAPI required for new endpoints, module-boundary linter blocking, dep justification, ADR process New code passes all gates routinely
Wave 5 Months 4+ SLO budgets, postmortem process, maturity rating refresh, dep cadence, license review Steady-state operation

Phase 2 of the enterprise-readiness roadmap (modular monolith refactor) does not start until Wave 3 is in force. Otherwise the refactor stream chases entropy that the feature stream is producing in real time.

The current wave is tracked in verify-markers.md.


A. Code-Level Standards

A.1 New files use TypeScript

Rule: every new source file in src/ is .ts (or .tsx for components if/when frontend lives here). Existing .js files are grandfathered.

Why: the type system catches the most frequent class of bugs in this codebase (typos, missing fields, wrong shapes) at compile time. TypeScript also makes the codebase legible to new engineers — types serve as inline documentation.

Enforcement: ESLint rule disallowing new .js files in src/. CI fails if a .js file is added under src/.

Escape valve: none. New .js files are not allowed.

A.2 strict: true in tsconfig; no implicit any

Rule: TypeScript's strict mode is enabled. any requires an explicit annotation and a // @ts-expect-error: <reason> if used to silence an error.

Why: the value of TypeScript drops sharply with any. Strict mode forces the discipline.

Enforcement: tsconfig.json strict flag. ESLint @typescript-eslint/no-explicit-any set to error.

Escape valve: explicit any is permitted with a one-line comment explaining why; the comment is reviewed.

A.3 ESLint with team ruleset, no warnings

Rule: all source files pass ESLint. Warnings count as errors in CI.

Why: "warnings we'll fix later" become "warnings nobody fixes." Either a rule matters and we fail on it, or it doesn't and we drop it.

Enforcement: eslint --max-warnings 0 in CI.

Escape valve: disable a specific rule with an inline comment + reason: // eslint-disable-next-line <rule> -- <reason>.

A.4 Prettier-formatted; no formatting debates

Rule: all files pass prettier --check.

Why: time spent debating formatting is time not spent building. Pick one config; never look back.

Enforcement: pre-commit hook runs prettier --write. CI runs prettier --check.

Escape valve: none. Formatting is not a debate.

A.5 No console.log in committed code

Rule: use the structured logger from platform/logger.ts. console.log, console.error, console.warn are forbidden in source.

Why: unstructured console.log produces unparseable output. With ~108 workers logging, this is the difference between "grep can find your incident" and "your incident is lost."

Enforcement: ESLint no-console rule set to error.

Escape valve: during local development, comment out the rule for a single line. The rule must be re-enabled before merging.

A.6 File and function size limits

Rule: - Files SHOULD be under 400 lines (advisory) and MUST be under 1,000 lines (hard). - Functions SHOULD be under 50 lines and SHOULD have at most 5 parameters.

Why: the 21,000-line routes.js is the result of never having this rule. Files past these thresholds cannot be reviewed effectively or held in working memory.

Enforcement: ESLint max-lines and max-lines-per-function rules. The hard limit fails CI; the advisory limit produces a warning that reviewers can choose to enforce.

Escape valve: the 1,000-line hard limit may be exceptioned for generated files (OpenAPI types, etc.) with a // @generated marker.

A.7 No // @ts-ignore without rationale

Rule: use // @ts-expect-error: <reason> if you must suppress a type error. The plain // @ts-ignore is forbidden.

Why: @ts-expect-error complains if the underlying error is fixed later, which prevents stale suppressions from accumulating. @ts-ignore does not.

Enforcement: ESLint @typescript-eslint/ban-ts-comment.

Escape valve: none.

A.8 Module-boundary import discipline

Rule: code in src/modules/A/ may only import from src/modules/B/index.ts, never from src/modules/B/internal/*.

Why: the public/internal distinction is the entire value of module boundaries. Once one engineer reaches into another module's internals, the boundary is gone.

Enforcement: eslint-plugin-boundaries (Wave 3 advisory; Wave 4 blocking).

Escape valve: if a function genuinely belongs in another module's public API, move it — don't bypass the boundary.

A.9 No new direct DB connections

Rule: all DB access goes through platform/db.ts. New mysql.createConnection, mysql2.createPool, or new mysql({...}) calls are forbidden in any new file.

Why: the 892 separate connection-construction sites are the operational nightmare flagged in the readiness audit. Centralizing fixes connection pooling, secrets handling, observability, and migration paths in one stroke.

Enforcement: ESLint no-restricted-imports blocking direct imports of mysql, mysql2/promise, and sync-mysql. The platform/db module is the sole sanctioned consumer.

Escape valve: changes to platform/db.ts itself are obviously exempt.

A.10 No new sync-mysql callers

Rule: new files must not import sync-mysql. Existing 127 callers are tagged for migration during Phase 1.

Why: synchronous MySQL blocks the event loop. It's an architectural antipattern. Every new caller compounds the migration cost.

Enforcement: ESLint no-restricted-imports for sync-mysql.

Escape valve: none. New sync-mysql usage is rejected.


B. Pull-Request Requirements

B.1 At least one review

Rule: PRs require ≥1 review approval before merging. Two reviews for security, billing, and authentication paths.

Why: code review is the single highest-leverage quality practice on a small team. It also distributes knowledge — every PR reviewed is one engineer who has seen the code.

Enforcement: GitHub branch protection on dev_api, uat_api, main. CODEOWNERS file routes sensitive-path reviewers automatically.

Escape valve: hotfix-emergency PRs (production down) may be merged with one review post-hoc, with an incident report following.

B.2 No self-approval

Rule: the PR author cannot approve their own PR.

Why: self-approval defeats the purpose of review. It happens with surprising frequency on under-pressure teams.

Enforcement: GitHub branch protection setting.

B.3 PRs target the right base

Rule: - Feature PRs target dev_api - uat_apidev_api (promotion only — no direct feature work) - mainuat_api (promotion only)

Why: the branch strategy fails if engineers PR feature work directly into main.

Enforcement: branch protection rules.

B.4 PR description follows the template

Rule: the PR template's checkboxes (the Definition of Done) are filled out, not skipped.

Why: the checklist is a forcing function. Skipping it is the silent non-compliance failure mode.

Enforcement: repository PR template. Reviewers reject PRs that ignore the checklist.

B.5 Files > 500 lines added in a single PR triggers extra review

Rule: PRs that add a file > 500 lines or modify a file by > 500 lines require a second reviewer.

Why: large PRs are reviewed less effectively. A second pair of eyes is a cheap mitigation.

Enforcement: convention; enforced by reviewers and CODEOWNERS.

Escape valve: generated files; fixture files; documentation. Mark with // @generated or place in a designated path.

B.6 PR title convention

Rule: PR titles follow conventional-commits style: feat:, fix:, refactor:, docs:, chore:, test:, perf:, style:, build:, ci:.

Why: lets us auto-generate changelogs and release notes, and signals intent.

Enforcement: CI title check (e.g., commitlint or a GitHub Action).


C. Test Requirements

C.1 New endpoints have integration tests

Rule: any new HTTP endpoint ships with at least one integration test (request → response, hits the DB).

Why: a unit test of the handler is a fiction; the real behavior emerges from middleware + handler + DB. Integration tests catch the breakage that unit tests don't.

Enforcement: patch-coverage gate on changed-line coverage. Reviewers verify test exists.

Escape valve: internal-only / cron-only changes that don't expose an endpoint are exempt.

C.2 New helper functions have unit tests

Rule: new functions in helper/, platform/, or modules/*/internal/ have at least one unit test that exercises a non-trivial path.

Why: these are the building blocks. A bug here propagates through every consumer.

Enforcement: patch-coverage gate.

C.3 Bug fixes include a regression test

Rule: a bug fix PR includes a test that fails before the fix and passes after.

Why: without the regression test, the fix is one rebase away from being undone.

Enforcement: reviewer checklist. Cannot be fully automated, but reviewers must explicitly confirm.

Escape valve: truly trivial fixes (e.g., typo in a string literal) where a test would be artificial. Reviewer judgement.

C.4 Coverage targets on critical paths

Rule: auth, billing, and publishing modules require ≥80% line coverage on changed lines (patch coverage, not whole repo).

Why: these are the code paths where untested code is a customer-impact event waiting to happen. They are also small enough to make the goal tractable.

Enforcement: Codecov or Coveralls patch-coverage gate, scoped to these modules.

Escape valve: none on these paths.

C.5 Flaky tests are quarantined fast

Rule: a test identified as flaky (passing then failing without code changes) is moved to a __flaky__ quarantine directory within 24 hours of identification, with a tracking issue.

Why: a single accepted-flaky test trains the team to ignore CI. That habit destroys CI's signal value within weeks.

Enforcement: flaky-test detection in CI (e.g., GitHub Actions plugin); the on-call engineer triages.

Escape valve: the test stays quarantined until fixed. There is no "merge despite the flake."

C.6 No tests skipped without a referenced issue

Rule: it.skip(), xit(), etc. are forbidden without a comment containing an issue number.

Why: skipped tests rot. Without a tracking issue they are invisible.

Enforcement: ESLint custom rule scanning test files.


D. Architectural Boundaries

D.1 New code lives in src/modules/

Rule: new HTTP handlers, services, and helpers go in src/modules/<area>/. Adding to routes/routes.js or routes/auth.js is forbidden.

Why: these two files are the architectural failure named in the readiness doc. Every new line added to them is a step away from the modular monolith.

Enforcement: lint rule + reviewer enforcement + grep gate in CI (lines added to those files trigger explicit reviewer attention).

Escape valve: trivial bug fixes inside an existing handler in those files (no logic change). Anything more substantive must be extracted first.

D.2 The boy-scout rule

Rule: when modifying a handler in a legacy file, extract it into the appropriate module first. Add at least one test for it as you extract.

Why: this is what makes the feature stream contribute to the cleanup rather than fight it. Over time, every legacy line that gets touched gets upgraded; lines nobody touches were stable anyway.

Enforcement: reviewer enforcement; PR template asks "if touching a legacy file, what was upgraded?"

Escape valve: if extraction would make the PR much larger than its purpose warrants, document the reason and create a follow-up issue. The follow-up issue is then prioritized in the next sprint.

D.3 One module owns one table (logically)

Rule: every database table has exactly one module that owns it (writes to it, defines its schema). Other modules read from it via the owning module's public API.

Why: without ownership, tables drift. A table that 8 modules write to ends up with 8 different invariants, and none of them hold.

Enforcement: ownership documented in data-model.md §6 catalog. Reviewer enforcement.

Escape valve: read-only joins across module boundaries are pragmatic and allowed during the migration period (single DB). After Phase 3 (if any module is extracted), this becomes a hard rule.

D.4 No new event buses, message queues, or anti-corruption layers between modules

Rule: cross-module communication is direct synchronous function calls (await content.generateDraft(...)). Adding a queue, event bus, or "anti-corruption layer" between two modules in the same process is forbidden.

Why: these patterns add complexity that doesn't pay off in a single-process application. They are appropriate when modules genuinely become services. See enterprise-readiness §5.2.

Enforcement: reviewer enforcement; ADR required for any deviation.

Escape valve: the job queue (BullMQ / Redis) is a special case — it's between the HTTP API and the worker fleet, not between modules of the API. Cross-process events are fine.


E. Security Defaults

E.1 No secrets in source

Rule: API keys, tokens, passwords, and credentials are loaded from environment variables (or a secrets manager). They never appear as string literals in source files.

Why: every committed secret has been compromised. Rotating it is the hard part; preventing it is the easy part.

Enforcement: - Pre-commit gitleaks hook - CI gitleaks job - Periodic full-history scan - Reviewers reject PRs containing patterns matching credential shapes

Escape valve: none.

E.2 Input validation at the boundary

Rule: every HTTP endpoint that accepts request bodies, query params, or path params runs a schema validator (Zod / Yup / express-validator) at handler entry.

Why: input validation is the cheapest first line of defense against injection, malformed-input crashes, and ill-defined contracts.

Enforcement: lint rule that checks for a schema-validation call in handler files. (Wave 4.) Reviewer enforcement until then.

Escape valve: internal-only endpoints (called only by other workers, never by external clients) may skip validation, but must declare so explicitly.

E.3 Auth-required endpoints declare their required role

Rule: every authenticated endpoint declares the required role(s) at the route level: router.get('/admin/x', requireRole(['admin']), handler).

Why: scattered auth checks inside handlers are the failure mode that produces missing-auth bugs. Declaring it at the route level makes the requirement greppable and reviewable.

Enforcement: convention; reviewer enforcement.

E.4 External API calls have timeouts and retries

Rule: every call to an external service (AI providers, social platforms, payment gateways, etc.) has: - A timeout (default 30 seconds; explicit per-call where different) - A retry policy with exponential backoff - An idempotency key for write operations

Why: without these, a single slow third party stalls the whole worker. Without idempotency, retries cause data corruption.

Enforcement: wrapper utility (platform/http.ts or similar) that enforces these defaults. Direct axios / fetch calls bypass the defaults and are rejected by reviewers.

Escape valve: documented rationale + ADR for cases where the wrapper doesn't fit.

E.5 Cost-bearing endpoints declare a rate limit

Rule: any endpoint that triggers AI generation, media rendering, or social-platform posting declares a rate limit (per-user and per-account).

Why: abuse / runaway-cost protection. Without per-tenant limits, one bad actor or one buggy client can produce a six-figure bill overnight.

Enforcement: lint rule (Wave 4) requiring rateLimit({...}) middleware on these endpoints. Reviewer enforcement until then.

E.6 Critical-severity dependency advisories block merge

Rule: if yarn audit (or Snyk / Dependabot) reports a Critical or High severity finding, no merge until it's resolved or explicitly time-boxed via an exception.

Why: these are public, indexed by vulnerability scanners, and exploited within days.

Enforcement: required CI status check.

Escape valve: explicit exception by an engineering lead, with a follow-up tracking issue and a < 30 day resolution window.

E.7 Auth and billing files cannot have suppressed type errors

Rule: // @ts-expect-error is forbidden in src/modules/identity/, src/modules/billing/, and any file that touches authentication or payment flows.

Why: these are the paths where a silent type bug = customer breach or financial incident. The bar is higher.

Enforcement: CI grep gate scoped to these paths.


F. Observability Defaults

F.1 Every request has a correlation ID

Rule: middleware attaches a correlation ID (UUID) to every request. The ID propagates through logs, downstream calls, and traces.

Why: without correlation IDs, debugging a single user's request across 100+ workers is impossible.

Enforcement: middleware in app.use chain at startup. Lint check that the middleware is registered.

F.2 Every external API call is wrapped in a span

Rule: the platform/http.ts wrapper from §E.4 also creates an OpenTelemetry span for the call.

Why: distributed tracing is the difference between "the AI provider was slow" being a guess and a fact.

Enforcement: wrapper enforces this; direct calls don't get traces.

F.3 Every job emits start / success / failure events

Rule: every background job's main function emits structured job.start, job.success, job.failure events with jobType, jobId, accountId (if applicable), and duration.

Why: queue depth, throughput, and error rate dashboards depend on these.

Enforcement: wrapper utility (platform/job.ts or similar) handles this automatically.

F.4 Logging follows the structured shape

Rule: logs are JSON, with fields: - level (debug / info / warn / error) - msg (human-readable) - correlationId - accountId (if known) - arbitrary structured context

Why: centralized log aggregation (Loki / ELK / CloudWatch / Datadog) requires structure to be queryable.

Enforcement: the logger from platform/logger.ts produces this shape. Direct console.log is forbidden (§A.5).

F.5 New endpoints add a panel to the dashboard

Rule: new endpoints are added to the appropriate observability dashboard before they ship.

Why: the failure mode is "we ship a feature, it breaks, we have no dashboard, we don't know it's broken." Adding the panel during PR ensures visibility from day 1.

Enforcement: convention; reviewer enforcement; on-call review at end of sprint.

Escape valve: trivial endpoints (health-check variants, etc.) may skip.


G. Database Hygiene

G.1 Every schema change is a migration

Rule: schema changes (new tables, new columns, index changes, etc.) ship as a versioned migration file in migrations/.

Why: ad-hoc ALTER TABLE produces environment drift. Migrations make schema state reproducible.

Enforcement: CI gate: "schema diff between fresh migrate-up and the canonical baseline = empty." If the diff is non-empty, the migration is incomplete.

Escape valve: none.

G.2 No manual ALTER TABLE in production

Rule: running schema changes outside the migration framework, in any environment, is forbidden.

Why: the moment one engineer does this, the migration framework's invariant ("migrations describe the schema") is broken. Recovery is painful.

Enforcement: operational policy. DBA-equivalent gating on production access.

G.3 New columns are NOT NULL DEFAULT or have a backfill plan

Rule: adding a NOT NULL column without a default requires a backfill migration plan in the same PR.

Why: a NOT NULL column with no default fails on existing rows during deploy.

Enforcement: reviewer enforcement.

G.4 New tables go through architectural review

Rule: any new table requires a 1-page ADR in the documentation tree (e.g. someli-doc/audit/someli-api/adr/####-table-name.md) covering: owning module, primary key, foreign-key relationships (if any), retention policy, anticipated row count, indexing strategy.

Why: the schema is the single hardest thing to change later. Decisions made at table-creation time persist for years.

Enforcement: required for migrations that include CREATE TABLE. Reviewer enforcement (Wave 4).

G.5 New tables include foreign-key constraints

Rule: new tables declare foreign-key relationships at the schema level. The grandfathered legacy tables (which have zero FKs) are not retroactively required to add them.

Why: schema-level integrity prevents the orphan-row class of bugs.

Enforcement: reviewer enforcement.

Escape valve: for tables that will eventually be extracted to a different service (so cross-DB FKs would prevent extraction), document the rationale and use an indexed column instead.


H. Dependency Hygiene

H.1 New dependencies require justification

Rule: adding a package to package.json requires a paragraph in the PR description: what does it do, what alternatives were considered, why this one.

Why: the npm ecosystem rewards "just install a package." The 91-package tree (with 14 fully unused) is the result of treating this decision casually.

Enforcement: PR template field; reviewer enforcement.

H.2 New packages are scanned for license, size, and maintenance

Rule: before adding a package, verify: - License is permissive (MIT / Apache-2.0 / BSD / ISC). GPL / AGPL / proprietary are not allowed without legal review. - Bundled tree size is reasonable (under ~5 MB unless justified). - Package has been updated in the last 12 months. - No critical CVEs in current version.

Why: cumulative effect on bundle size, security exposure, and license compliance.

Enforcement: CI advisory checks for license + size; reviewer enforcement for maintenance status.

H.3 Deprecated packages cannot be added

Rule: packages on the deprecated list (currently: request, moment, node-fetch, body-parser standalone, raw crypto-js for new flows, sync-mysql, sync-request) cannot be added to new code.

Why: these are tagged for migration in dependencies-inventory.md. Adding new callers compounds the migration cost.

Enforcement: ESLint no-restricted-imports rule enumerating the banned list.

Escape valve: none for new code. Existing callers are migration targets.

H.4 Bundle-size growth requires acknowledgement

Rule: PRs that grow node_modules by more than 10% require explicit acknowledgement from a reviewer.

Why: unbounded growth is how a 1.5 GB node_modules becomes a 3 GB one.

Enforcement: CI advisory check measuring delta.


I. Documentation Requirements

I.1 New endpoints update OpenAPI spec

Rule: new HTTP endpoints update the OpenAPI spec in the same PR.

Why: spec is the contract. Spec lagging behind code = clients confused, SDK generators broken, test gaps.

Enforcement: CI: schema-validate the spec; require diff in openapi.yaml (or wherever the spec lives) for PRs that touch handlers (Wave 4). Reviewer enforcement until then.

I.2 Architectural changes update the relevant doc

Rule: changes that materially alter architecture (new module, new external integration, new infrastructure component) update the corresponding document in the audit tree (someli-doc/audit/someli-api/*.md) alongside the code change.

Why: docs that lag code are wrong docs, which is worse than no docs.

Enforcement: reviewer enforcement.

I.3 New env vars are added to .env.example and configuration.md

Rule: every new process.env.<KEY> reference in source has a corresponding row in .env.example (with a placeholder value) and configuration.md (with the description).

Why: new joiners cannot bootstrap an environment if env vars are tribal knowledge.

Enforcement: CI grep check: env vars in conf.js must appear in both .env.example and configuration.md.

I.4 New jobs are added to jobs-inventory

Rule: new background jobs are added to jobs-inventory.md in the same PR.

Why: the operational picture of "what runs in this fleet" must stay accurate.

Enforcement: reviewer enforcement.

I.5 New external integrations are added to Integration-inventory

Rule: any new third-party integration is added to Integration-inventory.md with the standard fields (Configuration / Key Files / Operations / Notes) in the same PR.

Why: the integrations doc is the canonical view of "what does this platform talk to."

Enforcement: reviewer enforcement.


J. Operational Practices

J.1 ADRs for architectural decisions

Rule: non-trivial architectural decisions are captured as ADRs in the documentation tree (e.g. someli-doc/audit/someli-api/adr/####-title.md) using the standard format: Context / Decision / Consequences.

Why: the why of a decision is the most valuable part. Without ADRs, every new engineer re-litigates decisions made years ago.

Enforcement: convention. Reviewers ask "where's the ADR?" for decisions that warrant one.

J.2 Postmortems for Sev-1 / Sev-2 incidents

Rule: every Sev-1 (customer-impacting outage) and Sev-2 (degraded service) incident gets a postmortem within 5 business days.

Why: without postmortems, the same incidents recur. Memory of the lesson is shorter than the gap between recurrences.

Enforcement: on-call rotation responsibility. Tracked in a postmortem index.

Format: 1. Timeline (UTC) 2. Customer impact (severity, duration, scope) 3. Root cause(s) 4. Contributing factors 5. What worked 6. What didn't 7. Action items, with owners and due dates — including: which engineering practice would have prevented this?

J.3 Action items from postmortems are tracked to closure

Rule: every postmortem action item has an owner, a due date, and a tracking issue. Aging action items are escalated.

Why: action items that aren't tracked aren't done.

Enforcement: weekly review of open postmortem actions.

J.4 Tech-debt rotation

Rule: every engineer spends ~10–20% of their time on cleanup work, on a rotating basis. There is no permanent "feature team" vs "refactor team" division.

Why: a permanent refactor team is treated as second-class; a permanent feature team accumulates debt their cleanup peers can't keep up with. Rotation distributes ownership.

Enforcement: sprint planning; reviewed quarterly.

J.5 Recognition follows cleanup work, not just features

Rule: performance reviews and promotion criteria explicitly include cleanup, refactoring, and operational improvements alongside feature delivery.

Why: if promotion only follows visible new features, no engineer with promotion ambitions will pay down debt.

Enforcement: management practice; reviewed annually.


How to use this document

For an engineer:

  • Day-to-day: the Definition of Done is the operational summary. Read this doc when you have a question about why a rule exists.
  • When in doubt: the rule applies. Exceptions are explicit, time-boxed, and documented in the PR.
  • When you disagree: raise it. This document is reviewed and updated; rules that turn out to be wrong are changed (not silently violated).

For a reviewer:

  • Reject PRs that silently violate rules. Silent non-compliance is the failure mode.
  • Ask for the rationale on exceptions. A real exception has a real reason.
  • Cite the rule, not your opinion. "Practice §E.1 prohibits this" is a clearer feedback than "I don't like this."

For a tech lead:

  • Update the wave schedule as practices migrate from advisory to blocking.
  • Track adoption in verify-markers.md.
  • Sponsor the new wave before opening it — secure buy-in, fix the obvious blockers, then turn the gate on.

For engineering leadership:

  • Sign the Engineering Charter. Without leadership commitment, the practices below are advisory.
  • Defend the team's time to do this work. The two-streams problem is solved by not pulling the team off cleanup when feature pressure rises.