[VERIFY] Markers — Reference and Guidance¶
Several runbooks and reference docs contain [VERIFY] markers in places where the codebase does not provide the answer and the team must fill in operational knowledge from outside the repo. Currently tracked across:
- getting-started.md
- first-deployment.md
- dependencies-inventory.md
- engineering-charter.md (sign-off names — to be added by leadership at adoption)
This document lists every marker, what context it sits in, why it matters, and concrete guidance on what to write.
Once a marker is resolved: edit the relevant doc, replace the [VERIFY] block with the actual answer, and remove the entry from this list.
How to use this doc¶
- Work through markers in priority order (the table below is sorted that way within each group).
- For each marker, the Guidance column tells you what an acceptable answer looks like — not the answer itself.
- The Owner column suggests who likely has the answer; adjust based on actual team roles.
- A marker is "resolved" when the underlying runbook contains a concrete, copy-pasteable instruction in place of the
[VERIFY]block.
If the answer is "we don't do this and we should" — capture that as a separate ticket and leave a [VERIFY] pointing at the ticket. Don't bury gaps.
Group 1 — Schema & data (highest priority)¶
These block any new environment bootstrap. Resolve first.
| Marker | Where | Why it matters | Guidance | Owner |
|---|---|---|---|---|
| Canonical schema source | getting-started §7, first-deployment §6 | Without a schema.sql no one can stand up a fresh database. Currently this is tribal knowledge. |
Pick one of: (a) check a schema.sql into the repo and update it on every schema change; (b) script mysqldump --no-data from a designated source-of-truth env into a known location; (c) document a wiki page with the latest dump. (a) is recommended. |
DBA / platform lead |
| Initial seed data | getting-started §7, first-deployment §6 | Empty schemas need reference rows (plans, default templates, admin user) to be functional. | List required seed rows. Either ship a seed.sql alongside schema.sql, or document the manual SQL to run after schema load. |
DBA / product |
| Future migration tool | first-deployment §6 (note) | The "no migration tool" gap is also a roadmap item. The runbook needs to point at the tool once adopted. | Decide between Knex / Prisma Migrate / Flyway. Update the runbook to reference yarn migrate (or equivalent). Until then, the manual instruction stands. |
Platform lead |
Group 2 — Credentials & secrets (high priority)¶
Without these defined, new joiners can't get a working environment and new envs can't pull secrets.
| Marker | Where | Why it matters | Guidance | Owner |
|---|---|---|---|---|
.env template location |
getting-started §8 | New joiners need a template that lists every key. Sharing .env files via Slack/email is a security gap. |
Recommended: commit a .env.example at the repo root with every key from conf.js listed and placeholder values like <your-aws-key>. Removes the "ask another engineer in DM" pattern. |
Platform lead |
| AWS credentials for new joiners | getting-started §8 | Engineers can't talk to S3, Bedrock, or RDS without IAM credentials. Today this is undocumented. | Document: (1) which IAM role / user template to apply; (2) which environments they get access to (dev only, presumably); (3) how to rotate. Recommended: a someli-dev-engineer IAM policy attached to per-engineer IAM users with MFA enforced. |
Security / platform lead |
| GCP service account for new joiners | getting-started §8 | Vertex AI / GCS access. Currently fetched at runtime from AWS Secrets Manager (GCS_SECRET_NAME); new joiners need either their own GCP creds or access to the secret. |
Recommended: a per-engineer GCP service account with minimum-needed Vertex AI / GCS permissions. Or, share a dev-only service account. Document the setup. | Security / platform lead |
| Secrets Manager adoption status | first-deployment §2.5 | The new-env runbook assumes secrets live in AWS Secrets Manager. If they don't yet, the first-deployment process must either fix that gap first or fall back to .env baked into the image (not recommended). |
Either: (a) confirm Secrets Manager is in use and the runbook is correct; or (b) document the temporary fallback while flagging this as a Phase 0a item. | Security lead |
Group 3 — Infrastructure topology (medium priority)¶
These shape every new environment. Pin them down before standing up the next one.
| Marker | Where | Why it matters | Guidance | Owner |
|---|---|---|---|---|
| VPC and subnet layout | first-deployment §2.1 | Determines whether a new env shares networking with existing or gets its own VPC. Affects security boundaries, cost, and DR. | Recommended: one VPC per environment (dev / staging / prod) with public + private subnets in 2+ AZs. Document the standard CIDR ranges. | DevOps / platform lead |
| Production MySQL version | first-deployment §2.2 | New envs should match production exactly to avoid version-specific bugs. | Run SELECT VERSION(); against production and document. Likely MySQL 8.0.x. |
DBA |
| Multi-AZ policy | first-deployment §2.2 | Cost vs. availability trade-off. Production should be Multi-AZ; dev usually shouldn't. | Document the per-environment policy (dev: single-AZ; staging: single-AZ; prod: Multi-AZ). | DevOps |
| S3 bucket naming convention | first-deployment §2.3 | New envs need consistent bucket names for clarity and IAM-policy reuse. | Recommended: someli-{env}-primary-{region} and someli-{env}-media-{region}. Document the convention and apply going forward. |
DevOps |
| CloudFront in front of S3 | first-deployment §2.3 | Affects performance, cost, and the URLs the app uses for media. | Document whether prod uses CloudFront for the media buckets. If yes, list the distribution domains and update the runbook. If not, note this as a future improvement. | DevOps |
| ECR repository policy | first-deployment §2.4 | Single shared repo or one per environment? Affects access control, image isolation, and rollback. | Recommended: one ECR repo per environment (someli-api-dev, someli-api-uat, someli-api-prod). Lifecycle policy: keep last 50 tags. |
DevOps |
| GCP project organization | first-deployment §2.6 | One shared project across envs is cheaper but mixes blast radii. Per-env is cleaner. | Recommended: per-environment GCP project for blast-radius isolation. Document the project IDs. | Platform lead |
| Hostname convention per env | first-deployment §2.7 | UAT today is uapi.someli.ai. Other envs need a parallel scheme. |
Standardize: dev-api, staging-api, api (prod). Document and apply to Route 53. |
DevOps |
| Production IAM policy structure | first-deployment §4 | New tasks need IAM roles. Reusing existing structure is better than reinventing. | Document the existing prod IAM role names and policy ARNs. Provide them as templates for new envs. | Security lead |
| Shared dev RDS (if any) | getting-started §6 | New joiners need to know if there's a shared dev DB or if they should run local. | If yes: document hostname, security-group access process, and shared credentials. If no: confirm "local-only" is the intended dev model. | DBA / platform lead |
Group 4 — Worker deployment strategy (medium priority)¶
Fargate task design choices that don't have an obvious answer.
| Marker | Where | Why it matters | Guidance | Owner |
|---|---|---|---|---|
| Per-worker tasks vs PM2-in-Fargate | first-deployment §5.2 | Per-worker tasks scale and isolate failures better; PM2-in-Fargate is cheaper. | Recommended: per-worker tasks for the most-active and most-failure-prone jobs; PM2-in-Fargate for the long tail of low-traffic jobs. Document which. | Platform lead |
| Per-worker CPU/memory sizing | first-deployment §5.2 | Polotno renderers need much more memory than DB-housekeeping jobs. Wrong sizing causes OOM or wasted spend. | Profile each worker class (or measure under load) and document right-sized CPU/memory per type. Common categories: rendering (high mem), AI-orchestration (medium), DB-poll (low). | Platform lead |
| Always-on vs scheduled workers | first-deployment §5.2 | Workers that are pure cron (e.g., daily reports) are cheaper as ECS Scheduled Tasks. | Audit ecosystem.config.js and tag each worker as "always-on" or "scheduled." Implement scheduled tasks for the latter. |
Platform lead |
| Which workers to skip in dev | getting-started §10 | Saves new joiners from running expensive (Bedrock, Vertex) jobs locally and burning budget. | Provide a short list of workers safe to skip in dev (probably most of them) and the small set that's relevant for typical local development. | Platform lead |
Group 5 — OAuth & integrations (medium priority)¶
Per-environment configuration that must be done in vendor consoles, not in this repo.
| Marker | Where | Why it matters | Guidance | Owner |
|---|---|---|---|---|
| Exact OAuth callback paths | first-deployment §7.3 | The runbook lists conventional paths; need to confirm against routes/social.js for each provider. |
Grep routes/social.js for each passport.use and the corresponding callback URL. Document the exact path per provider. One-time, ~30 min. |
Platform lead |
| Sandbox vs production Paddle per env | first-deployment §7.4 | Determines PADDLE_ENV and which webhook key applies. Wrong setting = bills going to the wrong account. |
Document: dev / staging → sandbox; prod → production. Apply at Secrets Manager / .env level. |
Billing / platform lead |
Concrete login curl example |
getting-started §11 | New joiners need a copy-pasteable smoke test for the auth flow. | Capture the exact request bodies for /auth/register + /auth/login against a fresh dev DB. Add as a code block. |
Backend lead |
Group 6 — Operational (lower priority but high value)¶
Quality-of-life improvements for ongoing operation.
| Marker | Where | Why it matters | Guidance | Owner |
|---|---|---|---|---|
| Post-deploy smoke-test script | first-deployment §8 | The 12-step manual checklist is fine but a verify-env.sh script is more reliable. |
Author a single script that runs through the §8 checks against a BASE_URL argument. Add to repo, document invocation. |
Platform lead |
| Real costs from existing envs | first-deployment §10 | The $600–900/mo estimate is a guess; AWS Cost Explorer has the actuals. | Pull last 30 days of cost from AWS Cost Explorer per environment. Replace the estimate table. | Finance / DevOps |
| Team-specific install gotchas | getting-started §12 | Tribal knowledge that bites every new joiner unless captured. | Each new joiner who hits an undocumented problem and resolves it adds a row to the symptoms table. The doc grows as the team grows. | Whoever joined most recently |
Group 6.5 — Dependency hygiene (medium priority)¶
Markers added by the dependencies-inventory.md audit. Most are one-time investigations that produce a baseline, then close.
| Marker | Where | Why it matters | Guidance | Owner |
|---|---|---|---|---|
konva peer-dependency status |
dependencies-inventory §2 | Determines whether konva (zero direct references) can be removed or if polotno-node requires it as a peer-dep |
Run yarn why konva. If parents include polotno-node and the version pin matters, leave it. Otherwise add to the removal batch in §6. |
Platform lead |
yarn audit results |
dependencies-inventory §9 | Open CVEs need to be tracked, especially before SOC 2 Type I | Run yarn audit --level high. Document open advisories, severity, and remediation plan. Update quarterly. |
Security lead |
| Dependabot / Snyk status | dependencies-inventory §9 | Ongoing dep-vuln scanning. Phase 0 item per readiness roadmap | Configure .github/dependabot.yml (free, GitHub-native) or enable Snyk's GitHub integration. Gate merges on no High/Critical advisories. |
Security lead |
npm audit signatures in CI |
dependencies-inventory §9 | Supply-chain attack defense (event-stream, ua-parser-js, debug/chalk style attacks have hit npm repeatedly) | Add npm audit signatures --production to CI as a non-blocking job initially, then promote to blocking |
Security / platform lead |
| License inventory | dependencies-inventory §10 | Required for SOC 2 / enterprise procurement vendor questionnaires | Run npx license-checker --production --summary and --csv > licenses.csv. Flag any GPL / AGPL / proprietary. Update §10 of dependencies doc. Repeat annually |
Legal / platform lead |
node_modules size baseline |
dependencies-inventory §11 | Lets future cleanups demonstrate measurable progress | du -sh node_modules and du -sh node_modules/* | sort -h | tail -20. Capture once, re-capture after major removals |
Platform lead |
| Dep-update cadence policy | dependencies-inventory §12 | Without a stated cadence, updates drift; security patches age | Document a policy: weekly Dependabot, monthly yarn outdated review, quarterly major-version review, ad-hoc for CVEs. Reference it in the doc |
Platform lead |
Group 7 — Strategic (lowest priority — schedule, don't rush)¶
Decisions that aren't blocking today but should be made deliberately rather than by drift.
| Marker | Where | Why it matters | Guidance | Owner |
|---|---|---|---|---|
| Yarn upgrade strategy | getting-started §4 | Yarn 1.x is in maintenance mode. Eventually moving to Yarn 4 (Berry) or Corepack is a real decision. | Either: (a) commit to staying on 1.22.22 indefinitely (defensible — it works); (b) plan a Yarn 4 upgrade as a tooling project. Document the decision so new joiners aren't surprised. | Platform lead |
| Windows / WSL2 support | getting-started §1 | If the team supports Windows-via-WSL2, the docs need WSL-specific notes. If not, the doc should explicitly say so. | Decide: Linux + macOS only, or also WSL2. Document. Note that Polotno / Puppeteer Chromium support is the typical sticking point on native Windows. | Platform lead |
| Fargate as the new-env standard | first-deployment Appendix A | If new envs should always be Fargate, deprecate Appendix A. | Recommended: standardize on Fargate, deprecate Appendix A, plan to migrate the existing EC2/Lightsail dev and prod to Fargate over the next year. | Platform lead |
How to maintain this doc¶
This doc and the runbooks it points at are living artifacts. When a marker is resolved:
- In
getting-started.mdorfirst-deployment.md, replace the[VERIFY]block with the concrete, copy-pasteable answer. - In this doc, remove the corresponding row.
- The marker is now closed.
When a new gap is discovered while standing up an environment or onboarding an engineer:
- Add a
[VERIFY]block in the relevant runbook section with a brief description of what's unknown. - Add a row to this doc in the appropriate group with context, guidance, and an owner.
- Now the gap is tracked rather than tribal.
The goal is for this doc to shrink over time until it is empty. An empty verify-markers.md means both runbooks are fully self-sufficient — at which point this doc can be deleted entirely.
Related¶
- Getting Started
- First Deployment
- Deployment & DevOps — for ongoing day-to-day deploys
- Configuration Reference — full env-var inventory