Deployment & DevOps¶
This document describes how the someli-api codebase is built and deployed across environments. The reality is more complex than a single pipeline: three distinct deployment paths coexist, each targeting a different environment with different tooling.
At a glance¶
┌──────────────────────────────────────────────────────────────────────┐
│ DEV environment │
│ • API: Lightsail / EC2 + nginx + PM2 │
│ • Deploy on push to dev_api via GitHub Actions (and/or Jenkins) ── │
│ • Workers: NOT deployed by either pipeline (manual or none) │
└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ UAT environment (uapi.someli.ai) │
│ • Single ECR image, run as multiple Fargate tasks (one per role) │
│ • API task + per-worker tasks + Paddle-webhook task — all share │
│ the same image, with CMD overrides selecting which script runs │
│ • Image build: manual (developer runs ./push.sh) — no automation │
└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ PROD environment │
│ • API: EC2 + nginx + PM2 (inferred — same pattern as dev) │
│ • No deployment automation in this repo │
│ • Promotion: branch-merge to `main`; deploy is manual │
└──────────────────────────────────────────────────────────────────────┘
The three environments are deployed by different mechanisms, on different platforms, with different levels of automation. Section §6 covers the consequences.
1. Branch Strategy¶
dev_api ──→ uat_api ──→ main
│ │ │
▼ ▼ ▼
Dev (CI/CD) UAT/Fargate Production
(manual ECR) (no automation visible)
| Branch | Environment | Deploy mechanism |
|---|---|---|
dev_api |
Dev | GitHub Actions on push (and Jenkins — see §6.1) |
uat_api |
UAT (Fargate) | Manual ./push.sh from a developer's workstation |
main |
Production | Not automated in this repo |
Feature-branch naming patterns observed: new/, fix/, dev/, feature/.
2. Path 1 — Dev: GitHub Actions → SSH → Lightsail + PM2¶
File: .github/workflows/dev-api-deploy.yml
Trigger: push to dev_api.
Steps:
- Checkout (depth 2, for diff stat).
- Extract commit metadata: author, email, message, SHA, files changed, +/- counts.
- Detect associated PR via the GitHub API (multiple fallback strategies).
- SSH into the deploy host and run:
- Send Microsoft Teams notification (Adaptive Card) on success or failure with author / PR / diff details.
Notably, this pipeline does NOT run yarn install. It assumes the dev host either has dependencies already installed and unchanged, or that someone runs install manually if package.json changes. Adding a new dependency requires manual intervention on the host.
Required secrets:
| Secret | Purpose |
|---|---|
LIGHTSAIL_HOST |
Deployment server hostname or IP |
LIGHTSAIL_USER |
SSH user (default ubuntu) |
LIGHTSAIL_SSH_KEY |
Private SSH key |
DEV_API_WEBHOOK_URL |
Microsoft Teams webhook |
What it deploys: the HTTP API process only. The 39 PM2 worker apps in ecosystem.config.js are not touched.
3. Path 2 — Dev (alternate): Jenkins → SSH → EC2 + PM2¶
File: Jenkinsfile
Trigger: Jenkins job (manual or polled — not specified in the file).
Hardcoded target:
DEPLOY_HOST = '54.189.254.22' // us-west-2 EC2
DEPLOY_PATH = '/home/ubuntu/someli-api'
BRANCH = 'dev_api'
Stages:
Stage 1 — Deploy and Build¶
- SSH into the host using
sshagent(['somelijenkins3']). - Generate a temp script with embedded GITHUB_TOKEN;
scpto the host; execute. - Inside the host:
git pullor initial clone using the token.yarn install --frozen-lockfile(fallback toyarn install, thennpm install).- Locate or install PM2.
- Reload the
dev_apiPM2 process, or start it fresh: - Fallback:
nohup node server.js &if PM2 cannot be installed. - Cleanup token artifacts.
Stage 2 — Health Check¶
- 10-second wait. Optional curl to
/health(commented out).
Notifications¶
- Success: Office365 connector (Teams) message with commit range, committer, developer, change list, build URL.
- Failure: Same channel with the last 20 lines of build log.
Required Jenkins credentials:
| Credential ID | Purpose |
|---|---|
somelijenkins1 |
GitHub token (string) |
somelijenkins3 |
SSH key (sshagent) |
This pipeline DOES run yarn install on every deploy — unlike the GHA pipeline.
4. Path 3 — UAT: Manual ECR Push → Fargate¶
Files: push.sh, deploy_image.py, Dockerfile, nginx.conf
How it's triggered¶
# push.sh — note the hardcoded developer path
python deploy_image.py \
--image-tag "uapi" \
--repository-uri 255061853867.dkr.ecr.us-west-2.amazonaws.com/uat \
--dockerfile /home/nisanth/someli-api
A developer runs ./push.sh from their workstation. There is no CI trigger.
What deploy_image.py does¶
- Authenticates to ECR via
boto3. - Runs
docker build. - Tags the image as
255061853867.dkr.ecr.us-west-2.amazonaws.com/uat:uapi. - Pushes to ECR.
The same :uapi tag is overwritten on every push — there is no image versioning.
How the image is consumed¶
The Fargate task definitions, ECS service config, ALB rules, and IAM policies that consume this image live outside this repo (likely in a separate IaC repo or AWS Console). The pattern in use is:
One ECR image, many Fargate tasks. Each task definition pulls the same image but overrides
CMDto run a different script: - The API task:CMD ["node", "server.js"](orentrypoint.shrunning nginx + node) - One task per worker job:CMD ["node", "job_facebook_publish.js"], etc. - A Paddle-webhook task
This is consistent with the Dockerfile's comment: # Default command — override with specific job script using docker run args.
Hostname¶
nginx.conf declares server_name uapi.someli.ai _; and proxies to 127.0.0.1:3000. The uapi.someli.ai hostname is the UAT API endpoint.
5. Path 4 — Production: undocumented¶
There is no production deployment automation in this repo. Inferred state:
- The HTTP API runs on a dedicated EC2 host with the same
nginx + PM2pattern as dev. - Production deploys are manual (someone SSHes in,
git pullsmain, runspm2 reload). - The worker fleet's production runtime is unclear: it may be Fargate (using the UAT pattern with a different ECR tag), or may also be PM2-on-EC2.
This gap should be closed. Manual prod deployments with no audit trail, no rollback plan, and no documented runbook are not acceptable for a paid-customer platform.
6. Issues & Risks¶
The artifacts in this repo reveal several deployment-pipeline issues that warrant attention.
6.1 Two CI systems on the same branch¶
Both Jenkins (Jenkinsfile) and GitHub Actions (dev-api-deploy.yml) are wired to deploy on dev_api. Possibilities:
- Both are live and deploy to different dev hosts (Jenkins →
54.189.254.22, GHA →LIGHTSAIL_HOSTsecret). In which case, the dev fleet has two parallel APIs being kept in sync — wasted compute, deploy races possible. - Both target the same host, in which case every commit triggers two deploys racing each other — wasted CI and a real risk of mid-deploy collision.
- One is legacy and is still wired up but not in active use.
Action required: confirm which is canonical and retire the other.
6.2 Hardcoded values in deploy artifacts¶
| File | Hardcoded value | Risk |
|---|---|---|
Jenkinsfile |
54.189.254.22 (EC2 IP) |
If host is replaced, pipeline breaks silently |
push.sh |
/home/nisanth/someli-api (dev workstation path) |
UAT releases break if anyone but nisanth tries one — truck factor of 1 for UAT releases |
push.sh |
ECR account 255061853867, region us-west-2, tag uapi |
If account / region / tagging convention changes, manual edit required |
Action required: parameterize these via env vars or config.
6.3 The Dockerfile has no install step¶
FROM node:20.18.1-slim
# ... apt install nginx etc. ...
COPY nginx.conf /etc/nginx/nginx.conf
COPY . .
# (no `yarn install` or `npm ci` anywhere)
CMD ["/entrypoint.sh"]
.dockerignore excludes node_modules, so a clean checkout would build an image with no dependencies that fails on startup. Two possible explanations:
push.shis run from the developer's working directory (/home/nisanth/someli-api), where.dockerignoreis interpreted by docker but the working directory contains an installed node_modules. If that node_modules is not excluded for some reason (or the developer's local checkout doesn't honor.dockerignore), the pushed image bakes in dependencies installed onnisanth's laptop. This produces non-reproducible images — every UAT release contains whatever was on the dev laptop at build time.- There is a separate build process not in this repo. In that case, the artifacts here are a partial picture of UAT releases.
Action required: add RUN yarn install --frozen-lockfile --production to the Dockerfile so the build is reproducible from a clean checkout.
6.4 GitHub Actions pipeline doesn't install dependencies¶
The dev-api-deploy.yml pipeline pulls and reloads, but does NOT run yarn install. If a PR adds a dependency, the deploy will silently succeed but the new module will be missing. The next time someone manually SSHes in to install it, behavior changes mid-day with no clear cause.
Action required: add the install step to the GHA workflow (and verify whether the host has cached dependencies that mask this).
6.5 No image versioning¶
push.sh tags the image as :uapi and overwrites it every push. This means:
- No ability to roll back to a known-good image without rebuilding.
- No correlation between an image and a git commit.
- ECS Fargate task definitions referencing
:uapialways pull the latest, regardless of testing state.
Action required: tag images with the git SHA (uapi-${git_sha}) and update task definitions on each release.
6.6 nginx and Express body limits don't match¶
| Layer | Limit |
|---|---|
nginx (nginx.conf) |
100 MB |
Express (server.js) |
150 MB |
The smaller wins. The Express setting is misleading — clients hitting the 100–150 MB band get rejected by nginx with no clear reason. Also: 100 MB is itself unusually large for an HTTP API.
Action required: decide a single canonical limit (much smaller — see Security), align both layers.
6.7 No worker deployment in dev¶
The Jenkins and GHA pipelines start only the dev_api PM2 process. The 39 worker apps in ecosystem.config.js are not touched. Either:
- Dev has no workers running (and UAT/prod are the only environments where the worker fleet executes — making local debugging of worker bugs hard), or
- Workers are started manually via
pm2 start ecosystem.config.jsonce and left alone (in which case they don't get reloaded when their code changes — silent staleness).
Action required: decide whether dev needs workers, and either deploy them via the same pipeline or document that they're production-only.
6.8 No production automation¶
No artifact in this repo deploys to production. Promotions to main happen via merge, but the deploy is human-driven on an unknown machine. No rollback plan, no audit trail, no health-gating.
Action required: add a production deploy pipeline (Path 1 or Path 3 pattern, hardened) before the next major release.
6.9 ecosystem.config.js script-path drift¶
Per Jobs Inventory § Known Discrepancies, ecosystem.config.js references at least 7 nonexistent or wrong-path script files. PM2 will fail to start them. If the same file is consumed verbatim in a Fargate task definition spawn list, those tasks will crash-loop.
Action required: reconcile ecosystem.config.js against the actual script tree.
7. Shared Runtime Configuration¶
Regardless of which path deploys it, the running application has the same shape.
Docker image (UAT/Fargate)¶
Base: node:20.18.1-slim (multi-arch — supports both linux/amd64 and linux/arm64).
System deps in the image: nginx, libnss3, libexpat1, fontconfig (for Puppeteer/Chromium).
Entrypoint:
Exposed ports: 80 (nginx), 8080 (alternate).
Build: docker build -t someli-api:tag . (but see §6.3 for the missing install step).
nginx (nginx.conf)¶
listen 80
server_name uapi.someli.ai _;
location / { proxy_pass http://127.0.0.1:3000; }
client_max_body_size 100M;
gzip on;
Logs go to /var/log/nginx/access.log and error.log.
Application (server.js)¶
| Setting | Value |
|---|---|
| Port | process.env.port \|\| conf.port \|\| 5002 (binds 3000 in production behind nginx) |
| Body parser | 150 MB (JSON), 50,000 parameter limit (capped to 100 MB by nginx) |
| File upload | enabled via express-fileupload |
| Heap limit (PM2) | 2 GB (--max-old-space-size=2048) |
| Graceful shutdown | SIGTERM and SIGINT handlers close DB connection before exit (server.js:205–214) |
Health endpoints¶
| Endpoint | Purpose |
|---|---|
GET /health |
Liveness — server status, port, environment, timestamp |
GET /db-health |
Readiness — database connection & query test |
Both should be wired to load-balancer health checks. Per Logging & Observability, they currently are the entire health-monitoring surface — there are no metrics or trace endpoints.
8. PM2 Process Management¶
File: ecosystem.config.js
Defines 39 PM2-managed processes. The HTTP API server is started separately by the deploy pipelines, not by pm2 start ecosystem.config.js.
App-config shape¶
No error_file, out_file, log_date_format, cron_restart, or memory-limit directives are set on any of the 39 entries. Logs go to PM2's default location.
Categories represented¶
| Category | Examples |
|---|---|
| Token refresh | job_refresh_fb_token, job_weekly_refresh_fb_token, vertex_token |
| Content generation | job_design_generation, job_new_post_generation, subject_content_generation |
| AI processing | brandPositioning-AI-Response, objective-AI-Response, recomSubjects-AI-Response |
| Scheduling & feed | job_auto_post, job_post_schedule (App Friday_Cron) |
| Media & validation | job_post_color_image_verification, media_image_correction (autorestart, args=100) |
| Data processing | job_invoice_paid, job_library_search_terms, job_updateTemplateValid |
| Insights | job_account_insights, post_insights |
| Other | job_auto_disconnect, job_website_info |
For the full list and discrepancies between ecosystem.config.js and the actual file tree, see Jobs Inventory.
Common PM2 commands¶
pm2 start ecosystem.config.js # Start every entry in the ecosystem
pm2 reload dev_api # Zero-downtime reload of the API
pm2 list # Process list
pm2 logs job_design_generation # Tail one process
pm2 monit # Live monitoring TUI
pm2 save # Persist current process list across reboots
pm2 restart all # Restart everything
9. Port Mappings¶
| Service | Internal | External | Notes |
|---|---|---|---|
| Node.js API | 3000 | — | Behind nginx |
| Nginx | — | 80 | Public HTTP |
| Docker secondary | — | 8080 | Optional |
| MySQL (RDS) | 3306 | — | AWS internal |
10. Infrastructure Overview¶
| Component | Service | Region |
|---|---|---|
| Dev API host | AWS Lightsail and/or EC2 | us-west-2 |
| UAT runtime | AWS Fargate (ECS) | us-west-2 |
| ECR registry | AWS ECR account 255061853867 |
us-west-2 |
| Production API host | AWS EC2 (assumed) | us-west-2 (assumed) |
| Database | AWS RDS (MySQL) | us-west-1 |
| Primary storage | AWS S3 (someli) |
us-west-2 |
| Media storage | AWS S3 (someli-medialib) |
us-west-1 |
| AI inference | AWS Bedrock | us-east-1 |
| Knowledge base | Google Cloud Storage | us-central1 |
| AI (Gemini / Vertex) | Google Cloud | us-central1 |
Cross-region traffic (us-west-2 ↔ us-west-1 for DB and media; us-west-2 ↔ us-east-1 for Bedrock; us-west-2 ↔ us-central1 for Google) implies non-trivial egress cost and per-call latency. This topology should be reviewed against actual usage patterns.
11. Local Development Setup¶
# Install dependencies
yarn install
# Start with hot-reload (nodemon)
npm run dev
# Or start production-style
node server.js
# Start the worker fleet (optional in dev — usually skipped)
pm2 start ecosystem.config.js
Prerequisites:
- Node.js 20.18.1 (recommended via NVM:
nvm install 20.18.1 && nvm use 20.18.1) - MySQL/MariaDB (local or accessible RDS instance)
.envfile populated — see Configuration Referencepm2installed globally if you want to run workers locally:npm install -g pm2- For media-related jobs: Sharp + Puppeteer system dependencies (
libnss3,fontconfig)
12. Deployment Checklist¶
Before promoting a build to UAT or production:
- Required environment variables set (Configuration Reference)
- Schema migrations applied on the target DB — manual; no migration tool exists (see Enterprise Readiness §5.4)
- S3 buckets reachable with the deploy environment's credentials
- OAuth callback URLs match the target
APP_URL - Paddle webhook URLs updated for the environment
- Health endpoints respond:
GET /health,GET /db-health - nginx active on port 80, proxying to Node on 3000
- PM2 ecosystem started for background workers (if applicable to the environment)
- For UAT: Fargate task definitions point to the new image tag (verify in AWS Console — task defs are not in this repo)
- Smoke test the high-risk paths: login, scheduled-post creation, design render
13. Related¶
- Jobs Inventory — full PM2 catalog and the discrepancies with
ecosystem.config.js - Configuration Reference — env vars per environment
- Enterprise Readiness Assessment — strategic recommendations to consolidate the three pipelines and add IaC
- Security — body-limit, CORS, secrets concerns relevant to deployment
- Logging & Observability — PM2-log dependence and gaps in production observability