Deployment & DevOps¶

This document describes how the someli-api codebase is built and deployed across environments. The reality is more complex than a single pipeline: three distinct deployment paths coexist, each targeting a different environment with different tooling.

At a glance¶

┌──────────────────────────────────────────────────────────────────────┐
│ DEV environment                                                      │
│   • API: Lightsail / EC2 + nginx + PM2                              │
│   • Deploy on push to dev_api via GitHub Actions (and/or Jenkins) ── │
│   • Workers: NOT deployed by either pipeline (manual or none)        │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│ UAT environment (uapi.someli.ai)                                     │
│   • Single ECR image, run as multiple Fargate tasks (one per role)   │
│   • API task + per-worker tasks + Paddle-webhook task — all share    │
│     the same image, with CMD overrides selecting which script runs   │
│   • Image build: manual (developer runs ./push.sh) — no automation   │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│ PROD environment                                                     │
│   • API: EC2 + nginx + PM2 (inferred — same pattern as dev)          │
│   • No deployment automation in this repo                            │
│   • Promotion: branch-merge to `main`; deploy is manual              │
└──────────────────────────────────────────────────────────────────────┘

The three environments are deployed by different mechanisms, on different platforms, with different levels of automation. Section §6 covers the consequences.

1. Branch Strategy¶

dev_api  ──→  uat_api  ──→  main
   │              │            │
   ▼              ▼            ▼
Dev (CI/CD)   UAT/Fargate   Production
              (manual ECR)  (no automation visible)

Branch	Environment	Deploy mechanism
`dev_api`	Dev	GitHub Actions on push (and Jenkins — see §6.1)
`uat_api`	UAT (Fargate)	Manual `./push.sh` from a developer's workstation
`main`	Production	Not automated in this repo

Feature-branch naming patterns observed: new/, fix/, dev/, feature/.

2. Path 1 — Dev: GitHub Actions → SSH → Lightsail + PM2¶

File: .github/workflows/dev-api-deploy.yml

Trigger: push to dev_api.

Steps:

Checkout (depth 2, for diff stat).
Extract commit metadata: author, email, message, SHA, files changed, +/- counts.
Detect associated PR via the GitHub API (multiple fallback strategies).

SSH into the deploy host and run:

cd /home/ubuntu/someli-api
git pull origin dev_api
source /home/ubuntu/.nvm/nvm.sh
pm2 reload dev_api || pm2 reload 0

Send Microsoft Teams notification (Adaptive Card) on success or failure with author / PR / diff details.

Notably, this pipeline does NOT run yarn install. It assumes the dev host either has dependencies already installed and unchanged, or that someone runs install manually if package.json changes. Adding a new dependency requires manual intervention on the host.

Required secrets:

Secret	Purpose
`LIGHTSAIL_HOST`	Deployment server hostname or IP
`LIGHTSAIL_USER`	SSH user (default `ubuntu`)
`LIGHTSAIL_SSH_KEY`	Private SSH key
`DEV_API_WEBHOOK_URL`	Microsoft Teams webhook

What it deploys: the HTTP API process only. The 39 PM2 worker apps in ecosystem.config.js are not touched.

3. Path 2 — Dev (alternate): Jenkins → SSH → EC2 + PM2¶

File: Jenkinsfile

Trigger: Jenkins job (manual or polled — not specified in the file).

Hardcoded target:

DEPLOY_HOST = '54.189.254.22'           // us-west-2 EC2
DEPLOY_PATH = '/home/ubuntu/someli-api'
BRANCH = 'dev_api'

Stages:

Stage 1 — Deploy and Build¶

SSH into the host using sshagent(['somelijenkins3']).
Generate a temp script with embedded GITHUB_TOKEN; scp to the host; execute.
Inside the host:
git pull or initial clone using the token.
yarn install --frozen-lockfile (fallback to yarn install, then npm install).
Locate or install PM2.

Reload the dev_api PM2 process, or start it fresh:

pm2 start ${DEPLOY_PATH}/server.js \
  --name "dev_api" \
  --node-args="--max-old-space-size=2048"

Fallback: nohup node server.js & if PM2 cannot be installed.
Cleanup token artifacts.

Stage 2 — Health Check¶

10-second wait. Optional curl to /health (commented out).

Notifications¶

Success: Office365 connector (Teams) message with commit range, committer, developer, change list, build URL.
Failure: Same channel with the last 20 lines of build log.

Required Jenkins credentials:

Credential ID	Purpose
`somelijenkins1`	GitHub token (string)
`somelijenkins3`	SSH key (sshagent)

This pipeline DOES run yarn install on every deploy — unlike the GHA pipeline.

4. Path 3 — UAT: Manual ECR Push → Fargate¶

Files: push.sh, deploy_image.py, Dockerfile, nginx.conf

How it's triggered¶

# push.sh — note the hardcoded developer path
python deploy_image.py \
  --image-tag "uapi" \
  --repository-uri 255061853867.dkr.ecr.us-west-2.amazonaws.com/uat \
  --dockerfile /home/nisanth/someli-api

A developer runs ./push.sh from their workstation. There is no CI trigger.

What `deploy_image.py` does¶

Authenticates to ECR via boto3.
Runs docker build.
Tags the image as 255061853867.dkr.ecr.us-west-2.amazonaws.com/uat:uapi.
Pushes to ECR.

The same :uapi tag is overwritten on every push — there is no image versioning.

How the image is consumed¶

The Fargate task definitions, ECS service config, ALB rules, and IAM policies that consume this image live outside this repo (likely in a separate IaC repo or AWS Console). The pattern in use is:

One ECR image, many Fargate tasks. Each task definition pulls the same image but overrides CMD to run a different script: - The API task: CMD ["node", "server.js"] (or entrypoint.sh running nginx + node) - One task per worker job: CMD ["node", "job_facebook_publish.js"], etc. - A Paddle-webhook task

This is consistent with the Dockerfile's comment: # Default command — override with specific job script using docker run args.

Hostname¶

nginx.conf declares server_name uapi.someli.ai _; and proxies to 127.0.0.1:3000. The uapi.someli.ai hostname is the UAT API endpoint.

5. Path 4 — Production: undocumented¶

There is no production deployment automation in this repo. Inferred state:

The HTTP API runs on a dedicated EC2 host with the same nginx + PM2 pattern as dev.
Production deploys are manual (someone SSHes in, git pulls main, runs pm2 reload).
The worker fleet's production runtime is unclear: it may be Fargate (using the UAT pattern with a different ECR tag), or may also be PM2-on-EC2.

This gap should be closed. Manual prod deployments with no audit trail, no rollback plan, and no documented runbook are not acceptable for a paid-customer platform.

6. Issues & Risks¶

The artifacts in this repo reveal several deployment-pipeline issues that warrant attention.

6.1 Two CI systems on the same branch¶

Both Jenkins (Jenkinsfile) and GitHub Actions (dev-api-deploy.yml) are wired to deploy on dev_api. Possibilities:

Both are live and deploy to different dev hosts (Jenkins → 54.189.254.22, GHA → LIGHTSAIL_HOST secret). In which case, the dev fleet has two parallel APIs being kept in sync — wasted compute, deploy races possible.
Both target the same host, in which case every commit triggers two deploys racing each other — wasted CI and a real risk of mid-deploy collision.
One is legacy and is still wired up but not in active use.

Action required: confirm which is canonical and retire the other.

6.2 Hardcoded values in deploy artifacts¶

File	Hardcoded value	Risk
`Jenkinsfile`	`54.189.254.22` (EC2 IP)	If host is replaced, pipeline breaks silently
`push.sh`	`/home/nisanth/someli-api` (dev workstation path)	UAT releases break if anyone but `nisanth` tries one — truck factor of 1 for UAT releases
`push.sh`	ECR account `255061853867`, region `us-west-2`, tag `uapi`	If account / region / tagging convention changes, manual edit required

Action required: parameterize these via env vars or config.

6.3 The Dockerfile has no install step¶

FROM node:20.18.1-slim
# ... apt install nginx etc. ...
COPY nginx.conf /etc/nginx/nginx.conf
COPY . .
# (no `yarn install` or `npm ci` anywhere)
CMD ["/entrypoint.sh"]

.dockerignore excludes node_modules, so a clean checkout would build an image with no dependencies that fails on startup. Two possible explanations:

push.sh is run from the developer's working directory (/home/nisanth/someli-api), where .dockerignore is interpreted by docker but the working directory contains an installed node_modules. If that node_modules is not excluded for some reason (or the developer's local checkout doesn't honor .dockerignore), the pushed image bakes in dependencies installed on nisanth's laptop. This produces non-reproducible images — every UAT release contains whatever was on the dev laptop at build time.
There is a separate build process not in this repo. In that case, the artifacts here are a partial picture of UAT releases.

Action required: add RUN yarn install --frozen-lockfile --production to the Dockerfile so the build is reproducible from a clean checkout.

6.4 GitHub Actions pipeline doesn't install dependencies¶

The dev-api-deploy.yml pipeline pulls and reloads, but does NOT run yarn install. If a PR adds a dependency, the deploy will silently succeed but the new module will be missing. The next time someone manually SSHes in to install it, behavior changes mid-day with no clear cause.

Action required: add the install step to the GHA workflow (and verify whether the host has cached dependencies that mask this).

6.5 No image versioning¶

push.sh tags the image as :uapi and overwrites it every push. This means:

No ability to roll back to a known-good image without rebuilding.
No correlation between an image and a git commit.
ECS Fargate task definitions referencing :uapi always pull the latest, regardless of testing state.

Action required: tag images with the git SHA (uapi-${git_sha}) and update task definitions on each release.

6.6 nginx and Express body limits don't match¶

Layer	Limit
nginx (`nginx.conf`)	100 MB
Express (`server.js`)	150 MB

The smaller wins. The Express setting is misleading — clients hitting the 100–150 MB band get rejected by nginx with no clear reason. Also: 100 MB is itself unusually large for an HTTP API.

Action required: decide a single canonical limit (much smaller — see Security), align both layers.

6.7 No worker deployment in dev¶

The Jenkins and GHA pipelines start only the dev_api PM2 process. The 39 worker apps in ecosystem.config.js are not touched. Either:

Dev has no workers running (and UAT/prod are the only environments where the worker fleet executes — making local debugging of worker bugs hard), or
Workers are started manually via pm2 start ecosystem.config.js once and left alone (in which case they don't get reloaded when their code changes — silent staleness).

Action required: decide whether dev needs workers, and either deploy them via the same pipeline or document that they're production-only.

6.8 No production automation¶

No artifact in this repo deploys to production. Promotions to main happen via merge, but the deploy is human-driven on an unknown machine. No rollback plan, no audit trail, no health-gating.

Action required: add a production deploy pipeline (Path 1 or Path 3 pattern, hardened) before the next major release.

6.9 ecosystem.config.js script-path drift¶

Per Jobs Inventory § Known Discrepancies, ecosystem.config.js references at least 7 nonexistent or wrong-path script files. PM2 will fail to start them. If the same file is consumed verbatim in a Fargate task definition spawn list, those tasks will crash-loop.

Action required: reconcile ecosystem.config.js against the actual script tree.

7. Shared Runtime Configuration¶

Regardless of which path deploys it, the running application has the same shape.

Docker image (UAT/Fargate)¶

Base: node:20.18.1-slim (multi-arch — supports both linux/amd64 and linux/arm64).

System deps in the image: nginx, libnss3, libexpat1, fontconfig (for Puppeteer/Chromium).

Entrypoint:

#!/bin/bash
nginx -g "daemon off;" &
node server.js

Exposed ports: 80 (nginx), 8080 (alternate).

Build: docker build -t someli-api:tag . (but see §6.3 for the missing install step).

nginx (`nginx.conf`)¶

listen     80
server_name uapi.someli.ai _;
location / { proxy_pass http://127.0.0.1:3000; }
client_max_body_size 100M;
gzip on;

Logs go to /var/log/nginx/access.log and error.log.

Application (`server.js`)¶

Setting	Value
Port	`process.env.port \\|\\| conf.port \\|\\| 5002` (binds 3000 in production behind nginx)
Body parser	150 MB (JSON), 50,000 parameter limit (capped to 100 MB by nginx)
File upload	enabled via `express-fileupload`
Heap limit (PM2)	2 GB (`--max-old-space-size=2048`)
Graceful shutdown	SIGTERM and SIGINT handlers close DB connection before exit (`server.js:205–214`)

Health endpoints¶

Endpoint	Purpose
`GET /health`	Liveness — server status, port, environment, timestamp
`GET /db-health`	Readiness — database connection & query test

Both should be wired to load-balancer health checks. Per Logging & Observability, they currently are the entire health-monitoring surface — there are no metrics or trace endpoints.

8. PM2 Process Management¶

File: ecosystem.config.js

Defines 39 PM2-managed processes. The HTTP API server is started separately by the deploy pipelines, not by pm2 start ecosystem.config.js.

App-config shape¶

{
  name: "job_name",
  script: "./job_file.js",
  args: "optional_args",
  autorestart: true | false
}

No error_file, out_file, log_date_format, cron_restart, or memory-limit directives are set on any of the 39 entries. Logs go to PM2's default location.

Categories represented¶

Category	Examples
Token refresh	`job_refresh_fb_token`, `job_weekly_refresh_fb_token`, `vertex_token`
Content generation	`job_design_generation`, `job_new_post_generation`, `subject_content_generation`
AI processing	`brandPositioning-AI-Response`, `objective-AI-Response`, `recomSubjects-AI-Response`
Scheduling & feed	`job_auto_post`, `job_post_schedule` (`App Friday_Cron`)
Media & validation	`job_post_color_image_verification`, `media_image_correction` (autorestart, args=100)
Data processing	`job_invoice_paid`, `job_library_search_terms`, `job_updateTemplateValid`
Insights	`job_account_insights`, `post_insights`
Other	`job_auto_disconnect`, `job_website_info`

For the full list and discrepancies between ecosystem.config.js and the actual file tree, see Jobs Inventory.

Common PM2 commands¶

pm2 start ecosystem.config.js     # Start every entry in the ecosystem
pm2 reload dev_api                # Zero-downtime reload of the API
pm2 list                          # Process list
pm2 logs job_design_generation    # Tail one process
pm2 monit                         # Live monitoring TUI
pm2 save                          # Persist current process list across reboots
pm2 restart all                   # Restart everything

9. Port Mappings¶

Service	Internal	External	Notes
Node.js API	3000	—	Behind nginx
Nginx	—	80	Public HTTP
Docker secondary	—	8080	Optional
MySQL (RDS)	3306	—	AWS internal

10. Infrastructure Overview¶

Component	Service	Region
Dev API host	AWS Lightsail and/or EC2	us-west-2
UAT runtime	AWS Fargate (ECS)	us-west-2
ECR registry	AWS ECR account `255061853867`	us-west-2
Production API host	AWS EC2 (assumed)	us-west-2 (assumed)
Database	AWS RDS (MySQL)	us-west-1
Primary storage	AWS S3 (`someli`)	us-west-2
Media storage	AWS S3 (`someli-medialib`)	us-west-1
AI inference	AWS Bedrock	us-east-1
Knowledge base	Google Cloud Storage	us-central1
AI (Gemini / Vertex)	Google Cloud	us-central1

Cross-region traffic (us-west-2 ↔ us-west-1 for DB and media; us-west-2 ↔ us-east-1 for Bedrock; us-west-2 ↔ us-central1 for Google) implies non-trivial egress cost and per-call latency. This topology should be reviewed against actual usage patterns.

11. Local Development Setup¶

# Install dependencies
yarn install

# Start with hot-reload (nodemon)
npm run dev

# Or start production-style
node server.js

# Start the worker fleet (optional in dev — usually skipped)
pm2 start ecosystem.config.js

Prerequisites:

Node.js 20.18.1 (recommended via NVM: nvm install 20.18.1 && nvm use 20.18.1)
MySQL/MariaDB (local or accessible RDS instance)
.env file populated — see Configuration Reference
pm2 installed globally if you want to run workers locally: npm install -g pm2
For media-related jobs: Sharp + Puppeteer system dependencies (libnss3, fontconfig)

12. Deployment Checklist¶

Before promoting a build to UAT or production:

Required environment variables set (Configuration Reference)
Schema migrations applied on the target DB — manual; no migration tool exists (see Enterprise Readiness §5.4)
S3 buckets reachable with the deploy environment's credentials
OAuth callback URLs match the target APP_URL
Paddle webhook URLs updated for the environment
Health endpoints respond: GET /health, GET /db-health
nginx active on port 80, proxying to Node on 3000
PM2 ecosystem started for background workers (if applicable to the environment)
For UAT: Fargate task definitions point to the new image tag (verify in AWS Console — task defs are not in this repo)
Smoke test the high-risk paths: login, scheduled-post creation, design render

Jobs Inventory — full PM2 catalog and the discrepancies with ecosystem.config.js
Configuration Reference — env vars per environment
Enterprise Readiness Assessment — strategic recommendations to consolidate the three pipelines and add IaC
Security — body-limit, CORS, secrets concerns relevant to deployment
Logging & Observability — PM2-log dependence and gaps in production observability