Skip to content

Deployment & DevOps

This document describes how the someli-api codebase is built and deployed across environments. The reality is more complex than a single pipeline: three distinct deployment paths coexist, each targeting a different environment with different tooling.


At a glance

┌──────────────────────────────────────────────────────────────────────┐
│ DEV environment                                                      │
│   • API: Lightsail / EC2 + nginx + PM2                              │
│   • Deploy on push to dev_api via GitHub Actions (and/or Jenkins) ── │
│   • Workers: NOT deployed by either pipeline (manual or none)        │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│ UAT environment (uapi.someli.ai)                                     │
│   • Single ECR image, run as multiple Fargate tasks (one per role)   │
│   • API task + per-worker tasks + Paddle-webhook task — all share    │
│     the same image, with CMD overrides selecting which script runs   │
│   • Image build: manual (developer runs ./push.sh) — no automation   │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│ PROD environment                                                     │
│   • API: EC2 + nginx + PM2 (inferred — same pattern as dev)          │
│   • No deployment automation in this repo                            │
│   • Promotion: branch-merge to `main`; deploy is manual              │
└──────────────────────────────────────────────────────────────────────┘

The three environments are deployed by different mechanisms, on different platforms, with different levels of automation. Section §6 covers the consequences.


1. Branch Strategy

dev_api  ──→  uat_api  ──→  main
   │              │            │
   ▼              ▼            ▼
Dev (CI/CD)   UAT/Fargate   Production
              (manual ECR)  (no automation visible)
Branch Environment Deploy mechanism
dev_api Dev GitHub Actions on push (and Jenkins — see §6.1)
uat_api UAT (Fargate) Manual ./push.sh from a developer's workstation
main Production Not automated in this repo

Feature-branch naming patterns observed: new/, fix/, dev/, feature/.


2. Path 1 — Dev: GitHub Actions → SSH → Lightsail + PM2

File: .github/workflows/dev-api-deploy.yml

Trigger: push to dev_api.

Steps:

  1. Checkout (depth 2, for diff stat).
  2. Extract commit metadata: author, email, message, SHA, files changed, +/- counts.
  3. Detect associated PR via the GitHub API (multiple fallback strategies).
  4. SSH into the deploy host and run:
    cd /home/ubuntu/someli-api
    git pull origin dev_api
    source /home/ubuntu/.nvm/nvm.sh
    pm2 reload dev_api || pm2 reload 0
    
  5. Send Microsoft Teams notification (Adaptive Card) on success or failure with author / PR / diff details.

Notably, this pipeline does NOT run yarn install. It assumes the dev host either has dependencies already installed and unchanged, or that someone runs install manually if package.json changes. Adding a new dependency requires manual intervention on the host.

Required secrets:

Secret Purpose
LIGHTSAIL_HOST Deployment server hostname or IP
LIGHTSAIL_USER SSH user (default ubuntu)
LIGHTSAIL_SSH_KEY Private SSH key
DEV_API_WEBHOOK_URL Microsoft Teams webhook

What it deploys: the HTTP API process only. The 39 PM2 worker apps in ecosystem.config.js are not touched.


3. Path 2 — Dev (alternate): Jenkins → SSH → EC2 + PM2

File: Jenkinsfile

Trigger: Jenkins job (manual or polled — not specified in the file).

Hardcoded target:

DEPLOY_HOST = '54.189.254.22'           // us-west-2 EC2
DEPLOY_PATH = '/home/ubuntu/someli-api'
BRANCH = 'dev_api'

Stages:

Stage 1 — Deploy and Build

  1. SSH into the host using sshagent(['somelijenkins3']).
  2. Generate a temp script with embedded GITHUB_TOKEN; scp to the host; execute.
  3. Inside the host:
  4. git pull or initial clone using the token.
  5. yarn install --frozen-lockfile (fallback to yarn install, then npm install).
  6. Locate or install PM2.
  7. Reload the dev_api PM2 process, or start it fresh:
    pm2 start ${DEPLOY_PATH}/server.js \
      --name "dev_api" \
      --node-args="--max-old-space-size=2048"
    
  8. Fallback: nohup node server.js & if PM2 cannot be installed.
  9. Cleanup token artifacts.

Stage 2 — Health Check

  • 10-second wait. Optional curl to /health (commented out).

Notifications

  • Success: Office365 connector (Teams) message with commit range, committer, developer, change list, build URL.
  • Failure: Same channel with the last 20 lines of build log.

Required Jenkins credentials:

Credential ID Purpose
somelijenkins1 GitHub token (string)
somelijenkins3 SSH key (sshagent)

This pipeline DOES run yarn install on every deploy — unlike the GHA pipeline.


4. Path 3 — UAT: Manual ECR Push → Fargate

Files: push.sh, deploy_image.py, Dockerfile, nginx.conf

How it's triggered

# push.sh — note the hardcoded developer path
python deploy_image.py \
  --image-tag "uapi" \
  --repository-uri 255061853867.dkr.ecr.us-west-2.amazonaws.com/uat \
  --dockerfile /home/nisanth/someli-api

A developer runs ./push.sh from their workstation. There is no CI trigger.

What deploy_image.py does

  1. Authenticates to ECR via boto3.
  2. Runs docker build.
  3. Tags the image as 255061853867.dkr.ecr.us-west-2.amazonaws.com/uat:uapi.
  4. Pushes to ECR.

The same :uapi tag is overwritten on every push — there is no image versioning.

How the image is consumed

The Fargate task definitions, ECS service config, ALB rules, and IAM policies that consume this image live outside this repo (likely in a separate IaC repo or AWS Console). The pattern in use is:

One ECR image, many Fargate tasks. Each task definition pulls the same image but overrides CMD to run a different script: - The API task: CMD ["node", "server.js"] (or entrypoint.sh running nginx + node) - One task per worker job: CMD ["node", "job_facebook_publish.js"], etc. - A Paddle-webhook task

This is consistent with the Dockerfile's comment: # Default command — override with specific job script using docker run args.

Hostname

nginx.conf declares server_name uapi.someli.ai _; and proxies to 127.0.0.1:3000. The uapi.someli.ai hostname is the UAT API endpoint.


5. Path 4 — Production: undocumented

There is no production deployment automation in this repo. Inferred state:

  • The HTTP API runs on a dedicated EC2 host with the same nginx + PM2 pattern as dev.
  • Production deploys are manual (someone SSHes in, git pulls main, runs pm2 reload).
  • The worker fleet's production runtime is unclear: it may be Fargate (using the UAT pattern with a different ECR tag), or may also be PM2-on-EC2.

This gap should be closed. Manual prod deployments with no audit trail, no rollback plan, and no documented runbook are not acceptable for a paid-customer platform.


6. Issues & Risks

The artifacts in this repo reveal several deployment-pipeline issues that warrant attention.

6.1 Two CI systems on the same branch

Both Jenkins (Jenkinsfile) and GitHub Actions (dev-api-deploy.yml) are wired to deploy on dev_api. Possibilities:

  • Both are live and deploy to different dev hosts (Jenkins → 54.189.254.22, GHA → LIGHTSAIL_HOST secret). In which case, the dev fleet has two parallel APIs being kept in sync — wasted compute, deploy races possible.
  • Both target the same host, in which case every commit triggers two deploys racing each other — wasted CI and a real risk of mid-deploy collision.
  • One is legacy and is still wired up but not in active use.

Action required: confirm which is canonical and retire the other.

6.2 Hardcoded values in deploy artifacts

File Hardcoded value Risk
Jenkinsfile 54.189.254.22 (EC2 IP) If host is replaced, pipeline breaks silently
push.sh /home/nisanth/someli-api (dev workstation path) UAT releases break if anyone but nisanth tries one — truck factor of 1 for UAT releases
push.sh ECR account 255061853867, region us-west-2, tag uapi If account / region / tagging convention changes, manual edit required

Action required: parameterize these via env vars or config.

6.3 The Dockerfile has no install step

FROM node:20.18.1-slim
# ... apt install nginx etc. ...
COPY nginx.conf /etc/nginx/nginx.conf
COPY . .
# (no `yarn install` or `npm ci` anywhere)
CMD ["/entrypoint.sh"]

.dockerignore excludes node_modules, so a clean checkout would build an image with no dependencies that fails on startup. Two possible explanations:

  1. push.sh is run from the developer's working directory (/home/nisanth/someli-api), where .dockerignore is interpreted by docker but the working directory contains an installed node_modules. If that node_modules is not excluded for some reason (or the developer's local checkout doesn't honor .dockerignore), the pushed image bakes in dependencies installed on nisanth's laptop. This produces non-reproducible images — every UAT release contains whatever was on the dev laptop at build time.
  2. There is a separate build process not in this repo. In that case, the artifacts here are a partial picture of UAT releases.

Action required: add RUN yarn install --frozen-lockfile --production to the Dockerfile so the build is reproducible from a clean checkout.

6.4 GitHub Actions pipeline doesn't install dependencies

The dev-api-deploy.yml pipeline pulls and reloads, but does NOT run yarn install. If a PR adds a dependency, the deploy will silently succeed but the new module will be missing. The next time someone manually SSHes in to install it, behavior changes mid-day with no clear cause.

Action required: add the install step to the GHA workflow (and verify whether the host has cached dependencies that mask this).

6.5 No image versioning

push.sh tags the image as :uapi and overwrites it every push. This means:

  • No ability to roll back to a known-good image without rebuilding.
  • No correlation between an image and a git commit.
  • ECS Fargate task definitions referencing :uapi always pull the latest, regardless of testing state.

Action required: tag images with the git SHA (uapi-${git_sha}) and update task definitions on each release.

6.6 nginx and Express body limits don't match

Layer Limit
nginx (nginx.conf) 100 MB
Express (server.js) 150 MB

The smaller wins. The Express setting is misleading — clients hitting the 100–150 MB band get rejected by nginx with no clear reason. Also: 100 MB is itself unusually large for an HTTP API.

Action required: decide a single canonical limit (much smaller — see Security), align both layers.

6.7 No worker deployment in dev

The Jenkins and GHA pipelines start only the dev_api PM2 process. The 39 worker apps in ecosystem.config.js are not touched. Either:

  • Dev has no workers running (and UAT/prod are the only environments where the worker fleet executes — making local debugging of worker bugs hard), or
  • Workers are started manually via pm2 start ecosystem.config.js once and left alone (in which case they don't get reloaded when their code changes — silent staleness).

Action required: decide whether dev needs workers, and either deploy them via the same pipeline or document that they're production-only.

6.8 No production automation

No artifact in this repo deploys to production. Promotions to main happen via merge, but the deploy is human-driven on an unknown machine. No rollback plan, no audit trail, no health-gating.

Action required: add a production deploy pipeline (Path 1 or Path 3 pattern, hardened) before the next major release.

6.9 ecosystem.config.js script-path drift

Per Jobs Inventory § Known Discrepancies, ecosystem.config.js references at least 7 nonexistent or wrong-path script files. PM2 will fail to start them. If the same file is consumed verbatim in a Fargate task definition spawn list, those tasks will crash-loop.

Action required: reconcile ecosystem.config.js against the actual script tree.


7. Shared Runtime Configuration

Regardless of which path deploys it, the running application has the same shape.

Docker image (UAT/Fargate)

Base: node:20.18.1-slim (multi-arch — supports both linux/amd64 and linux/arm64).

System deps in the image: nginx, libnss3, libexpat1, fontconfig (for Puppeteer/Chromium).

Entrypoint:

#!/bin/bash
nginx -g "daemon off;" &
node server.js

Exposed ports: 80 (nginx), 8080 (alternate).

Build: docker build -t someli-api:tag . (but see §6.3 for the missing install step).

nginx (nginx.conf)

listen     80
server_name uapi.someli.ai _;
location / { proxy_pass http://127.0.0.1:3000; }
client_max_body_size 100M;
gzip on;

Logs go to /var/log/nginx/access.log and error.log.

Application (server.js)

Setting Value
Port process.env.port \|\| conf.port \|\| 5002 (binds 3000 in production behind nginx)
Body parser 150 MB (JSON), 50,000 parameter limit (capped to 100 MB by nginx)
File upload enabled via express-fileupload
Heap limit (PM2) 2 GB (--max-old-space-size=2048)
Graceful shutdown SIGTERM and SIGINT handlers close DB connection before exit (server.js:205–214)

Health endpoints

Endpoint Purpose
GET /health Liveness — server status, port, environment, timestamp
GET /db-health Readiness — database connection & query test

Both should be wired to load-balancer health checks. Per Logging & Observability, they currently are the entire health-monitoring surface — there are no metrics or trace endpoints.


8. PM2 Process Management

File: ecosystem.config.js

Defines 39 PM2-managed processes. The HTTP API server is started separately by the deploy pipelines, not by pm2 start ecosystem.config.js.

App-config shape

{
  name: "job_name",
  script: "./job_file.js",
  args: "optional_args",
  autorestart: true | false
}

No error_file, out_file, log_date_format, cron_restart, or memory-limit directives are set on any of the 39 entries. Logs go to PM2's default location.

Categories represented

Category Examples
Token refresh job_refresh_fb_token, job_weekly_refresh_fb_token, vertex_token
Content generation job_design_generation, job_new_post_generation, subject_content_generation
AI processing brandPositioning-AI-Response, objective-AI-Response, recomSubjects-AI-Response
Scheduling & feed job_auto_post, job_post_schedule (App Friday_Cron)
Media & validation job_post_color_image_verification, media_image_correction (autorestart, args=100)
Data processing job_invoice_paid, job_library_search_terms, job_updateTemplateValid
Insights job_account_insights, post_insights
Other job_auto_disconnect, job_website_info

For the full list and discrepancies between ecosystem.config.js and the actual file tree, see Jobs Inventory.

Common PM2 commands

pm2 start ecosystem.config.js     # Start every entry in the ecosystem
pm2 reload dev_api                # Zero-downtime reload of the API
pm2 list                          # Process list
pm2 logs job_design_generation    # Tail one process
pm2 monit                         # Live monitoring TUI
pm2 save                          # Persist current process list across reboots
pm2 restart all                   # Restart everything

9. Port Mappings

Service Internal External Notes
Node.js API 3000 Behind nginx
Nginx 80 Public HTTP
Docker secondary 8080 Optional
MySQL (RDS) 3306 AWS internal

10. Infrastructure Overview

Component Service Region
Dev API host AWS Lightsail and/or EC2 us-west-2
UAT runtime AWS Fargate (ECS) us-west-2
ECR registry AWS ECR account 255061853867 us-west-2
Production API host AWS EC2 (assumed) us-west-2 (assumed)
Database AWS RDS (MySQL) us-west-1
Primary storage AWS S3 (someli) us-west-2
Media storage AWS S3 (someli-medialib) us-west-1
AI inference AWS Bedrock us-east-1
Knowledge base Google Cloud Storage us-central1
AI (Gemini / Vertex) Google Cloud us-central1

Cross-region traffic (us-west-2 ↔ us-west-1 for DB and media; us-west-2 ↔ us-east-1 for Bedrock; us-west-2 ↔ us-central1 for Google) implies non-trivial egress cost and per-call latency. This topology should be reviewed against actual usage patterns.


11. Local Development Setup

# Install dependencies
yarn install

# Start with hot-reload (nodemon)
npm run dev

# Or start production-style
node server.js

# Start the worker fleet (optional in dev — usually skipped)
pm2 start ecosystem.config.js

Prerequisites:

  • Node.js 20.18.1 (recommended via NVM: nvm install 20.18.1 && nvm use 20.18.1)
  • MySQL/MariaDB (local or accessible RDS instance)
  • .env file populated — see Configuration Reference
  • pm2 installed globally if you want to run workers locally: npm install -g pm2
  • For media-related jobs: Sharp + Puppeteer system dependencies (libnss3, fontconfig)

12. Deployment Checklist

Before promoting a build to UAT or production:

  1. Required environment variables set (Configuration Reference)
  2. Schema migrations applied on the target DB — manual; no migration tool exists (see Enterprise Readiness §5.4)
  3. S3 buckets reachable with the deploy environment's credentials
  4. OAuth callback URLs match the target APP_URL
  5. Paddle webhook URLs updated for the environment
  6. Health endpoints respond: GET /health, GET /db-health
  7. nginx active on port 80, proxying to Node on 3000
  8. PM2 ecosystem started for background workers (if applicable to the environment)
  9. For UAT: Fargate task definitions point to the new image tag (verify in AWS Console — task defs are not in this repo)
  10. Smoke test the high-risk paths: login, scheduled-post creation, design render