Skip to content

Media Processing

Documentation for image/video handling: AWS S3 (dual-bucket / dual-region), Sharp for raster transforms, FFmpeg for video frames, Polotno for template-based design rendering, and stock-image providers (Pexels / Pixabay / Unsplash).


Overview

┌────────────────────────────────────────────────────────────────────┐
│                        Media Pipeline                              │
│                                                                    │
│  Input source              Processing                Storage       │
│  ──────────                ──────────                ───────       │
│  User upload  ─┐                                  ┌─► S3 (primary) │
│  Stock APIs   ─┼─► Sharp (resize/JPEG)  ──────────┤                │
│  Polotno JSON ─┤   FFmpeg (video frame)           └─► S3 (region2) │
│  AI gen image ─┘   Polotno (jsonToImage)                           │
└────────────────────────────────────────────────────────────────────┘

AWS S3 — Dual-Bucket Setup

S3 access is initialized in 36 files across the codebase using the AWS SDK v2 pattern. Configuration is centralized in conf.js:

Config key Purpose
S3_Bucket_Name, S3_Region Primary bucket — most uploads land here
S3_Bucket_Name2, S3_Region2 Secondary bucket / region
AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY Shared IAM credentials
S3_Path_RAG Prefix under which RAG documents are stored (see RAG Pipeline)

Standard initialization

const AWS = require('aws-sdk');
const s3 = new AWS.S3({
    accessKeyId: conf.AWS_ACCESS_KEY,
    region: conf.S3_Region,
    secretAccessKey: conf.AWS_SECRET_ACCESS_KEY,
});

This block appears verbatim in helper/postValidation.js and ~35 job files. There is no shared S3 helper — each consumer instantiates its own client.

Bucket-selection convention

  • Primary bucket (S3_Bucket_Name / S3_Region) — user-generated content, generated designs, RAG documents
  • Secondary bucket (S3_Bucket_Name2 / S3_Region2) — typically used for region-specific delivery or as a backup region

The exact split is controlled per-job; see individual job source for which bucket is targeted.

Credential helper — helper/files_upload.js

files_upload.js adds an extra layer: - Fetches GCS service-account credentials from AWS Secrets Manager (used by Vertex/Cloud RAG — see RAG Pipeline) - Caches secrets for 1 hour - Provides dual-cloud upload (S3 + Google Cloud Storage) for files that need to land in both clouds


Sharp — Image Resize & Compression

Used in: 36 files, primarily background jobs and helper/postValidation.js.

Standard transform

const sharp = require('sharp');

const buffer = await sharp(inputBuffer)
    .resize({ width: 500 })
    .jpeg({ quality: 100 })
    .toBuffer();

This 500-pixel width / quality-100 JPEG is the codebase's de-facto thumbnail format. Examples:

Caller Purpose
job_update_color_logo.js Resize after applying brand colors / logo
job_media_image_correction.js Resize after color-correcting designs
helper/postValidation.js Generate thumbnails alongside full-size uploads
job_carousel_pdf.js Pre-process pages before PDF assembly
job_user_media_generation.js Personalized media render output
job_thumbnail_*.js Thumbnail-specific variants

Notes

  • quality: 100 makes "JPEG compression" effectively a no-op compression — the conversion is being used for format normalization, not size reduction.
  • Sharp has no shared wrapper; every job duplicates the resize call. If you want to change the standard size, expect to update many files.

FFmpeg — Video Frame Extraction

Used in: routes/routes.js (the only direct caller).

Setup

const ffmpeg = require('fluent-ffmpeg');
const ffmpegPath = require('ffmpeg-static');
const ffprobePath = require('ffprobe-static').path;
ffmpeg.setFfmpegPath(ffmpegPath);
ffmpeg.setFfprobePath(ffprobePath);

ffmpeg-static and ffprobe-static ship the binaries with the package, so no system-level FFmpeg is required.

Operations

Operation Detail
Thumbnail extraction Seeks to 00:00:03, captures a single frame at configurable dimensions
Metadata probing ffprobe to read codec / duration metadata

Two endpoints in routes/routes.js use this to extract preview frames from uploaded videos before posting them to a social platform.

Limitations

  • No video encoding / re-encoding — inputs are passed through as-is.
  • No streaming — the full video must be on disk for ffprobe to read it.

Polotno — Template-Based Design Rendering

Polotno is the design engine. Designs are stored as JSON definitions; Polotno renders them to PNG/PDF via the polotno-node package.

Used in: ~49 files, including most generation and validation jobs.

Standard render call

const { createInstance } = require('polotno-node');

const instance = await createInstance({
    key: 'FXZvloSJvAe09-bdR9iC',  // shared client key, hardcoded
});
const base64Png = await instance.jsonToImageBase64(updateJson);

Supported operations

Method Use
jsonToImageBase64(json) Render a design to PNG/JPEG base64
imageToPdf(images) Combine an image sequence into a PDF (used for carousel posts)

Major callers

  • job_user_media_generation.js — render personalized post images
  • job_media_image_correction.js — re-render after color correction
  • job_color_check.js — render to verify text/background contrast
  • job_carousel_pdf.js — assemble multi-page carousels into PDFs
  • helper/postValidation.js — render + upload during the validation pipeline

Notes & caveats

  • The Polotno key is hardcoded across all 49 files. If it ever needs to be rotated, expect a sweeping search-and-replace.
  • Each createInstance() call spins up a fresh Chromium-based renderer. There is no shared renderer pool — costs scale linearly with parallelism.
  • Polotno is the single biggest source of memory pressure across the job fleet.

helper/postValidation.js

checkVisibity does not call instance.close(). This function is itself one of the Polotno leak sites flagged in Integration #16. After the render and S3 uploads, the Polotno instance (which spawned a Chromium process) is left open. This function is called from many of the validation jobs, so the leak compounds. Adding await instance.close() at the end of the function would address it.

helper/postValidation.js — pipeline detail

Purpose: runs the post validation pipeline — checks color contrast, fixes visibility issues, renders the result, and uploads.

Key export: checkVisibity(data, processResolve)

Pipeline (top-of-file): 1. Inspect the design JSON for text elements. 2. Parse RGB colors and compute complementary contrast values. 3. Update text colors in the JSON if contrast is insufficient. 4. Upload the corrected JSON to S3. 5. Render the JSON via Polotno → base64. 6. Sharp resize → JPEG (full image and thumbnail). 7. Upload both to S3 with the standard naming convention.

This file is the canonical example of the full media pipeline. New jobs that produce designs typically copy this pattern.


Stock Images — helper/stockImage.js

Providers: Pexels, Pixabay, Unsplash.

Configuration: - Unsplash — uses the unsplash-js SDK (createApi(...)) with a key. - Pexels / Pixabay — API keys retrieved at runtime from the tApiKeys database table (so they can be rotated without redeploy).

Exports

Function Purpose
getStockImage({ content, type, con, length, typeId }) Search a provider for stock photos matching a content topic
uploadToS3({ sourceUrl, thumb, thumbnailS3Key, s3Key }) Download a remote image, upload full-size + thumbnail to S3, return { imageLink, s3Key }

Provider selection

The type / typeId parameters select the provider. Typical pattern: try one, fall back to the next on failure or empty results.


Standalone "generate_*" scripts

Three top-level scripts are AI-driven content generators that aren't in ecosystem.config.js:

Script Cron Purpose
generate_company_industry.js every 40s Classify company industry via Vertex Cloud RAG
generate_organization_profile.js every 40s Generate organization profile (spawns subprocess)
generate_template_industry.js every 30s Industry classification for templates

They run as standalone Node processes with their own node-cron schedules and an isOnProcess re-entry guard. To run them under PM2 you have to launch them manually — pm2 start ecosystem.config.js will not pick them up.

These scripts are not strictly part of the media pipeline but are documented here because they share the AI/cron pattern with media-generation jobs and are easy to miss.


Naming & Path Conventions

S3 keys follow consistent prefixes (verify in source for exact format):

Prefix Content
account_id_<id>/ Per-account user uploads + RAG docs (under S3_Path_RAG)
posts/<account>/<post>/ Generated post images + thumbnails
templates/... Polotno design templates

Thumbnails are conventionally suffixed with _thumb or stored under /thumb/.


Operational Notes

  • No CDN configuration in code — assets are served via direct S3 URLs (or whatever CDN is fronting S3 at the infra level).
  • No image optimization beyond Sharp resize — no WebP, no AVIF, no responsive variants. Just JPEG @ 500 px.
  • No streaming uploads — large files are buffered in memory (the express.json({ limit: '150mb' }) body limit lets this happen — see Security).
  • Polotno workers are not pooled — each job spawns its own Chromium. Watch RSS in production.
  • Stock-image providers can rate-limit — keys live in tApiKeys, so you can rotate without a deploy.