Media Processing¶

Documentation for image/video handling: AWS S3 (dual-bucket / dual-region), Sharp for raster transforms, FFmpeg for video frames, Polotno for template-based design rendering, and stock-image providers (Pexels / Pixabay / Unsplash).

Overview¶

┌────────────────────────────────────────────────────────────────────┐
│                        Media Pipeline                              │
│                                                                    │
│  Input source              Processing                Storage       │
│  ──────────                ──────────                ───────       │
│  User upload  ─┐                                  ┌─► S3 (primary) │
│  Stock APIs   ─┼─► Sharp (resize/JPEG)  ──────────┤                │
│  Polotno JSON ─┤   FFmpeg (video frame)           └─► S3 (region2) │
│  AI gen image ─┘   Polotno (jsonToImage)                           │
└────────────────────────────────────────────────────────────────────┘

AWS S3 — Dual-Bucket Setup¶

S3 access is initialized in 36 files across the codebase using the AWS SDK v2 pattern. Configuration is centralized in conf.js:

Config key	Purpose
`S3_Bucket_Name`, `S3_Region`	Primary bucket — most uploads land here
`S3_Bucket_Name2`, `S3_Region2`	Secondary bucket / region
`AWS_ACCESS_KEY`, `AWS_SECRET_ACCESS_KEY`	Shared IAM credentials
`S3_Path_RAG`	Prefix under which RAG documents are stored (see RAG Pipeline)

Standard initialization¶

const AWS = require('aws-sdk');
const s3 = new AWS.S3({
    accessKeyId: conf.AWS_ACCESS_KEY,
    region: conf.S3_Region,
    secretAccessKey: conf.AWS_SECRET_ACCESS_KEY,
});

This block appears verbatim in helper/postValidation.js and ~35 job files. There is no shared S3 helper — each consumer instantiates its own client.

Bucket-selection convention¶

Primary bucket (S3_Bucket_Name / S3_Region) — user-generated content, generated designs, RAG documents
Secondary bucket (S3_Bucket_Name2 / S3_Region2) — typically used for region-specific delivery or as a backup region

The exact split is controlled per-job; see individual job source for which bucket is targeted.

Credential helper — `helper/files_upload.js`¶

files_upload.js adds an extra layer: - Fetches GCS service-account credentials from AWS Secrets Manager (used by Vertex/Cloud RAG — see RAG Pipeline) - Caches secrets for 1 hour - Provides dual-cloud upload (S3 + Google Cloud Storage) for files that need to land in both clouds

Sharp — Image Resize & Compression¶

Used in: 36 files, primarily background jobs and helper/postValidation.js.

Standard transform¶

const sharp = require('sharp');

const buffer = await sharp(inputBuffer)
    .resize({ width: 500 })
    .jpeg({ quality: 100 })
    .toBuffer();

This 500-pixel width / quality-100 JPEG is the codebase's de-facto thumbnail format. Examples:

Caller	Purpose
`job_update_color_logo.js`	Resize after applying brand colors / logo
`job_media_image_correction.js`	Resize after color-correcting designs
`helper/postValidation.js`	Generate thumbnails alongside full-size uploads
`job_carousel_pdf.js`	Pre-process pages before PDF assembly
`job_user_media_generation.js`	Personalized media render output
`job_thumbnail_*.js`	Thumbnail-specific variants

Notes¶

quality: 100 makes "JPEG compression" effectively a no-op compression — the conversion is being used for format normalization, not size reduction.
Sharp has no shared wrapper; every job duplicates the resize call. If you want to change the standard size, expect to update many files.

FFmpeg — Video Frame Extraction¶

Used in: routes/routes.js (the only direct caller).

Setup¶

const ffmpeg = require('fluent-ffmpeg');
const ffmpegPath = require('ffmpeg-static');
const ffprobePath = require('ffprobe-static').path;
ffmpeg.setFfmpegPath(ffmpegPath);
ffmpeg.setFfprobePath(ffprobePath);

ffmpeg-static and ffprobe-static ship the binaries with the package, so no system-level FFmpeg is required.

Operations¶

Operation	Detail
Thumbnail extraction	Seeks to `00:00:03`, captures a single frame at configurable dimensions
Metadata probing	`ffprobe` to read codec / duration metadata

Two endpoints in routes/routes.js use this to extract preview frames from uploaded videos before posting them to a social platform.

Limitations¶

No video encoding / re-encoding — inputs are passed through as-is.
No streaming — the full video must be on disk for ffprobe to read it.

Polotno — Template-Based Design Rendering¶

Polotno is the design engine. Designs are stored as JSON definitions; Polotno renders them to PNG/PDF via the polotno-node package.

Used in: ~49 files, including most generation and validation jobs.

Standard render call¶

const { createInstance } = require('polotno-node');

const instance = await createInstance({
    key: 'FXZvloSJvAe09-bdR9iC',  // shared client key, hardcoded
});
const base64Png = await instance.jsonToImageBase64(updateJson);

Supported operations¶

Method	Use
`jsonToImageBase64(json)`	Render a design to PNG/JPEG base64
`imageToPdf(images)`	Combine an image sequence into a PDF (used for carousel posts)

Major callers¶

job_user_media_generation.js — render personalized post images
job_media_image_correction.js — re-render after color correction
job_color_check.js — render to verify text/background contrast
job_carousel_pdf.js — assemble multi-page carousels into PDFs
helper/postValidation.js — render + upload during the validation pipeline

Notes & caveats¶

The Polotno key is hardcoded across all 49 files. If it ever needs to be rotated, expect a sweeping search-and-replace.
Each createInstance() call spins up a fresh Chromium-based renderer. There is no shared renderer pool — costs scale linearly with parallelism.
Polotno is the single biggest source of memory pressure across the job fleet.

helper/postValidation.js¶

⚠ checkVisibity does not call instance.close(). This function is itself one of the Polotno leak sites flagged in Integration #16. After the render and S3 uploads, the Polotno instance (which spawned a Chromium process) is left open. This function is called from many of the validation jobs, so the leak compounds. Adding await instance.close() at the end of the function would address it.

helper/postValidation.js — pipeline detail¶

Purpose: runs the post validation pipeline — checks color contrast, fixes visibility issues, renders the result, and uploads.

Key export: checkVisibity(data, processResolve)

Pipeline (top-of-file): 1. Inspect the design JSON for text elements. 2. Parse RGB colors and compute complementary contrast values. 3. Update text colors in the JSON if contrast is insufficient. 4. Upload the corrected JSON to S3. 5. Render the JSON via Polotno → base64. 6. Sharp resize → JPEG (full image and thumbnail). 7. Upload both to S3 with the standard naming convention.

This file is the canonical example of the full media pipeline. New jobs that produce designs typically copy this pattern.

Stock Images — `helper/stockImage.js`¶

Providers: Pexels, Pixabay, Unsplash.

Configuration: - Unsplash — uses the unsplash-js SDK (createApi(...)) with a key. - Pexels / Pixabay — API keys retrieved at runtime from the tApiKeys database table (so they can be rotated without redeploy).

Exports¶

Function	Purpose
`getStockImage({ content, type, con, length, typeId })`	Search a provider for stock photos matching a content topic
`uploadToS3({ sourceUrl, thumb, thumbnailS3Key, s3Key })`	Download a remote image, upload full-size + thumbnail to S3, return `{ imageLink, s3Key }`

Provider selection¶

The type / typeId parameters select the provider. Typical pattern: try one, fall back to the next on failure or empty results.

Standalone "generate_*" scripts¶

Three top-level scripts are AI-driven content generators that aren't in ecosystem.config.js:

Script	Cron	Purpose
`generate_company_industry.js`	every 40s	Classify company industry via Vertex Cloud RAG
`generate_organization_profile.js`	every 40s	Generate organization profile (spawns subprocess)
`generate_template_industry.js`	every 30s	Industry classification for templates

They run as standalone Node processes with their own node-cron schedules and an isOnProcess re-entry guard. To run them under PM2 you have to launch them manually — pm2 start ecosystem.config.js will not pick them up.

These scripts are not strictly part of the media pipeline but are documented here because they share the AI/cron pattern with media-generation jobs and are easy to miss.

Naming & Path Conventions¶

S3 keys follow consistent prefixes (verify in source for exact format):

Prefix	Content
`account_id_<id>/`	Per-account user uploads + RAG docs (under `S3_Path_RAG`)
`posts/<account>/<post>/`	Generated post images + thumbnails
`templates/...`	Polotno design templates

Thumbnails are conventionally suffixed with _thumb or stored under /thumb/.

Operational Notes¶

No CDN configuration in code — assets are served via direct S3 URLs (or whatever CDN is fronting S3 at the infra level).
No image optimization beyond Sharp resize — no WebP, no AVIF, no responsive variants. Just JPEG @ 500 px.
No streaming uploads — large files are buffered in memory (the express.json({ limit: '150mb' }) body limit lets this happen — see Security).
Polotno workers are not pooled — each job spawns its own Chromium. Watch RSS in production.
Stock-image providers can rate-limit — keys live in tApiKeys, so you can rotate without a deploy.

RAG Pipeline — S3 layout for RAG documents
Content Pipeline — how generation, validation, and branding jobs feed media into S3
Jobs Inventory — full list of media-related jobs
Integration Inventory — AWS, Polotno, stock-image provider details
Configuration Reference — required S3 / AWS env vars