No Rendercam. No YOLO. No EMA smoothing. The pipeline mirrors Periscope's in-browser crop algorithm frame-for-frame, so the demo renders are the exact same crop your users would see once the production encoder path ships.
/api/vionlabs_timeline/vertical_video/<video_id>/
VionLabs analyzes the source film and produces per-frame x_center values
describing where the crop window should sit horizontally. We cache each title's response as
coords/<slug>_periscope.json — an array of {ts, x_center} samples plus
source_frame_width (typically 416 px, the analysis-frame width).
Gotcha: VionLabs' has_vertical_video=true flag does not mean a vertical MP4 exists — it only
means the coord JSON is available. Rendering is on us.
gs://cbs_ent/cbs_ent/movies/<id>/<name>_hdpmezz_*.mp4
Source MP4 URIs live in src_url_60.json. batch_cut_60.sh pulls each source
once via gcloud storage cp, cuts every variant (v1 / v5 / v6 / g3) from the same local copy,
then deletes the source to reclaim ~100 GB of local disk per catalog pass.
Gotcha: VionLabs proxies are often 360p. When we need 1080p, we reach past the proxy to
the cbs_ent mezzanine master.
vionlabs_cut.py
Two piped ffmpeg processes with numpy in the middle: a producer decodes the clip
range to raw BGR24, Python computes each frame's media time and interpolates x_center from
the coord timeline, a centered window of width height × 9 ÷ 16 is sliced and clamped,
and a consumer re-encodes with a fixed filter chain:
hqdn3d=2:1.5:4:3 # gentle denoise scale=1080:1920:flags=lanczos+full_chroma+accurate_rnd unsharp=5:5:0.6:5:5:0.0 # luma-only sharpen libx264 preset=medium crf=20 # ~4s encode per clip
Audio is muxed from the source AAC@192k with +faststart. About 22 seconds for a 5-clip set.
Roughly 200× faster than a Rendercam full-movie render.
whisper_subs.py
Runs locally on Apple Silicon CPU via faster-whisper with compute_type=int8 and
beam_size=1. VAD filter with 400 ms minimum silence. Output is a sibling
<clip>.subs.json with per-segment, per-word timings in clip-local seconds:
{
"segments": [{
"start": 0.0, "end": 2.84,
"words": [{"w":"I", "s":0.0, "e":0.12}, ...]
}, ...]
}
Ships ~5–10× realtime on an M2. The feed viewer uses requestVideoFrameCallback to highlight
the active word frame-accurately.
gs://kevin-shortform-demo/clips/
Clips and subs upload to a public-read bucket with
roles/storage.objectViewer: domain:cbsinteractive.com — authenticated Paramount users get 200,
external users get 403. CORS allows GitHub Pages, localhost, and trycloudflare origins,
so feed.html fetches MP4s and word-timing JSON directly from the browser with no server.
Why not signed URLs: signed URLs expire every 12 hours. The domain-gated bucket gives stable, non-expiring links for internal demo sharing.
| Inputs | GCS source MP4 (HD master) + VionLabs coord JSON (per-frame x_center) |
|---|---|
| Outputs | 1080×1920 H.264 / AAC MP4 + .subs.json word timings |
| Hardware | CPU only. No GPU path. Runs on Kevin's MacBook. |
| Determinism | Cut is fully deterministic (same inputs → identical MP4). Whisper is near-deterministic (greedy decode). |
| Throughput | ~4 seconds per clip on M2. A 5-clip set finishes in ~22 seconds. |
| Filter chain | hqdn3d → scale=lanczos → unsharp → libx264 crf=20 |
| Periscope fidelity | Mirrors the canvas-crop algorithm 1:1. Same x_center interpolation, same clamp behavior. |
Every Voyager page is served from gs://kevin-shortform-demo/. MP4s, .subs.json
word timings, narrative tree JSONs, feed_data_*.js per-variant manifests, and every HTML page
live at the root or under clips/. There's no intermediate server, no CDN layer, no signed-URL
refresh ritual. The browser talks directly to GCS.