Skip to main content
Zap supports 11 step kinds. Each maps to a generation category and is routed to the appropriate provider adapter at runtime — the adapter’s supports(capability, model) method is checked before any job is submitted. Steps are connected via inputs references that name upstream step IDs, forming a directed dependency graph that the runtime resolves in order.

Creative Pipeline Grammar

The canonical pipeline flow for a Zap video recipe is:
InitialFrame -> InitialGen -> ExtendGen x N -> stitch -> Zap.mp4
An optional revision step can be inserted between InitialGen and ExtendGen:
InitialFrame -> InitialGen -> InitialGenReViz? -> ExtendGen x N -> stitch -> Zap.mp4
Audio steps (audio.tts, audio.music, audio.sfx) and keyframes can be placed anywhere in the graph and are merged during stitching.

image.gen

Generate a first frame, storyboard, character sheet, or reference image. This is typically the first step in a video pipeline — its output is passed as the anchor frame to video.gen. Key fields: model, prompt, reference_images, candidates, tier
- id: initial_frame
  kind: image.gen
  provider: mock
  model: mock-image
  prompt: prompts/initial-frame.md
Set candidates: 4 combined with rlhf: true to generate four candidate frames and have a human (or VLM judge) pick the best before the video pipeline begins.

image.edit

Transform an input image while preserving subject identity. Use inputs to reference the upstream step whose output should be edited. Common uses: style transfer, background replacement, lighting adjustment, or inpainting. Key fields: inputs (upstream image step ID), model, prompt, reference_images
- id: styled_frame
  kind: image.edit
  provider: gmi
  model: gmi-edit-model
  inputs: [initial_frame]
  prompt: prompts/style-transfer.md

video.gen

Animate image or prompt inputs into a video clip. The upstream image.gen step is typically listed in inputs to provide the first frame; duration_s sets the clip length billed to the provider. Key fields: inputs, duration_s, model, prompt, candidates, tier
- id: initial_gen
  kind: video.gen
  provider: gmi
  model: seedance-2-0-260128
  inputs: [initial_frame]
  duration_s: 5
  prompt: prompts/initial-gen.md
Provider notes:
  • seedance-2-0-260128 — billed at $0.07/s
  • fal-ai/kling-video/v2.1/pro/image-to-video — billed at $0.28/s
  • fal-ai/veo3.1 — billed at $0.45/s
  • happyhorse-1.1-i2v — billed at $0.28/s
  • gemini-omni-flash-preview — billed at $0.10/s

video.extend

Continue a clip from its last frame. The inputs field references the step to extend from. Use repeat to define a variable-length extension chain — the extendCount parameter at run time controls how many copies are instantiated by expandRepeatSteps(). Use extend.mode to control whether each iteration chains from the previous clip’s last frame (chain) or always anchors to the original first frame (anchored). Key fields: inputs, duration_s, model, prompt, repeat, extend
- id: extend
  kind: video.extend
  provider: gmi
  model: seedance-2-0-260128
  inputs: [initial_gen]
  duration_s: 5
  repeat:
    min: 0
    max: 4
    default: 2
  extend:
    mode: chain
The planner expands this into up to 4 numbered steps: extend_1, extend_2, extend_3, extend_4. With extendCount: 2, only extend_1 and extend_2 are submitted.

video.edit

Revise a clip with a prompt or composition layer. Use this step to apply motion effects, overlay graphics, adjust pacing, or re-light a clip in post. References an upstream video.gen or video.extend step via inputs. Key fields: inputs, model, prompt, duration_s
- id: revised_clip
  kind: video.edit
  provider: gmi
  model: gmi-video-edit
  inputs: [initial_gen]
  prompt: prompts/revision.md

video.upscale

Produce a higher-resolution version of a clip. Typically placed after the final video.extend step and before stitch. Uses a dedicated upscale model variant. Key fields: inputs, model, duration_s
- id: upscaled
  kind: video.upscale
  provider: gmi
  model: seedance-2-0-260128-upscale
  inputs: [extend]
  duration_s: 5
Provider notes: seedance-2-0-260128-upscale is billed at $0.056/s.

audio.tts

Generate voiceover from text. The prompt file contains the spoken text with optional {VARIABLE} references for dynamic content. Output is a .wav asset that the stitch step mixes into the final video. Key fields: model, prompt
- id: voiceover
  kind: audio.tts
  provider: gmi
  model: tts-model
  prompt: prompts/voiceover.md

audio.music

Generate background music for the video. Describe the mood, genre, and tempo in the prompt file. Duration is synchronized with the video length at stitch time. Key fields: model, prompt, duration_s
- id: background_music
  kind: audio.music
  provider: gmi
  model: music-gen
  prompt: prompts/music.md
  duration_s: 30

audio.sfx

Generate sound effects triggered at specific moments in the video. Use the prompt file to describe the sound (e.g. “whoosh, cinematic impact”). The stitch step positions SFX assets at the correct timestamp. Key fields: model, prompt
- id: sfx_impact
  kind: audio.sfx
  provider: gmi
  model: sfx-gen
  prompt: prompts/impact-sfx.md

keyframes

Extract, score, or prepare frames for a downstream step. Use this step to isolate key moments from a video clip for reference-based generation, or to provide scored candidate frames to a video.edit step. keyframes and stitch are the only two “local” step kinds — quoteStep() returns $0 for both. Key fields: inputs, keyframes (provider-specific config record)
- id: extracted_frames
  kind: keyframes
  inputs: [initial_gen]

stitch

Combine all resolved assets into the final Zap artifact. This must be the last step in the pipeline. The inputs array lists every clip and audio asset to include; ordering determines the timeline sequence. stitch and keyframes are local steps — no provider API call is made and no cost is incurred. Key fields: inputs, stitch (ZapStitch config)
- id: stitch
  kind: stitch
  inputs: [initial_gen]
  stitch:
    engine: auto
    format: mp4
    quality: standard

ZapStitch Engine Options

EngineDescription
autoRuntime selects the best available engine — prefers hyperframes when installed, falls back to local
localFFmpeg-based assembly — always available, no extra dependencies
hyperframesHTML composition via the HyperFrames CLI — requires DESIGN.md in the recipe root

HyperFrames Stitching

Use engine: hyperframes when the recipe needs HTML-based composition:
- id: stitch
  kind: stitch
  inputs: [upscaled, voiceover, background_music]
  stitch:
    engine: hyperframes
    format: mp4
    quality: standard
    fps: 24
HyperFrames recipes must include a DESIGN.md visual identity file before composition HTML is generated. At runtime, Zap writes a minimal temporary DESIGN.md automatically so that provider assets render through a compliant HyperFrames project.If the HyperFrames CLI is not installed or a generated composition check fails (npx hyperframes lint / validate / inspect), the runtime records the error on the step and falls back to the first resolved stitch asset. The run does not fail.

Model Rate Table

The following rates are used by quoteStep() (sourced from planner.ts). Models not in this table cost $0 (either local steps, mock models, or models without a declared rate).
ModelBillingRate
fal-ai/flux/devPer request$0.03
fal-ai/kling-video/v2.1/pro/image-to-videoPer second$0.28/s
fal-ai/veo3.1Per second$0.45/s
gemini-omni-flash-previewPer second$0.10/s
happyhorse-1.1-i2vPer second$0.28/s
seedance-2-0-260128Per second$0.07/s
seedance-2-0-260128-upscalePer second$0.056/s
For time-billed models, the formula is rate × duration_s. If duration_s is not set on the step, a default of 1 second is used for the estimate.

Step Kind Summary

KindCategoryLocalCost Basis
image.genImage generationNoPer request
image.editImage transformationNoPer request
video.genVideo generationNoPer second
video.extendVideo continuationNoPer second
video.editVideo revisionNoPer second
video.upscaleVideo upscalingNoPer second
audio.ttsText-to-speechNoPer request
audio.musicMusic generationNoPer request
audio.sfxSound effectsNoPer request
keyframesFrame extractionYes$0
stitchFinal assemblyYes$0