Zap Step Kinds: All 11 Pipeline Step Types Explained

Zap supports 11 step kinds. Each maps to a generation category and is routed to the appropriate provider adapter at runtime — the adapter’s supports(capability, model) method is checked before any job is submitted. Steps are connected via inputs references that name upstream step IDs, forming a directed dependency graph that the runtime resolves in order.

Creative Pipeline Grammar

The canonical pipeline flow for a Zap video recipe is:

InitialFrame -> InitialGen -> ExtendGen x N -> stitch -> Zap.mp4

An optional revision step can be inserted between InitialGen and ExtendGen:

InitialFrame -> InitialGen -> InitialGenReViz? -> ExtendGen x N -> stitch -> Zap.mp4

Audio steps (audio.tts, audio.music, audio.sfx) and keyframes can be placed anywhere in the graph and are merged during stitching.

`image.gen`

Generate a first frame, storyboard, character sheet, or reference image. This is typically the first step in a video pipeline — its output is passed as the anchor frame to video.gen. Key fields: model, prompt, reference_images, candidates, tier

- id: initial_frame
  kind: image.gen
  provider: mock
  model: mock-image
  prompt: prompts/initial-frame.md

Set candidates: 4 combined with rlhf: true to generate four candidate frames and have a human (or VLM judge) pick the best before the video pipeline begins.

`image.edit`

Transform an input image while preserving subject identity. Use inputs to reference the upstream step whose output should be edited. Common uses: style transfer, background replacement, lighting adjustment, or inpainting. Key fields: inputs (upstream image step ID), model, prompt, reference_images

- id: styled_frame
  kind: image.edit
  provider: gmi
  model: gmi-edit-model
  inputs: [initial_frame]
  prompt: prompts/style-transfer.md

`video.gen`

Animate image or prompt inputs into a video clip. The upstream image.gen step is typically listed in inputs to provide the first frame; duration_s sets the clip length billed to the provider. Key fields: inputs, duration_s, model, prompt, candidates, tier

- id: initial_gen
  kind: video.gen
  provider: gmi
  model: seedance-2-0-260128
  inputs: [initial_frame]
  duration_s: 5
  prompt: prompts/initial-gen.md

Provider notes:

seedance-2-0-260128 — billed at $0.07/s
fal-ai/kling-video/v2.1/pro/image-to-video — billed at $0.28/s
fal-ai/veo3.1 — billed at $0.45/s
happyhorse-1.1-i2v — billed at $0.28/s
gemini-omni-flash-preview — billed at $0.10/s

`video.extend`

Continue a clip from its last frame. The inputs field references the step to extend from. Use repeat to define a variable-length extension chain — the extendCount parameter at run time controls how many copies are instantiated by expandRepeatSteps(). Use extend.mode to control whether each iteration chains from the previous clip’s last frame (chain) or always anchors to the original first frame (anchored). Key fields: inputs, duration_s, model, prompt, repeat, extend

- id: extend
  kind: video.extend
  provider: gmi
  model: seedance-2-0-260128
  inputs: [initial_gen]
  duration_s: 5
  repeat:
    min: 0
    max: 4
    default: 2
  extend:
    mode: chain

The planner expands this into up to 4 numbered steps: extend_1, extend_2, extend_3, extend_4. With extendCount: 2, only extend_1 and extend_2 are submitted.

`video.edit`

Revise a clip with a prompt or composition layer. Use this step to apply motion effects, overlay graphics, adjust pacing, or re-light a clip in post. References an upstream video.gen or video.extend step via inputs. Key fields: inputs, model, prompt, duration_s

- id: revised_clip
  kind: video.edit
  provider: gmi
  model: gmi-video-edit
  inputs: [initial_gen]
  prompt: prompts/revision.md

`video.upscale`

Produce a higher-resolution version of a clip. Typically placed after the final video.extend step and before stitch. Uses a dedicated upscale model variant. Key fields: inputs, model, duration_s

- id: upscaled
  kind: video.upscale
  provider: gmi
  model: seedance-2-0-260128-upscale
  inputs: [extend]
  duration_s: 5

Provider notes: seedance-2-0-260128-upscale is billed at $0.056/s.

`audio.tts`

Generate voiceover from text. The prompt file contains the spoken text with optional {VARIABLE} references for dynamic content. Output is a .wav asset that the stitch step mixes into the final video. Key fields: model, prompt

- id: voiceover
  kind: audio.tts
  provider: gmi
  model: tts-model
  prompt: prompts/voiceover.md

`audio.music`

Generate background music for the video. Describe the mood, genre, and tempo in the prompt file. Duration is synchronized with the video length at stitch time. Key fields: model, prompt, duration_s

- id: background_music
  kind: audio.music
  provider: gmi
  model: music-gen
  prompt: prompts/music.md
  duration_s: 30

`audio.sfx`

Generate sound effects triggered at specific moments in the video. Use the prompt file to describe the sound (e.g. “whoosh, cinematic impact”). The stitch step positions SFX assets at the correct timestamp. Key fields: model, prompt

- id: sfx_impact
  kind: audio.sfx
  provider: gmi
  model: sfx-gen
  prompt: prompts/impact-sfx.md

`keyframes`

Extract, score, or prepare frames for a downstream step. Use this step to isolate key moments from a video clip for reference-based generation, or to provide scored candidate frames to a video.edit step. keyframes and stitch are the only two “local” step kinds — quoteStep() returns $0 for both. Key fields: inputs, keyframes (provider-specific config record)

- id: extracted_frames
  kind: keyframes
  inputs: [initial_gen]

`stitch`

Combine all resolved assets into the final Zap artifact. This must be the last step in the pipeline. The inputs array lists every clip and audio asset to include; ordering determines the timeline sequence. stitch and keyframes are local steps — no provider API call is made and no cost is incurred. Key fields: inputs, stitch (ZapStitch config)

- id: stitch
  kind: stitch
  inputs: [initial_gen]
  stitch:
    engine: auto
    format: mp4
    quality: standard

ZapStitch Engine Options

Engine	Description
`auto`	Runtime selects the best available engine — prefers `hyperframes` when installed, falls back to `local`
`local`	FFmpeg-based assembly — always available, no extra dependencies
`hyperframes`	HTML composition via the HyperFrames CLI — requires `DESIGN.md` in the recipe root

HyperFrames Stitching

Use engine: hyperframes when the recipe needs HTML-based composition:

- id: stitch
  kind: stitch
  inputs: [upscaled, voiceover, background_music]
  stitch:
    engine: hyperframes
    format: mp4
    quality: standard
    fps: 24

HyperFrames recipes must include a DESIGN.md visual identity file before composition HTML is generated. At runtime, Zap writes a minimal temporary DESIGN.md automatically so that provider assets render through a compliant HyperFrames project.If the HyperFrames CLI is not installed or a generated composition check fails (npx hyperframes lint / validate / inspect), the runtime records the error on the step and falls back to the first resolved stitch asset. The run does not fail.

Model Rate Table

The following rates are used by quoteStep() (sourced from planner.ts). Models not in this table cost $0 (either local steps, mock models, or models without a declared rate).

Model	Billing	Rate
`fal-ai/flux/dev`	Per request	$0.03
`fal-ai/kling-video/v2.1/pro/image-to-video`	Per second	$0.28/s
`fal-ai/veo3.1`	Per second	$0.45/s
`gemini-omni-flash-preview`	Per second	$0.10/s
`happyhorse-1.1-i2v`	Per second	$0.28/s
`seedance-2-0-260128`	Per second	$0.07/s
`seedance-2-0-260128-upscale`	Per second	$0.056/s

For time-billed models, the formula is rate × duration_s. If duration_s is not set on the step, a default of 1 second is used for the estimate.

Step Kind Summary

Kind	Category	Local	Cost Basis
`image.gen`	Image generation	No	Per request
`image.edit`	Image transformation	No	Per request
`video.gen`	Video generation	No	Per second
`video.extend`	Video continuation	No	Per second
`video.edit`	Video revision	No	Per second
`video.upscale`	Video upscaling	No	Per second
`audio.tts`	Text-to-speech	No	Per request
`audio.music`	Music generation	No	Per request
`audio.sfx`	Sound effects	No	Per request
`keyframes`	Frame extraction	Yes	$0
`stitch`	Final assembly	Yes	$0

​Creative Pipeline Grammar

​image.gen

​image.edit

​video.gen

​video.extend

​video.edit

​video.upscale

​audio.tts

​audio.music

​audio.sfx

​keyframes

​stitch

​ZapStitch Engine Options

​HyperFrames Stitching

​Model Rate Table

​Step Kind Summary

Creative Pipeline Grammar

`image.gen`

`image.edit`

`video.gen`

`video.extend`

`video.edit`

`video.upscale`

`audio.tts`

`audio.music`

`audio.sfx`

`keyframes`

`stitch`

ZapStitch Engine Options

HyperFrames Stitching

Model Rate Table

Step Kind Summary