is there any way to formulate the prompts that would imply . do action A, after action A is done do action B?
There seem to be several methods available, but some of them are difficult to use in an 8GB VRAM environment:
Wan2.2 RapidBase I2V: sequential actions, prompt weights, and continuity-safe A → B workflows
Short answer:
Yes, you can write prompts like “do action A, then after A is done, do action B.” The model can understand that language. But in a normal single-prompt I2V workflow, that instruction is usually a soft temporal suggestion, not a reliable frame-accurate command.
For the current RapidBase workflow, the safest ranking is:
1. One-clip two-beat prompt
2. Two clips with a handoff frame
3. Neutral overlap + short crossfade
4. FLF2V bridge clip
5. Prompt Relay
6. Prompt Schedule / FizzNodes
For prompt weights:
Avoid:
(((action:1.9)))
Prefer:
(same face and identity:1.10)
(preserve exact face:1.10)
(static camera:1.10)
(tiny natural smile:1.05)
The key rule is:
Weight preservation more than action.
The current workflow is working because it preserves the source image. Anything that pushes too hard toward complex action can also push the model into repainting, hallucination, or face drift.
1. Does the model understand “A, then B”?
It can understand the wording, but it does not necessarily execute it as an exact timeline.
A prompt like this is understandable:
The same person first blinks once, then after a brief pause makes a tiny natural smile.
But in a normal I2V generation, the text prompt conditions the whole clip. It is not automatically split into exact frame ranges like:
frames 0-16:
action A
frames 17-33:
action B
So the model may interpret “first A, then B” loosely.
Possible outcomes:
| Prompt | Possible model behavior |
|---|---|
blink once, then smile |
blink and smile happen in the right order |
blink once, then smile |
smile starts before the blink finishes |
blink once, then smile |
only the smile happens |
look down, then look back |
gaze drifts vaguely instead of following exact order |
A then B then C |
one action is skipped or the face starts drifting |
This is normal for a single-prompt video model. The model sees the whole instruction, but it is not a strict animation timeline unless you use timeline-control tools.
Useful background:
- How to get the most out of prompts for WAN models
- Kijai ComfyUI-PromptRelay
- RunComfy Wan2.2 Prompt Relay workflow
2. Best first method: one-clip two-beat prompting
For the current RapidBase workflow, this is the best first method.
It does not add nodes, LoRAs, bridge models, scheduling tools, or extra VRAM load. It also protects the main thing the current setup is good at:
same source image
same face
same lighting
same texture
same background
low hallucination
The limitation is that A → B order is only approximate.
Good use cases
Use one-clip two-beat prompts for small actions:
blink once -> tiny smile
gentle breathing -> blink once
look slightly downward -> return eyes to camera
tiny smile -> neutral expression
eyes shift slightly left -> eyes return to camera
neutral expression -> tiny smile
Bad use cases
Avoid large or multi-stage sequences:
turn head -> talk -> raise hand
walk forward -> gesture -> camera zooms in
look away -> laugh -> turn back
large smile -> speaking -> hair blowing
pose change -> lighting change -> background reaction
Each extra action increases the chance of:
face drift
changed mouth shape
changed eye shape
new lighting
new camera angle
background mutation
AI-looking repainting
3. Good wording for “after A is done, do B”
Use completion language, not just a loose list.
Weak:
blink and smile
Better:
first blinks once, then after the blink is complete, slowly forms a tiny natural smile
Good sequencing phrases:
first <action A>, then after a brief pause <action B>
after <action A> is complete, <action B>
begins still, then <action A>, then settles into <action B>
first holds a neutral expression, then gradually <action B>
after returning to neutral, <action B>
Avoid vague or overloaded phrasing:
blink and smile naturally
perform a sequence of expressions
react emotionally
do a cute expression
move seductively
act naturally
Vague words invite the model to improvise. Improvisation is where identity drift usually starts.
4. Practical one-clip formula
Use this structure:
[identity lock] + [starting state] + [action A] + [pause/settle] + [action B] + [camera lock] + [scene lock]
Example:
The same person from the source image keeps the exact same face and identity. The video begins with a calm neutral expression. First, the person gently blinks once. After the blink is complete, the person slowly forms a tiny natural smile. Preserve the same hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No zoom. No pan. No scene change.
This is better than:
she blinks then smiles
because it tells the model:
who must remain the same
what state to start from
what action comes first
what happens after
what must not change
5. One-clip two-beat prompt templates
Safest A → B template
The same person from the source image first blinks once, then after a brief pause makes a tiny natural smile. Preserve the exact same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No zoom. No pan. No scene change. Subtle natural motion only.
More explicit timing template
The video begins with the same person holding still. First, the person gently blinks once. After the blink is complete, the person slowly forms a tiny natural smile. Preserve the exact same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No scene change.
Face-first template
Preserve the exact same face and identity throughout the video. The same person first blinks once, then after a brief pause makes a tiny natural smile. Same hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No zoom. No scene change.
Short template
Same person, same face and identity. First one subtle blink, then a tiny natural smile. Static camera. Same lighting and background.
Very safe template
The same person keeps the exact same face and identity throughout the video. First, one small blink. Then, a tiny natural smile. Same hairstyle, clothing, lighting, colors, camera angle, and background. Static camera.
6. When a single prompt is not enough
If exact order matters, use two clips.
Do not do this:
Clip 1:
original source image -> action A
Clip 2:
original source image -> action B
That creates two independent clips from the same original starting point. Clip 2 does not know where Clip 1 ended.
Better:
Clip 1:
original source image -> action A only
Handoff frame:
clean stable frame near the end of Clip 1
Clip 2:
handoff frame -> action B only
This is the most practical way to get reliable A → B ordering without adding complex nodes.
7. Handoff-frame workflow
Process
1. Generate Clip 1 with action A only.
2. Inspect the last 3-10 frames.
3. Do not blindly use the final frame.
4. Pick the cleanest stable frame:
- best face
- least blur
- stable lighting
- stable background
- expression suitable for the next action
5. Save that frame as PNG.
6. Use it as the source image for Clip 2.
7. Prompt Clip 2 for action B only.
8. Keep settings consistent:
- same resolution
- same FPS
- same VAE
- same text encoder
- same sampler
- same scheduler
- same CFG
- same steps
- same prompt style
Clip 1 example
The same person gently blinks once, then returns to a calm neutral expression. Preserve the same face, identity, lighting, clothing, colors, camera angle, and background. Static camera.
Clip 2 example
The same person begins from a calm neutral expression, then slowly forms a tiny natural smile. Preserve the same face, identity, lighting, clothing, colors, camera angle, and background. Static camera.
This is more reliable than trying to force a complex sequence into one prompt.
8. Neutral overlap and crossfade
If using two clips, make the join happen during a neutral moment.
Bad join:
Clip 1 ends during a blink.
Clip 2 starts with a smile.
Better join:
Clip 1 ends after returning to neutral.
Clip 2 starts from the neutral handoff frame.
If the join is slightly visible, use a short crossfade.
Typical overlap:
4-8 frames
same FPS
same resolution
same color settings
same encoding settings
FFmpeg example:
ffmpeg \
-i clip1.mp4 \
-i clip2.mp4 \
-filter_complex "xfade=transition=fade:duration=0.25:offset=2.75" \
-c:v libx264 -crf 18 -preset slow \
output.mp4
offset must be adjusted to match the length of Clip 1 and the desired transition point.
Reference:
Important:
A crossfade can hide a small seam. It cannot fix a true face, lighting, or background mismatch.
If the face is different between the clips, a crossfade may create ghosting or a double-face dissolve.
9. FLF2V bridge clip
FLF2V means First-Last Frame to Video.
Instead of simply crossfading Clip A into Clip B, you provide:
first frame = stable end frame of Clip A
last frame = stable start frame of Clip B
Then the model generates the transition between them.
Concept:
Clip A:
source -> action A
Clip B:
handoff/source -> action B
Bridge:
first frame = stable end frame of Clip A
last frame = stable start frame of Clip B
prompt = smooth subtle transition, same face, same lighting, static camera
Why it can help:
more natural transition than crossfade
can reduce a sudden jump between two clips
uses actual visual endpoints
Why it may not be ideal for the current 8GB RapidBase workflow:
separate workflow family
may be heavier
may not preserve the same RapidBase look
may introduce bloom or airbrushed style
may require more setup and testing
Use FLF2V only if:
Clip A is good.
Clip B is good.
The join is visibly bad.
A simple crossfade is not good enough.
The bridge can be short.
References:
- ComfyUI official Wan2.2 guide
- Comfy Wan2.2 14B FLF2V workflow
- Comfy blog: Wan2.2 FLF2V native support
10. Prompt Relay
Prompt Relay is closer to the real solution for “A happens in one segment, B happens in another segment.”
Instead of relying on a single prompt, Prompt Relay routes different prompts through different temporal segments.
Concept:
Global prompt:
same person, same face, same identity, same lighting, same background, static camera
Segment 1:
blink once
Segment 2:
tiny natural smile
Why it is attractive:
A and B happen inside one timeline
less independent-clip continuity drift
global identity/camera constraints can stay active
different segments can receive different action prompts
Why it should be treated carefully:
changes the workflow structure
may not plug cleanly into the current RapidBase GGUF workflow
may increase complexity
8GB behavior is uncertain
could break the source-fidelity look
Do not add it to the working workflow directly. Test only in a duplicate workflow.
References:
11. Prompt Schedule / FizzNodes
Prompt scheduling is the general concept of changing prompt conditioning over time.
Concept:
Frames 0-16:
same person gently blinks once
Frames 17-33:
same person slowly forms a tiny smile
Why it may help:
more explicit temporal control
frame/segment-based prompt changes
better than hoping a single prompt follows order
Why it is not the first recommendation here:
not guaranteed to fit the current RapidBase GGUF workflow
can change conditioning behavior
may break the current source-fidelity look
adds complexity
References:
12. Recommended method ranking
| Rank | Method | A → B reliability | Source fidelity | 8GB friendliness | Recommendation |
|---|---|---|---|---|---|
| 1 | One-clip two-beat prompt | Medium | High | High | Try first |
| 2 | Two clips + handoff frame | High | Medium-high | High | Best practical method |
| 3 | Two clips + neutral crossfade | Medium-high | Medium-high | High | Good polish |
| 4 | FLF2V bridge | High for transition | Medium | Medium-low | Separate experiment |
| 5 | Prompt Relay | High conceptually | Unknown | Unknown | Advanced experiment |
| 6 | Prompt Schedule / FizzNodes | Medium-high conceptually | Unknown | Medium | Experimental |
Best practical rule:
simple A -> B:
use one-clip two-beat prompt
strict A -> B:
use two clips with a handoff frame
smooth transition:
use handoff frame + optional crossfade
true timeline control:
test Prompt Relay or Prompt Schedule only in a duplicate workflow
13. Prompt weights: do they work?
Yes, ComfyUI prompt weights can work.
Common syntax:
(phrase:1.2)
Plain parentheses also increase weight. ComfyUI’s CLIPTextEncode documentation says plain parentheses apply a default weight of 1.1, and the ComfyUI Community Manual says nested weights multiply.
Examples:
(phrase)
roughly increases emphasis
(phrase:1.2)
explicit weight
((phrase:1.2):0.5)
nested weights multiply
References:
- ComfyUI CLIPTextEncode documentation
- ComfyUI Community Manual: Text Prompts
- ComfyUI Wiki: prompt weighting
- ComfyUI Wiki: basic prompt syntax
14. Is (((weighted prompts:1.9))) useful here?
Probably not. For this workflow, it is more likely to hurt than help.
Avoid:
(((turns head and smiles:1.9)))
Avoid:
(((first blinks then smiles:1.9)))
Avoid:
(((action A then action B:1.9)))
Why? Because a huge action weight tells the model:
This action matters more than preserving the source image.
That can cause:
face drift
changed facial geometry
changed skin texture
changed lighting
hallucinated details
mouth/teeth weirdness
background changes
overcooked motion
loss of source fidelity
The current RapidBase workflow works because it is conservative. Heavy action weights fight that.
15. Better prompt-weight strategy
Do not heavily weight the action. Lightly weight preservation.
Better:
The same person first blinks once, then after a brief pause makes a tiny natural smile. (Preserve the exact same face and identity:1.15). Same hairstyle, clothing, lighting, colors, camera angle, and background. (Static camera:1.10). No zoom. No scene change.
Risky:
The same person (((first blinks then smiles:1.9))). Same face and background.
The first prompt says:
identity and camera are important
action is small
do not repaint
The second prompt says:
force this action even if the model has to invent
For face permanence, that is the wrong priority.
16. Suggested weight ranges
| Weight | Use |
|---|---|
1.00 |
normal baseline |
1.05 |
tiny emphasis |
1.10 |
safe emphasis |
1.15 |
useful emphasis for identity/static camera |
1.20 |
upper normal test |
1.25 |
mild stress test |
1.35 |
risky; use sparingly |
1.50+ |
likely too strong |
1.90 |
avoid for source-faithful I2V |
For this workflow, use mostly:
1.05-1.20
Maybe test:
1.25
Avoid:
1.50+
1.90
triple-parentheses action forcing
17. What to weight
Good things to weight
(same face and identity:1.10)
(preserve exact face:1.10)
(preserve source image:1.10)
(static camera:1.10)
(no scene change:1.05)
(same lighting and background:1.10)
(tiny natural smile:1.05)
(one subtle blink:1.05)
Risky things to weight
(turns head:1.4)
(speaks:1.4)
(laughs widely:1.4)
(raises hand:1.4)
(hair blowing:1.4)
(camera zooms in:1.4)
Very risky
(((wide smile:1.9)))
(((speaking:1.9)))
(((turning head:1.9)))
(((complex action sequence:1.9)))
Large facial motion and mouth motion are exactly where face permanence usually breaks.
18. Weighted A → B examples
Safe weighted A → B prompt
The same person from the source image first blinks once, then after a brief pause makes a tiny natural smile. (Preserve the exact same face and identity:1.15). Same hairstyle, clothing, lighting, colors, camera angle, and background. (Static camera:1.10). No zoom. No pan. No scene change.
Slightly stronger action prompt
The same person begins still, then (gently blinks once:1.05), then slowly forms a (tiny natural smile:1.10). Preserve the exact same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No scene change.
Face-first prompt
(Preserve the exact same face and identity:1.15). The same person first blinks once, then slowly forms a tiny natural smile. Same hairstyle, clothing, lighting, colors, camera angle, and background. (Static camera:1.10). No zoom. No scene change.
Minimal weighted prompt
(Same face and identity:1.15). First one subtle blink, then a tiny smile. Same lighting and background. (Static camera:1.10).
19. What not to do
Avoid this:
(((The person first blinks, then smiles, then turns their head, then speaks:1.9)))
That stacks three problems:
too many actions
too much weight
weighting the part that causes identity drift
Also avoid:
First she blinks, then smiles, then speaks, then turns her head, while the camera zooms in and the lighting becomes cinematic.
That asks the model to solve:
facial motion
mouth motion
head rotation
camera motion
lighting change
identity preservation
background stability
That is too much for a source-faithful RapidBase clip.
20. Practical A → B workflow
Step 1 — choose the smallest version of the action
Instead of:
turns head and smiles
use:
tiny eye movement and tiny smile
Instead of:
speaks
use:
subtle mouth movement
Instead of:
laughs
use:
tiny natural smile
Step 2 — write a two-beat prompt
The same person first <action A>, then after a brief pause <action B>. Preserve the exact same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No scene change.
Step 3 — add light preservation weights
(Preserve the exact same face and identity:1.15)
(Static camera:1.10)
Step 4 — batch seeds
8-16 seeds
same prompt
same settings
short preview
Pick by:
1. face permanence
2. correct order
3. natural motion
4. prompt obedience
Step 5 — split into two clips if order fails
If the model keeps blending A and B, use:
Clip 1:
action A only
Handoff:
clean stable frame near the end of Clip 1
Clip 2:
action B only
21. Best practical recommendation
For the current RapidBase workflow:
Use one-clip two-beat prompts first.
Use "first A, then after a brief pause B."
Keep A and B very small.
Batch seeds.
Weight preservation, not action.
Avoid 1.9 weights.
Use handoff-frame two-clip generation when strict order matters.
Only test Prompt Relay / scheduling in a duplicate workflow.
22. Example final prompts
Blink → smile
The same person from the source image first blinks once, then after a brief pause makes a tiny natural smile. (Preserve the exact same face and identity:1.15). Same hairstyle, clothing, lighting, colors, camera angle, and background. (Static camera:1.10). No zoom. No pan. No scene change.
Look down → return gaze
The same person from the source image first looks slightly downward with only a tiny eye movement, then returns the eyes to the camera. (Preserve the exact same face and identity:1.15). Same hairstyle, clothing, lighting, colors, camera angle, and background. (Static camera:1.10). No zoom. No scene change.
Neutral → tiny smile
The video begins with the same person holding a calm neutral expression. Then the person slowly forms a tiny natural smile. (Preserve the exact same face and identity:1.15). Same hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No scene change.
Breathing → blink
The same person from the source image keeps the exact same face and identity. The person gently breathes with subtle natural motion, then blinks once after a brief pause. Same hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No zoom. No scene change.
23. Reference links
Wan prompting / CFG behavior
Rapid / AIO background
FLF2V / Wan2.2 workflows
- ComfyUI official Wan2.2 guide
- Comfy Wan2.2 14B FLF2V workflow
- Comfy blog: Wan2.2 FLF2V native support
Prompt Relay / timeline control
Prompt scheduling
Prompt weighting
- ComfyUI CLIPTextEncode documentation
- ComfyUI Community Manual: Text Prompts
- ComfyUI Wiki: prompt weighting
- ComfyUI Wiki: basic prompt syntax