Deep Dive: VEO 3 (I spent $400 testing prompts so you don't have to)
Recently I’ve been spending a lot of time with Veo 3, Google’s new video generation model. I’ve even experimented with posting some of the clips and have picked up over 100k views from just 15-20 posts.
In addition to being an incredible video generation model the real breakthrough here is that it generates audio to go along with the video. This is the first model to have done this (and done it well) which is part of why it’s taken over the video generation meta.
Whether you know it or not you have probably seen a Veo 3 video in your social media feed, whether its the viral “cloud cutting” asmr, animal olympics, or a fake news clip.




It really is incredible how quickly its started to infiltrate social media and the creative ways people are using it.
The current limitation on AI video right now is cost.
Using the highest quality Veo 3 model will run about $1 per 8 second video (so experimentation is costly!).
When you add in that you usually have to work through several generations to get the right video this can really add up (which is how I’ve burned through a couple hundred bucks pretty quickly).
Cost is something that I believe will be solved pretty quickly though.
Given how fast image generation improved, I think we’ll have affordable (<0.25$/video) models that can produce Veo 3 level media within in 6 months.
Here are a couple examples:
Capybara Olympics
Cloud Cutting ASMR
And people are getting millions of views from posting clips just like this on TikTok, Shorts…etc.
What I have learned so far
There is a lot of noise and different opinions out there about prompting structure and techniques for Veo 3. Some of it works and is helpful but I’ve found that it really just boils down to a couple key items:
Lead your prompt with the “style” of video you want. Say things like:
“create a selfie vlog style video of”
“Hyper realistic live action sports footage”
“4K cutting asmr style video”
Be as detailed as possible
Getting viral results really comes from the surprising details like “crackling electricity when cutting into a cloud” or “the corgi does a perfect varial kickflip and lands it into a rail grind”
When you dont provide details Veo 3 will fill in the blanks and it tends to produce more generic results (which makes sense)
Negative prompting helps
Call out specific details you don’t want like “no subtitles”, “no slow motion”, or “no camera movement”
Character consistency is pretty good if you use the same description
One of the things I was confused about at first is how people got such consistent characters in their Veo 3 clips. What I found is that you will get similar characters if you provide the same description in each prompt.
So, if you say “create a selfie vlog with a realistic bigfoot camping in the woods”, reusing “realistic bigfoot” should get you similar looking characters in future clips.
I’ve found even small changes in the character description (“realistic bigfoot” vs “realistic bigfoot wearing glasses”) will break this though, so be consistent.
Examples and Prompts
Prompt: locked/fixed camera shot low street-level camera angle down a narrow Asian alley at dusk. Paper lanterns sway, puddles ripple under steady rain, neon kanji signs flicker and reflect on wet cobblestones. A lone umbrellaed pedestrian crosses frame once, droplets cascading off edges. Style tags: Blade-Runner palette, high-contrast reflections, 24 fps slight motion blur.
Prompt: Highly realistic ultra-8 K live-action POV video of you hiking a narrow alpine trail through a lush, Switzerland-style valley at golden hour.
The camera is head-level and steady; snow-dusted wildflowers frame the path while crystal lakes mirror the sky. Above, entire granite peaks levitate in slow rotation—sheer cliffs floating weightlessly, waterfalls spilling from their bases and vaporizing into drifting mist before they reach the meadow below. Cowbells echo from distant slopes, but no animals are visible—only the gentle clink carried on crisp mountain air.
Each boot step sounds crisp against gravel, and your trekking pole taps the earth with satisfying cadence. Wind tousles the grass, sending shimmering seedheads across the trail; shafts of sunlight break through cotton clouds, catching rainbow spray beneath the hovering waterfalls. No background distractions—just your forearms, the path ahead, monumental floating mountains, and immersive cinematic framing in a serene, surreal alpine world.
Prompt: Highly realistic ultra-8 K ASMR video of a human hand slicing a hyper-detailed, fiercely active galaxy-space cloud on a wooden cutting board. The cloud seethes like a living nebula—deep indigo and magenta swirls expand and contract in slow-motion breaths, cosmic dust plumes drift off the surface, and thin strands of violet plasma lightning arc continuously through star-speckled vapor. The camera is in close-up with shallow depth of field, capturing cinematic lighting and intricate cloud textures. The knife moves smoothly and deliberately, making three separate satisfying cuts – one after another – creating 3 to 4 clean, evenly spaced slices. Each stroke feels precise and crisp, with realistic slicing sound and natural motion. The sliced pieces gently separate, each revealing shimmering constellations and glowing stardust inside the cut faces. No background distractions, same iconic camera angle as professional food ASMR videos. No dialogue, text, or extra props—just galaxy cloud, knife, board, and the static high-end food-prep ASMR framing.
Prompt: highly realistic, 8k live stream olympic sports video, two capybaras boxing, filmed in live boxing style with an announcer, intense action, realistic sound effects.
Prompt: Selfie-style vertical phone footage, POV of an explorer wearing the iconic bright-yellow hazmat suit of Backrooms lore, holding the phone close. Footage starts at the top of a dimly-lit waterslide within the Backrooms; yellow-tiled walls visible behind, fluorescent lights flickering ominously. The vlogger cheerfully waves to the camera, muffled voice enthusiastically says, "Hey guys, finally checking out the Backrooms waterslide—let’s go!" Quickly pushes off down the slide at 2 seconds, accelerating slightly as water rushes beneath. Expression quickly shifts at around 4 seconds to visible anxiety; distorted echoing laughter emerges faintly from deeper within the tunnel behind. By 6 seconds, vlogger glances back nervously, revealing multiple shadowy humanoid silhouettes swiftly entering the slide behind them, reflected dimly in the visor of the hazmat suit. The vlogger begins panicking and blurts out, "Oh god, there's someone behind me—" before footage abruptly glitches into intense VHS static distortion at exactly 8 seconds, cutting off mid-sentence.
Style: Typical "Backrooms tapes" aesthetic; grainy VHS noise, washed-out yellow-teal color grade, subtle chromatic aberration, prominent fluorescent bloom and diffusion effects.
Ambiance: Continuous gentle rushing water sound, buzzing fluorescents, distant distorted laughter echoes becoming gradually louder.
Background Audio: Rapid breathing increasingly panicked through hazmat respirator, rustling hazmat suit fabric, sudden VHS static burst aligned exactly with final scare.
Conclusion
So as you can see from the examples, there are many different ways to prompt and ultimately there is a quite a bit of randomness involved in the generation. Sometimes I can try a prompt and it provides a terrible result and then I try it again and it’s perfect.
Because Veo 3 is so expensive right now it helps to go in with some knowledge on how to prompt it to so you waste less credits but I wouldn’t overthink it either:
lead with the style
be specific
leverage consistent characters
call out what you don’t want
Hope this was helpful!