V3 Instant Avatars, are created from a short user-recorded or uploaded video and preserve the original background and natural movement.
They deliver perfectly lip-synced narration using a cloned voice, synthetic voice, or recorded audio, enabling rapid production of authentic, professional-looking video content at scale.
Generated in Full HD (1080p) with visible body and hand movements, these avatars are ideal for fast offline video creation, but are not supported for streaming or use with D-ID Agents.
How to create a V3 Instant Avatars
Log in to the D-ID Creative Studio and go to the Avatars section in the side menu.
Click “Create Avatar”, then select the "Create with a video" option.
Submit your footage — a 1–2 minute video of yourself speaking naturally. You can upload a pre-recorded clip or use your webcam to record live.
-
- The system will learn your body and head movements from the video.
If you read a script aloud, it can also clone your voice.
Closely follow the video shooting guidelines:
- Keep your face visible at all times
- Record in a quiet, well-lit environment
- Look directly at the camera
- Pause between sentences with your mouth closed
4. Read the Consent Statement provided in the flow, including the three randomly generated words at the end of the script.
5. Finalize and submit your footage.
Your avatar will begin processing. On average, 1 minute of video takes 4–10 minutes to generate
Advantages of V3 Instant Avatars
- Realistic, full-body avatars created quickly
- Adaptive model trained on many avatars and fine-tuned to match your look
- Requires only short video input — no special equipment needed
Common reasons for failed generation
- Consent not validated (e.g., speech recording doesn't match the provided script or the face is not verified)
- Celebrity content triggers moderation
- Face not detected in footage