Interactive Avatar 101: Your Ultimate Guide
Last updated August 8, 2024
Get ready to chat! We have recently launched HeyGen Interactive Avatars; these Avatars are optimized for real-time interaction, and can speak in real-time, making them the perfect virtual support agent, salesperson, translator, expert, or any number of other roles for your business.
Think of HeyGen Interactive Avatars as a digital worker, with a realistic and engaging human presence, that you can put to any task you can dream up. Intrigued? Let's dive in!
What is an Interactive Avatar?
HeyGen’s Interactive Avatar API is a technology that allows you to initiate ‘streaming’ video sessions in which a HeyGen Avatar interacts and can say text on command in real-time, with low latency.
It’s like a FaceTime call with an Avatar!
The most common use case with HeyGen’s Interactive Avatar s to connect it, via code, to a Large Language Model like ChatGPT. For example, you could initiate an Interactive Avatar video session, and then send text it text from ChatGPT for the Avatar to say. In this way, the Interactive Avatar acts as the ‘body’ of the Large Language Model, while the ‘brain’ and source of the interaction remains ChatGPT.
A developer can recreate our interactive demo experience with relatively little effort, adding custom branding and interaction patterns of their own based on your specific business needs
Interactive Avatar in Action: Real-Life Examples
You can check out our demo to see what an Interactive Avatar interaction is like:
labs.heygen.com/interactive-avatar
You can also see how Reply.io added an Interactive Avatar to their Chatbot solution by visiting their landing page and clicking the ‘Chat’ widget in the bottom right hand corner:
How to create an Interactive Avatar
To get started, please find the Labs tab on the bottom left side of the screen, and choose 'Interactive Avatar'.
There, your next step would be to click on Create an Interactive Avatar and follow the instructions. If you'd like to use one of our Public Avatars instead, we have different ones available to you free of charge!
Processing Time for Interactive Avatars
Please note that the output of your Interactive Avatar will not be immediate. Below are estimated processing times per tier.
Free Users: 4 to 7 days
Creator Tier: 3 to 5 days
Team Tier: 2 to 3 days
Enterprise: 24 hours
Recording footage best practices
Total Duration: 2 minutes, in 3 parts
1.) Listening (15 seconds)
- Purpose: To show active engagement.
- Engage with facial expressions (e.g., raise eyebrows, smile) to demonstrate you are attentively listening.
2.) Talking (90 seconds)
- Purpose: To convey your message.
- Speak clearly and confidently, ensuring your message is concise and to the point.
3.) Idling (15 seconds)
- Purpose: To show attentive presence.
- Maintain a neutral expression and simply nod occasionally, without additional facial expressions.
Key Differences:
- Listening: Show active engagement through facial expressions, as if you are listening attentively.
- Idling: Keep a neutral demeanor and only nod, demonstrating attentive presence without extra expressions.
Important:
Try to keep the same body-position for the whole 120 seconds of the input video!
How much does it cost to make an Interactive Avatar?
Each Interactive Avatar costs $49 per month.
How much does it cost to use my custom Interactive Avatar?
You can use your Interactive Avatar in multiple ways. We include a low-code 'Embed' snippet that you can use to add a chat experience with your Interactive Avatar to your webpage.
Alternatively, for technical users, you can deploy your Interactive Avatar in different mediums using HeyGen's Streaming API.
You can test the Streaming API for free by using your Trial Token to create sessions. Sessions are unique instances of the Interactive Avatar being displayed in a window on a website or app. There is no cost to creating sessions or using the Streaming API when the sessions are created with your Trial Token.
If you are ready to move beyond testing, and plan to integrate the Interactive Avatar on your website or in your product, you will need to purchase Streaming Credits. Streaming Credits cost $0.10 for every minute of the Interactive Avatar speaking, and when you complete your purchase of Streaming Credits, you receive access to an Enterprise API Token in your Space Settings. When you use the Enterprise API Token in concert with the Streaming API, you can create up to 100 concurrent sessions, with unlimited monthly usage and session length. If you anticipate needing more than that, please get in touch with our Sales team.
Technical integration of an Interactive Avatar
We have a starter project on GitHub that developers can use to get started with Interactive Avatars from scratch, as well as an NPM Package that can easily add Interactive Avatar functionality to existing web apps:
Interactive Avatar Starter Project on GitHub
Interactive Avatar FAQ
I want to add this to my site, but I can’t code. What do I do?
The Interactive Avatar API has resources and documentation that will enable most web developers to add the Interactive Avatar functionality to your website or product. Currently, adding Interactive Avatar functionality to your site requires some programming, as there is no easy “copy-pasting” our Interactive Demo to your site with any customization.
I want my users to be able to speak with the Interactive Avatar out loud. How do I do that?
- Currently HeyGen does not provide this feature; you can use a library such as OpenAI’s Whisper to convert an End User’s audio to text and send that text to their LLM.
- We have added Whisper ASR implementation to our GitHub demo project , but an OpenAI API Key is required to enable it. Please see the GitHub demo project Readme for instructions on where to put your OpenAI API Key to enable both Whisper functionality as well as Chat Completions API functionality.
Can the Interactive Avatar speak any language?
There are two answers to this question:
- Firstly, remember that your user isn’t speaking with our Interactive Avatar - the use is speaking with ChatGPT or whichever LLM (Large Language Model) you have connected to our Interactive Avatar. So if that LLM can converse in multiple languages, the Interactive Avatar will speak those languages too.
- The second answer is that the accent of the Interactive Avatar is actually controlled by the Voice that you assign it. The different Voice IDs on HeyGen come from different Text-to-speech providers, including ElevenLabs, OpenAI, Azure, and Google, and the accent and sound of the voices will differ between those providers.
- In keeping with the aforementioned analogy, the Interactive Avatar is only a ‘body’, and you use the Streaming API to instruct the Avatar what to say. Because this text can come from anywhere, the developer needs to write code to send the text from your GPT or LLM to the HeyGen Streaming API.
Can I use a Photo Avatar as an Interactive Avatar?
Only Instant Avatars and Studio Avatars are supported. Photo Avatars are not supported.
Which voice can I use with the Streaming API?
- You can use different *Voice IDs with the Interactive Avatars. To browse through the available Voice IDs, you can visit the AI Voices on the HeyGen Homepage.
- The developer can use the voices provided by HeyGen Public Voices, Private Voices that are created when they create avatars or Voices brought from 3rd party sites such as ElevenLabs.
My Interactive Avatar makes weird motions. How do I improve it?
- When filming footage to turn into an Interactive Avatar, specifically, it is recommended to: - Look directly at the camera - Restrict movement of hands, arms, and head to ‘micro movements’ - Have a pleasant, relaxed expression
- For additional Avatar creation tips please visit our how to create your first avatar page .
I want to redo my Interactive Avatare
We currently offer one remake of Interactive Avatars if you are not satisfied with the end result. This is the only remake that we offer free of charge. Please contact support@heygen.com for futher information