• Products
  • Solutions
  • Made with Unity
  • Learning
  • Support & Services
  • Community
  • Asset Store
  • Get Unity

UNITY ACCOUNT

You need a Unity Account to shop in the Online and Asset Stores, participate in the Unity Community and manage your license portfolio. Login Create account
  • Blog
  • Forums
  • Answers
  • Evangelists
  • User Groups
  • Beta Program
  • Advisory Panel

Navigation

  • Home
  • Products
  • Solutions
  • Made with Unity
  • Learning
  • Support & Services
  • Community
    • Blog
    • Forums
    • Answers
    • Evangelists
    • User Groups
    • Beta Program
    • Advisory Panel

Unity account

You need a Unity Account to shop in the Online and Asset Stores, participate in the Unity Community and manage your license portfolio. Login Create account

Language

  • Chinese
  • Spanish
  • Japanese
  • Korean
  • Portuguese
  • Ask a question
  • Spaces
    • Default
    • Help Room
    • META
    • Moderators
    • Topics
    • Questions
    • Users
    • Badges
  • Home /
avatar image
9
Question by Tails1942 · Jul 04, 2011 at 03:22 PM · automaticworkflowlipsync

Any way of "automatic" lip syncing?

Hello fellow Unity users

Me and my brother is in the process of making a game, that will contain a LOT of recorded dialogue. So we want to be able to lip sync, without having to do every single line by hand.

We don't care if it's top of the line lip syncing, it just has to move the characters mouth when it's talking, and stop when finished. Think like the old playstation era, where there were no lip, only mouth movement.

Are there any addon for this, maybe in the asset store, or where can I at least find it? Or are there an easy way to to it ourselves?

And do mind that we have limited money, so we possibly won't have hundreds of dollars to spare.


Are there at least a way where a character can open his mouth more/less depending on the volume of the sounds coming out? I could imagine this is a kinda easy script to make?

Comment
Add comment · Show 1
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image LegionIsTaken · Jul 04, 2011 at 04:32 PM 0
Share

That sounds awesome!

9 Replies

· Add your reply
  • Sort: 
avatar image
20
Best Answer

Answer by aldonaletto · Jul 05, 2011 at 03:39 AM

This script uses audio.GetSpectrumData to analyze the audio data and to calculate the instantaneous volume of a given range of frequencies. In order to use GetSpectrumData, we must supply a power-of-two sized float array as the first argument, which the function fills with the spectrum of the sound currently playing. Each element in this array contains the instantaneous volume (0..1) of its corresponding frequency, calculated as N * 24000Hz / arraySize where N is the element index.
The function BandVol(fLow, fHigh) below calculates the averaged volume of all frequencies between fLow and fHigh. In this case, where voice sounds must be analyzed, we can set the range to 200Hz - 800Hz - it will produce good results, although other ranges can be tested as well (voice sounds range from 150Hz to 3KHz). If bass sounds were to be used, for instance, we should use a lower range like 50Hz to 250Hz.
In order to test it, I used a simple object (defined in mouth) which have this Y position elevated proportionally to the output of BandVol. A variable called volume is used to set how much the mouth raises. You can change this and use the value returned by BandVol to control the mouth vertical scale, for instance.
This script must be added to the object which contains the Audio Source, and another object must be defined in the mouth variable. It plays the audio clip defined in Audio Source and moves the mouth up and down following the sound played. In order to reproduce several different sounds, you can use PlayOneShot(audioClip) instead of Play().
EDITED: PlayOneShot doesn't affect GetSpectrumData, like @FutureRobot observed in his answer below. In order to play different sounds, declare an AudioClip array and populate it with the clips in the Inspector. To play one of these clips, assign it to audio.clip and use the old and good Play() (array and function PlaySoundN included below):

 var sounds: AudioClip[]; // set the array size and the sounds in the Inspector    
 private var freqData: float[];
 private var nSamples: int = 256;
 private var fMax = 24000;
 private var audio: AudioSource; // AudioSource attached to this object


 function BandVol(fLow:float, fHigh:float): float {

     fLow = Mathf.Clamp(fLow, 20, fMax); // limit low...
     fHigh = Mathf.Clamp(fHigh, fLow, fMax); // and high frequencies
     // get spectrum: freqData[n] = vol of frequency n * fMax / nSamples
     audio.GetSpectrumData(freqData, 0, FFTWindow.BlackmanHarris); 
     var n1: int = Mathf.Floor(fLow * nSamples / fMax);
     var n2: int = Mathf.Floor(fHigh * nSamples / fMax);
     var sum: float = 0;
     // average the volumes of frequencies fLow to fHigh
     for (var i=n1; i<=n2; i++){
         sum += freqData[i];
     }
     return sum / (n2 - n1 + 1);
 }
  
 var mouth: GameObject;
 var volume = 40;
 var frqLow = 200;
 var frqHigh = 800;
 private var y0: float;
 
 function Start() {
 
     audio = GetComponent.<AudioSource>(); // get AudioSource component
     y0 = mouth.transform.position.y;
     freqData = new float[nSamples];
     audio.Play();
 }
 
 function Update() {
 
     mouth.transform.position.y = y0 + BandVol(frqLow,frqHigh) * volume;
 }
 
 // A function to play sound N:
 function PlaySoundN(N: int){
 
     audio.clip = sounds[N];
     audio.Play();
 }







Comment
Add comment · Show 12 · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image aldonaletto · Jul 05, 2011 at 04:29 AM 3
Share

I've made a few more tests, and got better results filtering the output of BandVol. I also inverted the movement - the mouth moved down ins$$anonymous$$d of up, like a human chin. The code below include a $$anonymous$$oving Average filter and the modified Update. If you want to test it, just delete the old Update and paste this code at the script's end.

 // moving average filter to smooth mouth movement
 
 private var sizeFilter: int = 5;
 private var filter: float[];
 private var filterSum: float;
 private var posFilter: int = 0;
 private var qSamples: int = 0;
 
 function $$anonymous$$ovingAverage(sample: float): float {
 
     if (qSamples==0) filter = new float[sizeFilter];
     filterSum += sample - filter[posFilter];
     filter[posFilter++] = sample;
     if (posFilter > qSamples) qSamples = posFilter;
     posFilter = posFilter % sizeFilter;
     return filterSum / qSamples;
 }
 
 function Update() {
 
     mouth.transform.position.y = y0 - $$anonymous$$ovingAverage(BandVol(frqLow,frqHigh)) * volume;
 }
avatar image Chris D · Jul 05, 2011 at 04:32 AM 0
Share

Sweet! Nice work @aldonaletto!

avatar image aldonaletto · Aug 21, 2011 at 12:39 PM 0
Share

After some tests in another case, I concluded that Unity samples sounds at 48000Hz, not 44100Hz as assumed previously in this answer. The only thing altered in the script is the initialization value of f$$anonymous$$ax: it changed to 24000 (was 22050 ). I fixed the answer text too - the frequency associated to element[N] is calculated by N*24000/arraySize (ins$$anonymous$$d of N*22050/arraySize).
The answer was edited already; this comment is for people that used the script with the wrong f$$anonymous$$ax value. The frequency error with the wrong value is about 9%, what makes no noticeable difference in this particular application (but may produce unacceptable errors in other more critical cases)

avatar image wakeupscreaming · Jul 19, 2012 at 11:45 PM 0
Share

Hi Aldonaletto,

I'm trying to figure out this script, even just to get a cube moving up or down. The function BandVol has "for (var i=n1; i]]] " at the end, which appears to be an error. Was there something else that worked?

avatar image aldonaletto · Jan 26, 2013 at 03:39 PM 1
Share

The max frequency we can digitalize is 50% of the sampling frequency - 48000 is the sampling frequency, thus the limit is 24000.

Show more comments
avatar image
1

Answer by Chris D · Jul 04, 2011 at 03:59 PM

Here's what I've found:

  1. Forum topic (discusses some alternatives)

  2. Script on the wiki (if you model the individual sounds as separate meshes, it looks like this allows you to smoothly transition from one state to the other)

If you don't particularly care about the accuracy of the animations, just rig your characters' mouths and play an animation any time they're supposed to speak.

Alternatively, bones are just transforms so you should be able to reference them like any other transform. Just have your script adjust the magnitude of the movement based on the audio clip's volume. See Al's note below.

Comment
Add comment · Show 5 · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image aldonaletto · Jul 04, 2011 at 06:32 PM 3
Share

@ChrisD, the last suggestion will not work: audio.volume is the volume you set, not the clip instantaneous volume. The instantaneous volume could be get from audio.GetSpectrumData - IF we knew how to use it, of course, what isn't true because the docs are too vague about this function...

avatar image Chris D · Jul 04, 2011 at 08:14 PM 0
Share

Ah, good call, thanks Al. I did find this forum thread mentioning GetOutputData, though. $$anonymous$$aybe it'd be worth fiddling (blindly experimenting) with that?

avatar image aldonaletto · Jul 04, 2011 at 08:36 PM 0
Share

Good suggestion. I'll try their script. With a little (or a lot of) luck maybe I put something to work.

avatar image Tails1942 · Jul 04, 2011 at 08:40 PM 0
Share

If you get anything to work, please share! Even though my brother and I know some programming, we're still kind of new to the whole unity scene.

avatar image aldonaletto · Jul 05, 2011 at 03:49 AM 0
Share

@ChrisD, thanks for the hints. I read the forum and also found an interesting tutorial at http://unity3dtutorial.com/unity-3d-tutorials/audio-based-procedural-levelgeneration-manipulation-in-unity-3/ which showed how to use GetSpectrumData. I've posted an answer which can do reasonably the mouth control based on the audio played.

avatar image
1

Answer by testure · Jul 04, 2011 at 06:25 PM

if I were doing lipsyncs, I would probably rig up some standard phoneme mouth shapes/blends (M-E-O-W), and write some sort of custom blending system that would interpret a magpie (or similar) script and blend to the appropriate mouth shape.

magpie, and other software like it, will take recorded dialogue and attempt to generate a 'timing script' based on phonemes it detects. My experience with it usually needed very little cleanup work- after that you have a pretty usable 'script' from which you can get all the timing info about your dialogue. From there you could easily write something that parsed the lipsync timing, and if a particular phoneme is detected, just blend to that shape.

This is all just theory- I haven't implemented it before, but if I were that's probably how I would do it if accuracy were any concern.

Comment
Add comment · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image
1

Answer by Futurerobot · Aug 02, 2011 at 03:45 PM

Great stuff aldonaletto, I was looking for exactly this! However the BandVol function doesn't seem to respond to audio played with the PlayOneShot, only clips directly assigned to the audio component. Any idea how to get around that?

In the spirit of sharing, I managed to get ok looking results by using the value returned by your script and hooking them up to an additive animation.

I basically made a mouth animation which starts closed, opens with an O-like shape, then goes on to a wider shout style pose. This animation is enabled from the start with a maximum weight and zero playback speed. Since it's additive it won't affect any objects at frame 0. I then used your BandVol function to control the normalized time of the additive animation. Since the additive animation has nonlinear movement and some variation in it, it gave a more organic result than if I were to rotate the jaw or maybe fade a pose in and out by controlling its weight.

I also used a cutoff value making the character close his mouth at low values. Encouraging a more "talky" motion as opposed to a half-open vibrating pose that can happen at lower volumes. And finally a Lerp so I could tweak how smooth the mouth movements should be. In the end it worked well for my cartoony flappy-mouth character.

The extra variables used:

 private float mouthCurrentPose;
 private float mouthTargetPose;
 public float voiceVolumeCutoff;
 public float mouthBlendSpeed;

The setup of the additive animation from Start()

 animation["anim_talk"].layer = 5;
 animation["anim_talk"].blendMode = AnimationBlendMode.Additive;
 animation["anim_talk"].speed = 0.0f;
 animation["anim_talk"].weight = 1.0f;
 animation["anim_talk"].enabled = true;
 animation["anim_talk"].wrapMode = WrapMode.ClampForever;

and the function running the mouthpose

    void LipSynch()
 {
     mouthTargetPose = BandVol(frqLow,frqHigh)* volume;
 
 // Tweak the voiceVolumeCutoff to get a good result, I used 0.1f myself
     if(mouthTargetPose<voiceVolumeCutoff)
         mouthTargetPose = 0.0f;
     
     mouthCurrentPose = Mathf.Lerp(mouthCurrentPose,mouthTargetPose,Time.deltaTime*mouthBlendSpeed);
 
 // I didn't bother with clamping the result since the addditive animation clamps itself.
 // Tweak the volume value to get results between 0.0 and 1.0 from your voice samples.
     animation["anim_talk"].normalizedTime = mouthCurrentPose;
 }


You can get better results if you tweak the volume and voiceVolumeCutoff to match each voiceclip.

Comment
Add comment · Show 11 · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image aldonaletto · Aug 02, 2011 at 09:15 PM 0
Share

Good job! $$anonymous$$atching animations to the sound spectrum is the best way to achieve more realistic results. Congratulations! If possible, post a movie in youtube and place a link here to show the results.
About PlayOneShot: it really doesn't affect the audio source like the assigned clip; other audio source features don't work as well, like isPlaying.
But it's easy to work around this: you can make an array, assign the audio clips to it in the Inspector (or read them from the Resources folder) and assign the desired clip to the audio source before playing it using Play() - kind of:

var sounds: AudioClip[]; // set the array size and clips at the Inspector

... // when you want to play the sound n: audio.clip = sounds[n]; audio.Play(); ... This works fine, even if you change the clip before the previous one has finished: the old clip just stops and the new one starts playing. Since isPlaying works as well, you can use it to know when the clip has ended.

avatar image khamael-2 · Jan 03, 2012 at 03:18 PM 1
Share

Hi guys,

I've manage to use this kind of solution to have lip sync in my Unity game. But I got a little problem. Sometimes, when my character is shouting for instance, the animation keeps on playing, and the mouth goes crazy! Check the image. $$anonymous$$y guess is that somehow the delta frame keeps on being added? I tried to truncate it, to disable the anim when the mouthCurrentPose reaches 1.. but without success. Any ideas?

thanks!! alt text

avatar image aldonaletto · Jan 03, 2012 at 03:37 PM 0
Share

Dear God! The guy in the picture looks like a The Ring victim! I'm a complete stupid on animations, but would try to clamp the value assigned to normalizedTime to 0..1:

 animation["anim_talk"].normalizedTime = $$anonymous$$athf.Clamp (mouthCurrentPose, 0, 1);
avatar image khamael-2 · Jan 03, 2012 at 03:48 PM 0
Share

Yeah.. I thought of clamping, but it doesn't work. my code is like this

function LipSync() { mouthTargetPose = $$anonymous$$ovingAverage(BandVol(frqLow,frqHigh) * volume); // Tweak the voiceVolumeCutoff to get a good result, I used 0.1f myself if(mouthTargetPose < voiceVolumeCutoff) mouthTargetPose = 0.0f;

    //mouthCurrentPose = $$anonymous$$athf.Lerp(0,mouthTargetPose,Time.deltaTime*mouthBlendSpeed);
    mouthCurrentPose = $$anonymous$$athf.Clamp (mouthTargetPose, 0, 1);
 
    sphere.transform.position.z = mouthCurrentPose;
    mouth.normalizedTime = mouthCurrentPose;        

}

The sphere is just to see it bouncing around, for 'debug', and mouth is the animationclip animation["anim_talk"]. It has like 15 frames, from mouth closed, mouth open, and mouth closed again. $$anonymous$$y guess is that when the sound is too loud, it passes and stays in the clamped valued, and somehow the delta keeps on being added?.. weird no, although useful for other torture anims ;)

avatar image khamael-2 · Jan 05, 2012 at 02:26 PM 0
Share

Hi there, another addition to my problem. I thought that another way to do some facial animation would be to apply some transforms in the bones, simple stuff like move the jaw or something. So I did a little script where I open or close the jaw (just by changing the transform.position.y of it, on keypress) it works fine. BTW, I store the y position of the jaw in the Update, and add the delta in LateUpdate - not sure if it's the proper way to do it, but it seems to work. Now the problem is when I have another animation running. When I trigger an animation that moves the head, if I then trigger move jaw script, say with jaw.position.y = oldy - 0.001, the jaw keeps on moving down till the end of the animation clip. What's happening here? Probably it's the same cause as with the additive animation! Any ideas????

Show more comments
avatar image
0

Answer by Hannibalov · Jun 27, 2012 at 11:32 AM

Hi,

I'm trying the here mentioned answers directly for the microphone input instead of a loaded audioclip and it doesn't work :( Is there any reason for that?

I'm using these functions (adapted from the ones mentioned here and in another post):

 private float GetVolume()
 {
     if(audio==null)
         return 0;
     float[] data = new float[samples];
     audio.GetOutputData(data, 0);
     
     //take the median of the recorded samples
     ArrayList s = new ArrayList();
     foreach (float f in data)
     {
         s.Add(Mathf.Abs(f));
     }
     s.Sort();
     return (float)s[samples / 2];
 }
 
 float fMax = 24000;
 private float HumanFreq(float fLow, float fHigh)
 {
     if(audio==null)
         return 0;
     float[] data = new float[samples];
     fLow = Mathf.Clamp(fLow, 20, fMax); // limit low...
     fHigh = Mathf.Clamp(fHigh, fLow, fMax); // and high frequencies
     // get spectrum: freqData[n] = vol of frequency n * fMax / nSamples
     audio.GetSpectrumData(data, 0, FFTWindow.BlackmanHarris); 
     int n1 = (int)Mathf.Floor(fLow * samples / fMax);
     int n2 = (int)Mathf.Floor(fHigh * samples / fMax);
     float sum = 0;
     // average the volumes of frequencies fLow to fHigh
     for (var i=n1; i<=n2; i++){
         sum += data[i];
     }
     return sum / (n2 - n1 + 1);
 }

GetVolume virtually always returns 0 (with random exceptions), and HumanFreq does not filter wheather it's a human voice or just noise. Am I missing something? I tried changing samples value, but with no effect.

Comment
Add comment · Show 1 · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image aldonaletto · Jul 21, 2012 at 05:31 PM 0
Share

@Hannibalov, you should post a new question - this one is way too crowded, and we can't answer to answers!

  • 1
  • 2
  • ›

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Welcome to Unity Answers

The best place to ask and answer questions about development with Unity.

To help users navigate the site we have posted a site navigation guide.

If you are a new user to Unity Answers, check out our FAQ for more information.

Make sure to check out our Knowledge Base for commonly asked Unity questions.

If you are a moderator, see our Moderator Guidelines page.

We are making improvements to UA, see the list of changes.



Follow this Question

Answers Answers and Comments

18 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

Automatic lightmap texture and material import 0 Answers

Automatic lip syncing detect human voice and set voice frequency ? 0 Answers

Version control workflow 9 Answers

DontDestroyOnLoad coding procedure 1 Answer

Understanding how to use Perforce with Unity 1 Answer

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Spaces
  • Default
  • Help Room
  • META
  • Moderators
  • Explore
  • Topics
  • Questions
  • Users
  • Badges