• Products
  • Solutions
  • Made with Unity
  • Learning
  • Support & Services
  • Community
  • Asset Store
  • Get Unity

UNITY ACCOUNT

You need a Unity Account to shop in the Online and Asset Stores, participate in the Unity Community and manage your license portfolio. Login Create account
  • Blog
  • Forums
  • Answers
  • Evangelists
  • User Groups
  • Beta Program
  • Advisory Panel

Navigation

  • Home
  • Products
  • Solutions
  • Made with Unity
  • Learning
  • Support & Services
  • Community
    • Blog
    • Forums
    • Answers
    • Evangelists
    • User Groups
    • Beta Program
    • Advisory Panel

Unity account

You need a Unity Account to shop in the Online and Asset Stores, participate in the Unity Community and manage your license portfolio. Login Create account

Language

  • Chinese
  • Spanish
  • Japanese
  • Korean
  • Portuguese
  • Ask a question
  • Spaces
    • Default
    • Help Room
    • META
    • Moderators
    • Topics
    • Questions
    • Users
    • Badges
  • Home /
avatar image
0
Question by carlitoselmago · Jul 22, 2016 at 08:17 AM · audiosourcespeechtts

Using Speechlib from SAPI (Microsoft text to speech API) as an AudioSource

Im building an app wich has a chatbot and uses SAPI for text to speech along with SALSA asset for lypsync. What i'm trying to acomplish is to create a live AudioSource that feeds directly from TTS audio output. I have succesfully acomplished this thru saving into wav files for each sentence and then loading the wav files in runtime to the gameobject that has the lypsync etc. This works, but the continious loading of wav files makes the app be slow, freeze each time it does that and even crash.

I know it's possible to make a live audiosource from a microphone on the computer. So what I want to do is something like that.

I tried what from my naive level of programmer would be the logic way. Simply connect the audiooutput stream from the TTS as a audisource audio clip, like this:

TTSvoice.AudioOutputStream = AudioSource.clip; and get this error:

error CS0029: Cannot implicitly convert type UnityEngine.AudioClip' to SpeechLib.ISpeechBaseStream'`SpeechLib.ISpeechBaseStream'

I know in Python you can connect audio objects from different libraries thru numpy converting audio to a standard raw array data. But I'm also kinda new to C# and Unity.

here's my code:

 using UnityEngine;
 
 using System.Collections;
 using SpeechLib;
 using System.Xml;
 using System.IO;
 using System;
 using System.Diagnostics;
 
 public class controller : MonoBehaviour {
 
     private SpVoice voice;
     public AudioSource soundvoice;
 
     // Use this for initialization
     void Start () {
 
         voice = new SpVoice();
 
         GameObject character = GameObject.Find("character");
         soundvoice = character.GetComponent(typeof(AudioSource)) as AudioSource;
 
         voice.AudioOutputStream = soundvoice.clip;
       
         StartCoroutine(talksome());
     }
     
     // Update is called once per frame
     void Update () {
 
         
 
     }
 
     IEnumerator talksome() {
         while (true)
         {
             counter++;
             string sentence = "counting " + counter;
             voice.Speak(sentence);
             print(sentence);
 
             voice.WaitUntilDone(1);
             yield return new WaitForSeconds(2);
         }
     }
 }
 





Comment
Add comment
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

2 Replies

· Add your reply
  • Sort: 
avatar image
0

Answer by kapor · Jun 15, 2017 at 12:03 AM

hi. you found any solution for this? @carlitoselmago thanks. g.

Comment
Add comment · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image
0

Answer by macanannym_unity · Sep 06, 2017 at 09:16 PM

if your trying to go from Microphone to Avatar, then you should know TTS includes the vismeme callbacks so that lip syncing seems more natural in apps that can pull that information (or .Net)

However, your live voice doesn't include this when it is read from an audiosource such as a microphone. In short, you really can't accurately do what you are asking, unless you have some crazy code that is capturing your voice, speech to texting it, deciphering phenoms to vismeme callbacks, and then sending the info to your other script that controls those vismeme blends.

That would be some task, since with what i'm doing, its about 1/2 delay to decipher what i have said, to look it up in the dictionary, the preform the associated task. And i got a pretty boss machine as well. (all SSD, 8 core, 16 gb, 970 Asus etc...) so its really how fast you can transpose your voice to text and then to the appropriate blends. Good luck, if anyone gets this to work in a quick manner, i'm sure we would all appreciate seeing the code! Hope this helps.

//pardon my geekspeek as I really don't know all the proper TTS terms, but i think this covers it.

Comment
Add comment · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Welcome to Unity Answers

The best place to ask and answer questions about development with Unity.

To help users navigate the site we have posted a site navigation guide.

If you are a new user to Unity Answers, check out our FAQ for more information.

Make sure to check out our Knowledge Base for commonly asked Unity questions.

If you are a moderator, see our Moderator Guidelines page.

We are making improvements to UA, see the list of changes.



Follow this Question

Answers Answers and Comments

5 People are following this question.

avatar image avatar image avatar image avatar image avatar image

Related Questions

How to use speech recognition / speech detection to trigger an audio source on an Android device 1 Answer

Microphone Noise Reduction speech enhancement 1 Answer

iOS and Android accessibility 1 Answer

character speech 1 Answer

proper C# casting for GetComponents on Audio Source? 1 Answer

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Spaces
  • Default
  • Help Room
  • META
  • Moderators
  • Explore
  • Topics
  • Questions
  • Users
  • Badges