Introduction
Voice-controlled AI assistants are becoming increasingly more popular, and we incorporated some of that utility in Zeus, Please! A well-implemented voice control system can add a far deeper sense of interactivity and play to your game, so I will be going through how you can add this function to your projects.
Requirements and Conditions
Before we begin, there are certain conditions to implementing this system. Because my method uses the inbuilt Windows Speech system, it is only applicable for projects being built for Windows, on Windows. Secondly, the way the code works is based on recognizing certain phrases and performing preset actions based on these. If you are looking for a dictation system, this is not the tutorial for you. Lastly (and most obviously), you do need a microphone plugged in, obviously.
Step 1: Setting up Unity
You can either add this functionality to an existing project or create a new one – either works. The setting you have to change is under Player Settings, accessible from Edit -> Project Settings -> Player, or File -> Build Settings and pressing the “Player Settings” button under the platform selection list.
Within the player settings, we want to allow Unity to use our microphone. This is under the options specific to the “Universal Windows Platform” build. Under that tab, open up the “Publishing Settings”, and scroll down to the “Capabilities” list. Ensure that “Microphone” has a check mark so that Unity is given access.
Note that, despite the setting being in the “Universal Windows Platform” build, the application will still work even if you build using the “Windows Standalone” setting. You do NOT need to use the “Universal Windows Platform” build type.
Step 2: Programming
Now we can begin coding. For this tutorial we’re going to just send messages to the console, but you can extend the application’s behaviors very easily.
We’ll introduce a script into the scene, calling it “voiceRecognition” – depending on what we want it to do, we can attach it to whatever gameobject makes the most sense. In my case I’ll add it to an empty gameobject.
By default, the script won’t recognize any of the voice recognition methods, so we need to first add in the Windows Speech namespace. At the top, we add in the following line:
This gives us access to all the Windows Speech functions we need to get voice recognition to work.
Within the the script, we declare three variables. A "KeywordRecognizer", which will listen out for certain words or phrases, a string array, which holds those words and phrases, and a "ConfidenceLevel", which tells Unity how strict you want the detection to be. We add these before the "Start" function:
In this example, I've used "hello" as the word to be recognized, but you can also use phrases. I would also recommend using regular, existing words for simplicity. You can use proper nouns or words of your own creation, but you may need to spell things out phonetically (as we had to with Dionysus, which was stored as "dye oh nice us").
An array can hold more than one value; if I wanted, I could have the "KeywordRecognizer" listen for multiple phrases by adding entries to the array, so if I wanted it to also listen for "hi" and "hey", I could replace it with:
Now our variables are in place, we need to initialize the "KeywordRecognizer" and tell Unity what to do when it does recognize something. Under the Start() method, we add three more lines:
You'll likely get an error where "Listen" is written - we'll fix this in a moment, but first I'll explain what Unity is doing. The first line initializes the "KeywordRecognizer", and gives it our list of words to listen out for, and how strict the detection is. The second line tells Unity what to do when a matching word or phrase is detected. It's an event, so we need to use this "+=" syntax to give it a method to run - we haven't defined "Listen", which is why we're getting an error. The last line tells Unity to actually start listening.
Now, let's fix our error. We'll create a method called "Listen", which we need to structure like so:
The "PhraseRecognizedEventArgs" contains information about what Unity thought it detected, which needs to be passed on. Within this method we can have Unity do whatever it is we want - play an animation or update a score, for example. We can also use the information that the speech detection gives us, as I have in this example - sending it to the console. We could also use "speech.text" in a conditional statement to change behavior depending on what was said.
In the same way that we can start a "KeywordRecognizer", we can also stop it. This is useful if you don't want Unity to be listening at a certain point in the game. You can also call these stop/start functions in the "Listen" method, so if you want Unity to only listen for a phrase once, and then stop listening, you'd adjust the "Listen" function to be:
A final point: you can have more than one "KeywordRecognizer" active at once. This means you don't need to have all the behaviors in one function - what we did was to have one recognizer per action, and gave each its own array of words and phrases. This also means that you can have phrases that activate and deactivate certain other "KeywordRecognizer" functions. What we did for Zeus, Please! was to have a master activation phrase (which, of course, was just "Zeus please"), which would enable other recognizers depending on the current state of the game. Essentially, we built our own "Hey Siri"/"Ok Google" system!
I hope this helps you bring your Unity project to the next level. I'd love to hear if any of you does implement this, and if you have any questions or suggestions feel free to leave a comment!
Comments