Annoucements, product updates, and technical posts from the Serenade team.
Software development involves much more than just writing code, and Serenade aims to bring voice to the entire development process. As a developer, chances are you spend a lot of time in a web browser reading documentation, browsing code on GitHub, and looking up answers on Stack Overflow. Today, we’re excited to bring voice to all of these workflows with the launch of Serenade for Chrome.
Serenade for Chrome brings Serenade’s powerful voice commands to the web browser. Navigation is as simple as saying
open stack overflow or
back, you can manage tabs with commands like
new tab or
next tab, and text can be input with commands like
type hello—the same commands you’re already accustomed to using in Atom and VS Code.
Here’s a demo of Serenade for Chrome in action:
Serenade for Chrome also introduces the new
show command, which can be used to show selectable links, inputs, and code. For instance, by saying
show code followed by a number, you can copy a block of code from a Stack Overflow answer or GitHub gist, then paste it into your editor by just saying
paste. Or, you can use
show links followed by a number to navigate link-heavy pages.
All of Serenade’s text editing commands—like
delete next two words, and
copy previous line—are available in Chrome as well, whether you’re typing a GitHub search or Gmail reply.
We’re excited to hear your feedback! If you run into any issues or have ideas for features, don’t hesitate to reach out in the community channel.
Today, we’re excited to launch a new version of the Serenade speech engine after several months of development. Our speech engine is now faster and more powerful than ever before, which should boost the productivity of every developer using Serenade.
The new speech engine uses a state-of-the-art acoustic model and language model with significantly better performance than our previous models. These new models are trained on a much larger set of data, which means the new engine should be more accurate across a wider variety of pronunciations. We’ve also introduced a dedicated model for noise detection, which enables Serenade to handle background noise more effectively than before.
Perhaps the biggest change in our new speech engine is that it’s able to use context from the file you’re editing. For instance, let’s say you’re editing a function, and a variable called “docusaurus” is in scope. Without any context, Serenade would probably rank the word “docusaurus” fairly low, since it’s not a common word. But, with the context from the code you’re working with, Serenade can learn that “docusaurus” is actually much more likely and rank it at the top of the alternatives list. So, as you speak the names of variables, functions, classes, etc., and Serenade will know what you mean much more often.
Let’s talk results. In order to measure accuracy, we look at recall metrics, which measure how frequently the correct transcript was found in the top n results. For instance, recall@5 measures how frequently the correct transcript appeared in the first 5 results. Of particular importance is recall@1, which essentially measures how frequently the first result was correct, meaning no clarification commands were needed, and the developer could continue with their workflow.
With our new engine, we’re seeing significantly higher recall metrics across the board.
recall@1: 35% reduction in error rate
recall@5: 57% reduction in error rate
recall@10: 60% reduction in error rate
In addition to having significantly higher accuracy, our new speech engine is much faster. Previously, live transcript results would only appear every ~700ms due to limitations in our speech processing and streaming backend. Now, our speech engine is able to respond much more quickly, using smaller chunks of audio, so live results will appear much more frequently. You should also see a much shorter delay between when you finish speaking a voice command and when the result appears in your editor, which helps keep you in your development flow.
speech decoding speed: ~50% faster
end-to-end speed: ~60% faster
We hope these changes will help you be more productive than ever when coding with voice. If you have any questions or feedback, don’t hesitate to reach out to us in the community channel.
Last year, I developed a repetitive strain injury, commonly known as an RSI, in my wrists. With this condition, typing at a keyboard for even a few minutes caused immense hand pain—years of sitting at a computer for 8+ hours a day finally caught up to me. Suddenly, it seemed like I wouldn’t be able to write code anymore. After all, the vast majority of code is written using a keyboard and mouse, tools that I could no longer use.
I looked around to see what people with similar conditions were doing. For some, physical therapy (through stretching and exercises) caused the pain to subside. For others, switching to an ergonomic keyboard and mouse (like the Kinesis keyboard or Evoluent vertical mouse) enabled them to use a computer without discomfort. Neither worked for me.
So, I turned to dictation software, since that didn’t require my hands at all. With a few of these dictation apps, I was able to start writing code again (which felt amazing!) and I was really impressed with the state of speech technology. But, I was far from fully productive, as the learning curve was quite steep. Some apps required you to speak using the NATO alphabet, and others required you to define and memorize your own mapping of words to keystrokes (e.g., “pineapple” for the “enter” key, since you don’t often say “pineapple” when programming). Even after that learning curve, needing to dictate every character that occurred in source code was much too slow—creating a function in Python my saying “def hello left parenthesis right parenthesis colon newline indent…” simply isn’t efficient.
With nothing allowing me to be sufficiently productive, I needed to leave my job. I knew there had to be a better way to write code without a keyboard. So, I started working on a prototype of a new voice coding app (with someone else typing for me to start) called Serenade, alongside my close friend Tommy, now my co-founder. We wanted to create a product that was really easy to use, to the point where you could just speak naturally, as you would in a conversation, and code would be written for you. As Serenade got better and better, I could slowly feel my productivity increasing.
Fast-forward to today, and I’m fully productive again using Serenade. In fact, I’m using Serenade full-time to build itself. Serenade is unlike any other voice programming solution in a few ways:
Serenade comes with its own speech-to-text engine, using a custom model specifically designed for code. Most other speech-to-text technologies are trained on typical conversaions between people, which isn’t ideal for code. After all, how often do you say “attr” or “enum” in conversational speech? Instead, Serenade learns common programming constructs, variable names, and other words you’d say when programming, making it much more accurate for coding.
Dictating code word-for-word (or even worse, letter-for-letter) is really slow. Instead of relying on just dictation, Serenade uses natural English input, so to create a function called hello, you can just say “create function hello”, without needing to worry about any syntax or memorization. In the same way, you can naturally describe manipulations to existing code, like “delete class” or “add parameter url”.
If Serenade isn’t confident in what you said, you’ll see a list of alternatives you can choose from. With many speech apps that only use the first result, it can be frustrating to repeat yourself just to correct a single word. Instead, Serenade allows you to just select a different result, which can dramatically streamline your workflow.
Coding by voice with Serenade can actually be faster than using a keyboard and mouse. (And, it’s certainly more fun.)
Is your cursor at the bottom of your screen, but you know you want to delete the function at the top of the file? Just say “delete first function”.
Are you in the middle of writing a function, and you realize that you forgot to pass in a variable called foo? Just say “add parameter foo”.
Do you have a dictionary that really should be defined as an enum instead? Just say “convert dictionary to enum”.
Many editors and IDEs have similar refactoring functionalities, but speaking is often more efficient than navigating menus upon menus or memorizing hundreds of keyboard shortcuts. And, the same Serenade commands work across any programming language, so whether you’re writing TypeScript or Python, the same natural commands like “add enum colors” will work.
You can talk faster than you can type. We’re building a world where you’ll be able to code faster than you ever could before.