The Serenade Blog

Annoucements, product updates, and technical posts from the Serenade team.

Serenade for Chrome

Cheng Gong

Software development involves much more than just writing code, and Serenade aims to bring voice to the entire development process. As a developer, chances are you spend a lot of time in a web browser reading documentation, browsing code on GitHub, and looking up answers on Stack Overflow. Today, we’re excited to bring voice to all of these workflows with the launch of Serenade for Chrome.

Serenade for Chrome brings Serenade’s powerful voice commands to the web browser. Navigation is as simple as saying open stack overflow or back, you can manage tabs with commands like new tab or next tab, and text can be input with commands like type hello—the same commands you’re already accustomed to using in Atom and VS Code.

Here’s a demo of Serenade for Chrome in action:

Serenade for Chrome also introduces the new show command, which can be used to show selectable links, inputs, and code. For instance, by saying show code followed by a number, you can copy a block of code from a Stack Overflow answer or GitHub gist, then paste it into your editor by just saying paste. Or, you can use show links followed by a number to navigate link-heavy pages.

Chrome Links Overlay

All of Serenade’s text editing commands—like type hello, delete next two words, and copy previous line—are available in Chrome as well, whether you’re typing a GitHub search or Gmail reply.

Serenade for Chrome is now freely available from the Chrome web store here. To learn all of the voice commands supported in Serenade for Chrome, check out our Chrome documentation.

We’re excited to hear your feedback! If you run into any issues or have ideas for features, don’t hesitate to reach out in the community channel.

A New Speech Engine

Tommy MacWilliam

Today, we’re excited to launch a new version of the Serenade speech engine after several months of development. Our speech engine is now faster and more powerful than ever before, which should boost the productivity of every developer using Serenade.

The new speech engine uses a state-of-the-art acoustic model and language model with significantly better performance than our previous models. These new models are trained on a much larger set of data, which means the new engine should be more accurate across a wider variety of pronunciations. We’ve also introduced a dedicated model for noise detection, which enables Serenade to handle background noise more effectively than before.

Perhaps the biggest change in our new speech engine is that it’s able to use context from the file you’re editing. For instance, let’s say you’re editing a function, and a variable called “docusaurus” is in scope. Without any context, Serenade would probably rank the word “docusaurus” fairly low, since it’s not a common word. But, with the context from the code you’re working with, Serenade can learn that “docusaurus” is actually much more likely and rank it at the top of the alternatives list. So, as you speak the names of variables, functions, classes, etc., and Serenade will know what you mean much more often.

Let’s talk results. In order to measure accuracy, we look at recall metrics, which measure how frequently the correct transcript was found in the top n results. For instance, recall@5 measures how frequently the correct transcript appeared in the first 5 results. Of particular importance is recall@1, which essentially measures how frequently the first result was correct, meaning no clarification commands were needed, and the developer could continue with their workflow.

With our new engine, we’re seeing significantly higher recall metrics across the board.

recall@1: 35% reduction in error rate

recall@5: 57% reduction in error rate

recall@10: 60% reduction in error rate

In addition to having significantly higher accuracy, our new speech engine is much faster. Previously, live transcript results would only appear every ~700ms due to limitations in our speech processing and streaming backend. Now, our speech engine is able to respond much more quickly, using smaller chunks of audio, so live results will appear much more frequently. You should also see a much shorter delay between when you finish speaking a voice command and when the result appears in your editor, which helps keep you in your development flow.

speech decoding speed: ~50% faster

end-to-end speed: ~60% faster

We hope these changes will help you be more productive than ever when coding with voice. If you have any questions or feedback, don’t hesitate to reach out to us in the community channel.

Creating Serenade

Matt Wiethoff

Last year, I developed a repetitive strain injury, commonly known as an RSI, in my wrists. With this condition, typing at a keyboard for even a few minutes caused immense hand pain—years of sitting at a computer for 8+ hours a day finally caught up to me. Suddenly, it seemed like I wouldn’t be able to write code anymore. After all, the vast majority of code is written using a keyboard and mouse, tools that I could no longer use.

I looked around to see what people with similar conditions were doing. For some, physical therapy (through stretching and exercises) caused the pain to subside. For others, switching to an ergonomic keyboard and mouse (like the Kinesis keyboard or Evoluent vertical mouse) enabled them to use a computer without discomfort. Neither worked for me.

So, I turned to dictation software, since that didn’t require my hands at all. With a few of these dictation apps, I was able to start writing code again (which felt amazing!) and I was really impressed with the state of speech technology. But, I was far from fully productive, as the learning curve was quite steep. Some apps required you to speak using the NATO alphabet, and others required you to define and memorize your own mapping of words to keystrokes (e.g., “pineapple” for the “enter” key, since you don’t often say “pineapple” when programming). Even after that learning curve, needing to dictate every character that occurred in source code was much too slow—creating a function in Python my saying “def hello left parenthesis right parenthesis colon newline indent…” simply isn’t efficient.

With nothing allowing me to be sufficiently productive, I needed to leave my job. I knew there had to be a better way to write code without a keyboard. So, I started working on a prototype of a new voice coding app (with someone else typing for me to start) called Serenade, alongside my close friend Tommy, now my co-founder. We wanted to create a product that was really easy to use, to the point where you could just speak naturally, as you would in a conversation, and code would be written for you. As Serenade got better and better, I could slowly feel my productivity increasing.

Fast-forward to today, and I’m fully productive again using Serenade. In fact, I’m using Serenade full-time to build itself. Serenade is unlike any other voice programming solution in a few ways:

  1. Serenade comes with its own speech-to-text engine, using a custom model specifically designed for code. Most other speech-to-text technologies are trained on typical conversaions between people, which isn’t ideal for code. After all, how often do you say “attr” or “enum” in conversational speech? Instead, Serenade learns common programming constructs, variable names, and other words you’d say when programming, making it much more accurate for coding.

  2. Dictating code word-for-word (or even worse, letter-for-letter) is really slow. Instead of relying on just dictation, Serenade uses natural English input, so to create a function called hello, you can just say “create function hello”, without needing to worry about any syntax or memorization. In the same way, you can naturally describe manipulations to existing code, like “delete class” or “add parameter url”.

  3. If Serenade isn’t confident in what you said, you’ll see a list of alternatives you can choose from. With many speech apps that only use the first result, it can be frustrating to repeat yourself just to correct a single word. Instead, Serenade allows you to just select a different result, which can dramatically streamline your workflow.

Coding by voice with Serenade can actually be faster than using a keyboard and mouse. (And, it’s certainly more fun.)

  • Is your cursor at the bottom of your screen, but you know you want to delete the function at the top of the file? Just say “delete first function”.

  • Are you in the middle of writing a function, and you realize that you forgot to pass in a variable called foo? Just say “add parameter foo”.

  • Do you have a dictionary that really should be defined as an enum instead? Just say “convert dictionary to enum”.

Many editors and IDEs have similar refactoring functionalities, but speaking is often more efficient than navigating menus upon menus or memorizing hundreds of keyboard shortcuts. And, the same Serenade commands work across any programming language, so whether you’re writing TypeScript or Python, the same natural commands like “add enum colors” will work.

You can talk faster than you can type. We’re building a world where you’ll be able to code faster than you ever could before.