Getting Started

The Serenade API is a powerful way to write your own custom voice commands. With the Serenade API, you can create custom automations (like keypresses, clicks, and more), custom pronunciations, and custom snippets (which insert customizable code snippets into your editor).

All of your custom commands will be defined in (node.js) JavaScript files in the ~/.serenade/scripts directory. Any JavaScript file in that directory will be loaded by Serenade, and you can also require other files and third-party libraries. Each script in that directory will have access to a global object called serenade that serves as the entry point for the Serenade API. If you prefer, you can also symlink ~/.serenade/scripts to another directory on your device.

A repository of example custom commands can be found at https://github.com/serenadeai/custom-commands. Feel free to use these commands directly, or use them as a reference for writing your own. If you do create your own commands, open up a pull request to that repository to share them with other Serenade developers!

Automations

With custom automations, you can create your own voice commands to automate keypresses, clicks, and more. For instance, you could write a voice command to search Stack Overflow or switch to a terminal and build your project.

Defining Automations

Custom automations are defined in ~/.serenade/scripts/. Serenade has already created a file called ~/.serenade/scripts/custom.js for you, which you can use as a starting point. You can create other files and install npm packages in that directory as well.

Every script in ~/.serenade/scripts has a global variable called serenade in scope, which can be used to create voice commands. A custom automation can either be global, meaning it can be activated from any application, or it can be scoped to one or more apps. For instance, if you only wanted to be able to trigger a certain automation from Chrome, you could scope the command to chrome, a case-insensitive substring of the process name.

All output from scripts in ~/.serenade/scripts is piped to ~/.serenade/serenade.log, so if you use functions like console.log from your custom automations, output will appear in ~/.serenade/serenade.log.

Let's look at an example. The below custom automation will bring your terminal to the foreground (launching it if it's not already running), type in a bash command to make a project, and execute it.

serenade.global().command("make", async (api) => {
  await api.focusOrLaunchApplication("terminal");
  await api.typeText("make clean && make");
  await api.pressKey("return");
});

The global method on the serenade object specifies that we'd like this command to be triggerable from any application. The command method takes two arguments:

  • The voice command you want to create, specified as a string
  • The automation that will be executed when you speak that command, specified as a callback. The api object that's passed to the callback as a the first argument has a variety of automation methods, all of which are outlined in the API Reference.

In this example, we used focusOrLaunchApplication to bring the terminal app to the foreground if it's running and to launch it if not, then typeText to type a string of text, and finally, pressKey to press a key on the keyboard.

Let's look at another example. This custom automation will search the current web page in Chrome for a string you specify with voice. For instance, you could trigger this automation by saying find hello world

serenade.app("chrome").command("find <%text%>", async (api, matches) => {
  await api.pressKey("f", ["command"]);
  await api.typeText(matches.text);
});

This time, rather than specifying global(), we used app("chrome") to make this command valid only when Google Chrome is in the foreground. In the first argument, surrounding text in <% %> creates a matching slot that will match anything. The words matched by a slot are passed to the callback via the matches parameter. So, for example, if you said find hello world, this command would be triggered, and matches.text would have a value of hello world. This automation will press the f key while holding down the command key, which will open Chrome's search box, then will type in whatever you said into the box.

You can specify multiple slots in a voice command, and matches will be populated with all of them.

Dynamic Automations

After defining an automation, you can dynamically enable or disable it using the Serenade API. For instance, suppose you wanted to create voice commands to enter and exit a "mode" where only some commands are valid. You could implement something like this:

const spellingModeCommands = [
  serenade.global().key("alpha", "a"),
  serenade.global().key("bravo", "b")
  // and more!
];

// disabled by default, until you say "start spelling"
serenade.global().disable(spellingModeCommands);

serenade.global().command("start spelling", async (api) => {
  serenade.global().enable(spellingModeCommands);
});

serenade.global().command("stop spelling", async (api) => {
  serenade.global().disable(spellingModeCommands);
});

Snippets

With custom snippets, you can create shortcuts for code you write regularly. Like custom automations, custom snippets are defined via JavaScript files in the ~/.serenade/scripts directory. To write custom snippets, create a JavaScript file in ~/.serenade/scripts, like ~/.serenade/scripts/snippets.js, and then you can use the Serenade API to register new voice commands.

Here's an example of a snippet that creates a new Python method whose name is prefixed with test_.

serenade.language("python").snippet(
  "test method <%name%>",
  "def test_<%name%>(self):<%newline%><%indent%>pass",
  { "name": ["identifier", "underscores"] },
  "method"
);

Now, if you say test method foo, the following code will be generated:

def test_foo(self):
    pass

The snippet method takes four parameters:

  • A string that specifies the trigger for the voice command. Surrounding text in <% %> creates a matching slot that matches any text. You can then reference the matched text in the generated snippets, much like regular expression capture groups.
  • A snippet to generate. If you defined a matching slot called <%name%> in the trigger, then <%name%> in the snippet will be replaced by the words that were matched in the transcript.
  • A map of slots to styles. Styles describe how text should be formatted, and a slot can have multiple styles. For instance, if a slot represents an identifier (e.g., a class name) where symbols aren't allowed, and that identifier should be pascal case, then the values ["identifier", "pascal"] could be used. See the API Reference for possible values.
  • How to add the snippet to your code. In the above example, we're specifying that this block should be added as a method, so if your cursor is outside of a class, it will move to the nearest class before inserting anything, just as it would if you said "add method". The default value for this argument is statement. See the API Reference for possible values.

As another example, here's a snippet to add a new React class in a JavaScript file:

serenade.language("javascript").snippet(
  "add component <%name%>",
  "const <%name%><%cursor%>: React.FC = () => {};",
  { "identifier": ["identifier", "pascal"] }
);

Notice that you can use the special slot <%cursor%> to specify where the cursor will be placed after the snippet. The full list of special slots is:

  • <%cursor%>: Where the cursor will be placed after the snippet is added.
  • <%indent%>: One additional level of indentation.
  • <%newline%>: A newline.
  • <%terminator%>: The statement terminator for the current language, often a semicolon.

As one last example, here's a snippet to create a Java class with an extends and implements in one command:

serenade.language("java").snippet(
  "new class <%name%> extends <%extends%> implements <%implements%>",
  "public class <%name%><%cursor%> extends <%extends%> implements <%implements%> {<%newline%>}",
  {
    "name": ["pascal", "identifier"],
    "extends": ["pascal", "identifier"],
    "implements": ["pascal", "identifier"]
  },
  "class"
);

Pronunciations

You can also create your own custom pronunciations. For instance, if Serenade consistently hears hat when you say cat, then you can simply remap hat to cat. That way, the word you intended to say is what's used in each command Serenade hears.

To define new pronunciations, you can use the .pronounce method. For instance, to remap the word prize to price, you can add the below to your custom.js file:

serenade.global().pronounce("prize", "price")

Just as with all custom commands, you can also use filters like .language and .extensions.

System

In addition to scripting automations and creating snippets, you can also customize how Serenade interacts with your system.

Accessibility API

On macOS and Windows, Serenade integrates with OS-level accessibility APIs in order to enable dictation into applications even without official Serenade plugins or the Revision Box. Since these APIs are often inconsistently implemented across applications, this behavior is opt-in by default. To enable accessibilty API support for an application, add it to ~/.serenade/settings.json:

{
  "use_accessibility_api": [
    "slack",
    "discord"
  ]
}

Revision Box

When Serenade can't read a text field, because there's no dedicated Serenade plugin or accessibility APIs aren't implemented properly, you can configure the Revision Box to appear automatically. In your ~/.serenade/settings.json, you can configure the behavior of the Revision Box on a per-application basis—below is an example. Here, the default behavior for applications is to not show the Revision Box at all, for slack to always show the Revision Box, and for mail to show the Revision Box only when the accessibility API returns no value.

{
  "show_revision_box": {
    "all_apps": "never",
    "slack": "always",
    "mail": "auto"
  }
}

API Reference

Below is a reference for all methods that are available in the Serenade API.

class Serenade

Methods to create new Builder objects with either a global scope or scoped to a single application. You can access an instance of this class via the serenade global in any script.

global()

Create a new Builder with a global scope. Any commands registered with the builder will be valid regardless of which application is focused or language is used.

app(application)

Create a new Builder scoped to the given application. Any commands registered with the builder will only be valid when the given application is in the foreground.

  • application <string> Application to scope commands to.

language(language)

Create a new Builder scoped to the given language. Any commands registered with the builder will only be valid when editing a file of the given language.

  • language <string> Language to scope commands to.

extension(extension)

Create a new Builder scoped to the given file extension. Any commands registered with the builder will only be valid when editing a file with the given extension.

  • extension <string> File extension to scope commands to.

scope(applications, languages)

Create a new Builder scoped to the given applications and languages. Any commands registered with the builder will only be valid when one of the given applications is focused and one of the given languages is being used. To specify any application or language, pass an empty list for that parameter.

  • applications <string[]> List of applications to scope commands to.
  • languages <string[]> List of languages to scope commands to.

url(url)

Create a new Builder scoped to a specific URL or domain name when using Chrome with the Serenade extension. Commands registered with this builder will only be valid when the active tab matches one of the given URLs.

  • urls <string[]> List of URLs to scope commands to.

class Builder

Methods to register new voice commands.

command(trigger, callback[, options])

Register a new voice command.

  • trigger <string> Voice trigger for this command.
  • callback <function> Function to be executed when the specified trigger is heard. Arguments to the callback are:
    • api <object> An instance of the API class
    • matched <object> A map from slot names to matched text.
    • Returns <string> Command ID that can be passed to enable or disable.
  • options <object> Options for how this command is executed. (Available only in the latest Serenade beta.)
    • autoExecute <boolean> Whether this command executes automatically or requires confirmation. For destructive commands (e.g., closing a window), you likely want this to be false, and for non-destructive commands (e.g., scrolling up), you like want this to be true. Defaults to false.
    • chainable <string> Whether this command can be chained together with other custom commands. Possible values are:
      • none This command is not chainable with other custom commands.
      • any This command can appear anywhere in a chain.
      • firstOnly This command can only appear as the first element of a chain.
      • lastOnly This command can only appear as the last element of a chain.
      Defaults to none.

disable(id)

Disable a voice command.

  • id <string[] | string> List of command IDs or a single command ID, which is the return value when the command was registered.

enable(id)

Enable a voice command.

  • id <string[] | string> List of command IDs or a single command ID, which is the return value when the command was registered.

hint(word)

Give a hint to the speech engine that a word is more likely to be heard than would be assumed otherwise.

  • word <string> Word to hint to the speech engine.
  • Returns <string> Command ID that can be passed to enable or disable.

key(trigger, key[, modifiers, options])

Shortcut for the command method if you just want to map a voice trigger to a keypress. This method is equivalent to:command("trigger", async api => { api.pressKey(key, modifiers); });

  • trigger <string> Voice trigger for this command.
  • key <string> Key to press. See keys for a full list.
  • modifiers <string[]> Modifier keys (e.g., "command" or "alt") to hold down when pressing key. See keys for a full list.
  • options <object> Options for how this command is executed. (Available only in the latest Serenade beta.) See command for possible values.
  • Returns <string> Command ID that can be passed to enable or disable.

pronounce(before, after)

Remap the pronounciation of a word from before to after.

  • before <string> What to remap.
  • after <string> What to remap to.
  • Returns <string> Command ID that can be passed to enable or disable.

snippet(templated, generated[, transform])

Register a new snippet.

  • templated <string> A string that specifies the trigger for the voice command. Surrounding text in <% %> creates a matching slot that matches any text. You can then reference the matched text in the generated snippets, much like regular expression capture groups.
  • generated <string> A snippet to generate. You can use<% %> to reference matching slots. You can also define the default formatting for any matching slot by putting a colon after the slot's name; to specify multiple styles, separate them with commands. The default text style is lowercase. Possible values for formatting are:
    • caps All capital letters.
    • capital The first letter of the first word capitalized.
    • camel Camel case.
    • condition The condition of an if, for, while, etc.—symbols like "equals" will automatically become "==". condition impliesexpression.
    • dashes Dashes between words.
    • expression Any expression; symbols will be automatically mapped, so dashwill become -.
    • identifier The name of a function, class, variable, etc.; symbols will be automatically escaped, so dash will become dash.
    • lowercase Spaces between words.
    • pascal Pascal case.
    • underscores Underscores between words.
  • transform <string> How to add the snippet to your code. Defaults tostatement. Possible values are:
    • inline (directly at the cursor)
    • argument
    • attribute
    • catch
    • class
    • decorator
    • element (i.e., an element of a list)
    • else
    • else_if
    • entry (i.e., an element of a dictionary)
    • enum
    • extends
    • finally
    • function
    • import
    • method
    • parameter
    • return_value
    • ruleset (i.e., a CSS ruleset)
    • statement
    • tag (i.e., an HTML tag)

text(trigger, text[, options])

Shortcut for the command method if you just want to map a voice trigger to to typing a string. This method is equivalent to: command("trigger", async api => { api.typeText(text); });

  • trigger <string> Voice trigger for this command.
  • text <string> Text to type.
  • options <object> Options for how this command is executed. (Available only in the latest Serenade beta.) See command for possible values.
  • Returns <string> Command ID that can be passed to enable or disable.

class API

Methods for workflow automation. An instance of API is passed as the first argument to the callback passed to the command method on a Builder. All methods on the API are async, so you should await their result, or use .then() to attach a callback.

click([button][, count])

Trigger a mouse click.

  • button <string> Mouse button to click. Can be left, right, or middle.
  • count <number How many times to click. For instance, 2 would be a double-click, and 3 would be a triple-click.
  • Returns <Promise> Fulfills with undefined upon success.

clickButton(button)

Click a native system button matching the given text. Currently macOS only.

  • button <string> Button to click. This value is a substring of the text displayed in the button.
  • Returns <Promise> Fulfills with undefined upon success.

domBlur(selector)

Currently available only in Chrome. Remove keyboard focus from the first DOM element matching the given CSS selector string.

  • selector <string> CSS selector string corresponding to the element to defocus.

domClick(selector)

Currently available only in Chrome. Click on the first DOM element matching the given CSS selector string.

  • selector <string> CSS selector string corresponding to the element to click.

domCopy(selector)

Currently available only in Chrome. Copy all of the text contained within the first DOM element matching the given CSS selector string.

  • selector <string> CSS selector string corresponding to the element containing the text to be copied.

domFocus(selector)

Currently available only in Chrome. Give keyboard focus the first DOM element matching the given CSS selector string.

  • selector <string> CSS selector string corresponding to the element to focus.

domScroll(selector)

Currently available only in Chrome. Scrolls to the first DOM element matching the given CSS selector string.

  • selector <string> CSS selector string corresponding to the element to scroll to.

evaluateInPlugin(command)

Currently available only on VS Code. Evaluate a command inside of a plugin. On VS Code, the command argument is passed to vscode.commands.executeCommand.

  • command <string> Command to evaluate within the plugin.

focusApplication(application)

Bring an application to the foreground.

  • application <string> Application to focus. This value is a substring of the application's path.
  • Returns <Promise> Fulfills with undefined upon success.

getActiveApplication()

Get the path of the currently-active application.

  • Returns: <Promise<string>> Fulfills with the name of the active application upon success.

getClickableButtons()

Get a list of all of the buttons that can currently be clicked (i.e., are visible in the active application). Currently macOS only.

  • Returns: <Promise<string[]>> Fulfills with a list of button titles upon success.

getInstalledApplications()

Get a list of applications installed on the system.

  • Returns: <Promise<string[]>> Fulfills with a list of application paths upon success.

getMouseLocation()

Get the current mouse coordinates.

  • Returns: <Promise<{ x: number, y: number }>> Fulfills with the location of the mouse upon success.

getRunningApplications()

Get a list of currently-running applications.

  • Returns: <Promise<string[]>> Fulfills with a list of application paths upon success.

launchApplication(application)

Launch an application.

  • application <string> Substring of the application to launch.
  • Returns <Promise> Fulfills with undefined upon success.

mouseDown([button])

Press the mouse down.

  • button <string> The mouse button to press. Can be left, right, or middle.
  • Returns <Promise> Fulfills with undefined upon success.

mouseUp([button])

Release a mouse press.

  • button <string> The mouse button to release. Can be left, right, or middle.
  • Returns <Promise> Fulfills with undefined upon success.

pressKey(key[, modifiers][, count])

Press a key on the keyboard, optionally while holding down other keys.

  • key <string> Key to press. Can be a letter, number, or the name of the key, like enter, backspace, or comma.
  • modifiers <string[]> List of modifier keys to hold down while pressing the key. Can be one or more of control, alt, command, option, shift, or function.
  • count <number> The number of times to press the key.
  • Returns <Promise> Fulfills with undefined upon success.

quitApplication(application)

Quit an application.

  • application <string> Substring of the application to quit.
  • Returns <Promise> Fulfills with undefined upon success.

runCommand(command)

Execute a voice command.

  • command <string> Transcript of the command to run (e.g., "go to line 1" or "next tab").

runShell(command[, args][, options][, callback])

Run a command at the shell.

setMouseLocation(x, y)

Move the mouse to the given coordinates, with the origin at the top-left of the screen.

  • x <number> x-coordinate of the mouse.
  • y <number> y-coordinate of the mouse.
  • Returns <Promise> Fulfills with undefined upon success.

typeText(text)

Type a string of text.

  • text <string> Text to type.
  • Returns <Promise> Fulfills with undefined upon success.

Keys

You can speak any key name in order to reference it in a Serenade command. In addition to any letter or number, you can also say any of the below:

TranscriptKeyTranscriptKey
plus+dash, minus-
star, times*slash, divided by/
less than<greater than>
equal=comma,
colon:dot, period.
underscore_semicolon;
bang, exclam!question mark?
tilde~percent, mod%
at@dollar$
backslash\hash#
caret^ampersand&
backtick`pipe|
left brace{right brace}
left bracket[right bracket]
single quote'quote, double quote"
tab<tab>enter, return<enter>
space<space>delete<delete>
backspace<backspace>up<up>
down<down>left<left>
right<right>escape<escape>
pageup<pageup>pagedown<pagedown>
home<home>end<end>
caps<caps lock>shift<shift>
command<command>control<control>
alt<alt>option<option>
win, windows<windows>function, fn<fn>
f1<F1>f2<F2>
f3<F3>f4<F4>
f5<F5>f6<F6>
f7<F7>f8<F8>
f9<F9>f10<F10>
f11<F11>f12<F12>